When talking online, how do we know that the person we’re talking to is who we think they are? Can we tell if a piece of writing has been changed by a different author?
These questions are at the heart of the latest national security challenge launched by HMGCC Co-Creation, which is looking for tech solutions founded in forensic linguistics, to help ensure the identity of online authors. A key requirement is to automate processes with explainable and defensible decisions.
HMGCC Co-Creation will provide funding for time, materials, overheads and other indirect expenses for applicants successful in phase 2 of the competition.
Technology themes
Artificial intelligence, behavioural and social sciences, communication systems, data science and engineering, machine learning, software development.
Key Information
HMGCC Co-Creation will be hosting a two-stage competition process.
1) Phase 1. The objective is to rapidly assess brief proposals. Those unsuccessful will be informed with feedback. Successful applicants will be invited to phase 2. For further information please see How to apply – Phase 1.
2) Phase 2. Following a feedback phase, successful phase 1 applicants will be requested to submit a proposal directly to cocreation@hmgcc.gov.uk. For further information please see How to apply – Phase 2.
The challenge
Context of the challenge
HMGCC Co-Creation is launching a challenge on behalf of national security to find a solution that detects changes in authorship within online communications and provide a detailed explanation of the detected differences.
UK government departments, like many private sector organisations with a global reach, conduct significant communication online. By communicating only via online text, there is a need for assurance that the person being messaged is the intended recipient, using writing style, motivation, mood and attitude changes as clues to their identity, with any changes needing to be understood in the context of national security concerns.
The gap
Specialists can analyse written online communication and may be able to spot inconsistencies across multiple messages from an individual, for example changes in linguistic style, tone, mood and motivations. However, this can be labour intensive and does not scale for many different online interactions.
It is believed that writing patterns of an individual can be learnt by a machine, giving the capability of automated anomaly detection. The machine’s explanation of why a change has been detected can then be investigated further.
Example use case
Javier is an operational officer investigating serious and organised crime gangs. He often communicates with different suspected criminals to gain evidence.
These communications are mainly in an email style layout; typically including a greeting, one or more paragraphs of text and a sign-off. They can be written in English and foreign languages including non-Latin.
Javier is wary when dealing with the suspected criminals. He rarely meets his contacts in person, to ensure everyone’s safety. He feels he may be unaware if he is being deceived.
From his first communication with his contact, Javier logs all of his interactions in a central repository. The number of communications received can vary considerably, with the minimum total length being about two short paragraphs. He spots subtle changes across communications from the same contact, and it is difficult for Javier to note if the content changes are significant or not. He flags this to a central forensic linguist team to analyse and to also check if two supposedly different contacts show common communication styles.
As Javier and his colleagues have multiple contacts in the criminal world, they would like a tool to automatically flag authorship inconsistencies.
Project scope
Applicants should aim to deliver a demonstrator in this 12-week project, to at least Technology Readiness Level (TRL) 6. A developed model is required that can be transferred to the sponsor for initial trials. Essential and desirable requirements are listed below, along with constraints and elements which would not be required for this challenge.
Essential requirements:
- Authorship analysis of the writing style of an online contact to detect changes over time, and identify if changes relate to new authors, additional authors, or use of generative AI.
- Ability to assess in English and foreign languages including non-Latin.
- Ability to assess authorship of email-style layout, typically a greeting, one or more paragraphs and sign-off.
- Ability to provide a detailed explanation of why an authorship change has been detected.
- Solution architecture is expected to comprise n-tier architecture, with a minimum of two tiers, user interface and application. Additionally, the solution should be self-contained or ‘black-box’ with integration capabilities, for example APIs, to allow ingest of data, models, languages, and egress of outputs.
- Authentication and authorisation for roles, for example user and administrator.
- Model provided in either safetensor format or containerised in Docker or Kubernetes, likewise the solution as a whole will be containerised or capable of being built and deployed as a container.
- Design patterns that will allow to deploy anywhere (on-prem, on cloud infrastructure, or in a data centre) and can work off-line.
- Good documentation along with example code used.
Desirable requirements:
- Cross-case writing analysis enabling comparison of writing styles across online conversations with different individuals to detect authorship matches.
- Cross-genre writing style analysis using additional genres such as SMS, social media and formal documents.
- Ability to assess authorship using message content that is not just writing style, but could include, for example, meta-data analysis, and behavioural science characteristics.
- The model could be plugged into a corporate knowledge base, with the ability to search on historical information.
Constraints:
- Training data sets must be compliant with UK law, including GDPR. Use of already developed algorithms, anonymised datasets or synthetic training data should be considered.
- Capable of functioning even with a small number of words.
Not required:
- No requirement for audio and visual genres.
- Analysis of handwritten communications.
- Digital forensic analysis of AI watermarks.
- Cloud-only based solutions.
- Horizon scanning only.
Key dates
Monday 1 September 2025
Phase 1 competition opens
Friday 19 September 2025
Clarifying questions published
Thursday 25 September 2025 at 5pm
Phase 1 competition closes
Monday 6 October 2025
Applicants notified with feedback
Tuesday 7 October 2025
Phase 2 competition opens
Friday 24 October 2025 at 5pm
Phase 2 competition closes
Tuesday 4 November 2025
Applicant notified
Wednesday 12 November 2025
Pitch day in Milton Keynes
Wednesday 19 November 2025
Commercial onboarding begins*
*Please note, the successful solution provider will be expected to have availability for a one-hour onboarding call via MS Teams on the date specified, to begin the onboarding/contractual process.
Late November/ early December 2025
Target project kick-off
Eligibility
This challenge is open to sole innovators, industry, academic and research organisations of all types and sizes. There is no requirement for security clearances.
Solution providers or direct collaboration from countries listed by the UK government under trade sanctions and/or arms embargoes, are not eligible for HMGCC Co-Creation challenges.
Clarifying questions
Clarifying questions or general requests for assistance can be submitted directly to cocreation@hmgcc.gov.uk before the deadline with the challenge title as the subject. These clarifying questions may be technical, procedural, or commercial in subject, or anything else where assistance is required. Please note that answered questions will be published to facilitate a fair and open competition.