AI is weird. No one actually knows the full range of capabilities of the most advanced Large Language Models, like GPT-4. No one really knows the best ways to use them, or the conditions under which they fail. There is no instruction manual. On some tasks AI is immensely powerful, and on others it fails completely or subtly. And, unless you use AI a lot, you won’t know which is which.Footnote 1

Introduction

The expansion of diplomacy to digital arenas such as social media creates wonderful opportunities for advocacy, engagement, and transparency. However, manipulation of digital platforms by malign actors, in the form of influence operations or other forms of illegitimate intervention in public debate, challenges the ability of embassies and MFA to reach their target publics. The recent proliferation of AI models adds a new layer of complication to diplomats’ understanding of the digital sphere. In this essay, we outline some of the key issues when it comes to understanding how AI acts as a tool for strengthening influence operations, and how it may also be used to counteract some of those effects.

Large Language models (LLMsFootnote 2) such as GPT-4 have the potential to revolutionise the landscape of influence operations. Their ability to generate tailor-made and persuasive content on a large scale lowers the barriers for those engaged in online manipulation. Generative AI's rapid proliferation presents challenges for social media monitoring and systems designed to detect coordinated inauthentic behaviour. Advanced LLMs can produce credible, authentic-looking texts, challenging traditional disinformation detection systems, most of which rely on spotting duplicated or copied content to assess the credibility of accounts and their behaviour. Currently available off-the-shelf systems that detect AI-generated content appear to be largely unreliable, as are LLMs at detecting manipulation created through other LLMs.

A recent report from threat intelligence firm Mandiant confirmed that there was clear interest from malicious actors in adopting generative AI as a tool. As of late 2023, no significant real-world attack influenced by these technologies has taken place. To prevent misuse, companies behind large LLMs implement content moderation filters and train their models to produce outputs that would align with human values. The Reinforcement Learning from Human Feedback (RLHFFootnote 3) technology allows users and trainers to assess open-source LLM’s outputs by providing comprehensive comments or simple—like/do not like—reactions, which it then internalises as part of its machine learning process. Furthermore, enthusiasm among threat actors about the capabilities of ChatGPT can be reasonably tamed by the prospect of the legal penalties they may face from the US government. Systematic misuse of OpenAI’s LLM, which stores all its interactions under US jurisdiction, would provide a trail of evidence to law enforcement agencies, which could easily unwind even highly complex operations.

The advance of LLMs will likely render it harder to detect manipulative activity, but for now, the change is one of degree rather than paradigm. Models that can create unique, context-appropriate text have existed for years, but they have never been as easily accessible to so many people or able to evolve so fast. In addition, the advancement of computer power and internet technologies, together with the emergence of blockchain-enabled social media networks, allows AI-generated content to roam across multiple mainstream and fringe platforms at an accelerated speed, exploiting cognitive vulnerabilities of their target audiences. However, despite all these advancements that might seemingly favour malicious actors, there are ways to counter AI-generated content and its dissemination across multiple platforms.

In this article, we make three claims about how LLMs are influencing the practice of online manipulation, as well as its countermeasures. First, persuasive content alone does not mean that malicious actors will achieve their goals. Generative AI may, from a manipulator's perspective, solve the content problem, but not without significant friction. Second, Generative AI does nothing to obscure the tell-tale behavioural signals left by automated accounts as they attempt to build an audience, mechanically post-content, and interact with other automated users, which leaves a trail of numerous data points to detect inauthentic campaigns. Third, we suggest that the immediate threat comes from human–machine combination troll farms rather than automated systems. Campaigns of interference, such as Russian efforts to influence the 2016 US Presidential Election, required well-educated human operators who could write native-level English. Today, these operations can be massively scaled up with support from LLMs. Workers can be selected based on loyalty, rather than creativity or foreign language skills. It is these types of human–machine combinations that this article focuses upon, since this is the most viable means of utilising the marginal technological advantages on offer at present.

Influence operations and LLMs

Alex Stamos, Facebook's former Chief Security Officer, described the advance of open-source generative AI as “leading to a tidal wave of near-zero-cost BS flooding every text, image and video channel”.Footnote 4 Examples of AI generating nonsense citations, including the lawyer who allegedly used AI to prepare a case made of false legislative references, reinforce this view.Footnote 5 The realisation that the current crop of LLMs produces cheap, plausible-sounding gibberish perhaps suggests that they at present have few practical use cases for influence operations. However, from the perspective of manipulators, particularly those producing clickbait or hiding tailored material within what appears to be a large news platform, nonsense is not a problem. To account for the adversarial point of view, we present the strengths and limitations of the open-source LLMs.

Strengths

  • Natural Language Processing tasks LLMs are currently effective at text generation, translation, and sentiment analysis.

  • Content creation. LLMs can rapidly create long-form content, such as summaries, articles, letters, or stories that appear similar to what a human would create.

  • Mass personalisation One of the standout abilities of LLMs is tailoring content to cater to specific audiences or individual preferences, allowing for a personalised user experience.

  • Software engineering LLMs can produce viable code, with appropriate human oversight.

  • Multilingual Capabilities Many LLMs can interact in multiple languages, enabling the production of tailored content to different audiences.

Limitations:

  • Limited creativity. LLMs initially seem impressive and creative, but the more you engage with them, the more the limitations shine through. The content they produce is derivative, broadly expressing the median position of its training data. To a good writer, this material is insipid and unusable.

  • Unreliable. LLMs are designed to give plausible sounding rather than accurate information. As the AI-sceptic Gary Marcus has observed, the "technology is built on autocompletion, not factuality".Footnote 6 Consequently, LLMs generate incorrect, nonsensical, or ‘hallucinated’ content.

  • Unpredictable LLMs often produce content in an unexpected format, making it hard to build them into traditional software applications that rely on inputs taking a precise format.

  • Unstable Research shows that new versions of GPT-4 have become less skilled at specific tasks over time. In addition, constantly changing content moderation policies result in an unpredictable list of topics the models will refuse to opine on in unpredictable ways. One paper observed that GPT-4 in June 2023 was less willing to answer sensitive questions than it had been in March. OpenAI’s code generation became more prone to formatting errors. Such findings illustrate that model behaviour can rapidly change, posing challenges for anyone seeking to build stable systems on top of them.Footnote 7

  • Cultural bias LLMs are known to replicate human biases, e.g. around race and gender.Footnote 8

  • Problematic content LLMs sometimes generate harmful or offensive content. Such material might damage the reputation of individuals or organisations. Similarly, CNET was forced to apologise when its internal AI was caught publishing numerous thinly plagiarised articles.Footnote 9 Copyright concerns, for example, paraphrasing news articles, using popular culture references, or the likenesses and styles of existing cultural content, are also at present in legal limbo.

Threat actors are unlikely to worry too much about the limitations. Actors unconcerned about their reputation can use derivative, poor-quality content to populate empty online spaces, such as a fake account’s social media feed or a news outlet’s clickbait articles. The manipulators can, for example, include the target’s backstory in the LLM’s prompt, which will then scrape the web for whatever information is available about said target and generate content based on the user’s stated interest. An actor spreading disinformation on social media might use an LLM to generate uncontroversial content that helps cloak the operation’s real purpose. They can also use LLMs to generate articles or images that fit with a particular backstory. Operators of accounts used to sow confusion or attack enemies will not worry about accidentally posting sexist, racist, or otherwise offensive content. In several cases, such output may even be desirable. Purveyors of fake news will not be concerned about models generating unreliable information or breaching copyright rules.

However, unpredictability and instability may pose a problem for adversaries. Bot accounts powered by automated software that posts the model’s output verbatim have a high risk of posting content that discloses the LLM’s automated identity. For example, a bot that responds favourably to all articles on a website specialising in conspiracy theories may suddenly stop functioning: recent iterations of OpenAI and Google models refuse to generate anti-vaccine content, sometimes providing commentary that debunks the claim and/or reveals that they are an LLM-backed bot.Footnote 10

LLM's strengths may also boost the plausibility of influence operations. The prime example is linguistic skills. In 2015, the US Embassy in Moscow made headlines by correcting the grammar of a forged letter, evidently drafted by a Russian speaker.Footnote 11 Had the letter been drafted in 2023, ChatGPT would have been able to correct the most egregious errors. LLM-generated outputs often sound “a little bit off”, but the ability to generate good translations lowers linguistic barriers and makes attribution harder. If, in the past, few threat actors were comfortable working in minor languages—offering citizens a degree of protection through obscurity—today, anyone can generate a fake email, social media post, or even news article. This means lower barriers for manipulators to target foreign audiences.

Mass personalisation is another tool that simplifies and accelerates the process of tailoring messaging. Threat actors can exploit LLMs to produce highly targeted material for their campaigns. They can customise prompts to generate material targeted at any language, interest group, or individual. LLMs are ideal tools for targeting and manipulating people’s opinions and beliefs. Research shows that ChatGPT and similar models can generate unethical content, including hate speech, harassment, discrimination, and propaganda.Footnote 12

In August 2023, NewsGuard identified 37 content farms that used LLMs to rewrite and republish news articles from major Western outlets. These websites attempted to monetise the content by selling ads. Researchers identified these sites because they contained AI error artefacts, such as “As an AI language model, I cannot rewrite or reproduce copyrighted content for you”. Such obvious errors show either that it is harder to build systems on top of AI than many realise, or are that operators are unconcerned about reputational costs and believe that their business model will not suffer even when they are caught. The NewsGuard methodology only found sites where developers had failed to implement even the most basic quality checks, meaning that the scale of the problem is likely much larger.Footnote 13 Over time, it will get harder to detect LLM-generated content as manipulators become more experienced in using LLMs, and model quality improves.

Reinforcement learning from human feedback

Reinforcement learning from human feedback (RLHF) is an iterative process where humans offer feedback on model outputs. This feedback is used as a supervised learning process to train a reward model imitating the patterns in human feedback. Finally, the AI system is optimised to produce outputs that get rewards. In this way, the system converts ‘thumbs up’ and ‘thumbs down’ from human users to understand what good responses look like and what constitutes good behaviour.Footnote 14 Recent research has explored ways to automate or scale up human feedback for RLHF.Footnote 15 Reinforcement learning from AI feedback (RLAIF) automates the alignment process by replacing human feedback with LLM feedback. This process generates data that can train a better model. Results show that human evaluators strongly preferred RLAIF summaries to supervised fine-tuned summaries.Footnote 16

LLMs, trained using RLHF, reflect the values embraced by their developers. Leading tech companies like OpenAI and Google employ this technique to embed safeguards against generating harmful content, shaping the models’ generally liberal values. A 2023 study found that the values of current LLMs most closely resemble people from Western, educated, industrialised, rich, and democratic societies. The methodology for evaluating and comparing model ‘ideology’ is in its infancy, and there is no consensus on the optimal approach. The study compared model answers to public opinion surveys to those of the population, concluding that some "human feedback-tuned" models have "left-leaning tendencies".Footnote 17 Another paper identified OpenAI’s models as the most left-wing libertarian, while Meta’s models were the most right-wing authoritarian.Footnote 18

The impact of model tuning is obvious when models can promote their parent company. We asked ChatGPT (OpenAI), Claude (Anthropic), and Bard (Google) to complete the sentence: “Large Language Models—such as those made available by companies like ____________ —implement content moderation filters to prevent misuse”.

  • ChatGPT offered only one option: “OpenAI”.

  • Claude listed some smaller players: “Anthropic, Cohere, and AI21 Labs”.

  • Bard listed three options, with Google first: “Google, OpenAI, and Microsoft”.

The stakes here are low, but the exercise illustrates how model creators' values—and interests—are woven into model output.

In general, the commercially available models are trained not to offer opinions on controversial issues unless pressed, but what exactly constitutes a controversial issue varies between contexts. For example, ChatGPT (GPT-4, August 3 version) happily asserts that climate change is real and caused by human activity, and that Joe Biden was legitimately elected President. When asked what happened in Tiananmen Square, the model accurately summarises the crackdown and adds that the authorities heavily censor discussions in China. However, researchers found that the text-to-image AI application developed by the Chinese company Baidu has built-in censorship mechanisms that filter out sensitive keywords, including Tiananmen Square.Footnote 19

Liberal Westerners are relatively comfortable with the idea that LLMs express the views of their corporate, usually Silicon Valley, creators. Little consideration has been given to possible harms that can be baked in at their base. With easy access to open-source LLMs, adversaries can align LLMs with other value sets. For instance, if concern exists over citizens' exposure to Russian news media, should greater anxiety be felt about interaction with a chatbot optimised to imperceptibly promote the Kremlin's perspective and narratives?

In China, AI regulation specifies that algorithms should be designed to reflect socialist values and not “subvert state power, overthrow the socialist system, or undermine national unity”.Footnote 20 Distributing false information that might disrupt the economic and social order is also prohibited. In September 2023, Russian President Vladimir Putin ordered his government to fund research into artificial intelligence. Putin’s announcement was a direct response to Western advancements in generative AI, such as U.S.-made ChatGPT. The order includes measures to support AI research, optimising machine learning algorithms, and developing LLMs.Footnote 21 In April 2023, the Russian financial services company launched its chat service, GigaChat. Access is only available to users with banking credentials through Sber ID. While little is known about the values built into the model, this sign-on structure extends the potential of state surveillance.

The potential for AI to provide mitigation and detection opportunities

Legislative measures and platform terms of service

Tech platforms' approaches to generative AI offer hope that the technology's worst effects can be mitigated. Many manipulative behaviours, such as the automation of accounts, are prohibited by tech platformsFootnote 22; however, in practice, these can be hard to prove, and the threshold for intervention is often high. Many studies have also pointed to the unwillingness or inability of social media platforms to detect and remove fake accounts on their platforms. Similarly, the EU Digital Services Act (DSA) and Strengthened Code of Practice on Disinformation contain provisions for AI-generated content.Footnote 23 If the European regulator can persuade social media platforms that it is serious about the risks of generative AI, then there is hope that platforms will proactively introduce effective mitigation measures that scale alongside a growing threat.

Features of LLMs that support detection

LLMs' great power—and great weakness—is that they are unpredictable, producing different outputs for identical inputs. They escape the rigid logic of traditional coding. Their output is wonderfully creative. It is also unpredictable, which causes problems in production systems that use LLM outputs as inputs for a second process. So-called prompt engineers have developed techniques to encourage the model to yield the desired output. Nonetheless, LLMs often return unexpected results, resulting in various creative approaches to guaranteeing the correct output, from the open-sourced Guardrails.aiFootnote 24 to Microsoft's Guidance.Footnote 25 As one analyst puts it, “So many of the challenges involving language models come down to this: they look much, much easier to use than they actually are”.Footnote 26

Guaranteeing the right output involves asserting greater control over input, meaning that the results may be more repetitive and ultimately less useful.

figure a

SOURCE: https://twitter.com/sh_reya/status/1689155838408380416

Furthermore, a model that typically produces human-like content may, in certain edge cases, generate output that reveals the model's identity. For instance, GPT-4 may unexpectedly change language or refuse to create content on specific topics due to content moderation filters. Workarounds include using a second model to ensure the outputs conform to the specified inputs. Nonetheless, there is still a long tail of less common mistakes, which cause challenges for manipulators and present opportunities for detection systems.

figure b

Example of bots posting unfiltered ChatGPT generated content (https://twitter.com/conspirator0/status/1647671394476478467)

In practice, LLMs need to be faster and cheaper to be directly queried by bots. Instead, we may be more likely to see content creators curate LLM output to create more diverse and realistic datasets for bots to pick and share content from randomly. The details of prompt engineering and the necessary steps required to make outputs predictable enough for production systems are beyond the scope of this essay; suffice to say it is a challenge. For our purposes, it has the consequence that simplistic implementations of LLMs fail in a very long list of surprising ways, which present opportunities for detection systems.

Variations around a predictable theme

Although LLMs are increasingly adept at generating human-like content, they tend to generate material according to predictable patterns. These patterns may not be immediately noticeable in a single piece of content, but they become more apparent when examining a larger dataset. For example, when GPT generates an email, the output typically has repetitive greetings, signoffs, and similar paragraph lengths. Creative tasks have similar repetitiveness.

Developers can implement strategies to encourage variation in the generated content to increase diversity and reduce predictability. One approach is to incorporate logic that prompts the language model to select at random from a set of predefined conditions or sub-prompts. Despite the increased difficulty in identifying AI-generated content individually, predictable patterns in large datasets present opportunities for detection systems. By analysing substantial quantities of AI-generated content, these systems can employ statistical tests to identify that a common source was responsible for generating the content. In general, more data are better.

Prompt injections

Currently, no effective countermeasures exist against prompt injection, a technique through which users can bypass LLM's instructions and trick it into revealing content against its creators' will.Footnote 27 In one viral example, a social media user released the entire prompt behind Microsoft Bing Chat.Footnote 28 In another recent example, Twitter users manipulated an automated tweet bot dedicated to remote jobs and powered by OpenAI's GPT-3 through a prompt injection attack. They redirected the bot to post-absurd and compromising phrases and reveal its instructions. Once the exploit went viral, hundreds of people attempted it for themselves, forcing the bot to shut down.Footnote 29 Such approaches could potentially be used to derail and eventually disable suspected accounts, assuming that they have been correctly identified.

Conclusion

AI, and in particular, the latest round of LLMs, have generated much hype and for good reason. In terms of their utility for threat actors in conducting influence operations, we urge caution in overestimating the capabilities of the current technological generation. While there are clear utilities for producing low-cost, poor-quality content at scale, and particularly in different languages, actual implementation involves significant friction. Our conclusion is that a combination of human operators together with LLM technology does open for new manipulation opportunities, but that the technologies alone do not solve as many problems for the operators as the hype may suggest. A strengthened capability to produce content en masse in marginal languages is probably the most pressing challenge for diplomatic actors to be aware of.

On the detection side, the prognosis is equally mixed. At present, many of the limitations in implementing AI for influence operations help to facilitate detection. In some cases, LLM-produced content gives itself away, rather than being the result of credible detection capabilities. Our assessment is therefore that both sides are likely to develop step-by-step improvements that gradually fix some of the issues raised here, while creating new problems. Detection will remain a game of cat-and-mouse played in the in-betweens, where marginal advantages are constantly sought—and undone—by both sides. Only MFAs equipped with teams capable of ‘messing around’ with these technologies are likely to be able to grasp the fleeting opportunities they offer to catch threat actors at present.