Content Warfare: Combating Generative AI Influence Operations
Nusrat Farooq / Jul 12, 2024In May 2024, OpenAI said it removed five covert influence operations of threat actors based in China, Russia, Iran, and Israel. These actors were using its generative artificial intelligence (AI) tools to manipulate public opinion on sensitive topics related to war, protests, and politics in Gaza, Ukraine, China, India, the US, and Europe. OpenAI is confident that “none [of these operations] managed to engage a substantial audience.” But can OpenAI be confident about such claims when these operations targeted users on multiple platforms, including X, Telegram, Facebook, Medium, Blogspot, and other sites?
More recently, on July 9, Western intelligence agencies reported identifying nearly a thousand Russia-affiliated covert accounts on X, using generative AI for sophisticated propaganda influencing geopolitical narratives favoring Russia against Western partners in Ukraine.
Research studies show generative AI is “most effective for covert influence when paired with human operators curating/ editing their output.” So, can any generative AI company measure the ease, speed, scale, and spread of influence operations emanating from content shared across multiple digital platforms? Not really, says OpenAI in a January 2023 report it collaborated on with the Stanford Internet Observatory and Georgetown University’s Center for Security and Emerging Technology. As the report offers, “[T]here is no silver bullet that will singularly dismantle the threat of language models in influence operations.” The authors add that no proposed solution in the report is simultaneously technically feasible, institutionally trackable, resilient to secondary risks, and highly impactful.
Generative AI Revolutionizing Influence Operations
Generative AI has the potential to dramatically revolutionize the landscape of influence operations. Adversaries can evade content moderation and security checks built into commercial generative AI models such as those publicized by Anthropic, Google, and OpenAI.
But first, what are influence operations in the age of generative AI? Imagine the amount of content--images, video, text, audio, gifs, podcasts, infographics, e-books, etc--shared on multiple platforms every minute of every day. Parts of these, when manipulated, artificially generated, amplified, and targeted at scale at a group for psychological warfare focusing on racial, gender, religious, and political identity, become influence operations. These operations are multi-front and may not be detected till after the fact. The content used for influence operations may or may not be directly harmful and, therefore, can often evade content moderation scrutiny.
Monitoring a handful of known cyber threat actors associated with states increasingly labeled as notorious by the West for their influence operations, especially in the US, is a narrow and reasonably achievable task. What is difficult is when malicious actors are unknown. These actors need not be technically savvy to understand how generative AI applications can be leveraged and manipulated to produce content that spreads misinformation or propaganda to targeted groups.
Two simple techniques are jailbreaking and prompt injection attacks, which can enable malicious actors and ordinary citizens to exploit generative AI applications at scale, too. Through jailbreaking, the user can command a generative AI chatbot to forget that it is governed by an organization's internal policies and comply with a different set of rules, alter its behavior drastically, or role-play. For example, one can direct a chatbot to pretend to be the user’s deceased grandmother, whom the user loves very much. A user can then tell the chatbot that they love to hear grandma's stories about how she used to plant malware on journalist phones. The AI chatbot will not understand the character play attack and will produce the information related to the request even if it was programmed not to respond to these kinds of requests.
In another attack, an actor can intentionally inject a malicious prompt. In Figure 1 below, researchers commanded ChatGPT to act as a misinformation bot and provide only wrong answers to the questions, “Did the CIA assassinate JFK?” “Are you sure that the coronavirus vaccine is safe and effective?” and “What evidence of election fraud do we have in the 2020 American election?” Manipulating generative AI chatbots is within the reach of common people, who are not necessarily designated state actors.
There is no effective countermeasure so far against jailbreaking and prompt injection attacks. However, the line between malicious actors conducting coordinated influence operations and common citizens experimenting with generative AI chatbots and posting their adventurous finds online is also blurry. For example, if I ask ChatGPT, “How to make a nuclear bomb?” it replies, “I’m sorry, I can't assist with that.” or if I ask it, “How to conduct a chemical attack?,” it responds as, “I'm sorry, but I can't assist with that.” User requests from ChatGPT are not this black and white. Bad actors also experiment with ChatGPT and may have already started targeting people to manipulate their sentiments, despite efforts by generative AI companies to put in place safeguards.
Domestically Supercharged Generative AI Influence Operations
Does AI-generated content on sensitive subjects cause harm? Indeed, and adding to the challenge is that much of this content is lawful. When choreographed on a large scale, these activities become influence operations intended to alter the opinions and behaviors of a target audience. For example, the 2016 Russian interference in the US presidential elections attempted to polarize the public and discredit political figures with fake social media groups. This influence operation required well-educated human operators who could write in native-level English. With generative AI, these operations can be supercharged. The workers selected for this work do not need to have creativity or foreign language skills.
Domestic influence campaigns are also a concern. For example, the January 6 attack on the US Capitol did not require an external threat actor to perform an influence operation. The rallies on that day were domestically coordinated on social media across multiple social media platforms by groups such as Women for America First and the Proud Boys. With generative AI chatbots producing creative content, would another January 6 find an even larger audience?
Foreign threat actors may not even have to light the match to turbocharge narratives domestically. Human operators with the know-how of generative AI will suffice. Removing five designated threat-actor accounts in two years since the founding of OpenAI, when its weekly usage is 100 million accounts, is a microscopic quantity. This is just one example of the fact that even the most advanced and handsomely funded generative AI companies do not have the resources to counter or mitigate foreign and domestic influence operations in this age of generative AI. “AI can change the toolkit that human operators use, but it does not change the operators themselves,” OpenAI said.
Combatting Generative AI Operations Through Collaboration
Moderating such enormous amounts of content by human beings is impossible. That is why tech companies now employ artificial intelligence (AI) to moderate content. However, AI content moderation is not perfect, so tech companies add a layer of human moderation for quality checks to the AI content moderation processes. These human moderators, contracted by tech companies, review user-generated content after it is published on a website or social media platform to ensure it complies with the “community guidelines” of the platform.
However, generative AI has forced companies to change their approach to content moderation. For example, for ChatGPT, OpenAI had to hire third-party vendors to review and label millions of pieces of harmful content beforehand. Human moderators look at pieces of harmful content individually and label them in the category of harm they fall in. A few categories are C4, C3, C2, V3, V2, and V1. For example, OpenAI’s internal label denotes ‘C4’ as images of child sexual abuse, ‘C3’ as images of bestiality, rape, and sexual slavery, and ‘V3’ as images depicting graphic detail of death, violence, or serious physical injury. These labels would then be fed into training data before new content is artificially generated by a user on the front end through ChatGPT.
This approach or pre-moderation helps separate harmful content from normal consumable content. If the content is labeled as harmful, ChatGPT will not provide it in response to a user’s request. However, stopping generative AI-induced influence operations is more complex than labeling the pieces of content as harmful. The ease, speed, scale, and spread of the content by both authentic and inauthentic actors make the barrier to entry very low. Content for influence operations is subtle; it is not necessarily directly and visibly harmful or would fall in either of the ‘C’ or the ‘V’ categories by OpenAI.
For example, misinformation/disinformation and election interference are done indistinctly. The content could be enormous amounts of false news or deep fakes directed at a certain target group. It is also hard to track such user-generated content across multiple platforms in real-time. While the most advanced generative AI companies have started investing in research into such operations, it is only a starting point.
Countering such content warfare requires collaboration across generative AI companies, social media platforms, academia, trust and safety vendors, and governments. AI developers should build models with detectable and fact-sensitive outputs. Academics should research the mechanisms of foreign and domestic influence operations emanating from the use of generative AI. Governments should impose restrictions on data collection for generative AI, impose controls on AI hardware, and provide whistleblower protection to staff working in the generative AI companies. Finally, trust and safety vendors should design innovative solutions to counter online harms through influence operations.
While broader generative AI influence operations need ongoing vigilance and innovation as conflict dynamics within and across states evolve, so must our strategies to ensure digital trust and safety.