Syllabus: Large Language Models, Content Moderation, and Political Communication
Prithvi Iyer, Justin Hendrix / Sep 26, 2024This piece was first published on May 7, 2024, and is updated sporadically with additional resources. While we cannot post every link we receive, we encourage the Tech Policy Press community to share material that may be relevant. The date of publication posted here represents the last date when updates were made.
With the advent of generative AI systems built on large language models, a variety of actors are experimenting with how to deploy the technology in ways that affect political discourse. This includes the moderation of user-generated content on social media platforms and the use of LLM-powered bots to engage users in discussion for various purposes, from advancing certain political agendas to mitigating conspiracy theories and disinformation. It also includes the political effects of so-called AI assistants, which are currently in various stages of development and deployment by AI companies. These various phenomena may have a significant impact on political discourse over time.
For instance, content moderation, the process of monitoring and regulating user-generated content on digital platforms, is a notoriously complex and challenging issue. As social media platforms continue to grow, the volume and variety of content that needs to be moderated have also increased dramatically. This has led to significant human costs, with content moderators often exposed to disturbing and traumatic material, which can have severe psychological consequences. Moreover, content moderation is a highly contentious issue, driving debates around free speech, censorship, and the role of platforms in shaping public discourse. Critics argue that content moderation can be inconsistent, biased, and detrimental to open dialogue, while proponents of better moderation emphasize the need to protect users from harmful content and maintain the integrity of online spaces. With various companies and platforms experimenting with how to apply LLMs to the problem of content moderation, what are the benefits? What are the downsides? And what are the open questions that researchers and journalists should grapple with?
In this syllabus, we examine some of what is known about the use of large language models (LLMs) to assist with content moderation tasks, engage in various forms of political discourse, and deliver political content to users while also considering the ethical implications and limitations of relying on artificial intelligence in this context, and how bad actors may abuse these technologies.
This syllabus is a first draft; it will be periodically updated. If you would like to recommend relevant resources to include, do reach out via email.
AI and Political Communication
In this section, we track academic research examining the use of generative AI for counterspeech, hate-speech detection, political communication, and to create and mitigate disinformation campaigns.
Blogs
- AI chatbots are intruding into online communities where people are trying to connect with other humans
- Belgian AI firm develops LLM to combat online hate speech
- Generative AI is already helping fact-checkers. But it’s proving less useful in small languages and outside the West
Counterspeech and hate speech detection
- Brown, H., Lin, L., Kawaguchi, K., & Shieh, M. (2024). Self-Evaluation as a Defense Against Adversarial Attacks on LLMs. arXiv preprint arXiv:2407.03234.
- Ben-Porat, C. S., & Lehman-Wilzig, S. (2020). Political discourse through artificial intelligence: Parliamentary practices and public perceptions of chatbot communication in social media. In The Rhetoric of Political Leadership (pp. 230-245). Edward Elgar Publishing.
- Argyle, L. P., Bail, C. A., Busby, E. C., Gubler, J. R., Howe, T., Rytting, C., ... & Wingate, D. (2023). Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale. Proceedings of the National Academy of Sciences, 120(41), e2311627120.
- Kumar, A. (2024). Behind the Counter: Exploring the Motivations and Perceived Effectiveness of Online Counterspeech Writing and the Potential for AI-Mediated Assistance (Doctoral dissertation, Virginia Tech).
- Zhu, W., & Bhat, S. (2021). Generate, prune, select: A pipeline for counterspeech generation against online hate speech. arXiv preprint arXiv:2106.01625.
- Cypris, N. F., Engelmann, S., Sasse, J., Grossklags, J., & Baumert, A. (2022). Intervening against online hate speech: A case for automated Counterspeech. IEAI Research Brief, 1-8.
- Gabriel, I., Manzini, A., Keeling, G., Hendricks, L. A., Rieser, V., Iqbal, H., ... & Manyika, J. (2024). The Ethics of Advanced AI Assistants. arXiv preprint arXiv:2404.16244.
- Tomalin, M., Roy, J., & Weisz, S. (2023). Automating Counterspeech. In Counterspeech (pp. 147-170). Routledge.
- Doğanç, M., & Markov, I. (2023). From Generic to Personalized: Investigating Strategies for Generating Targeted Counter Narratives against Hate Speech. In Proceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA) (pp. 1-12).
- Costello, T. H., Pennycook, G., & Rand, D. (2024). Durably reducing conspiracy beliefs through dialogues with AI.
- Saha, P., Agrawal, A., Jana, A., Biemann, C., & Mukherjee, A. (2024). On Zero-Shot Counterspeech Generation by LLMs. arXiv preprint arXiv:2403.14938.
- Hong, L., Luo, P., Blanco, E., & Song, X. (2024). Outcome-Constrained Large Language Models for Countering Hate Speech. arXiv preprint arXiv:2403.17146.
- Jin, Y., Wanner, L., & Shvets, A. (2024). GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? arXiv preprint arXiv:2402.15238.
- Mun, J., Allaway, E., Yerukola, A., Vianna, L., Leslie, S. J., & Sap, M. (2023, December). Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 9759-9777).
- Agarwal, V., Chen, Y., & Sastry, N. (2023). Haterephrase: Zero-and few-shot reduction of hate intensity in online posts using large language models. arXiv preprint arXiv:2310.13985.
- Zheng, Y., Ross, B., & Magdy, W. (2023, September). What makes good counterspeech? A comparison of generation approaches and evaluation metrics. In Proceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA) (pp. 62-71).
- Gligoric, K., Cheng, M., Zheng, L., Durmus, E., & Jurafsky, D. (2024). NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps. arXiv preprint arXiv:2404.01651.
- Jahan, M. S., Oussalah, M., Beddia, D. R., & Arhab, N. (2024). A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs. arXiv preprint arXiv:2404.00303.
- Zhang, M., He, J., Ji, T., & Lu, C. T. (2024). Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection. arXiv preprint arXiv:2402.11406.
- Nirmal, A., Bhattacharjee, A., Sheth, P., & Liu, H. (2024). Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales. arXiv preprint arXiv:2403.12403.
- Roy, S., Harshavardhan, A., Mukherjee, A., & Saha, P. (2023). Probing LLMs for hate speech detection: strengths and vulnerabilities. arXiv preprint arXiv:2310.12860.
- Morbidoni, C., & Sarra, A. (2022). Can LLMs assist humans in assessing online misogyny? Experiments with GPT-3.5.
- Roberts, E. (2024). Automated hate speech detection in a low-resource environment. Journal of the Digital Humanities Association of Southern Africa, 5(1).
- Aldjanabi, W., Dahou, A., Al-qaness, M. A., Elaziz, M. A., Helmi, A. M., & Damaševičius, R. (2021, October). Arabic offensive and hate speech detection using a cross-corpora multi-task learning model. In Informatics (Vol. 8, No. 4, p. 69). MDPI.
- Guo, K., Hu, A., Mu, J., Shi, Z., Zhao, Z., Vishwamitra, N., & Hu, H. (2023, December). An Investigation of Large Language Models for Real-World Hate Speech Detection. In 2023 International Conference on Machine Learning and Applications (ICMLA) (pp. 1568-1573). IEEE.
Political campaigns and disinformation
- Ezzeddine, F., Ayoub, O., Giordano, S., Nogara, G., Sbeity, I., Ferrara, E., & Luceri, L. (2023). Exposing influence campaigns in the age of LLMs: a behavioral-based AI approach to detecting state-sponsored trolls. EPJ Data Science, 12(1), 46.
- Bai, H., Voelkel, J., Eichstaedt, J., & Willer, R. (2023). Artificial intelligence can persuade humans on political issues.
- Bang, Y., Chen, D., Lee, N., & Fung, P. (2024). Measuring Political Bias in Large Language Models: What Is Said and How It Is Said. arXiv preprint arXiv:2403.18932.
- Törnberg, P. (2023). Chatgpt-4 outperforms experts and crowd workers in annotating political Twitter messages with zero-shot learning. arXiv preprint arXiv:2304.06588.
- Barman, D., Guo, Z., & Conlan, O. (2024). The Dark Side of Language Models: Exploring the Potential of LLMs in Multimedia Disinformation Generation and Dissemination. Machine Learning with Applications, 100545.
- Urman, A., & Makhortykh, M. (2023). The Silence of the LLMs: Cross-Lingual Analysis of Political Bias and False Information Prevalence in ChatGPT, Google Bard, and Bing Chat.
- Lucas, J., Uchendu, A., Yamashita, M., Lee, J., Rohatgi, S., & Lee, D. (2023). Fighting fire with fire: The dual role of llms in crafting and detecting elusive disinformation. arXiv preprint arXiv:2310.15515.
- Sun, Y., He, J., Cui, L., Lei, S., & Lu, C. T. (2024). Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges. arXiv preprint arXiv:2403.18249.
- Mirza, S., Coelho, B., Cui, Y., Pöpper, C., & McCoy, D. (2024). Global-liar: Factuality of LLMs over time and geographic regions. arXiv preprint arXiv:2401.17839.
LLMs and Content Moderation
In this section, we provide resources on the opportunities and risks of using large language models for content moderation. Research on this topic examines the ability of LLMs to classify posts based on a platform’s safety policies at scale.
Tech Policy Press coverage
- Sullivan, D., & Badiei, F. (2024, October 29). Between hype and hesitancy: How AI can make us safer online. Tech Policy Press.
- Willner, D., & Chakrabarti, S. (2024, January 29). Using LLMs for policy-driven content classification. Tech Policy Press.
- Barrett, P. M., & Hendrix, J. (2024, April 3). Is Generative AI the Answer for the Failures of Content Moderation? Tech Policy Press.
- Boicel, A. (2024, April 4). Using LLMs to Moderate Content: Are They Ready for Commercial Use? Tech Policy Press
- Iyer, P. (2024, April 3). Transcript: Dave Willner on Moderating with AI at the Institute for Rebooting Social Media. Tech Policy Press.
Blogs
- Weng, L., Goel , V., & Vallone , A. (2023, August 15). Using GPT-4 for content moderation.
- Leveraging LLMs in Social Media Content Moderation & Analysis | tome01. (n.d.).
- Spectrum Labs. (n.d.). AI-Based content moderation: Improving trust & safety online.
- Faieq, Z., Sartori, T., & Woodruff, M. (2024, February 15). Supervisor LLM Moderation: Using LLMs to Moderate LLMs.
- WhyLabs. (n.d.). Content Moderation with Large Language Models (LLMs).
- Cheparukhin. (2023, September 1). How OpenAI’s GPT-4 LLM promises to Reshape Content Moderation. HackerNoon.
- Daphne Keller (2024, August 22) Regulating platform risk and design: ChatGPT says the quiet part out loud
- Asjad, M. (2024, August 2). Google AI introduces Shieldgemma: A comprehensive suite of LLM-based safety content moderation models built on GEMMA2. MarkTechPost.
Technical papers released by AI companies
- Wang, G., Cui, Y., & Li, M. (2023, September 5). Build a generative AI-based content moderation solution on Amazon SageMaker JumpStart | AWS machine learning blog.
- Palangi, H., & Ray, D. (2022, May 23). (DE)ToxiGen: Leveraging large language models to build more robust hate speech detection tools. Microsoft Research.
- Anthropic. (n.d.). Content moderation. Claude.
Events
- Moderating AI and Moderating with AI.
- The Future of Content Moderation and its Implications for Governance
Academic papers
- Bhatia, A., & Sukthankar, G. (2024, September). Moderating Democratic Discourse with LLMs. In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation (pp. 123-132). Cham: Springer Nature Switzerland.
- Huang, T. (2024). Content Moderation by LLM: From Accuracy to Legitimacy. arXiv preprint arXiv:2409.03219.
- Wen, R., Crowe, S. E., Gupta, K., Li, X., Billinghurst, M., Hoermann, S., ... & Piumsomboon, T. (2024). Large Language Models for Automatic Detection of Sensitive Topics. arXiv preprint arXiv:2409.00940.
- Yang, K. C., & Menczer, F. (2023). Large language models can rate news outlet credibility. arXiv preprint arXiv:2304.00228.
- Jha, P., Jain, R., Mandal, K., Chadha, A., Saha, S., & Bhattacharyya, P. (2024). MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention. arXiv preprint arXiv:2406.05344.
- Yuan, Z., Xiong, Z., Zeng, Y., Yu, N., Jia, R., Song, D., & Li, B. (2024). Rigorllm: Resilient guardrails for large language models against undesired content. arXiv preprint arXiv:2403.13031.
- Mahomed, Y., Crawford, C. M., Gautam, S., Friedler, S. A., & Metaxa, D. (2024, June). Auditing GPT's Content Moderation Guardrails: Can ChatGPT Write Your Favorite TV Show?. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 660-686).
- Dorn, D., Variengien, A., Segerie, C. R., & Corruble, V. (2024). Bells: A framework towards future proof benchmarks for the evaluation of llm safeguards. arXiv preprint arXiv:2406.01364.
- Lykouris, T., & Weng, W. (2024). Learning to Defer in Content Moderation: The Human-AI Interplay. arXiv preprint arXiv:2402.12237.
- Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y., ... & Khabsa, M. (2023). Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674.
- Shah, Chirag, and Emily M. Bender. "Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?." ACM Transactions on the Web (2023).
- Gomez, J. F., Machado, C. V., Paes, L. M., & Calmon, F. P. (2024). Algorithmic Arbitrariness in Content Moderation. arXiv preprint arXiv:2402.16979.
- Kumar, D., AbuHashem, Y., & Durumeric, Z. (2024). Watch Your Language: Investigating Content Moderation with Large Language Models. arXiv preprint arXiv:2309.14517.
- He, Z., Guo, S., Rao, A., & Lerman, K. (2024). Whose Emotions and Moral Sentiments Do Language Models Reflect?. arXiv preprint arXiv:2402.11114.
- Qiao, W., Dogra, T., Stretcu, O., Lyu, Y. H., Fang, T., Kwon, D., ... & Tek, M. (2024, March). Scaling Up LLM Reviews for Google Ads Content Moderation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (pp. 1174-1175).
- Ma, H., Zhang, C., Fu, H., Zhao, P., & Wu, B. (2023). Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning. arXiv preprint arXiv:2310.03400.
- Franco, M., Gaggi, O., & Palazzi, C. E. (2023, September). Analyzing the use of large language models for content moderation with chatgpt examples. In Proceedings of the 3rd International Workshop on Open Challenges in Online Social Networks (pp. 1-8).
- Kwon, T., & Kim, C. (2023). Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online. arXiv preprint arXiv:2401.02974.
- Axelsen, H., Jensen, J. R., Axelsen, S., Licht, V., & Ross, O. (2023). Can AI Moderate Online Communities?. arXiv preprint arXiv:2306.05122.
- Kolla, M., Salunkhe, S., Chandrasekharan, E., & Saha, K. (2024). LLM-Mod: Can Large Language Models Assist Content Moderation?.
- Zhou, X., Sharma, A., Zhang, A. X., & Althoff, T. (2024). Correcting misinformation on social media with a large language model. arXiv preprint arXiv:2403.11169.
- Udupa, S., Maronikolakis, A., & Wisiorek, A. (2023). Ethical scaling for content moderation: Extreme speech and the (in) significance of artificial intelligence. Big Data & Society, 10(1), 20539517231172424.
- Nicholas, G., & Bhatia, A. (2023). Toward Better Automated Content Moderation in Low-Resource Languages. Journal of Online Trust and Safety, 2(1).
- Mullick, S. S., Bhambhani, M., Sinha, S., Mathur, A., Gupta, S., & Shah, J. (2023, July). Content moderation for evolving policies using binary question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) (pp. 561-573).
- Murthy, S. A. N. Text Content Moderation Model to Detect Sexually Explicit Content.