Recent revelations about Facebook’s struggles with non-English moderation raise big questions about how we think about the responsibility of social media platforms to address disinformation. On Monday, former Facebook employee Sophie Zhang told a committee of the British parliament that the company explicitly prioritizes the U.S. and Western Europe, putting less attention and resources into the rest of the world. Another whistleblower, Frances Haugen, revealed to the U.S. Senate that 87% of Facebook’s investments to counter misinformation are focused on English language content, while just 9% of Facebook users actually speak English.
It is likely such asymmetries exist across the major social media platforms. Indeed, much of the dialogue on content moderation and disinformation is led by what’s happening in the United States. It’s understandable. The richest country in the world. Home to the biggest social platforms. And Donald Trump. But a lack of appreciation for other cultural contexts can lead to disastrous consequences. Most famously, the genocide in Myanmar is an example of platforms not being aware of how technology can be weaponized in a specific cultural context to incite violence.
Platforms work with thousands of often low paid moderators, many located outside of the U.S. and Europe, and these people are generally making split-second decisions about whether content stays up or goes down. To make better decisions, platforms need to work with experts in every corner of the world who can provide more nuanced local insight on a daily basis, and hire many more of them. But the scale of the problem is such that we also need better technologies to assist their work.
This is something I’m working on at Kinzen. Our focus is on the global problem of disinformation. We recognize that a key challenge to addressing it is the difficulty of scaling the capacity for human intervention in every country or cultural context. That’s why we work with experts on the creation of new technologies to understand disinformation in multiple languages and multiple formats, like audio and video.
But even when we discuss technological solutions to disinformation, we are often talking only about research and tools primarily developed for the English language. This is not just about the inherent biases of the people working behind the scenes, but because of the overwhelming use of English across the web. The size of training data sets skews results such that there is a wide gulf between the quality of natural language processing systems in English compared with almost every other of the 6,500 languages used by humans today.
This is why we work with experienced journalists and researchers to constantly update algorithmic systems– a process we call “sub-editing the machine,” where we put a human editor in the loop. We look for people with proven expertise in their field of research. They review our data systems and suggest tweaks that are honed specifically to the threats of disinformation and hate speech in their locale or context.
Creating systems that can continuously incorporate new information is where some of the most interesting challenges are taking place right now. Take Turkish, for example. An ancient language, it was ordained as the “official language” of the country after the fall of the Ottoman Empire in the early 20th century. As part of this “modernization”, the written language script was changed from Arabic to the Latin alphabet. The confluence of ancient and modern Turkish is reflected in both society and language, creating challenges for confused AI systems.
The conversion introduced several vowels and consonants that are unique to Turkish, which now present a challenge for AI systems in the transcription of audio content. Picking up sounds such as “ğ” or “Ğ”, “ö” or “Ö”, “ı” or “I” prove especially difficult in current systems and require another level of analysis in transcription of words like ağıt (elegy) or değişim (transformation).
Arabic poses a different challenge. Tools and dashboards have to be redesigned to account for the right-to-left nature of the written language, and that’s only the start. Just like every other language, the rich variety of dialects within Arabic means that technology struggles to distinguish a Moroccan speaking Arabic from a Saudi Arabian, for example. Picking up Arabic diacritics and vowel marks can sometimes lead to more confusion. For example مَدْرَسَة (school) and مُدَرِّسَة (teacher) are two identical words but with different vowel marks that create different meanings.
Hindi is also fascinating, but it doesn’t even begin to help us adequately account for everything that’s unfolding in India. It’s only one of 22 languages recognised in the Eighth Schedule of the Indian Constitution. There are even more “unofficial” languages spoken there, and thousands of dialects. Switching languages, from Hindi to English and back again, as happens frequently, can confuse AI systems, as can the propensity to drop in English words while speaking predominantly in Hindi.
It’s absolutely essential that local experts have a role in shaping the technology of content moderation. But they also have a critical role in providing insights which allow for a greater contextual understanding of what’s happening on the ground. Ensuring they learn from each other is vital. Disinformation knows no borders. Regardless of the language differences, we see the same trends and tactics emerging in various languages. For example, after the 2020 U.S. election my colleague Karolin Schwarz wrote about how already– a year out from the German election– there were signs of the “voter fraud” false narrative shaping German discussions online, with the German term for “electoral fraud” already appearing in multiple podcasts.
COVID has taken the international intermingling of conspiracy thinking to new levels. This makes it even more important that disinformation researchers collaborate to learn how global narratives become local threats, and vice versa. For example, a term like “vaccines kill” might emerge in one language, but before it spreads to others, colleagues, working together, can get ahead of the trend in their own country. This international approach also reveals differences between countries. In the West, we’ve heard a lot about hydroxychloroquine and ivermectin as supposed “COVID cures”, but in India there are similar concerns about ayurvedic medicine.
Sometimes the expert provides insight into something that is truly unique to the country. For example, in Swedish folklore, it’s believed that witches fly to Blåkulla on Maundy Thursday to feast with the devil. During the witch persecutions of the 1600s, a woman being absent on this day could imply she was a witch. Nowadays, it’s a tradition that Swedish children go door to door during Easter dressed as witches – almost always with headscarfs and brightly colored clothes – and ask their neighbors for candy. The last few years, however, the word “påskkärring”, “easter hag” or “easter witch” has also become a far-right dog whistle, since Islamophobes use it to refer to Muslim women.
Such complexity requires a range of solutions, and they need to be well thought through. Some won’t involve technology or platforms at all. But platforms have a role too. They can work more closely with locals on the ground – these may be representatives of civil society or experts on the area. A tighter feedback loop between those doing moderation inside the platforms and the civil society groups and researchers on the frontlines is required to ensure more informed decisions are being made. This can enable more rapid responses to real-time threats. Spending on Trust and Safety has to be decoupled from the average revenue per user (ARPU), as others have publicly recommended. Enabling this community-informed moderation should allow for a more nuanced position by platforms when countering threats. This, along with the scaled responses such as those Kinzen is developing, can create international solutions to international problems.
The reality is that the conversation about how to counter disinformation is likely to remain driven by events in the U.S. in the near term. But the testimony of Zhang and Haugen sheds light on the need for non-English expertise to drive solutions from now on. Likewise, journalist and free expression advocate Maria Ressa– who recently won the Nobel Prize— has pointed to hard lessons learned outside of the U.S. and Europe that should inform the types of solutions we build going forward. It’s time to turn the conversation more to what’s happening in the Philippines or Nigeria or Japan or Indonesia, so that we can prevent another Myanmar. Rather than treating the world like a succession of growth markets to be conquered, platforms need to consider a complex web of cultures to be understood.
Shane Creevy has been working at the forefront of social journalism for over a decade. He spent eight years at Storyful as a journalist and led its video business as Global Video Editor. As the Head of Editorial at Kinzen, he provides oversight of the company’s content services and its unique approach to Natural Language Processing.