Home

Donate
Perspective

The Multilingual AI Gap Is Not Closing. It Is Being Rebranded.

Sofia Olofsson / Apr 10, 2026

New Delhi: Commuters stranded in a traffic jam on Dr Zakir Hussain Marg, near Bharat Mandapam amid the ongoing AI Impact Summit, in New Delhi, Thursday, Feb. 19, 2026. (Karma Bhutia/PTI via AP)

Adding more data to AI training sets is not the same as including communities. Until that distinction shapes policy, linguistic inclusion in AI will remain performance rather than progress. As governments move from AI pilots to real-world deployment in public services, this distinction is no longer technical—it is a question of governance, safety, and rights.

In recent months, policy and industry responses to multilingual AI have accelerated. Major AI labs have expanded language coverage. New multilingual benchmarks continue to emerge globally. The India AI Impact Summit in February 2026 placed linguistic inclusion at the center of global discussions. UNESCO has advanced a Global Roadmap on Multilingualism in the Digital Era, and proposals for a Multilingual AI Fund have entered the policy conversation.

This shift reflects a broader convergence: governments, companies, and international organisations now recognize that language gaps in AI are systemic, not peripheral. But most responses still treat this as a problem of coverage rather than control.

On paper, this looks like progress. But one question remains largely unasked: inclusion of what, and for whom? The dominant approach treats linguistic inclusion as a data problem, suggesting the solution is more datasets, more languages, and higher benchmark scores. This is what might be called a dataset fallacy: the assumption that representation in training data is equivalent to inclusion in practice. What this framing obscures is power: who defines how languages are represented, what counts as "working," and who bears the consequences when systems fail. What is often presented as progress risks becoming performative multilingualism, the appearance of inclusion without a redistribution of control.

The gap between adding a language to a model and including the community that speaks it in governance is not a technical limitation. It is a political choice.

The dataset paradigm and its limits

The most visible measure of progress in multilingual AI is benchmark performance. The SAHARA benchmark, published in 2025, evaluated 517 African languages across 16 NLP tasks and confirmed what researchers have long argued: a pronounced performance gap persists between English and the vast majority of African languages, including widely spoken ones such as Hausa, Wolof, Oromo, and Kinyarwanda. The study’s authors attribute these disparities to policy-driven data inequities—not linguistic complexity, but decades of underinvestment in digital infrastructure.

This framing matters because it identifies the problem as structural rather than technical. Yet the solutions that follow from benchmarking tend to remain within the same paradigm: collect more data, train more models, publish more scores. The assumption is simple: close the data gap, and inclusion will follow.

Recent evidence suggests the gap is not only about performance, but safety. A 2026 benchmark testing harmful prompts across English and West African languages found that safeguards that held in English degraded sharply in other languages, with refusal rates dropping by more than half in some cases. This suggests that alignment and safety mechanisms do not reliably transfer across languages—turning linguistic disparity into a systemic risk, not just a quality issue.

This pattern is reinforced by UbuntuGuard, the first African policy-based safety benchmark, constructed from adversarial queries authored by 155 domain experts across six countries. Its findings show that English-centric benchmarks systematically overestimate multilingual safety, and that cross-lingual transfer, often treated as a shortcut to coverage, provides partial but insufficient protection. The benchmark's design itself illustrates the argument: safety categories grounded in Western contexts fail to capture culturally specific harms when applied to African languages.

A 2025 white paper by Stanford HAI, the Asia Foundation, and the University of Pretoria confirms this divide: most major LLMs underperform for non-English and especially low-resource languages, are not attuned to relevant cultural contexts, and are not accessible across much of the Global South. Not because the technical challenge is insurmountable, but because investment and governance have not followed.

Data collection at scale is rarely neutral. When AI labs or international organizations undertake large-scale language data initiatives, the communities whose languages are being collected typically have little say in how that data is gathered, what contexts it represents, how it is labeled, or how it will be used downstream. The data enters pipelines shaped by external priorities, commercial, academic, or developmental, and the community’s relationship to it is extractive rather than participatory. Language becomes a resource to be harvested, not a living system to be governed. This concern is increasingly reflected in policy discussions. UNESCO’s 2026 Global Roadmap on Multilingualism in the Digital Era explicitly calls for community participation in the governance of linguistic data, signaling a shift away from purely technical approaches toward questions of ownership and control.

Grassroots initiatives such as Masakhane—a network of over 2,000 African researchers working on NLP for African languages—have articulated an alternative. Their principles include data sovereignty: that communities should decide what data represents them, retain ownership over it, and determine how it is used. This stands in sharp contrast to models built on scraped data and evaluated through benchmark gains. As researchers from the Carnegie Endowment for International Peace have documented, many systems that claim to support African languages are not fit for purpose for the communities they purport to serve.

Performative multilingualism in practice

If inclusion is measured through coverage rather than accountable use, failure will predictably appear where language matters most: governance, safety, and rights. When AI-powered content moderation systems cannot reliably parse languages spoken by millions, the result is not just poor user experience—it is a governance failure. An investigation by Global Witness and Foxglove ahead of Kenya's 2022 elections—which found that Facebook approved ads calling for ethnic violence and beheading in Swahili, passing through its systems without review—was not an anomaly. It was the predictable outcome of a moderation architecture optimized for English and degraded elsewhere.

The same pattern is emerging in public services. The World Bank’s Generative AI Foundations report documents how AI tools deployed in health, agriculture, and education produce inconsistent, and at times dangerous, outputs when prompts are in low-resource languages. As governments integrate AI into service delivery, a trend accelerated by the enthusiasm for digital public infrastructure showcased at India’s AI Impact Summit, these failures become questions of democratic accountability, not technical edge cases.

The question is not whether AI systems should work in more languages. It is what “working” means, and who defines it. A model that can translate a sentence in Yoruba is not the same as one that can reliably deliver public health guidance, adjudicate an insurance claim, or provide accurate civic information in that language. The contexts in which language matters most governance, justice, health, democratic participation are precisely where superficial support is most dangerous.

The governance gap no one is naming

Current AI governance frameworks are largely silent on this issue. The EU AI Act, which enters full force in August 2026, does not require high-risk systems to demonstrate equivalent performance across the languages of the populations they serve. South Korea’s AI Basic Act focuses on transparency and safety but does not address linguistic adequacy. The OECD AI Principles call for inclusive growth without specifying what inclusion entails for the thousands of language communities AI systems underserve.The Council of Europe's Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law, opened for signature in September 2024 and now signed by thirteen countries including the United States, United Kingdom, Canada, Japan, and the European Union, is the first internationally binding AI treaty in existence. It, too, is silent on linguistic adequacy.

The 2026 International AI Safety Report—authored by over 100 experts, backed by more than 30 countries, and the largest global collaboration on AI safety to date—acknowledges that AI risks are unevenly distributed and that the Global South faces distinct challenges of inclusion and institutional readiness. Yet linguistic performance disparity does not feature as a standalone risk category. The omission is telling: even the most comprehensive safety assessment treats language as a contextual factor rather than a structural vulnerability.

The implication is straightforward: there is no regulatory mechanism, in any major jurisdiction, that would prevent a high-risk AI system from being deployed in a multilingual context where it performs reliably in one language and unreliably in others. There is no requirement for impact assessments to evaluate linguistic disparities. There is no standard for what constitutes adequate language support in public-facing AI. Governance frameworks assume that if a system is safe, it is safe for everyone. The evidence suggests otherwise.

What would real inclusion require?

Genuine linguistic inclusion in AI would require at least three shifts.

First, community participation in data governance—not just consultation, but decision-making authority over how language data is collected, used, and shared. Initiatives like Masakhane, the Papa Reo project for Māori, and GhanaNLP demonstrate what this can look like. The African Union's Continental AI Strategy, endorsed in July 2024, gestures toward this vision. Several member state strategies explicitly prioritize AI development in local languages, but eighteen months of implementation data show 83 percent of AI startup funding on the continent is still concentrated in just four countries. Aspirational frameworks, even at the continental level, reproduce the same inequalities they are designed to address without binding accountability mechanisms.

More recent efforts point in a similar direction. In March 2026, the GSMA and Zindi launched the African Trust & Safety LLM Challenge, inviting data scientists to stress-test models across underrepresented African languages and code-switched contexts, with the outputs feeding into a reusable, Africa-focused safety benchmark. The initiative positions African AI practitioners not as subjects of evaluation but as authors of the standards themselves.

Second, AI governance frameworks—whether the EU AI Act, national strategies, or international standards—should incorporate linguistic performance as a dimension of risk. When high-risk systems are deployed in multilingual contexts, their reliability across relevant languages should be treated as a safety requirement, not a localisation feature.

Third, a move beyond benchmark performance as the primary measure of inclusion. Benchmarks capture performance in controlled conditions. They do not capture whether communities shaped the data, influenced representation, or have a say in deployment. Inclusion should not be measured by leaderboard scores, but by whether communities have meaningful governance over the systems that affect them. The multilingual AI gap is not closing. It is being repackaged in the language of progress while the underlying power dynamics—who builds, who benefits, who decides—remain unchanged. Until governance frameworks catch up, the most accurate description of linguistic inclusion in AI is this: for most of the world’s languages, it remains a promise made in English.

Authors

Sofia Olofsson
Sofia Olofsson is a Program Management Officer at the United Nations, working on AI governance, digital policy, and data-driven program design in the context of international cooperation. The views expressed are solely those of the author and do not reflect the positions of the United Nations or any...

Related

Perspective
When AI Can’t Understand Your Language, Democracy Breaks Down December 11, 2025

Topics