Perspective

Digital Sovereignty Means Breaking the Western Monopoly on AI Meaning

Sujata Mukherjee, Sasha Maria Mathew / Apr 28, 2026

This is the second in an article series related to language equity in trust & safety. See the first article here.

Categorize by Pauline Wee & DAIR / Better Images of AI / CC by 4.0

Traditionally, digital sovereignty has focused on “hard” assets—data centers, high-performance chips, and network cables. Yet this focus neglects a more fundamental layer of control: the invisible architecture of meaning that determines how AI interprets human intent. In an algorithmically-infused world, the power to decide what constitutes harm, humor, or a threat in languages like Kannada, Amharic, or Icelandic is currently held by a small number of Western developers. By encoding a narrow cultural consensus into global systems, these developers have created a form of "linguistic capitalism," or a monopoly on meaning-making that threatens a community’s autonomy as much as any physical infrastructure.

This brings us to the emerging concept of semantic sovereignty, defined as the right and ability of a community to ensure that its language, culture, values, and knowledge systems are accurately and fairly represented in the digital environment. This concept of communal self-determination is becoming increasingly salient in digital rights discourse, as AI deployment scales. Language is more than a vehicle for data exchange; it is a web of behaviors, rituals, and rules that encodes a community’s lived reality. When AI systems are trained through a single cultural lens, they don’t just fail to understand speakers of low-resource languages—they systematically misrepresent them at scale.

However, sovereignty in the digital age cannot be a passive status; it must be an active practice. This is why we advocate for a transition from the broad goal of semantic sovereignty to the practical execution of semantic ownership. Reclaiming this ownership requires a shift in how we build and govern technology. Drawing on the OCAP principles (Ownership, Control, Access, and Possession) established by Canadian First Nations, we argue for a "semantic engineering" layer that provides cultural communities with the actual tools of ownership. Sovereignty is ultimately an act of creation. As Mateo Romero and Robert Preucel have noted, it is the process of claiming digital space by building systems that reflect a community’s own reality.

In our previous article, we proposed measures to estimate the growing language equity gap. In this piece, we go further: we argue that the gap is not simply a data problem or a translation problem. It is an alignment failure—a consequence of AI systems whose governing purpose was never defined by, or for, the communities they now purport to serve. As LLMs begin to power public infrastructure and mediate personal decisions in the Global Majority, cultural and linguistic alignment is at the front line of semantic ownership.

Semantic engineering, or governing Purpose

To move from theory to implementation, we must first understand how cultural bias sediments within AI systems. The Data-Information-Knowledge-Wisdom (DIKW) hierarchy, a standard framework in information science, illustrates how raw data matures into actionable insights. When mapped onto an AI training pipeline, these layers reveal where dominant cultural assumptions accumulate:

Data (D): The raw pre-training corpus. This foundation already contains linguistic biases and cultural assumptions well before a single value choice is made.
Information (I): The stage where raw text becomes organized internal knowledge, forming semantic associations and syntactic patterns.
Knowledge (K): The model’s library of facts and rules, distilled through fine-tuning and instruction tuning.
Wisdom (W): The preference-learning stage (such as RLHF). Here, the model implicitly adopts judgments of what is "safe" or "appropriate."

However, the DIKW hierarchy is too linear and simplistic. Recent research proposes an extension by adding Purpose (P), forming a networked cognitive loop. In the DIKWP model, Purpose is not simply another layer stacked on top of Wisdom. It is the governing frame that cascades downward, shaping how the model learns representations and renders judgments. Purpose is where agency resides - but is also the lever that current AI development most consistently leaves implicit and culturally uncontested.

This framing redefines the problem: safety is not a translation problem. A model can translate words accurately, converting data to useful information, but fails the safety test because it lacks localized linguistic and cultural specifications. When a model embeds a purpose that reflects the cultural defaults of primarily English-speaking designers, speakers of languages like Swahili or Kannada pay the price. In these contexts, trust & safety failures are not merely data deficits; they are purpose alignment failures.

Instrumenting Purpose: an accounting of available methods

How, then, do we engineer purpose into a model, and to what extent can it shape a model’s ability to reason about the world? There are several paths, each with different technical and governance trade-offs.

Pre-training ownership: Community control over the data and information layers is the most direct path to semantic ownership. Because cultural values are deeply ingrained in these base layers, this would require massive technical and governance investments that have yet to be realized in practice.
Reward model design and auditing: Involving communities in the design of reward models—defining exactly what a preference model should optimize for in a specific cultural context—can powerfully shape a model’s cultural alignment. This also presents a significant trade-off in scalability and consensus management, and can lead to unpredictable or "hallucinated" cultural reasoning as cultural norms and “red lines” evolve.
Representation Engineering: This operates by embedding culture-specific semantic associations within a model’s internal representation space to "steer" it toward specific cultural associations. While this can drive cross-cultural alignment without significant retraining, it still requires deep technical expertise and access to ML infrastructure.
Inference-time Methods: Techniques like cultural prompting or Retrieval-Augmented Generation (RAG) can provide models with access to community-defined rules in real-time, and have been moderately successful in culturally divergent settings. These methods are accessible but limited; they often fail to generalize across novel contexts and do not override the model’s original, underlying cultural defaults.

Constitutional AI (CAI), a method developed by Anthropic, offers a more promising balance. In this approach, a pretrained model critiques and revises its own outputs based on a set of explicit written principles, with a separate model scoring outputs based on those principles and driving a fine-tuning loop through reinforcement learning. When native speakers and cultural experts are empowered to author these “constitutions” and create adversarial datasets for evaluation, they can systematically shape a model’s behavioral dispositions, i.e. participate in engineering its purpose.

Deliberately transcribing culturally attuned purpose into a model offers another important benefit—it renders cultural value assignment explicit and legible, turning the black box into a more transparent system. In trust & safety, many failures occur because a term is information-neutral but context-dangerous. A conventional benchmark tells us only that a model returned a false negative. It does not disclose if that failure was a data gap, a reasoning failure, or a constitutional misalignment. A purpose-driven evaluation regime offers a more transparent alternative. By explicitly encoding community-defined cultural dispositions into the model (through any of the engineering paths described above ) and then evaluating whether those dispositions are faithfully executed in practice, we can evaluate points of failure in the chain of reasoning. When a model fails on a caste-coded term, a purpose evaluation can locate the failure precisely: the model correctly identified the information (the words used) but defaulted to its pretrained cultural heuristic rather than the constitutionally mandated prior.

While the methods described above show technical promise, their true challenge is institutional - building the organizational infrastructure for genuine community authority requires distributed, rather than narrow, remote points of governance.

Semantic ownership through distributed linguistic governance

Effective global governance of general-purpose AI models requires moving from an “etic” perspective—observing a culture from the outside through one's own biases—to an “emic” perspective, which seeks to understand the world as community members experience it. Currently, the industry standard is the “etic” approach: centralized teams in Silicon Valley attempt to moderate content in Swahili or Amharic using translated and/or superficial guidelines. The human consequences of this remoteness gap are not theoretical; they are documented in catastrophic failures across the Global South.

Community-driven, participatory constitutional alignment exercises based on crowdsourced public input are being studied. However, even the best-intentioned participatory methods can feel tokenistic. To move from participation to true semantic ownership, the governing purpose of an AI system must be defined and managed by the linguistic communities themselves. This requires trust & safety organizations to evolve, colocating policy teams within the regions they aim to serve.

The shift to semantic ownership also necessitates a fundamental redefinition of the language specialist's role. The current model of perfunctory attention to language inclusion by hiring entry-level, production-oriented labelers or consulting local advocacy groups at infrequent intervals is inadequate. We recommend elevating these roles as semantic engineers, shifting from perfunctory content labeling to the technical task of architecting meaning. These practitioners don’t just review posts; they design the ontologies and knowledge graphs that represent a culture’s core values. For companies building frontier models intended for global deployment, employing semantic engineers drawn from local language and cultural contexts is increasingly an epistemic requirement rather than an organizational prerogative.

These roles require hybrid capabilities—combining the linguistic and cultural lens of social sciences with the technical ability to translate those nuances into structured data that a model can actually ingest. Transitioning to this model requires moving beyond volume-oriented quota-driven hiring. Today, hiring for language specialists often prioritizes broad language ‘coverage’—treating a single standardized language as a monolith while ignoring the distinct regional dialects and social cues that define how harm is actually expressed. Future semantic engineers must be trained in knowledge engineering and systematic language modeling. Preparing this workforce requires trust & safety organizations to invest in “upskilling” current language specialists, moving them away from labeling tasks towards designing semantic guardrails— like community constitutions—that govern model purpose at scale.

Both our organizational recommendations—colocating policy teams and elevating language specialists to semantic engineers—imply a fundamental shift in budgetary logic, treating language support not as a variable cost of moderation but as a deliberate, fixed R&D investment. For organizations, this means prioritizing deep cultural competency over broad, superficial "coverage," even if it necessitates a slower, more deliberate rollout of AI features in complex linguistic markets. The strategic choice is straightforward: pay now for proactive alignment to enable scalable policy architectures, or pay later for reactive crisis management where harm is moderated after the fact.

Conclusion

In our previous work, we focused on measuring the language equity gap. In this piece, we argue for a structural fix: instrumenting semantic ownership as a constitutional frame for LLMs. In the analog world, we don't let outsiders define the safety "red lines" for a local community. Yet, with generative models increasingly deployed in intimate personal and high-stakes professional spaces, we allow a narrow cultural consensus to dictate what is "safe" for divergent linguistic contexts. Semantic ownership is the digital reclamation of that analog right to self-definition.

Generative models are no longer peripheral tools - they increasingly mediate intimate personal and high-stakes professional decisions for speakers of every language. When the purpose layer of these systems is calibrated to a narrow cultural consensus, communities are not merely misrepresented; they are dispossessed of their right to self-determination and the capacity to set their own boundaries between what is safe and what is harmful.

However, this path is not without risk. As the demand for semantic autonomy grows, so does the threat of a "splinternet" composed of parallel, self-contained reality bubbles, where culturally scoped AI intermediaries make cross-cultural understanding impossible. The same constitutional architecture that empowers a linguistic minority to define its own norms could, in the wrong hands, be weaponized by authoritarian regimes to enforce ideological conformity under the guise of cultural self-determination.

The rebuttal to this concern lies in the failure of the alternative: a centralized, Western-centric ‘monoculture’ often creates the friction and resentment that authoritarianism feeds upon. Transparent, community-owned semantic guardrails lay the groundwork for international legitimacy. True interoperability in AI will not come from a forced global consensus, but from a common technical protocol that allows diverse cultural purposes to communicate without erasing one another.

This article is meant to provoke future work rather than settle it. We write as trust & safety practitioners who recognize the stakes on both sides, and we welcome collaboration on designing solutions that are equal to the complexity.

Support Tech Policy Press

If you've found our work helpful, consider supporting us.

Donate

Authors

Sujata Mukherjee

Sujata Mukherjee is a problem solver, trust & safety leader and digital anthropologist with 20 years building trusted product experiences, leading research and scaling CX functions. She is currently Senior Director of Trust & Safety Product Management at Upwork, and previously led research, advocacy...

Sasha Maria Mathew

Sasha Mathew is a technology policy leader with 10+ years at the intersection of digital rights, online safety and regulatory policy, in jurisdictions across the US, UK, Europe and India. She is currently the global head of the platform policy team at Bumble. Previously, she was a product policy lea...