Contesting AI Safety
Victor Zhenyi Wang / Sep 12, 2024Since the release of ChatGPT in 2022, AI policy and regulation have increasingly focused on safety. With California set to pass the first comprehensive legislation around AI safety, a natural question might be: what do policymakers mean when they talk about safety?
The 2022 White House Blueprint for an AI Bill of Rights (AI Blueprint) emphasizes safety under the first principle of Safe and Effective Systems. Here, safety focuses on preventing unintended harmful outcomes by narrowly scoping data use, conducting effective audits of AI companies through independent institutions, and broadly addressing harms such as discrimination and bias. Under this view, which draws from conventional safety science, safety is concerned with minimizing risks to within an acceptable threshold under the standard operating conditions of the technology. It approaches safety from a sociotechnical perspective.
Yet California’s recent bill, SB 1047 - Safe and Secure Innovation for Frontier Artificial Intelligence Models Act - interprets safety differently. It focuses primarily on preventing certain kinds of AI models from causing incidents that result in mass casualties or significant financial damages - including the extinction of humanity. These are existential risks associated with AI. This view is motivated partly by fears of malicious actors operating outside the standard operating conditions most AI users experience and more general anxieties tied to the AI systems themselves. Such existential risks stem from a distinct intellectual history that traces back to the early days of cybernetics.
AI safety and existential risk
Concerns about the risks artificial intelligence poses to humanity are as old as the field itself. In the early days of cybernetics, machines embodied concepts such as learning or intelligence. Norbert Wiener, in a 1960 essay, raised a fundamental issue in human-machine interaction, warning that “if the machines become more and more efficient and operate at a higher and higher psychological level, the catastrophe […] of the dominance of the machine comes nearer and nearer.” Wiener suggests that if machines become sufficiently capable, they may question human control, seek to reverse it, and then dominate us. In this way, existential risks are inherent to how we conceptualize AI, separate from how users interact with AI models.
The dangers posed by these machines arise from the idea that they “transcend some of the limitations of their designers.” Even if rampant automation and unpredictable machine behavior may destroy us, the same technology promises unimaginable benefits in the far future. Ahmed et al. describe this epistemic culture of AI safety that drives much of today’s research and policymaking, focused primarily on the technical problem of aligning AI. This culture traces back to the cybernetics and transhumanist movements. In this community, AI safety is understood in terms of existential risks—unlikely but highly impactful events, such as human extinction. The inherent conflict between a promised utopia and cataclysmic ruin characterizes this predominant vision for AI safety.
Both the AI Bill of Rights and SB 1047 assert claims about what constitutes a safe AI model but fundamentally disagree on the definition of safety. A model deemed safe under SB 1047 might not satisfy the Safe and Effective principle of the White House AI Blueprint; a model that follows the AI Blueprint could cause critical harm. What does it truly mean for AI to be safe?
AI safety as a ‘boundary object’
I argue that AI safety functions as a boundary object - something that allows different communities of practice to share a loose common understanding, even without agreement on a precise definition. Safety is discussed, conceptualized, and applied differently within each community through varying reinterpretations and framings. By positioning safety in this role, we recognize it as a concept that is contestable and shaped by the social and political processes surrounding it.
Boundary objects allow for collaboration between different parties. In reading the White House’s Executive Order 14110 on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, safety appears to encompass everything from civil rights and labor disruptions to biosecurity, monitoring of model training by foreign actors and the safe enjoyment of “gains … from technological innovation.” The Executive Order, an artifact shaped by different conceptions of safety, reflects various communities of practice around AI, each with its own interpretation of safety.
How does safety’s polysemous nature affect the politics of AI? In the context of national security, an AI cold war between the US and China offers us a glimpse of how different communities of practice use safety as a foundation for collaboration. Here, two distinct objects of safety for the US, China, and AI, merge on the issue of national security. For example, the CHIPS and Science Act and export controls on certain categories of microchips are mostly about competition with China. Yet, within an AI cold war narrative, regulatory action on the supply chain for frontier computing chips also becomes a safety issue by limiting the distribution of frontier AI systems and setting constraints on the capabilities of the chips.
Concerns about China’s AI capabilities are amplified by existential risk narratives that fuel new anxieties about the global impact if China were to “win” the AI arms race. Even though escalating tensions with China may contribute to existential risk as much as reduce it, these political tradeoffs vanish under the guise of AI existential safety. Meanwhile, one might question whether such national security concerns are the best approach to achieve the kind of existential safety regime that SB 1047 seems concerned with. Terms like “catastrophic risk” or “cold war” support ideas of safety that primarily serve specific commercial and political interests, and effectively shut down a more robust debate around AI safety.
With this in mind, consider Vice President Harris’ remarks at the AI safety summit in the UK last November: “[w]hen a woman is threatened by an abusive partner with explicit deepfake photographs, is that not existential for her?” By making this statement, Harris contests the conference’s primary framing of safety as most concerned with existential risks and technical governance approaches. In short, she raises the question, “safety for whom?”
The “whom” in AI existential safety is universal, addressing existential risk to all of humanity without specifying the subject. Lucy Suchman suggests that the subject in reference to AI has historically been universalized. For example, in propositional logic, we might make a statement concerning the subject S that knows some proposition p; feminist critics, however, might challenge this framing of an impartial agent with “who is S?” For those most concerned with existential risk, the question of “whom” may seem irrelevant when the very fate of humanity is at stake. However, in a different community of practice, these questions are integral to understanding and shaping the sociopolitical arrangements around us.
For instance, advocacy organizations and academics have long raised the alarm about how automating social services exacerbates discrimination and domination. Virginia Eubanks describes the creation of a “digital poorhouse” where hiring algorithms discriminate against women and racial minorities. Countless other examples of actual harms produced by the automation of various aspects of public life existed long before generative AI became prevalent. Marginalized groups face harm from these sociotechnical systems within their standard operating procedures even if such systems might appear to offer benefits to a more abstract, general subject S. Within AI safety, AI should not exacerbate these existing injustices. This raises the question: does the existence of these harms suggest that AI is misaligned?
Recognizing safety as a boundary object brings contestability back into regulatory discussions of AI. The trajectory of AI adoption in society is not inevitable. It is contingent on the actions and policies we set today. We can imagine a more participatory process for developing responsible AI. We can impose strict constraints on AI use in society as the EU AI Act does by rejecting the use of AI in specific situations that pose unacceptable risks, such as facial recognition. We can also pass stronger employment laws that guarantee worker rights and protections in the face of growing automation. Ultimately, our goal should be governing AI, not AI governing us.