Home

Donate

When AI Policy is Handcuffed, Spend on Alignment

Philip Fox / Mar 18, 2025

Philip Fox works on AI policy at the KIRA Center, an independent think tank in Berlin.

Cristóbal Ascencio & Archival Images of AI + AIxDESIGN / Better Images of AI / Glitched Landscape / CC-BY 4.0

“The reason for this urgent e-mail concerns breakdowns of internal controls.” This is what then-Citigroup employee Richard M. Bowen III wrote to his bank’s executives in 2007 on a mission to warn them of the risky mortgage operations that played a crucial role in the financial crisis. He had been blowing the whistle internally for over a year but to no avail. Others had similarly called out instruments like Collateralized Debt Obligations (CDOs) in public, including a prescient warning by Warren Buffett in 2003 that derivatives incentivize “false accounting” and could threaten “the whole economic system.” Neither regulators nor corporate decision-makers heeded the advice, and the public learned about the dangers only when it was too late. The party was over by then, and the global financial system almost crumbled.

We have seen a similar pattern with social media: When companies like Meta switched to engagement-based algorithms in the early 2010s, they were well aware of how strongly this could shape a user’s information diet, an effect that researchers had studied as early as 2014. Still, it took a couple of more years until the impacts on democratic discourse and mental health became a topic of public concern.

The pattern seems to be this: Economic incentives favor the diffusion of some poorly understood product or technology that some insiders consider potentially dangerous. But while public scrutiny is low, these dangers are easy for policymakers to ignore, especially if something has beneficial as well as risky applications (such as social media or derivatives). No one wants to stifle innovation via regulatory overreach in response to risks few people seem to care about.

The public is asleep on AI

AI is a dual-use technology that will likely be vastly more impactful than social media, but it matches the pattern. Its potential impact – which could be anything between stunningly beneficial, moderately transformative, and catastrophic – is still largely invisible to the public. Since 2022’s “ChatGPT moment,” important advancements – like the OpenAI o3 model’s advances in challenging tests of PhD-level reasoning – have attracted relatively little coverage. The one major public reaction to AI news that we did see recently, around DeepSeek’s R1, has been nervous and often erratic, as Peter Wildeford, Lennart Heim, and others have explained (ranging from misleading cost estimates to ill-conceived ‘Sputnik moment’ parallels).

This puts safety-focused policy in a tight spot. With the public's attention turned elsewhere, policymakers seem to have little to gain and much to lose from AI safety efforts in the short term.

Of course, the most important reason why AI safety is currently unpopular with policymakers is plausibly the fear of missing out in the race for geostrategic dominance. While that’s surely a major part of the story, polls suggest it is incomplete. For example, a 2024 survey of US adults found a strong bipartisan preference (75%) for safe AI development over racing “as fast as possible to be the first country to get extremely powerful AI.” So, at least when explicitly asked, the American public does voice a clear preference for safety – but people just don’t bother enough to put AI safety on the public agenda right now.

There will be a wake-up moment

This situation will change at some point. Given the tremendous rate of progress, a major “wake-up moment” within the next 1-3 years is likely – or likely enough, at any rate, that AI companies will take this into account in their strategic planning (more on this below). On the assumption that, by that time, it won’t be too late to avert the worst outcomes, possible warning shots fall into three main categories:

  1. Singular accidents or misuse cases. Think of an AI-enabled cyberattack on critical infrastructure causing a 3-day outage in a developed country or an advanced AI agent attempting (and only barely failing) to self-replicate in the wild.
  2. Systemic labor market impacts. If a large number of people start losing their jobs, AI will make headlines very soon.
  3. The random and unpredictable. Who would have predicted that, of all AI-related news in the past 12 months, DeepSeek would cause the greatest stir? Public and media reactions aren’t always well-calibrated, so it’s very hard to anticipate what will shake people up. A head of state falling in love with an AI companion?

Of course, wake-up moments can go both ways. If AI solves quantum gravity, triples global GDP growth, or invents an affordable vaccine for Alzheimer’s disease, public awareness of AI’s opportunities could spike disproportionately in a positive direction before a significant negative incident subsequently raises awareness of risks.

Setting precedents while nobody’s watching

Either way, we’re now living through a phase of generally low awareness around AI’s potential impacts. This combines with a competitive landscape where key actors face strong independent incentives to rush ahead without much democratic oversight. This includes the economic incentives of AI developers competing for investment and market share and the geostrategic incentives of nation-states to keep a technological edge over adversaries. Moreover, there is an internal force among researchers to realize whatever is technologically possible, described by John von Neumann in the context of the atomic bomb: “What we are creating now is a monster whose influence is going to change history, provided there is any history left. Yet it would be impossible not to see it through.”

The current lack of public attention adds a more neglected factor to this. It doesn’t just fuel a general race dynamic but specifically incentivizes AI companies to quickly create facts on the ground that are hard to roll back. Because public scrutiny is currently low but likely to increase in the future, companies have reason to create path dependencies that improve their negotiating position after an eventual public wake-up, which usually comes with calls for stronger guardrails and oversight. Such dependencies can arise on three different levels:

  1. Economic dependencies. The increasing integration of AI into various sectors of the economy raises the AI industry’s systemic importance (even if it largely burns money in the meantime). This could make it ‘too big to fail,’ similar to certain banks during the financial crisis.
  2. Security dependencies. Public-private partnerships in the military domain or the use of AI in critical infrastructure tighten the bond between AI companies and state institutions, increasing the political leverage of the former.
  3. Technological dependencies. State-of-the-art reasoning models already generate synthetic data to train the next generation of models and will be increasingly used to automate AI R&D. If AI companies find a way to supercharge this, it could eventually result in ‘recursive self-improvement’: AI models continuously improving themselves in potentially greatly accelerated R&D cycles. Given the lack of public insight into lab-internal processes and typical government reaction times, this could kick us off on a technological trajectory that is very hard to stop.

AI companies have a strong reason to shape the economic, political, and technological landscape around them as described, in ways that increase their bargaining power and make political action after a wake-up moment harder rather than more manageable – not because the people working at these companies are evil or intrinsically power-seeking, but because a distinctive combination of competitive pressure, institutional self-preservation, and public indifference favor such behavior.

AI policy in the meantime

This is a very tricky situation for those who think that we can reap AI’s benefits only by emphasizing both innovation and safety. The AI industry faces a number of incentives to accelerate, raising its relative political strength in the meantime, while safety advocates find themselves in a dilemma described by Anton Leicht: On the one hand, they cannot rely on broad public support for safety policy (and, what is more, don’t seem particularly well-placed as a community to rally for it). On the other hand, the required policy measures aren’t mere technocratic fixes and are thus politically too costly to be quietly pushed through behind the scenes. In short, greater public buy-in seems both important and hard to obtain without a wake-up event (which, as I have said, might even make things more difficult for a time if it primarily makes the opportunities of AI more tangible). Is there a way out of this conundrum?

While there are a couple of different options, I here want to focus on one possible way forward that seems both natural and fairly neglected in recent discussions: While the window for more committed interventions is still too small, focus on drastically ramping up government funding for alignment research to mitigate national-security-level risks from advanced AI. (I understand this broadly to cover, e.g., research on AI control or robustness against jailbreaks – basically anything that increases societal control over powerful AI and reduces potentially catastrophic risks from AI systems of arbitrary capability.)

As a policy goal, this has a couple of advantages:

  • The focus on national security threats makes the issue sufficiently non-partisan. Everyone can agree that we should prevent such things as cyber attacks on critical infrastructure, engineered pandemics, or various loss-of-control scenarios The proposal is thus not easily politicized and doesn’t depend on a lot of public support.
  • It’s relatively cheap: Current research spending on alignment is very low (see below), so one would expect much low-hanging fruit and substantial impact even through modest increases.
  • Middle powers without homegrown frontier AI could be involved in the project (perhaps incentivized via some benefit-sharing), facilitating international coordination.
  • This could even set an important precedent in US-China cooperation in the medium term.

One clarification before a look at the numbers: I’m not assuming that alignment, especially for superhuman AI, is a tractable problem. Perhaps it’s essentially unsolvable or solvable only to a limited, ultimately insufficient degree. Even if that were the case, it seems like something scientists could learn and subsequently convey to policymakers. And so, if policymakers became aware that a major, concerted push to solve alignment didn’t bear any fruits, that would be an important insight to inform decision-making.

The Global Alignment Fund

How much does the world (including companies, governments, academia, and civil society) currently spend on alignment? A precise estimate is hard to obtain, as the question hasn’t been studied rigorously, and public data are scarce. However, a very optimistic, back-of-the-envelope calculation suggests that global spending is most likely <$400M annually – or <0.0004% of global GDP. (Stephen McAleese estimates that philanthropic organizations, industry, and academia spend no more than $176M/year on research to minimize risks from advanced AI. If we add to this $100M/year for the world’s AI Safety Institutes and a generous margin of error, $400M seems like a reasonable upper bound.) Even as a fraction of general VC funding for AI start-ups ($131.5B in 2024), this is a tiny 0.3%. (Oscar Delaney and Oliver Guest point out that current spending is also imbalanced.)

To address this imbalance, I propose a Global Alignment Fund. This fund would be government-backed, at least initially, and boost progress on the most fundamental problem in AI safety: nobody knows how to reliably control increasingly advanced AI systems. Since current attention to this problem is so low, even a minor ramp-up in government spending would go a long way.

To illustrate, consider the 11 countries – including Germany, India, Japan, the UK, and the US – that recently established (or committed to establishing) AI Safety/Security Institutes. I estimate that if these countries added just 1% of their annual R&D spending to the Fund, this would immediately lead to an almost 10x increase in global spending on alignment research (These countries have a total GDP of around $60T. I assume that they spend between 0.65% (India) and 4.93% (South Korea) of their annual GDP on R&D and that governments provide, on average, around 20% of that expenditure.) If governments were willing to spend more than 1%, even greater resources could be marshaled to solve what might be the most significant technological challenge humanity currently faces. The incentives to take part in this endeavor will vary somewhat by government – countries without frontier AI models might participate in exchange for model access rights, facilitating the adoption of safe models – but one central incentive is universally shared: It is in every country’s long-term strategic interest to mitigate national security-level risks from AI.

The implementation details of this proposal are beyond the scope of this text. Governments could pool funds internationally to set up a “CERN for AI Safety,” or they could opt for a more decentralized approach. They could fund research projects directly or offer AI companies tax credits to reward investments in alignment research, as Americans for Responsible Innovation recently suggested. What matters is this: When low public awareness, accelerating race dynamics, and an anti-regulatory vibe shift make costly safety policy increasingly difficult, a Global Alignment Fund can serve as an effective and politically more low-key measure to reduce severe risks from AI. The AI policy community should seriously consider this proposal.

Authors

Philip Fox
Philip Fox works on AI policy at the KIRA Center, an independent think tank in Berlin. He is a co-author of the International AI Safety Report and has a PhD in philosophy from Humboldt University of Berlin.

Related

The End Of The Beginning For AI Policy

Topics