Home

Donate
Perspective

Using AI to Reform Government is Much Harder Than it Looks

Ben Green / Jun 3, 2025

This piece is part of “Ideologies of Control: A Series on Tech Power and Democratic Crisis,” in collaboration with Data & Society. Read more about the series here.

Elon Musk participates in a farewell press conference alongside US President Donald Trump on Friday, May 30, 2025, in the Oval Office. (Official White House photo by Molly Riley)

Last week, Elon Musk announced his official departure from the Trump administration and the Department of Government Efficiency (DOGE). In a Friday afternoon press conference with President Donald Trump in the Oval Office, Musk indicated that he would continue to advise the President, and that DOGE would continue its work. “The DOGE influence will only grow stronger,” he said. “It is permeating throughout the government.”

But despite the list of supposed accomplishments that Trump read from behind the Resolute Desk, for Musk, his tenure in government fell far short of expectations. He originally boasted that, aided by AI, it would be possible to cut a trillion dollars of government spending. Now, after reaching just a fraction of that goal (and possibly even increasing long-term budget deficits), Musk appears chastened. In an interview with the Washington Post, he acknowledged that “it sure is an uphill battle trying to improve things in DC.”

Given that Musk’s true priority was apparently to fire workers and cut off aid to vulnerable groups, his inability to achieve more drastic cuts is welcome to many. However, his experience highlights a broader lesson that transcends DOGE: using AI to reform government is much harder than policymakers and technologists assume.

This lesson is important because it isn’t only Musk and DOGE who are bullish on AI as a tool for reforming government. In April, the Office of Management and Budget released a memo directing agencies to “accelerate the Federal use of AI.” Meanwhile, beyond the Trump administration, many states—including ones led by Democrats—are also actively exploring how they can use AI to improve efficiency and decision-making.

For policymakers and other government officials, these efforts seem like a prudent response to the rapid developments in AI. The novel capabilities of publicly available tools, combined with promises by technology companies that further advances are just around the corner, make it easy to believe that AI is poised to rapidly improve all facets of government.

Despite this hype, however, the reality is much more sobering. The problem isn’t just that many AI tools are unreliable and depend on messy datasets, although those issues are pervasive. The deeper problem is a large gap between technical novelty and practical functionality.

Even technically sophisticated AI tools are often unhelpful in practice. Meanwhile, the bipartisan belief in the rationality of reforming government with AI provides cover for austerity policies and the right-wing takeover of government processes.

The key to understanding this gap between hype and reality is considering how governments integrate AI into their operations. When we examine AI in context, rather than fixating on a tool’s technical capabilities, we can identify three particularly significant challenges that make it difficult to improve governments with AI.

Challenge #1: There’s a big difference between benchmark tests and practical functionality.

Government officials—particularly the leaders of DOGE—sometimes present AI as a full-scale replacement for human workers. These claims are often grounded in comparisons that show AI tools matching or exceeding human performance on tests like the bar exam.

Despite their compelling nature, these tests of AI performance are misleading, as they fail to measure how AI performs on real-world tasks. Actual human labor requires more multifaceted behaviors than what these tests evaluate.

After all, lawyers don’t sit around answering bar exam questions all day. Crucially, because AI operates very differently from people, there’s no evidence that an AI tool that scores well on the bar will be as good a lawyer as someone who achieves the same score. In fact, there’s evidence that—despite their bar scores—large language models regularly hallucinate when responding to legal questions.

One of DOGE’s plans is to replace many government coders with AI agents that write software. These AI software tools may appear suitable for the task, given tests showing that they can pass engineering interview tasks at near-perfect rates. The issue is that real software engineering work is much more complicated than these interview tasks. Government software engineers must follow security protocols, integrate their code into complex codebases, and ensure the code is easy to maintain. AI is unable to manage all of these tasks. As a result, injecting AI code into government software will likely lead to broken software, hacks, and compromised data.

Challenge #2: AI isn’t beneficial unless it’s integrated into domain-specific workflows.

Recognizing that AI generally can’t replace human workers, many government agencies have adopted AI tools to augment workers. They hope that AI can help government workers analyze information and make decisions.

Although this outcome is possible, a significant barrier is that AI doesn’t automatically make people work more efficiently and effectively. An algorithm will be beneficial only if decision-makers want to receive the tool’s advice and can act on it. For this to occur, AI tools need to be highly tailored to the domain-specific goals and workflows of government staff. However, technologists rarely spend the time to learn about these needs and operational processes.

Mismatches between the needs of workers and the outputs of AI tools are a common source of failure for government AI. Consider how social workers in Allegheny County, Pennsylvania, responded to an AI-based tool designed to help them identify which families to investigate for child maltreatment. Caseworkers noted a significant discrepancy between their goals and the advice provided by the algorithm. While they prioritize children’s immediate safety, the screening algorithm predicts children’s safety over a two-year span. Given this difference, caseworkers didn’t find the algorithm helpful and often ignored its suggestions.

Similarly, here’s how one employee assessed the new AI chatbot that DOGE deployed to federal workers at the General Services Administration: “It’s about as good as an intern. Generic and guessable answers.”

Challenge #3: It’s difficult to achieve reliable human-AI collaboration.

Suppose a government adopts an AI tool whose advice is relevant and actionable for decision-makers. Problem solved, right? Not necessarily.

In theory, pairing people and AI should yield the best of both worlds: the accuracy and consistency of algorithms alongside the oversight and expertise of people. In practice, however, this desired fusion mostly fails to materialize.

A central challenge is that people are bad at judging the quality of AI advice—they struggle to discern which recommendations are good or bad. As a result, human decision-makers place too much trust in the algorithm, leading them to follow incorrect and biased recommendations. As a notable example of this behavior, police across the United States have relied on clearly incorrect matches by facial recognition algorithms to arrest Black men with no connection to the crime being investigated. One spent ten days in jail, and it took almost a year for the charges against him to be dropped.

A typical response to these issues is to pair AI outputs with explanations, using text or visualizations to help human users understand the algorithm’s advice. Although it seems intuitive that this information will help people decide whether the algorithm’s advice is trustworthy, that isn’t what happens in practice. Instead, explanations have the unintended effect of increasing user trust in AI tools, making people more likely to accept incorrect advice.

Responding to these challenges

Expecting AI to be helpful in every situation, prompting the desire to “accelerate” an “AI-first strategy” in government, is a recipe for failure. When governments fail to consider these three challenges, it leads members of the public to face understaffed departments, inefficient bureaucratic processes, and nonsensical decisions.

The direct response to these challenges is for policymakers to become more skeptical about whether adopting AI will benefit the public. Rather than strive simply to increase the government’s use of AI, policymakers should strive to enhance the government’s ability to assess whether AI will improve organizational processes. Before adopting any AI tool, governments should require concrete evidence that the tool is reliable for the intended purpose, that workers find the tool useful, and that the tool demonstrably improves human decision-making.

Alongside this direct response, policymakers need to rethink the values that justify government AI adoption in the first place. Across the political spectrum, the rationale of increasing efficiency underlies most efforts to promote government AI. But as DOGE highlights, “increasing government efficiency” generally means “implementing austerity.” To leverage AI in ways that actually improve the lives of the public, policymakers must begin by altering their assumptions about what it means to improve government. Reforms should be oriented toward supporting the dignity of public servants and the ability of all people to lead flourishing lives. Otherwise, efficiency-driven AI reforms are going to prove the truth of Musk’s promise—or threat—of DOGE’s influence permeating all levels of government.

Authors

Ben Green
Ben Green is an assistant professor in the University of Michigan School of Information and an assistant professor (by courtesy) in the Gerald R. Ford School of Public Policy. His first book, The Smart Enough City: Putting Technology in Its Place to Reclaim Our Urban Future, was published in 2019 by...

Related

DOGE Plan to Push AI Across the US Federal Government is Wildly DangerousMarch 6, 2025
Anatomy of an AI CoupFebruary 9, 2025
Perspective
Tech Power and the Crisis of DemocracyJune 3, 2025
Perspective
The Myth of AGIJune 3, 2025

Topics