What Does It Take To Moderate AI Overviews?
Varsha Bansal / Sep 22, 2025When right-wing commentator Charlie Kirk was shot during a public event in Utah this month, the next thing that followed was typical of any tragedy these days: a whirlwind of online misinformation. Except, it also included AI-generated false claims. X’s chatbot Grok claimed Kirk was “fine and active” even as the news of his death was everywhere, whereas Google’s AI Overviews — an AI generated summary that sometimes shows up at the top of Google’s search results — spread unverified claims that Kirk was on a hit list of perceived enemies of Ukraine, and in a separate query misidentified someone as a potential suspect.
This incident is the latest example of a concerning trend: during times of crises, the same loop repeats — early speculation gets scraped, summarized, and laundered by AI systems. “In practice, early rumors and speculation from social platforms get ingested by AI systems, stripped of context and returned as authoritative answers,” said Alex Mahadevan, director of MediaWise at Poynter Institute. “After the Kirk incident and during the LA protests, that showed up as confident misidentifications and invented details that flipped as facts evolved.”
Concerns around AI-generated misinformation are high, with the likelihood of chatbots repeating false information nearly doubling from 18% in August 2024 to 35% in August 2025. At the same time, placing AI-generated summaries atop a Google search provides more visibility to the summary, which could potentially be inaccurate. “Instead of acknowledging limitations, citing data cutoffs, or declining to weigh in on sensitive topics, the models are now pulling from a polluted online ecosystem,” said McKenzie Sadeghi, AI editor at NewsGuard. “The result is authoritative-sounding but inaccurate responses, and providing confident-sounding answers even when no reliable sources exist.”
Process of moderating AI Overviews
Google launched AI Overviews in May 2024 and has since expanded to over 200 countries, reaching 2 billion monthly users. But just weeks into the launch, when a user Googled cheese not sticking on pizza, its Overviews feature suggested adding glue. In another incident, it suggested that humans eat a rock per day.
It’s unclear how Google moderates breaking news events, but to verify the accuracy and grounding of other routine queries, the company has contracted AI raters. One such worker told Tech Policy Press that the accuracy of the model’s responses has improved over time, but they have seen the model get things wrong 25 per cent of the time. “It’s baffling [because] seems like it should be straightforward,” said this worker, requesting anonymity. “You're looking through text and looking for keywords and then coming back with an answer — why hallucinate at all?”
One possible explanation lies in the way the model handles queries, this worker explains. The model can handle queries in two ways: one where it simply “just verbatim plugs it in and does a sort of a lazy search,” and the other where it interprets the query, rephrases it, and breaks it down into multiple queries to run them separately. And it’s at the time of drafting the query itself that it may get it wrong. “Sometimes it asks questions that are entirely off topic,” explained the rater.
Another worker told Tech Policy Press that the process of moderating for AI Overviews is “pretty gruelling and taxing.” The Overviews are fed into a user interface, and raters pull from a queue that randomly assigns them cases. They work on each case based on the Google style guide and workflow, where they check for accuracy of the model’s response to the user query. But it doesn’t end there. This is followed by a process called “consensus,” where raters meet and bring each of their versions of a particular case into alignment. “The raters then do marathon meetings, sometimes for days, to align their cases,” said this worker.
Professor Chirag Shah of the University of Washington, who studies search, explains that Google’s AI Overviews is a RAG (Retrieval Augmented Generation) system that retrieves top results and then uses them to generate answers. “While retrieving first ensures that the latest and breaking news are discovered, the generation part still uses the model with a cutoff time weeks or months before,” said Shah. “That can often cause misalignments and erroneous generation.”
Regulatory oversight
Nine months after its launch in the US and other regions, AI Overviews made its way into the European Union — but only launched in eight member states, including Spain, Portugal, and Germany — and it remains unclear if and when the feature will be launched in other member states. This was potentially due to the regulatory uncertainty that comes in the region with rules including the Digital Services Act (DSA), the Digital Markets Act (DMA), and the AI Act. In fact, prior to launching AI Overviews in the region, Google submitted an “ad-hoc risk assessment” to the Commission. The report is currently under review. “The Commission will continue to monitor the situation as part of its ongoing compliance assessment under the DSA and will not hesitate to use its enforcement powers under the DSA if evidence of non-compliance is found,” said Executive Vice-President Virkkunen on behalf of the European Commission.
According to policy experts in the EU, since Google is designated as a Very Large Online Search Engine (VLOSE), it has clear risk assessment and mitigation obligations under the DSA. And given Google's monopoly in online search, coupled with the fact that AI Overviews are the first thing users see, the company should be held accountable. “The DSA makes it clear that very large online platforms and search engines are responsible for systemic risks that stem from the design and functioning of their services and related systems — including AI Overviews in this case,” said Raziye Buse Çetin, head of policy at the European non-profit AI Forensics. She further added that the accountability also extends to the underlying model. “Google’s AI Overviews run on Gemini, which would fall under the AI Act’s risk management framework,” she said. “If Gemini is designated as a model with systemic risk, its provider must assess and mitigate those risks at the Union level.”
Just days prior, an alliance of media and digital organizations in Germany filed a complaint against Google’s AI Overviews, citing that it violates the DSA by reducing website traffic from independent media, as users are less likely to click on links due to the AI-generated summaries. The complaint further cites studies and argues that AI spreads faulty or fabricated content, which is a direct contradiction to the goals of the DSA. They said that Google is obliged to counter such misinformation risks.
Policy experts in the US have different views. Neil Chilson, head of AI Policy at the Abundance Institute, said that the European approach to misinformation clashes with America’s legal and cultural commitment to free speech. “Under the First Amendment, even inaccurate and false speech is protected,” he said. “Americans trust that the best way to defeat falsehoods is not to suppress them, but to confront them with more speech — that protection applies to AI-generated summaries as well.” Interestingly, earlier this year, a Minnesota solar company sued Google for defamation after its AI Overviews feature falsely said that the company faced a lawsuit.
Chilson added that summaries — whether created by humans or AI — are only as reliable as their sources. “In fast-moving events, sources can be wrong,” he said. “The reporting around the Charlie Kirk assassination illustrated this: even law enforcement officials were updating and correcting their statements in real time.”
At the same time, however, there has been a call for algorithmic transparency in the US by the Center for AI and Digital Policy (CAIDP) to understand whether an algorithm is moderating, summarizing, promoting, generating, mediating, or deciding content, or affecting other decisions. “US lawmakers should favor robust mechanisms for algorithmic transparency that provide access to the logic, factors, and data that provide the basis for the outputs (whether it is a summary like Google Overviews or other automated decision or chatbot output),” said Christabel Randolph, an associate director with the CAIDP.
Then there’s the question about access to credible sources. With the recent implementation of the European Media Freedom Act, AI Overviews has already come under scrutiny for media fairness and reducing web traffic to news sites and credible journalistic outlets. Poynter’s Mahadevan is also concerned about that in the US. “The AI overviews are starving legitimate news sources and fact-checkers, who are doing the legwork during breaking news, of the clicks they need to pay for all that work,” he said. “They're sort of crippling the fourth estate, while at the same time filling an information vacuum with misleading information that can fuel polarization.”
One reason there hasn’t been much policy intervention around AI Overviews in the US, per policy experts, is that there aren’t clear, unaddressed legal harms. “In the US, the First Amendment broadly protects speech, and defamation law already provides remedies for certain false statements,” said Chilson. The general policy contours in the US suggest that AI companies should not be legally liable for mixing the speech of other third parties, even if that information is not accurate, explains Spence Purnell, a senior fellow at think-tank R Street Institute. “In the United States, we are allowed to speak untruths so long as speech doesn’t violate a few very narrow exceptions like libel and slander,” he added.
That said, one potential way forward, according to Mahadevan, would be for Google to disable AI Overviews on volatile, breaking-news queries and require agreement from multiple high-reliability sources while attaching per-claim citations with visible timestamps. Çetin also explains that for such events such as the one with false reports of Kirk’s death, moderation happens reactively through classifiers, content filters, and blocking tools because such cases cannot be fixed at the model level.
Authors
