The Missing Fair Use Argument in the Copyright Battle Over AI Summaries
Robert Diab / Feb 4, 2026
Teresa Berndtsson / Letter Word Text Taxonomy / CC 4.0
In 2025, much of the litigation over AI has focused on the use of protected content in AI overviews and summaries. Yet attention has largely remained focused on a set of earlier cases challenging the use of copyrighted works in model training. This is despite the fact that summaries are rapidly becoming the dominant use of AI, as seen in the explosive growth of tools like Perplexity and OpenAI’s search functions, and in Google’s AI Overviews becoming a fixture on the world’s most trafficked website.
The stakes for the future of AI are high, yet the legal issues in these cases have been under-explored.
Two of the lawsuits target Google for antitrust violations, alleging that publishers’ content is effectively compelled for inclusion in AI Overviews as a condition of search visibility. However, most of the cases assert that AI-generated summaries infringe copyright. Plaintiffs object to the scraping of their content for use in “retrieval augmented generation” (RAG), arguing that AI summaries can substantially reproduce protected works and divert traffic, thereby defeating any claim of fair use.
Courts have ruled on dismissal motions in three of these cases, reaching mixed conclusions on whether AI summaries are substantially similar to the underlying works. But no court has yet addressed the more fundamental question: even where summaries do resemble protected content, might they still qualify as fair use?
That question exposes a gap in the litigation to date. AI summary tools are not merely abridgment engines. They are systems that automate and transform the search process itself—often enabling further inquiry, comparison across sources, and ongoing dialogue. Taking that wider view changes how all four fair use factors should be understood.
Summary of claims and decisions in the copyright actions
The principal cases include actions by Advance Magazine Publishers (Condé Nast) against Cohere, an enterprise AI provider; Britannica, Merriam-Webster, and Dow Jones against Perplexity; and by various plaintiffs, including book authors and the Center for Investigative Reporting (CIR), against OpenAI and Microsoft.
In the cases against Cohere and Perplexity, plaintiffs advance the same four claims. First, the defendant copies protected works when scraping content for inclusion in its RAG database. Second, the resulting outputs or summaries include full or substantial reproductions of their works. Third, those outputs serve as substitutes for accessing the plaintiffs’ content on their own websites. Fourth, in cases where the defendant’s output contains hallucinations that are falsely attributed to the plaintiffs, the defendant damage plaintiffs’ trademarks.
Plaintiffs further contend that these summaries do not qualify as fair use because they’re not transformative. They argue the output does no more than duplicate the plaintiffs’ content and divert users away from accessing the plaintiffs’ website, with the intent to monetize the diverted traffic.
The actions against OpenAI and Microsoft are part of the first wave of lawsuits challenging the use of protected works for model training. CIR and author groups have also alleged that ChatGPT and CoPilot summaries or “abridgements” involve unauthorized copying that results in substantially similar outputs.
In April 2025, Judge Sidney Stein of the Southern District of New York denied a motion to dismiss several claims but granted dismissal of CIR’s abridgement claim. After reviewing roughly a dozen pages of summaries from ChatGPT and CoPilot, Stein found them “not substantially similar, qualitatively or quantitatively,” to CIR’s original news content. Citing the Second Circuit’s decision in Nihon v. Comline (1999), Stein held that the abridgments “differ in style, tone, length, and sentence structure from CIR’s articles.” They also “present the ‘facts in a different arrangement’—bullet point lists or short summary paragraphs—‘with a different sentence structure and different phrasing’”. Notably, two contained lengthy verbatim extracts. Scanning through them as a whole, I find it hard to see an obvious difference between the form of these summaries and what we find in a Perplexity or ChatGPT summary of news or commentary sources today.
In October 2025, however, Judge Stein declined to dismiss infringement claims related to AI summaries in a class action brought by authors of fictional works, including George R.R. Martin. Surveying various ChatGPT summaries of books in the “Game of Thrones” series, Stein found them to be substantially similar “based on the output’s incorporation of such copyrightable elements of Martin’s original work as setting, plot, and characters.”
That reasoning has been questioned by copyright scholar Matthew Sag. The fact that ChatGPT can generate brief summaries of Martin’s books or outlines for potential sequels “falls well short of demonstrating that such outputs by themselves would be regarded by the ordinary observer as substantially similar to a fully realized novel,” Sag argues. If those outputs are deemed sufficiently similar, he suggests, then the same would have to be said of the countless plot summaries found on Wikipedia.
In November of 2025, Judge Colleen McMahon of the same court in New York denied Cohere’s motion to dismiss in litigation over its use of AI summaries. McMahon found that the plaintiffs had established a plausible case for infringement by pointing to evidence that Cohere copied entire works when scraping the plaintiffs’ websites. She also found evidence that Cohere produces full or substantial excerpts of plaintiff content in summaries and misattributes hallucinated answers, which could be damaging to plaintiffs’ brands. A key difference between this case and the OpenAI cases is that many of the summaries at issue consisted of output “nearly identical to Publishers’ works,” including one that copied “eight of ten paragraphs from a New Yorker article with very minor alterations.”
Perplexity has yet to file a reply in the two actions against it discussed above, but it has sketched the outlines of a defense in blog posts. The company maintains that it does not crawl or scrape sites, but relies instead on “user-driven agents,” which “only fetch content when a real person requests something specific” and “do not store the information or train with it.” Even so, fetching content likely still entails intermediate copying. Either way, the company contends, its summaries use “publicly reported facts” to create content that is new and useful to millions of users. To the extent that its answers contain excerpts of original works, the company contends, those excerpts are fair use because they are limited and non-competitive in nature — and any instances of full or substantial copying are anomalous and allegedly engineered by the plaintiffs in bad faith.
Yet, Perplexity undermines its fair use defense. In one blog post, it highlights the launch of a publisher revenue-sharing program (why bother if there’s no infringement?) and, elsewhere, touts its product as a substitute for visiting original sources. A dismissal motion would likely fail here as it did in Cohere’s case, as some evidence of substantial copying in even a few summaries would be enough to send the matter to trial. Where it would go from there points to larger questions about AI summaries and the scope of fair use.
A global mapping of AI summaries, copyright, and fair use
To be clear, an AI summary engine could completely avoid infringement in a narrow set of circumstances: the app fetches from a source containing protected expression only factual information; it does not make a temporary copy of a substantial portion of the work; and it produces a summary that is not substantially similar to the original. Many summaries may avoid copying in the process of fetching, but most likely don’t.
Without delving into the intricacies of how AI summary engines draw on original content in scraping for RAG or live fetching, public explanations of how these systems operate tend to reveal some form of intermediate copying. However, US courts have consistently held that where outputs for consumption do not substantially reproduce protected expression, intermediate copying may qualify as fair use (see Vanderhye v iParadigms LLC; Authors Guild, Inc. v. HathiTrust; Authors Guild v. Google, Inc.). The more pressing question at the moment is whether AI summary engines that frequently reproduce protected content in substantial part can also still claim fair use.
Courts consider four factors in determining whether something constitutes fair use when material is drawn from a protected work. The first is whether the defendant’s purpose in using the material is transformative or different, not resulting in a substitute for the original work. Addressing AI summaries in passing, a recent report from the US Copyright Office notes that “the use of RAG is less likely to be transformative where the purpose is to generate outputs that summarize or provide abridged versions of retrieved copyrighted works, such as news articles, as opposed to hyperlinks.” The report cites a body of cases involving the production of news abstracts, including Associated Press v Meltwater and Nihon v Comline, as well as story abridgments in cases like Penguin v Colting and others.
In those cases, courts have generally found that excerpts of original content were either too long and too close to the core of a protected work, or the work lacked additional commentary or insight to be transformative — reasoning echoed in Judge Stein’s finding against OpenAI noted above. As Pamela Samuelson and co-authors put it, in these cases, “the potential expressive substitution effect was too significant.”
Yet, as the Copyright Office also notes, “transformativeness is a matter of degree,” and it depends on the “functionality” of the technology at issue and “how it is deployed.” Even where an AI summary does not amount to more than an abridgement of original content, the tool itself — an app like Perplexity or ChatGPT’s web-search function — is more than an index of abridgments, as in the Meltwater and Nihon cases. A generative AI app that produces AI summaries stands ready to do, and often does do, much more than offer an abridgement; it can facilitate a conversation about the summary that may draw on a range of other sources or do a host of other things. When prompted, it might further explain an idea, event, or figure mentioned in a summary; explain it in language a 10-year-old can understand, or condense it to 300 words; or present the information visually as an infographic.
Viewed more broadly, this functionality points to a different line of authority: Sony Betamax v Universal City Studios. Although Sony’s VCR technology could facilitate infringement, it had substantial non-infringing uses such as time-shifting and recording authorized content. AI summary apps are similar. They automate several steps in a search: consulting many sources at once to compile an answer; pinpointing a detail or idea in more than one source; or retaining the context of a discussion as you pivot to explore a related topic.
The second and third factors of the fair use analysis concern the nature of the work (expressive versus functional) and “the amount and substantiality of the portion used in relation to the copyrighted work as a whole.” AI overviews that consist of substantial reproductions of one expressive source — an essay or commentary — would cut more strongly against fair use. But once again, many AI-generated answers will draw on more than one source and can prompt a further conversation that would draw on additional sources.
But what about the impact on the market for the plaintiff’s content?
The Supreme Court has held the fourth factor to be the most crucial in the fair use analysis: “the effect of the use upon the potential market for or value of the copyrighted work.” AI summaries and overviews have clearly had an impact on traffic to publisher sites. How much of an impact is a matter of debate. The more salient question is whether AI summaries function largely as substitutes for original content, placing them in direct competition with copyright holders.
If there were evidence that AI summaries frequently consist of substantially similar content (mere abridgements) and that users often stop at an initial response, this would weigh strongly against a finding of fair use. But not all summary apps are the same. If outputs predominantly draw on facts or are not substantially similar to protected works, they would not directly compete with original sites. The fair use case would be stronger if there were also evidence that a substantial portion of the time, user engagement does not end with a first answer. That pattern would support a view of AI summary apps as conversational and transformative rather than merely substitutive. Therefore, such tools would compete not in a market for original content on a given topic but in a distinct market for AI summaries.
Taking a holistic view of the various factors, courts will need to consider the varied ways in which AI summary tools are actually used, along with the technical and design safeguards that can be employed to avoid substantial copying. Courts have also considered the public benefit of the challenged use. Here, the benefit lies in automating several steps in the process of finding answers buried in a single source or distributed across many. The technical breakthrough this involves enables us to readily accomplish tasks that cannot be achieved by consulting search engines or original works one by one. It’s a function of enormous benefit to billions of users, making information accessible in a genuinely novel and vastly more powerful way than before.
AI summaries may have an impact on the market for a plaintiff’s goods by reducing demand for their content. This does not mean that providers of AI summaries compete with publishers for the provision of that content, or that publishers should be entitled to control the market for information available for summary. AI summaries serve a different function: not search indexing or abridging, but rather providing automated assistance in searching for answers to nuanced questions and enabling further dialogue — to enhance understanding. Novel and beneficial uses of this kind are precisely what fair use is meant to protect.
Authors
