Home

Donate

Thomson Reuters v. ROSS Provides Insight into How Courts May Evaluate Fair Use Defense for AI Training Data

Ariel Soiffer, Arianna Evers, Louis Tompros / Mar 4, 2025

A Thomson Reuters Building in Canary Wharf, London. Shutterstock

“Only one thing is impossible for God: To find any sense in any copyright law on the planet,” wrote Mark Twain. With the continued progression of generative AI and the related large-scale training of large language models (LLMs) on content allegedly protected by copyright, content owners have initiated numerous lawsuits alleging copyright infringement, while many AI providers rely on the defense of fair use.

These cases are largely in the early stages without substantive rulings about copyright infringement. Recently, however, a Delaware District Court granted summary judgment ruling in part for Thomson Reuters in a case against ROSS Intelligence (ROSS), a firm that says it “builds AI-driven products to augment lawyers' cognitive abilities.” The court found that ROSS had infringed certain of Thomson Reuters’s copyrighted material and that ROSS could not benefit from a fair use defense. In doing so, the court noted that “[i]t is undisputed that ROSS’s AI is not generative AI (AI that writes content itself)” and that this case only has to do with non-generative AI. However, this court case is one of the first to attempt to make sense of copyright law in the AI context, and, therefore, it may be a useful guidepost for future rulings.

By way of brief factual background, ROSS obtained access to Thomson Reuters’s Westlaw headnotes—summaries of key points of law in a document—via “bulk memos” that ROSS obtained from a third party that had created those memos using Westlaw headnotes. The court found that Thomson Reuters has a valid copyright to the material, ROSS copied protected elements of Westlaw’s headnotes, and some headnotes and memos were substantially similar.

To show copyright infringement, the moving party must demonstrate that (1) it owned a valid copyright and (2) the defendant copied protectable elements of the copyrighted work. The second element mandates a showing that (i) the defendant actually copied the copyrighted work and (ii) the copying was substantially similar to the copyrighted work. see Dam Things from Den. v. Russ Berrie & Co., 290 F.3d 548, 561–62 (3d Cir. 2002). Actual copying means “the defendant did, in fact, use the copyrighted work in creating his own.” Tanksley v. Daniels, 902 F.3d 165, 173 (3d Cir. 2018). Substantial similarity mandates the assessment of whether “the later work materially appropriates the copyrighted work.” Id. at 173. Summary judgment may be appropriate when “no reasonable jury could find” otherwise. Id. at 171. The court found that Thomson Reuters has copyright registration for Westlaw’s content, and there is no genuine dispute that the headnotes meet the minimal requirement for originality. The court found actual copying of a portion of the headnotes through assessing potential access and probative similarity. Moreover, the court found that no reasonable jury could find otherwise that some of the headnotes and the Bulk Memos are not substantially similar.

Following the copyright infringement finding, the court rejected ROSS’s fair use defense. The test for whether a use is a “fair use” is based on four factors: (1) the purpose and character of the use, including commercial use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the work that was taken; and (4) the effect on the market for the work.

The court found that ROSS’s use was commercial and non-transformative, with ROSS admitting to commercial use and the court finding that ROSS’s use does not have a “further purpose or different character.” Applying the recent Supreme Court case Warhol v. Goldsmith, the court looked at the broad purpose and character of ROSS’s use, which was to use Westlaw’s headnotes as an index to create a legal research tool that competes with Westlaw. The court acknowledged that ROSS’s model was not generative AI that creates new content (rather, it returned relevant judicial opinions that already existed), leaving open the question of whether a generative AI model that provides new material based on training data would be sufficiently transformative. Further, the court held that the intermediate copying that ROSS performed was not necessary, distinguishing this case from cases involving intermediate copying of computer source code. Earlier cases came to different outcomes because copying was necessary for interoperability or to reverse engineer unprotected functional elements within a program, but in this case, it was not necessary for ROSS to make copies. As a result, the first factor leaned against fair use.

However, the second factor (the nature of the work) favored a fair use, because the headnotes were only found to be minimally creative, although the court noted that this was rarely an important factor. As to the third factor (the amount and substantiality of the work), the court also found that it favored a fair use, because ROSS never made the headnotes publicly available but just used the headnotes to facilitate its AI.

Finally, the fourth factor (the effect on the market for the original work), which the court noted was the most important factor, favored Thomson Reuters. Relevant to the analysis was the original market, which the judge identified as legal research platforms, and at least one potential derivative market, which the judge identified as data to train legal AIs. The court concluded that ROSS created a product that could substitute for Westlaw’s product and that ROSS’s activities had a potential effect on a market for AI training data, even absent a showing that Thompson Reuters had used the data to train its own legal search tools. The potential for content owners to establish value in the data used to train AI models could make it more difficult for generative AI model developers to succeed in their fair use defenses.

As a result of this analysis, the court found that the fair use defense failed for the Thomson Reuters materials protected by copyright. The four-factor fair use test necessitates balancing different factors that will vary in any given case. At this point in time, there is nothing close to a consensus on how the fair use defense will play out for AI model developers. However, this opinion provides an early guidepost on the road to finding sense in at least one part of copyright law, contra Mark Twain. It may prove significant for the many generative AI cases that are to come.

Although it does not involve generative AI, this case is the first to rule against the fair-use defense for an AI model. This decision will likely be frequently cited in future cases.

First year associate Isaiah Chatman contributed to this article.

Authors

Ariel Soiffer
Ariel Soiffer is a Partner at WilmerHale, where his practice focuses on technology-related transactions and advising clients on technology-related matters. Mr. Soiffer draws on his prior business experience as a management consultant to provide practical solutions to legal and business challenges th...
Arianna Evers
Arianna Evers is a Partner at WilmerHale. Evers helps clients navigate privacy, cybersecurity, and artificial intelligence (AI)-related challenges in an increasingly complex and fluid legal environment. Ms. Evers represents clients in high-stakes enforcement actions with regulators, including the Fe...
Louis Tompros
Louis Tompros, an experienced first-chair trial and appellate litigator, is a Partner at WilmerHale. He makes the fast-moving and legally complicated field of intellectual property understandable for juries and judges. He has handled the most challenging patent, trademark and copyright matters at tr...

Related

Copyright Fair Use Regulatory Approaches in AI Content Generation

Topics