Home

Donate
Perspective

In Copyright ‘Wins’ for Anthropic and Meta, Judges Leave Ample Room for Future Defeats

Eryk Salvaggio / Jun 26, 2025

Eryk Salvaggio is a fellow at Tech Policy Press.

Earlier this week, a federal judge in San Francisco ruled that Anthropic did not violate copyright law by using books it had purchased to train its Claude AI models, despite lacking the permission of individual authors. The summary judgment is a courtroom win for the AI industry on the narrow area of training models as fair use, but it’s specific to a subset of works that Anthropic purchased, and hinges on a particular method for training. US District Judge William Alsup determined that under such conditions, training qualifies as an "extremely transformative" example of fair use under federal law.

In another ruling issued Wednesday in a case against Meta, US District Court Judge Vince Chhabria largely agreed that AI training is transformative enough to merit fair use protection, but pointed to larger issues of the impact on the marketplace for writing — noting that the lawyers in the case failed to bring up specific frameworks that would allow him to consider it.

If you squint, these rulings seem like a win for AI firms — and could set precedents as courts grapple with how copyright law applies to artificial intelligence. However, the ruling in the Anthropic case is extremely narrow: it applies only to books that Anthropic bought and digitized, and the company faces separate allegations that it used pirated digital copies of millions of books. For the issues at stake over the vast scraping of unlicensed data, the case is anything but a victory for the industry — and, while part of the same case, that outcome will be determined in December.

The case hinged on Anthropic's argument that its collection of books, which the company purchased and scanned for a "research library," was what it used for training Claude. The authors — Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson — argued that Anthropic's use of these books in a digitized research library violated their digital rights. In the decision, Alsup noted that "all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies."

This refers specifically to works Anthropic purchased and trained on, not the millions of pirated books alleged to be in its dataset. Crucially, the authors of this case did not assert that "any LLM outputs infringing upon their works ever reached users" of Anthropic's Claude model. In other words, the case was less about what the LLMs produce and more about how they were trained. Furthermore, the authors were representing a larger suit which didn't center on direct harms they experienced, but the damage to all authors.

In their initial filing, the plaintiffs wrote that "LLMs allow anyone to generate — automatically and freely (or very cheaply) — texts that writers would otherwise be paid to write and sell." While a compelling summary of the threat to human authorship and thought posed by LLMs, it was too broad a harm to bring about in a copyright infringement lawsuit, which typically hinges on specific. In this case, the specifics weren't in the plaintiff's favor. According to this week’s ruling, all of the authors' books were purchased by Anthropic, in a process described by the court:

Anthropic spent many millions of dollars to purchase millions of print books, often in used condition. Then, its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form — discarding the paper originals. Each print book resulted in a PDF copy containing images of the scanned pages with machine-readable text (including front and back cover scans for softcover books). Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It acquired copies of millions of books, including of all works at issue for all Authors (emphasis added).

This is a crucial point in this case: Anthropic purchased at least some of these books. It just didn’t compensate the authors or seek their permission. But this is the point of fair use to begin with: it removes the burden of requesting permission from uses that do not impinge on the reader’s rights, including research, nor the rights of publishers, for whom copyright was originally intended. This may not seem fair to many authors, who reasonably conclude that their work is feeding an automated competitor without compensation or their explicit consent. This reflects long-standing consolidation of copyrights in the publishing industry that favors those who print books over those who write them — not to mention the entirety of the used book market, in which authors see their work recirculated without payment.

That means that the simple act of purchasing these books to scan them — and, apparently, destroying the primary copy — stands in stark (and legal) contrast to Anthropic’s alleged behavior within the vast data-grab of the AI industry and its use of pirated and unlicensed data. That action isn’t being ignored here, but will be assessed as a separate case in the winter.

I have argued that copyright cases, such as this one, would be an unsuccessful long-term strategy for curtailing the AI industry's infringement of what are, fundamentally, data rights. But the US has limited imagination for laws related to rights over our digital traces, leaving most policy to be determined through dense user agreements that give our data away under, for example, "a non-exclusive, transferable, sublicensable, royalty-free, worldwide license to: host, use, distribute, modify, run, copy, publicly perform or display, translate, and create derivative works of any information, data, and other content made available by you or on your behalf."

It doesn’t have to be this way. But given the reluctance to transform policies around copyright in the social media era, many authors are left to seek remedies or protection through copyright law rather than (or alongside) demands for policy change. Unfortunately, copyright is not the ideal mechanism for the transformation of the digital rights landscape.

This tension is evident between Alsupp and Chhabria in the two rulings. Whereas Alsupp notes that AI’s risk to the market is the equivalent of teaching school children how to write well — which he asserts is “is not the kind of competitive or creative displacement that concerns the Copyright Act.” Alsup’s allusions to schoolchildren learning might suggest he was anthropomorphizing these models, but his actual finding from that allusion is quite concrete: he does not ask “does the AI model have a right to learn,” but “did Anthropic have a right to make a copy in the form of a large language model?” 

Chhabria nonetheless challenges the narrowness of this allegory directly:

Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made. ... when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.

With AI, perhaps the precedent is not being set — but at least the cement is getting mixed.

Models are transformative, even when outputs aren’t

Training a model on a collection of texts is reasonably transformative: a model is a measurement of statistical likelihoods of nearest-neighbors, whether those neighbors be words or patterns of pixels or musical notes. While the data stored in a model is unrecognizable compared to its sources, this reliance on correlation can create instances where copyright protected material is nonetheless re-generated for a user. At that point, as this case has made clear, there is a different set of considerations for the model manufacturers, who may then legally be considered to be publishers of unlicensed content.

Judge Alsup notes that "authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public. If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop" (page 28).

This shifts the legal strategy for all aggrieved authors to focus on examples of generated outputs that are infringing, rather than the inputs involved in the training process. As such, the precedent points to a situation in which training is fair use, but generation is publishing. Cases such as those brought by The New York Times hinge on the publishing aspect of generative AI being in conflict with the journalistic business model by circumventing paywalls. That case, still underway, focuses on evidence of significant elements of Times articles appearing as generated text produced by OpenAI’s models.

In both cases, the judges signaled an openness to broader arguments, while being forced to consider the specificities of what was in front of them. Plaintiffs in the Meta case largely lost on this point, with Judge Chhabria writing:

Courts can’t decide cases based on general understandings. They must decide cases based on the evidence presented by the parties. ... As for the potentially winning argument—that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution—the plaintiffs barely give this issue lip service.

Lawyers around the United States are likely listening.

Process matters

From the Anthropic case, it seems that even if training an AI model is deemed fair use, how the models are trained still matters. Training is not protected as fair use by default, but under precise conditions. This is what makes the Anthropic case a potentially pyrrhic victory. The court has set a precedent that hinges on the idea that content is permissible if purchased: companies must legally obtain a copy. The copy could be from a second hand book store, or bought by lots for a nickel each. The author may or may not be directly compensated. In any case, precedent being set here suggests that AI companies must legally obtain a copy of books they train on.

Oddly, Anthropic, once it purchased a copy, digitized and destroyed the book. The Court found this to be an aid to its argument, in that such destruction encourages the non-duplication of the book. This was partially in response to the authors’ claims that the digitization of their books was unauthorized. It’s unclear whether AI companies will be required to destroy physical copies of books they train on, or simply remove them from their own inventory, or ultimately how important this oddity is to assuring fair use is upheld. But it points to a bizarre situation in which physical books are shredded as part of the training process for purely legal reasons.

Interestingly, creating digital copies of these books, which is how Judge Alsup handled the process of copying printed matter into training data, were deemed to be on par with creating a “more convenient space-saving and searchable digital cop[y].” This referred to the books used as data and their digital copies, which were trained and retrained for the sake of developing various LLMs. Purchasing books from publishers to treat how we like is part of a first-sale exception to copyright, and is not unique to the age of AI. The case seems to settle on the idea that using a book for LLM training can be justified by just one purchase of that book.

Ultimately, Anthropic purchased the books at the heart of this narrow finding. The court found that it had the legal right to make digital copies of those books; this digital copy could then be scanned and compressed into a model, just as I could use similarly obtained books to cut up into a collage. This case has determined that an LLM trained on books purchased by the company, so long as the LLM does not show evidence of reproducing that text, is a protected (fair) use. Anthropic was also able to show the court that it had implemented a filter, designed to prevent any original training data (ie, passages from the author’s books) from coming through, a factor that ultimately helped the company. All of this is very specific to Anthropic’s methods, and not a blanket approval on all forms of AI training — especially not training from illegally obtained sources.

Understandably, this may feel like a setback to authors who perceive their language as shaping the outcome of every text generated by the LLM in response to a user query. The arrangement of words is what authors do; therefore, there may have been some lingering hope that these patterns would somehow benefit the creators as authors of the statistical likelihood of the language they craft. This case suggests otherwise.

But importantly, the ruling maintains a framework under which scraping the Web without permission still violates the law. While the court found that using a book for LLM training is ultimately transformative, stealing a book is still stealing a book. Judge Alsup writes that you can't just "steal a work you could otherwise buy (a book, millions of books) so long as you at least loosely intend to make further copies for a purportedly transformative use (writing a book review with excerpts, training LLMs, etc.), without any accountability." That is to say, fair use has never justified theft, it only set limits on what you could do once you bought it.

The case is, therefore, a very narrow win for the AI industry: for now, it suggests that the companies must purchase (or obtain the rights) for the books it trains on, given that the market for authors of these books clearly includes selling a copy for LLM training. The presence or absence of a single book will likely have little impact on the outcome of an LLM. However, the ruling points to headaches for any company that has trained its model on unlicensed, illegally obtained text and possibly images.

How such precedent might apply to text with little resale value is up for debate. In December, the case will proceed focusing on pirated works scraped into the Books3 dataset. None of the precedent established in this ruling seems to apply there, as much of the case suggests the purchase of the material is what justified the exercise of training rights.

New precedent, new questions

This opens up a new set of questions. For one, the judges in both cases are at odds. In the Meta case, Judge Chhabria seemed to agree on the transformative nature of AI, framing the question of illegal use in similar terms to Judge Alsup:

Because the performance of a generative AI model depends on the amount and quality of data it absorbs as part of its training, companies have been unable to resist the temptation to feed copyright-protected materials into their models—without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal.

Although the devil is in the details, in most cases the answer will likely be yes.

Chhabria nonetheless handed the win to Meta, speaking to the way in which the case was presented by lawyers. But Chhabria goes even narrower, arguing that the impact of AI on writers is what’s at stake in copyright claims — and that while most copyright cases are determined through narrow specificities, is “preserving the incentive for human beings to create artistic and scientific works,” suggesting that this is the question the Plaintiffs in the Meta case should have raised: the impact of AI on the market for writing in general.

This puts Chhabria and Alsupp in conflict, and Chhabria addresses Alsupp explicitly, noting that he “focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on.”

It’s unclear how other judges may decide on these matters and how broadly such precedent may be applied. We may ask about situations in which material is not for sale, per se: are publicly accessible Reddit comments the property of Reddit, or the user? What is the mechanism that might assure their authors or publishers are compensated, or that the material is obtained legally?

Might this precedent suggest that individual illustrators also have a right to sell digital copies of their images to AI companies? Or, by sharing them online, have we “given them away,” in contrast to the publishers of books, whose contents are not shared wholesale online?

Unfortunately, with Getty dropping key claims in one of the most-watched US cases on the subject, we are unlikely to know anytime soon. But extending a similar precedent to images would place enormous burdens on companies that have relied on billions of scraped images to train their models. It could suggest a key distinction will emerge between the rights we have over what we share online and what we share in other forms of media. If so, this may create a true crisis for the digital commons.

We are yet to know what the court will decide even on the Anthropic case. In these two wins for the AI industry, judges have left ample room for future defeats. But for now, any claim that suggests "training is fair use" is too broad a statement. The details of how we train AI clearly make a difference. Given the case-by-case nature of fair use findings more broadly, a legal consensus in the short term is unlikely.

Authors

Eryk Salvaggio
Eryk Salvaggio is a blend of hacker, researcher, designer, and media artist exploring the social and cultural impacts of technology, including artificial intelligence. He is a 2025 visiting professor at the Rochester Institute of Technology's Humanities, Computing, and Design program and an instruct...

Related

How the Emerging Market for AI Training Data is Eroding Big Tech’s ‘Fair Use’ Copyright DefenseMarch 4, 2025
Perspective
AI Training, the Licensing Mirage, and Effective Alternatives to Support Creative WorkersJune 2, 2025

Topics