Home

Donate

To Preserve US AI Leadership, Congress Must Address Copyright

Joshua Levine / Mar 13, 2025

Joshua Levine is Manager of Technology Policy at the Foundation for American Innovation.

The United States currently boasts many of the world’s leading AI model developers, ranging from established tech giants to emerging startups. But the US risks losing its competitive edge due to its own mistaken policies, which have left the door open for a slew of copyright disputes. With at least 39 active lawsuits targeting AI developers for using copyrighted materials in model training, America’s leadership in AI is under threat.

As with any new technology, disruption can manifest in new social norms, labor market churn, and institutional instability. But this should not dissuade Americans from developing and leveraging a new technology such as AI, particularly because of the potential benefits it could unlock. Copyright law is a form of industrial policy, balancing ownership to incentivize the creation of new creative works while ensuring information is available to the public to spur new innovations and uses. This is the primary reason intellectual property and copyright were enshrined in the US Constitution. To take a step toward that uncertain future, Congress should codify the use of publicly available data for AI training as fair use and kickstart the development of opt-out standards for content creators.

AI development relies on vast datasets to train models capable of diverse applications. Current legal disputes question whether the presence of copyrighted material in training data constitutes infringement. Plaintiffs are asking courts for damages equal to the GDP of some small countries, the scrubbing of public data sets, and even the destruction of AI models. If successful, these lawsuits would create a chilling effect on the US AI industry, rendering many firms insolvent or close to it.

Codifying AI model training as fair use would align with longstanding copyright principles. The doctrine of fair use, enshrined in the Copyright Act of 1976, allows for non-permissive use of copyrighted works. Fair use is often evaluated on a case-by-case basis using a four-part test. Courts have upheld this principle in cases related to new technologies such as Authors Guild v. Google, where digitizing millions of books to enable search functionality, and the “Betamax” case, which found VCRs had uses that did not violate copyright law and were beneficial to the public. AI model training, which synthesizes data to enable a model to perform various capabilities, fits squarely within this tradition.

Failure to clarify the legality of AI training risks ceding leadership to China, America’s primary competitor in the AI race. Unlike the US, China imposes few restrictions on the use of data for AI training in non-domestic applications. This permissive environment allows Chinese developers to rapidly advance their AI capabilities while aligning them with national strategic goals. America’s innovators face the burden of litigation and the chilling effect of uncertainty, which could incentivize them to relocate to jurisdictions with more favorable regulatory frameworks, such as Japan or Singapore.

The recent emergence of DeepSeek captured headlines and upset markets, but two aspects of the firm’s success should stand out. First, the Chinese government's extensive support for the firm indicates that the Chinese Communist Party’s long-standing strategy of subsidizing national champions will continue. In DeepSeek’s case, this includes access to infrastructure and compute, as well as making data collected by government entities available for model training. More noteworthy, however, is the confirmation that DeepSeek’s model was trained extensively on sources from the Western internet, particularly from “shadow libraries,” repositories of materials, including copyrighted works, that are freely available online. American firms avoided using such materials for fear of legal liability, although some firms have been found accessing similar repositories for training materials.

Critics argue that allowing AI developers to use copyrighted materials without explicit permission undermines creators’ rights. To mitigate such concerns, lawmakers should explore potential technical solutions. In addition to a codified fair use exemption, Congress should direct the National Institute of Standards and Technology to develop an opt-out system for copyright holders to exclude their work from AI training datasets. Such a tool, modeled on existing tools like robots.txt for web scraping, could provide agency to creatives over how their work is accessed and used without hindering innovation by model developers. This would allow content creators to have greater agency over how their work can be used and, depending on their preference, take advantage of technologies and payment arrangements that can reduce friction between parties with lower transaction costs.

Trade associations representing large rights holders, individual artists, and some in civil society have rejected such attempts, saying that enabling AI training on works is tantamount to large-scale theft and selling out creators. Two themes underlie this view. The first is that generative AI products will harm the market for human-created work. The second is that the process of model training and generative AI models are plagiarism machines that simply reproduce existing works rather than new expressive material.

First, it is too early to tell how generative art will impact markets for contemporary creations. Still, limited data and experience from the present would challenge this line of reasoning. A study conducted in 2023 attempted to understand how humans feel about generative AI art. Participants were asked to rate art, some made by a human and some made with an AI model. Participants consistently rated the AI-generated art lower than the works created by human artists. The value of art is complicated to determine and can be prone to dramatic swings. This is true of the market for AI art, in some cases even more so than more traditional mediums. As detailed by the Copia Institute, revenues for various forms of media have never been higher, even as the overall amount of content has grown and morphed.

These data points are juxtaposed with the recent emergence of AI-generated art, which is more likely to expand human artistry, whether as its own medium or as an aspect of the creative process. Jobs related to digital art have been growing and are not anticipated to slow down. Christie’s recently held its first AI art auction, with bidding activity surpassing initial expectations. Anthropic released data showing that after computational and mathematical tasks, creative workers are using AI the most to augment their day-to-day tasks. Data shows that more than 60 percent of independent musicians are taking advantage of new tools to assist with production, cover art, and even songwriting. Artists are leveraging AI to share intimate stories and creations.

To the second point, AI model training and the functions of generative AI models are examples of building technical processes that simulate learning and iterative development. Such processes have been sustained through a balance of restrictions on the distribution and reproduction of certain works and open access to information to promote learning and iterative creation by the public, either for free or for a price, since the American founding. The fair use doctrine protects the non-permissive use of certain copyrighted works, including in the digital realm. Beyond the court cases and precedents noted above, the positive arguments for the transformative potential of generative AI to promote progress in science and the useful arts should be taken seriously.

Such benefits must be understood, however, in the context of reserving an incentive for creation. This is the balance that copyright law in the United States has strived for, and it is time to rethink how to ensure that balance endures moving forward. Compulsory licensing regimes or requiring affirmative consent from every creator for data used in model training is technically and economically difficult, if not impossible, and would fail to strike the appropriate balance, particularly when compared to our allies and adversaries around the world. Inspecting how the digital world has changed incentives and relative elasticities of supply and demand for creative content should lead one to question the utility of current copyright practices for achieving the policy’s stated goals.

Codifying fair use for AI training would benefit AI developers and the broader economy. Clear legal guidelines would support investment from model developers and companies building the technologies that underpin AI development, such as advanced semiconductors and cloud infrastructure. Legal clarity for training models can provide assurance to firms working on complementary technologies to continue investing and improving their products, creating positive spillovers for the US economy and its competitive standing vis-à-vis other states.

Such legal clarity can also benefit creatives and artists who may or may not want their work to be used for model training. This allows individual rights holders, as well as collectives or corporations, to decide for themselves the value and availability of their data. Firms are already negotiating such deals, while startups are working to create platforms and tools to support direct transactions between creators and developers, as well as methods to protect data from scrapers who ignore applicable standards. Enabling a TDM-style exception for the narrow use of training AI models does nothing to impede rights holders from being paid for their content. Rather, it creates an opportunity for new technical and legal solutions to emerge that can support the development of technology that is vital to the economic, strategic, and cultural clout of the US and support, rather than replace, creative talent and expression.

Congress has a history of adapting copyright law to accommodate technological change, from the VCR to the Internet. Today’s lawmakers face a similar moment. Clarifying that AI model training constitutes fair use and accelerating the development of opt-out standards would advance the foundational goals of copyright: incentivizing inventorship, promoting progress in science and the useful arts, and supporting the public through access to information. The future of US technological and geopolitical strength depends on it.

Authors

Joshua Levine
Joshua Levine is a Research Fellow at the Foundation for American Innovation. His work focuses on policies that foster digital competition and interoperability in digital markets, online expression, and emerging technologies.

Related

The Case for Requiring Explicit Consent from Rights Holders for AI Training

Topics