Discussing the Copyrightability of Generative AI Outputs
Rashmi Bagri / Feb 12, 2025
Elise Racine & The Bigger Picture / Better Images of AI / Web of Influence I / CC-BY 4.0
In the last two years since the launch of ChatGPT, AI tools have faced an avalanche of legal challenges, the majority of them accusing the companies behind them of copyright infringement. Plaintiffs ranging from authors and visual artists to news organizations and music publishers have alleged that these AI models have misused their work to train algorithms and generate outputs.
At the heart of these legal disputes lie two major copyright issues - whether the unauthorized use of copyrighted material scraped from the internet without permission, compensation, or attribution to original creators amounts to copyright violation and whether the output generated by tools trained on this copyrighted material deserves copyright protection. Major players in the AI industry – including OpenAI, Stability AI, Anthropic, Microsoft, Google, and Meta currently face accusations of exploiting creative works and violating intellectual property rights on an unprecedented scale. Globally, courts and policymakers are grappling with crucial aspects of the AI copyright debate, potentially reassessing the foundational principles of copyright law in light of these dilemmas.
Complex Copyright Issues
The copyright issues surrounding artificial intelligence are twofold yet intertwined, and each raises complex legal and ethical questions. The first and most contentious issue pertains to the input – the vast datasets used to train these AI tools are often scraped from the internet, encompassing everything from books and articles to artwork, music, and code – and much of this training data comes from copyrighted material. The second issue revolves around the output – the work generated by these AI tools. Can these outputs, which users can tailor based on prompts, qualify for copyright protection?
While some argue that granting copyright protection to AI-generated works is an audacious leap, given that the tools themselves rely heavily on replication, and the output is often derivative, if not outrightly duplicative of the copyrighted works used to train these models, others demand copyright protection for AI generated works as a way to “democratize” the creative process.
However, one cannot make a determination on the second issue without comprehending the first. AI models cannot exist without training data, most of which is protected by copyright. This fact is undisputed. Companies like OpenAI have openly admitted it, whistleblowers have proven it, and the general public can deduce whether their written or visual work has been used to train these models. Numerous online tools and platforms such as websites like ‘Have I Been Trained,’ reverse image search tools, and text detection and matching tools extrapolate whether a particular piece of work has been used in training data of a model.
Additionally, generative AI tools have been known to regurgitate exact replicas of content they were trained on, sometimes even producing the original watermarks. Therefore, anything these tools create is, at best, derivative and, at worst, a reproduction, and certainly not transformative, regardless of the level, clarity, and involvement of human prompting. This is why the argument that AI-generated outputs should receive copyright protection does not hold water.
Still, courts have used the extent of human prompting as a determining factor in some rulings, creating confusion around the copyrightability of AI-generated works. For instance, the UK allows copyright protection for AI-generated outputs where the human user makes a significant contribution. Similarly, China has granted copyright in cases where a user prompt exhibits sufficient originality, setting a precedent that human input can justify copyright protection for AI-generated works.
Given that research, AI companies, and whistleblowers have indicated that the models rely on copyrighted work and do not independently create new content, granting copyright protection based on human prompting is questionable. In contrast to the decisions in the UK and China, other judgments, like Thaler v. Perlmutter and the USPTO’s decision concerning Zarya of the Dawn, reinforce that human authorship is essential for copyright protection.
Copyright is granted to incentivize creativity, thus requiring human intent and authorship, elements AI-generated outputs lack (as they are mere syntheses of their training data). While human users can guide the model, they do not influence its core mechanism. Another source of confusion stems from the argument that since computer-generated works have been given copyright protection in most jurisdictions, AI-generated works should qualify, too. However, the underlying distinction is that computers are merely tools that amplify human creativity without introducing independent elements. Conversely, AI creates outputs by remixing existing works into new forms, making its outputs inherently derivative of its input, giving human users little control over the final product.
Proposed Solutions
Since AI tools pilfer the entirety of data available on the internet without filtering copyrighted content, copyright scholars and policymakers globally are working to protect the interests of creators. In the UK, a new copyright directive allows copyright holders to opt out of having their works used to train AI models. This policy, similar to the EU’s opt-out mechanism under the copyright directive, makes creators responsible for protecting their work instead of holding AI companies accountable.
AI companies have responded to copyright concerns by attempting to filter copyrighted data using methods like deduplication, content filtering, and metadata exclusions, but these efforts have been insufficient due to the vast amount of diverse training data. Datasets like LAION-5B contain billions of images scraped without robust copyright checks. Even with similarity detection tools like Google reverse image search, TinEye, Moss, etc., designed to reduce obvious copyright violations by flagging near exact matches of copyrighted content, they often fail to detect derived works, styles, and paraphrased content. OpenAI has even acknowledged that creating an AI model without copyrighted data is nearly impossible because effective filtering is extremely difficult. And since there can be no generative AI models without copyrighted content, their outputs remain derivative rather than creative or innovative.
The solutions proposed in the UK and EU burden individual copyright holders, while Japan’s proposed rules allow AI developers to train on copyrighted material without consent, prioritizing AI advancement. China, on the other hand, mandates different licensing agreements for training datasets in its draft copyright directive. The US and India are leaving these disputes to the courts, focusing on fair use and transformative purpose.
Both countries consider human involvement and potential economic harm in copyright violation cases, shifting the burden of proof to the plaintiff. Further complicating the subject is the hypocrisy embedded in the policies of these AI companies. Research highlights how the acceptable use policies of these companies explicitly prohibit using their AI-generated outputs to train other AI models. For instance, OpenAI has raised concerns about the Chinese AI model DeepSeek allegedly using ChatGPT’s training data without permission. This stance is deeply ironic—while these companies freely exploit the works of others to train their systems, they simultaneously assert ownership over their outputs and restrict others from using their content for the same purpose.
AI companies defend their large-scale copyright infringements by citing fair use and taking advantage of exceptions such as research purposes like text and data mining to defend themselves. However, both exceptions necessitate a clearly defined public interest and no harm to the original market for the original work. AI models, as commercial products, do not prioritize public interest or respect copyright holder’s rights. They harm the market for original works and disrupt the creative ecosystem by using others’ labor without permission or credit. Despite claims of “democratizing creativity,” this approach exploits original creators and leaves them economically vulnerable.
AI and the Future of Human Creativity
Considering these factors, granting copyright protection to AI-generated outputs could have severe implications for the creative economy. AI does not create original content but rather parrots it, lacking intent or originality. This not only diminishes the value of human creativity but also establishes a framework where works derived from existing copyrighted content may then receive new copyright protections. There is already a risk of a perpetual cycle whereby AI-generated works are subsequently used in future training datasets that then contribute to new AI-generated content. This AI slop threatens to crowd out genuine creativity, even absent copyright protection. And it’s no surprise that the tech companies that have created these AI models are also the same ones that allow AI-generated content to run rampant with few checks.
Extending copyright to such unoriginal content threatens the fundamental purpose of copyright law, which is to protect and promote human creative and artistic expression. Human creativity is, as the court put it in Thaler v. Perlmutter, “the sine qua non at the core of copyrightability...” Thus, genuine creativity is safeguarded not by extending copyright to AI-generated works but by upholding the rights of original creators.
Authors
