Generative AI and Copyright Issues Globally: ANI Media v OpenAI
Aklovya Panwar / Jan 8, 2025Courts around the world are entertaining lawsuits related to copyright and AI. Two categories of litigation are prominent: those related to the copyright violation caused by generative AI and the granting of copyright to AI-generated work. Examples of the latter are rare and not likely to succeed in most cases unless copyright jurisprudence undergoes a seismic shift in multiple countries. The main point of concern is the issues originating from the first category of cases.
Recently, in a first in India, the news agency ANI filed a lawsuit against OpenAI for alleged copyright violations. ANI says OpenAI used its news content to train ChatGPT without permission. ANI claims that ChatGPT falsely attributed fabricated news stories to the agency, damaging its reputation and potentially spreading misinformation. ANI seeks damages and a permanent injunction against OpenAI for using the agency’s content. ANI states that it is not just a copyright infringement but also the issue of protecting the public rights against misinformation. It further argues that exploiting commercially available information in the public domain without a licensing agreement is not permissible in this context.
OpenAI, in its defense, stated that using publicly available data for training LLMs comes under the fair use doctrine. The California company claims it did not access data or information illegitimately and explicitly avoided accessing content behind paywalls. It says it is transforming the data in such a way that it does not violate ANI’s copyright. Further, OpenAI contends that it blocked the ANI’s domain following a legal notice – which means it is no longer using its data for training purposes, showcasing its commitment to copyright compliance. The copyright law protects the expression of ideas but not ‘ideas’ or ‘facts,’ the company says, and all it has done is gather data, information, and facts from diverse sources, which were transformed not in verbatim form but in a manner distinct from how ANI expresses it. It argues the court's lack of jurisdiction as its operations are based outside India. On interim injunction, it submitted that no injunction was granted against it in similar cases in the US, Canada, and Germany.
The Delhi High Court has framed four issues for adjudication - (i) Whether storing copyrighted data for training ChatGPT amounts to copyright infringement; (ii) Whether generating user responses using copyrighted data constitutes infringement; (iii) Whether this use falls under ‘fair use’ as per Section 52 of the Copyright Act; and (iv) Whether Indian courts have jurisdiction over this matter, given that OpenAI’s servers are based abroad.
Insights from abroad
The issues, which are more or less similar to ongoing cases around the world, highlight a common trope. By exploring the relevant cases, some insights can be derived which might be relevant to the ANI case.
In the US, multiple cases are underway on the same issue. The first to be highlighted is The New York Times v OpenAI. Filed in 2023 in the Southern District of New York, the news company filed a lawsuit against OpenAI and Microsoft for using its articles unlawfully to train AI models. It contends that this use violates its copyright and undermines its business model by allowing bypass of paywall. According to the lawsuit, the Times’ earnings and journalistic integrity are seriously threatened by OpenAI’s capacity to produce outputs that closely resemble or imitate its material. Here, too, the defense taken is that of the fair use doctrine. Another pertinent issue that arose was the removal of the Copyright Management Information (CMI) that encompasses the copyright notice, title and other identifying information, terms and conditions of use, and identifying numbers or symbols referring to the CMI. The Times claimed that the removal of CMI was done intentionally to hide the copyright infringement. To prove fair use defense, OpenAI recently sought to compel the production of (1) the Times’s use of nonparties’ generative artificial intelligence (“Gen AI”) tools; (2) the Times’s creation and use of its own Gen AI products; and (3) the Times’s position regarding Gen AI.” But Magistrate Judge Ona Wang denied OpenAI’s motion by ruling that “This case is about whether Defendant trained their LLMs using Plaintiff’s copyrighted material, and whether that use constitutes copyright infringement. (ECF 170, ¶¶ 158-168). It is not a referendum on the benefits of Gen AI, on Plaintiff’s business practices, or about whether any of Plaintiff’s employees use Gen AI at work. The broad scope of document production sought here is simply not relevant to Defendant’s purported fair use defense.”
Another case is Raw Story Media v. OpenAI. The case is before the Southern District of New York. The plaintiffs alleged that OpenAI violated the Digital Millennium Copyright Act (DMCA) by removing CMI from their articles before using them to train ChatGPT. The court dismissed the lawsuit filed by Raw Story Media against OpenAI, ruling that the plaintiffs lacked standing as they did not demonstrate tangible injury from the alleged misuse of their copyrighted content in training ChatGPT. Judge Colleen McMahon emphasized that the plaintiffs failed to show actual harm or instances where ChatGPT reproduced their copyrighted material without attribution. The plaintiffs have the option to amend their complaint to address the court's concerns. In this case, the court emphasized the importance of plaintiffs demonstrating “concrete harm” to establish standing. The ruling suggests that without specific examples of direct or imminent infringement, claims may not succeed.
The next case is Andersen v. Stability AI before the Northern District of California, San Francisco Division, in which some visual artists claim that their copyrighted works were exploited without their consent. The AI models that underpin image generators such as Stable Diffusion and Midjourney were allegedly trained using the works. Claims of direct copyright infringement are made by the plaintiff. They also make an unjust enrichment allegation. The plaintiffs argue that copyright infringement has occurred since the AI-generated outputs closely resemble their original works. In its defense, Stability AI argues that the plaintiffs have not sufficiently demonstrated the use of their specific works in training the model and that they also have not shown how the generated outputs directly infringe on those works. The court allowed claims of substantial similarity to proceed but dismissed generalized allegations of infringement. In a recent development, the plaintiff filed a Second Amended Complaint, and all four defendants have filed their response, which can be found here - Stability AI, Runway, Midjourney, and Deviant Art.
In Kadrey v. Meta Platforms, a case before the Northern District of California, the plaintiff filed a lawsuit against Meta for copyright infringement. These allegations relate to AI-generated content that closely resembles original works by author Richard Kadrey. Kadrey claims that Meta’s AI systems generated outputs replicating his original literary works without permission. He argues that these outputs infringe on his copyrights and misrepresent his authorship, harming his reputation as a creator. Meta contends that the AI-generated outputs do not constitute direct copies of Kadrey’s works and, thus, do not infringe copyright. They also argued that any resemblance is coincidental. Alternatively, it falls under fair use due to the transformative aspects of the generated content. The case is still in its early stages, with ongoing motions regarding the sufficiency of Kadrey’s claims. The outcome will depend on how courts interpret the relationship between AI-generated content and existing copyright protections. The case hinges on whether the resemblance is sufficiently transformative to constitute fair use. Recently, there have been other claims made by the plaintiff against Meta on the line of the Crime-Fraud Doctrine. Therefore, the plaintiff claims leave to file a Third Amended Complaint to again assert DMCA CMI claims and also California Comprehensive Computer Data and Access Fraud Act (CDAFA) claims. Meta opposes these claims as being baseless.
In the UK, in Getty Images v. Stability AI, Stability AI is the target of a lawsuit by Getty Images before the High Court. Getty claims that millions of its copyrighted photos were illegally collected by Stability AI to train its Stable Diffusion AI model. Getty makes the case that Stable Diffusion’s outputs do more than just copy Getty’s artwork – they bear its trademarks as well. According to Getty, the AI-generated photos can be traced to their original sources. This includes cases where approximations of Getty’s watermarks appear on generated images. Stability AI sought the dismissal of parts of Getty’s claims. Stability argues that the training and development activities occurred outside the UK. Therefore, these activities fall outside UK jurisdiction for copyright claims. They argue that parts of the UK’s Copyright, Designs and Patents Act 1988 (CDPA) apply only to tangible articles. They do not apply to software or intangible data. Stability AI maintains that it has not infringed any copyrights as it did not directly copy any specific works. The High Court has allowed Getty’s claims to proceed to trial after dismissing Stability AI’s request for summary judgment, which means the claims will proceed to be heard at trial. The court noted that there are substantial merits to the claims. This is particularly true regarding the location of the training activities. It is also important to determine whether they occurred in the UK.
In Canada, CanLII v. Caseway AI is the case to watch which is before the Supreme Court of British Columbia. CanLII (the Canadian Legal Information Institute) filed a lawsuit against Caseway AI, alleging that it infringed on CanLII’s copyright by using its legal content without permission for training its AI models. CanLII alleges that Caseway AI unlawfully used its database of legal information without obtaining necessary licenses or permissions, thereby violating its Terms of Use that prevent bulk downloads. They argue that this unauthorized use constitutes copyright infringement by reproducing and creating a derivative work based on copied work and undermines CanLII’s business model. Caseway AI argued that their use falls under fair dealing provisions or other exceptions within Canadian copyright law. They also contend that their use of CanLII’s data is transformative and does not harm CanLII’s market. This case is still developing. It has potential implications for how open legal databases protect their content when emerging technologies like AI increase the risk of unauthorized use.
What is relevant for ANI?
By looking at these cases, some aspects seem pertinent to the ANI case. First is the high burden of proof on the plaintiff under which the courts emphasize the need for plaintiffs to show concrete harm, such as direct market loss or reputational damage. For instance, in Raw Story Media v. OpenAI, the case failed due to insufficient demonstration of harm caused by the alleged removal of CMI. Similarly, in the Kadrey and Getty Images cases, the courts required evidence of material harm to proceed. Indian courts, like their international counterparts, may require ANI to demonstrate clear harm which includes financial loss. For e.g., disruptions to its paywall revenue model, reduced subscriptions, or reputational damage (e.g., misinformation attributed to ANI). ANI would have a high burden of proof to establish how the alleged misuse directly affects its business or credibility.
Second is interpreting the fair dealing doctrine under Indian copyright law, which will involve determining whether the use of ANI’s content was transformative. Across jurisdictions, courts assess whether such use adds value, changes the purpose, or creates a non-substitutive work. In The New York Times v. OpenAI, the defendants argued fair use by claiming their AI transformed the content; however, the Times alleged that OpenAI’s outputs replicated its articles and threatened its business model. Indian courts may analyze whether AI’s use of ANI’s factual news transforms it meaningfully or merely substitutes it, as transformation is harder to prove for factual content compared to creative works. While greater leniency in allowing fair use for factual content can be argued, this is not absolute, especially where misuse is economically detrimental. The same issue can be seen in CanLII v. Caseway AI (Canada), where Caseway AI argued that scraping CanLII’s factual legal content falls under “fair dealing” for research purposes. But, it is important to note that CanLII’s material was not copyright-protected because court documents are public records.
ANI’s news, while factual, represents an economically valuable product of journalistic effort. Indian courts may view AI scraping as undermining its commercial interest. Even factual works deserve protection under Indian copyright law, as seen in Eastern Book Company v. D.B. Modak (2008), where "substantial effort" was deemed protectable as the court held that for copyright protection, the material does not require novelty or invention, but minimal creativity.
Third is the degree of copying, where courts consider the amount and substantiality of the material used. In Getty Images v. Stability AI (UK), the use of substantial portions of a copyrighted database without authorization allowed claims to proceed. ANI must establish that OpenAI scraped large volumes of its news database, constituting substantial copying with a direct market impact. A failure to prove the materiality of copying could weaken ANI’s case, as seen in the Raw Story Media case, where lack of evidence led to dismissal.
Finally, these cases demonstrate the growing importance of licensing frameworks to mitigate disputes between content creators and AI developers. Robust future agreements could set parameters for permissible AI training uses. Indian courts may also consider market-specific impacts, such as disruption to ANI's subscription model, as a key factor in shaping their verdict.
ANI will need to provide evidence that its material was not only accessed but used in a way that caused harm or violated statutory protections under the Indian Copyright Act. While courts globally emphasize the balance between innovation and copyright, Indian jurisprudence may lean towards stronger protection for content creators given the economic stakes in sectors like news media.
Conclusion
The outcomes of these cases will significantly shape the future of copyright law concerning artificial intelligence. They will establish new norms and expectations for both content creators and AI developers. As courts navigate these complex issues, new legislation could emerge. This legislation may balance innovation with intellectual property rights. Such changes would foster an environment where technological advancement does not come at the expense of creators’ rights. The legal landscape is poised for transformation as these cases unfold. This could lead to more stringent regulations around data usage in AI training. It might also lead to clearer definitions of fair use that reflect the unique challenges posed by emerging technologies.