The Ghibli-Style AI Trend Shows Why Creators Need Their Own Consent Tools
Mallory Knodel, Audrey Hingle / Apr 2, 2025
Tokyo, Japan—The Robot statue on an open garden space at Ghibli museum. Shutterstock
Last week’s flood of Studio Ghibli-style AI images generated by OpenAI’s latest update to GPT-4o quickly captured the internet’s imagination, reportedly bringing in over one million new users to the platform in a single day. Dreamy skies, expressive faces, and soft brushstrokes uncannily replicated the studio’s signature aesthetic, and the style was quickly applied to everything from popular memes to family photos. OpenAI’s CEO Sam Altman jumped on the trend, updating his X avatar to a Studio Ghibli-style image and tweeting about it. The White House got in on it, too, posting a particularly cruel image of a woman who was arrested earlier this month by the US Immigration and Customs Enforcement.
The resemblance of these images to Studio Ghibli’s style wasn’t accidental. These images are strong evidence that OpenAI trained its model on copyrighted Studio Ghibli content, likely scraped without permission.
The studio almost certainly didn’t consent to having its distinctive style replicated. Amid the coverage of the trend, comments resurfaced from the studio’s founder, Hayao Miyazaki, who once said of AI-generated video: “I am utterly disgusted… I strongly feel that this is an insult to life itself.” While we can’t undo what’s already happened, we can still build protections for the next generation of artists and better tools that respect creativity.
If we want an internet that values consent, creativity, and fairness, we need tools that respect the boundaries set by creators, which should be attached to their work. Current proposals to manage AI scraping mostly focus on robots.txt, which is primarily useful for website owners and publishers who control their domains. But robots.txt doesn't effectively address content shared across platforms, nor does it give individual creators a way to easily communicate consent when publishing on third-party sites or when others reuse their work. To fill this gap, new solutions are emerging, from embedding machine-readable metadata directly into files to new tools and protocols aimed at making consent more portable, persistent, and easier to enforce.
The Limits of Robots.txt
Many current discussions around managing AI scraping focus on updating robots.txt for the AI era. While robots.txt is a critically important tool because of its simplicity, widespread adoption, and long-standing role in guiding web crawlers, it was never designed to serve as a robust rights-management tool. First proposed 30 years ago as a simple, voluntary protocol for website and crawler interaction, it is a way for site owners to express how they want search engines, researchers, and archiving projects to handle and use their content based on a clear signal and good manners.
That worked well enough when crawlers indexed content for search, research, or archiving. But the stakes are much higher now. Today’s AI systems scrape vast amounts of content from the open web, including websites like Wikipedia, news outlets such as The Guardian and The New York Times (which is now suing OpenAI), public domain and pirated books, code from platforms like GitHub, and public forums like Reddit. Some of this material is in the public domain or openly licensed, but much is copyrighted, raising ongoing legal and ethical concerns.
While robots.txt might be well suited for website owners and publishers who can tell AI scrapers to buzz off their entire sites, it does little to address issues faced by individual content creators, such as artists, musicians, writers, and other creative professionals who share content across multiple platforms or websites. These creators need a way to easily communicate their consent preferences when publishing their work on third-party sites or when others use it.
Case Study: Bluesky and the Social Media Challenge
A recent debate on Bluesky perfectly illustrated the complexity of consent in the era of AI scraping. The platform introduced a proposal to let users opt in or out of having their posts scraped for AI training. According to Bluesky’s CEO Jay Graber, this proposal represented a way to give individuals more control over how their content was used, but it triggered a backlash. Many users misunderstood the proposed feature as a potential shift in platform policy that would allow Bluesky to train AI on users’ posts rather than a tool to control third parties. The proposal hasn’t yet led to any action or platform changes.
The confusion speaks to the broader problem: most people don’t know how to express online consent preferences for their content if the options even exist. If they do exist, the technical mechanisms are often hidden, inconsistent, or limited to domain-level control.
Emerging Tools for Content-Level Consent
As the limits of domain-level controls like robots.txt become more apparent, new approaches are emerging to embed consent directly into the content, making it portable, persistent, and platform-agnostic. Some focus on embedding consent signals directly into individual files, making preferences easier for creators to manage across platforms.
Examples include adding machine-readable metadata directly within images videos and other digital files, and tools such as Spawning’s Do Not Train tool suite or the TDM·AI proposal, which provide creator-friendly solutions for content-level control. Additionally, structured HTTP headers and expanding signaling mechanisms to APIs and cloud services are suggested to ensure consistent preference communication across various digital environments. Together, these tools offer a more scalable and creator-centric way to manage how content travels and is used online, especially in the context of AI training.
Why Signals Aren’t Enough Without Enforcement
Expressing consent is only one-half of the equation. Ensuring they are respected is the other half. The current crop of tools proposed rely entirely on voluntary compliance. Without enforcement, even the clearest signals can be ignored. The growing backlash against AI scraping reflects a deeper concern over the erosion of long-standing norms online
As regulators, primarily in the EU, move to define legal frameworks for AI transparency and data usage, the technical community has a narrow window to weigh in and help shape meaningful and enforceable norms. The EU AI Act and its accompanying Code of Practice have added urgency as rights holders and cultural organizations demand enforceable safeguards and more meaningful opt-out (and opt-in) mechanisms. If we want tools that truly empower creators, expressed preferences must be backed by accountability—which means regulation, not just best practices.
Recommendations
Getting this right matters deeply, not just for publishers and artists, but also for researchers and journalists whose work depends on open access to information. As policymakers and technologists debate the future of AI data use, now is the time to weigh in. To have your say, consider joining the conversations with the IETF or following the upcoming events and livestreams from Brussels.
Here are our recommendations for how to build a better internet for content creators:
- Empower Creators with Content-Level Signals:Creators need simple, built-in ways to express how their work can and cannot be used. These signals should be embedded directly into the content itself (images, videos, text files), not just at the domain level. This makes preferences portable, persistent, and platform-agnostic.
- Prioritise Clear Signals Now, Expect Enforcement Soon:The more consistent and understandable our signaling systems become, the easier it will be for policymakers to craft enforceable rules around them. We need infrastructure that sets the stage for regulatory action.
- Expect Complexity: For widespread adoption, technical designs must be lightweight and interoperable, while the legal frameworks that support them must be robust. In other words, we cannot reduce complex rights and equity relationships to protocols built for automated systems and service-to-service communication. We need signals that aren’t just legible to machines but are usable and understandable by the people they’re meant to protect.
- Someone, Please Launch An Ethical Alternative:There’s a growing demand among developers and everyday users for AI systems trained on ethically sourced data. For companies looking to stand out, building or supporting models that respect creator consent isn’t just the right thing to do. It’s a market opportunity waiting to happen.
Authors

