Home

Startup Aims To Help Publishers Collect Fees from AI Companies

Anya Schiffrin, Haaris Mateen / May 14, 2024

A toll booth on the New York State Thruway in the 1960s. Source

As the news media industry panics about whether artificial intelligence (AI) will kill its already ailing business model, a handful of startups have emerged intending to help publishers monetize their content on chatbots and other AI interfaces.

In some ways it is an old story: in the age of search, Google knew that providing news snippets would diminish the number of people going directly to news websites. AI will have the same effect, but on steroids. With search, people looking for information might read the first or possibly second page of Google results. Now AI will look through dozens of pages and the linked article, distill the information and provide custom answers to user prompts. This could have the effect of killing traffic to destination websites.

Pretty much the only thing that news publishers have going for them is that without quality information the performance of the large language models (LLMs) that power AI chatbots such as OpenAI’s ChatGPT will deteriorate. So, there may be a chance for publishers to monetize the up-to-date information they produce.

“In order to reduce hallucinations and increase accuracy, LLMs need to retrieve content and data on an ongoing basis,” said Olivia Joslin, co-founder of Tollbit, a startup that announced in March 2024 that it had raised $7 million for its business aimed at helping publishers license content to AI companies.

One way of trying to make sure publishers get paid is to sue AI companies for using their copy. The New York Times filed a lawsuit against OpenAI in December. A group of progressive outlets including the Intercept and Raw Story sued in February, and eight papers owned by Alden Capital sued last month. Some industry leaders, such as IAC chairman Barry Diller, believe it is necessary to push for new copyright laws to change to the definition of “fair use.”

But Joslin and Tollbit co-founder Toshit Panigrahi – both alumni of Toast, a restaurant management system – think there may be a faster, and cheaper, way to start generating revenue for publishers.

“Instead of a cat and mouse game where companies that want data spend money on scraping and content owners have to keep spending more on cyber security, why not make an agreement? You can’t build a sustainable AI ecosystem with endless litigation,” Panigrahi said in an interview.

AI agents and web scrapers scour the internet constantly to get information that has been verified in order to ground models in facts. Before AI, companies that invested in cyber security could usually tell whether the entity reading their website was a person or a bot. Now web scraping tools use AI to avoid detection and scrape more efficiently. So what TollBit does is offer scrapers the option to pay the destination directly for content – a ‘toll’ on its use.

The idea is that TollBit works with the existing cybersecurity tools and forwards web scrapers or AI agents over to TollBit. Then, TollBit gives them a message offering them the chance to pay a licensing fee without the need to negotiate with the publisher ahead of time. TollBit is essentially a metered tool sitting on top of the internet. Once the scraper is sent there it can decide whether to pay.

In this scheme, the price of the licensing fee is up to the publisher, and so are the other conditions like whether to insist on attribution for the content being used. All kinds of agreements can be made, including a bulk price for an LLM that sends in more queries.

Joslin and Panigrahi say that AI models seeking current information for retrieval-augmented generation(RAG) will need to fetch it from a publisher, then go and reference it every single time a user asks for it and then they will display it. This means it will be easy to verify how many times the content is used.

This licensing idea – one of many that will likely emerge in the months ahead – does not preclude regulation. In fact, regulation may be necessary to ensure metered systems such as the one TollBit provides are respected, Panigrahi says. “I anticipate in the future there will be regulations that say, for example, if there are 5,000 citations of an article then you have to pay for it.” He thinks that even now audits can be done on the output of these generative AI tools, especially when RAG is employed, that could tell if the source content is being cited appropriately.

It’s important to note that TollBit is more like a tollbooth than it is a way of valuing content. It just sits there ready to set up a “contract” with the scraper or AI agent and collects and takes a cut of what it collects. It does not set the price. Publishers are still going to have to wrestle with the question of what kind of news they can sell and earn money from. In recent interviews, Panigrahi said that the cost per thousand (CPM) often used to sell advertisements could be a good indicator. So could a determination of the traffic lost to AI – though we would argue that traffic has already fallen so much that it may not be the best metric.

Joslin and Panigrahi say Tollbit is onboarding new customers every week, including “international conglomerates,” but would not provide details. Publishers are not charged an upfront fee but TollBit will take a cut of revenue that is collected. Joslin and Panigrahi declined to say what runway the company has or to provide revenue projections.

In a recent podcast, Ben Lerer, managing partner of Lerer Hippeau, a venture capital firm that invested in TollBit, explained why he thinks TollBit’s work will help publishers.

“Hand-to-hand combat is not going to work. You need to meet the agents with agents. This layer provides a path to monetization and a means for these agents to legally and fairly access content. This isn't just to protect the website that is being disintermediated but also the AI bot that is going to use the content.”

However, it’s clear that TollBit’s business model is based on several optimistic assumptions, including the belief that AI companies will be willing to pay for quality content and that they will be willing to pay for more than a couple of news sources.

It is possible that, just as advertising became fully automated, so could the licensing part of scraping to train AI systems. AI agents and scrapers could automatically scour different sites and pay for the cheapest one, deciding, say, that The Times of India reporting on the news of the day is as valuable as that of The New York Times in certain contexts, and is perhaps cheaper.

However, Joslin is convinced that audiences will want reliable AI applications with high quality information. “If I don’t see the publisher I know, why would I trust the information? Branding and voice are important,” Joslin said.

Authors

Anya Schiffrin
Anya Schiffrin is the director of the Technology, Media, and Communications at Columbia University’s School of International and Public Affairs and a lecturer who teaches on global media, innovation and human rights. She writes on journalism and development, investigative reporting in the global sou...
Haaris Mateen
Haaris Mateen is Assistant Professor of Finance at the University of Houston C.T. Bauer College of Business. His research focuses on the economics of climate change and on media economics. He received his PhD in Economics from Columbia University.

Topics