Could an Alliance of News Organizations Build an LLM for Journalism?
Prithvi Iyer / Feb 7, 2025The relationship between AI and journalism is fraught. On the one hand, news media organizations see the utility of generative tools and systems in introducing efficiencies, reducing costs, and potentially creating new opportunities to serve readers and advertisers, while AI firms want to incorporate accurate, up-to-date news and information into their platforms. On the other hand, newsrooms are concerned that AI firms will undermine their already precarious economics and steal their content, even as AI firms want to preserve fair use maximalism to train their models on everything available online. Amid these competing dynamics, some AI firms and news media organizations are striking deals, and others are locked in litigation.
No wonder, then, that some in the field of journalism are keen to explore how to challenge these dynamics and put more power back in the hands of journalists and the organizations that employ them. One way to do that would be to limit their reliance on commercial models. A group of researchers from Microsoft Research, Data & Society, the Associated Press, Brown, and Cornell see “a critical opportunity” to “pursue a participatory approach to the development and governance of LLMs and AI, owned and led by journalists.” In a paper published on arXiv last month, they say that “In the present moment, where journalists’ work is expropriated by corporate LLMs without compensation or design input, questions of meaningful participation are particularly salient.”
To explore such questions, the team drew on best practices developed in the Fairness, Accountability, and Transparency (FAccT) research community. The participatory approach seeks to address power imbalances and “distribute power to those who typically do not have it.” To ensure that the journalism community was at the center of this initiative, the researchers consulted a “field partner” – a product manager at a large news organization –to discuss how newsrooms can “better assert intellectual property protections against data scraping from AI companies.” The research team iteratively adapted the research questions based on conversations with the field partner, eventually arriving at three key questions:
- “What are the market and structural factors that shape newsroom decision-making about AI?”
- “What are the organizational and technical requirements needed to enable the creation of a participatory design-created, newsroom-owned, and journalism-specific LLM?”
- “What new functionality might a participatory LLM enable in a journalist’s day-to-day—and what material and affective demands would participation levy upon them in return?”
To answer these questions, the researchers conducted 20 interviews with reporters, data journalists, labor organizers, and newsroom executives from newsrooms of all sizes. First, they queried how respondents use AI in their work (if at all). Then, they engaged interviewees in “participatory design fiction,” gathering reactions to an invitation to join a fictitious organization called the “Newsroom Tooling Alliance (NTA).” This exercise helped identify the “key contours of the context that such an initiative must address” if implemented. The interviews revealed numerous trade-offs and points of tension that might shape “journalism as a design space for participatory AI.”
Key Findings
The interviews shed light on the field's current structural and financial constraints, which are reflected in the pressure to create more content while experiencing a resource shortage. Respondents clarified that the financial strain is partially due to changing news consumption habits and traditional newsrooms losing their audience. Respondents, especially from smaller newsrooms, shared concerns about fairness and compensation when it comes to AI. One respondent noted:
"I hate the idea that The New York Times can strike a deal with OpenAI, but OpenAI just, like, takes my shit. Google’s the same way. They’re all the same. I’m a businessman, but I have no bargaining power. I live at the pleasure of the platform. So the rich are getting richer, the poor are getting poorer.”
Given these systemic tensions around AI and journalism, how could the field cooperate toward a collectively owned LLM? The findings reveal important factors that would shape whether organizations compete or cooperate. A key theme pertained to the different incentives between large and small news organizations: The findings suggest that while respondents supported a degree of cooperation between news organizations to resist corporate capture by technology companies, larger organizations also seek a competitive edge over rivals, trying to “do AI better” than their competitors. Despite these rivalries, some saw an opportunity to support smaller newsrooms that lack the resources to independently develop AI tools. Another source of tension related to balancing data pooling with data protection considerations. Respondents broadly agreed on the merits of pooling data to develop a common LLM, but some believed that certain work should remain proprietary, like exclusive investigations and source lists.
Another important source of tension related to consensus on use cases for such a participatory LLM, an issue that would be important to solve given the probability of cases where one member organization approves of using the LLM for a certain task or business purpose while another is strongly opposed to it. Regarding the leadership of a journalist-led LLM, respondents raised important questions about representation and accountability. Such an initiative would also need to manage how member organizations worked with corporate tech actors.
Lastly, the interviews also shed light on some day-to-day operational tensions that could arise when developing an LLM for and by journalists. For one, most respondents admitted to using commercial tools such as ChatGPT and Claude for daily tasks like digging through municipal records and creating interactive charts. However, there was widespread fear about sharing data with these models, indicating that “whereas today’s commercial models seem to meet the needs of journalists, on closer read, many newsrooms are instead wary of disclosing their proprietary data to commercial tools.” A core concern among respondents was whether adopting any form of automation, even if led by journalists, would threaten journalism jobs. “Journalists, as a workforce, are already very constrained in time, capacity, and resourcing, and the introduction of automation has a history of encouraging the elimination of their jobs, instead of augmenting their work,” the authors concluded.
So, what could a journalist-led LLM look like?
The researchers provide a proposal for the development of a participatory LLM for journalism stewarded by member organizations as part of what they call the “Newsroom Tooling Alliance.” Here is an overview:
- Members would commit to sharing some of their work products in a shared database and hold the right to choose the data they wish to include. To promote greater data sharing, the size and quality of the data shared will determine the revenue share.
- Data scientists on staff would use this database to fine-tune “the most transparent and consentful open-source LLM available (or to experiment with creating a de novo LLM) in order to develop applications for journalistic tasks.”
- Members would also be tasked with undertaking audits of text outputs to check for correctness and bias.
- The idea is that members’ use of the LLM would “be billed at cost for upload and download time, without the additional charges billed by companies like ChatGPT to use its tooling.” The core costs would perhaps be covered by foundations or through the development of other revenue opportunities.
- This Newsroom Tooling Alliance would be overseen by a steering committee consisting of journalists and newsroom executives.
Whether the project will remain a “design fiction” or become a reality remains to be seen. Either way, the investigation points to the need for alternatives to corporate AI systems, and such a participatory LLM initiative would give “newsrooms more control over how their data is used and monetized, power over what functionality is prioritized for development, and the assurance that they are not subject to the capricious decisions of an external partner.” Such an initiative could serve as a model for other creative industries that stand to lose out as AI firms continue to recklessly hoover up data.