Home

Donate
Perspective

Why Commercial Tools Can Scrape Social Media But Researchers Can't

Brandon Silverman / Nov 11, 2025

This piece is part of “Seeing the Digital Sphere: The Case for Public Platform Data” in collaboration with the Knight-Georgetown Institute. Read more about the series here.

Anne Fehres and Luke Conroy & AI4Media / Better Images of AI / Hidden Labour of Internet Browsing / CC-BY 4.0

When researchers at the Institute for Strategic Dialogue (ISD) submitted a formal data access request to X under Europe's Digital Services Act in early 2025, they had every reason to be optimistic. As an established research institution studying online reactions to Germany's elections—precisely the type of public interest work that the European Union established the DSA to enable—their request seemed straightforward. Yet, after weeks of extensive back-and-forth, X rejected the application without providing either the data or a clear legal basis for refusal.

When other organizations went through the same process with new data-sharing programs launched by Meta and TikTok, researchers experienced so many challenges that the European Commission launched a formal investigation. After hundreds of interviews with researchers, their preliminary findings confirmed that the programs were not providing the kind of access and data that they claimed.

These cases exemplify a stark paradox at the heart of our digital society: even with the support of new emerging regulations, independent researchers still face major barriers to accessing even basic social media data essential for studying everything from election integrity to public health crises. However, if you happen to be a commercial for-profit hoping to study social media, there is a thriving, $100 million-plus market that freely trades this same data with very few restrictions.

The commercial data bonanza

In September, Chris Miles and I did a comprehensive analysis of the social media monitoring market that revealed an ecosystem of more than 250 for-profit tools used by corporations to track conversations, measure sentiment, and analyze trends across social media platforms. Companies like Brandwatch earn between $100 and $200 million annually from more than 5,000 corporate clients, while Meltwater reported $439 million in revenue in 2022, serving over 27,000 clients worldwide. These tools offer brands everything researchers struggle to access: multi-platform coverage, real-time tracking, historical data, and sophisticated analytics.

Pricing alone shows who gets access. While basic commercial packages start around $500 monthly, enterprise tiers routinely exceed $2,000 per month, with the most premium options reaching $8,000 monthly. For corporations, this is a standard business expense; for academic researchers and civil society researchers operating on shoestring budgets, it's out of reach.

But the real story isn't just cost—it's about how these commercial tools actually obtain their data. Our investigation, which included mystery shopper interviews with 21 major social media monitoring companies, revealed a troubling pattern of opacity. While some tools maintain direct partnerships with platforms like X, Meta, and LinkedIn, many rely on data scraping or purchasing from third-party suppliers that scrape content themselves. Even the largest monitoring companies acknowledged using "augmented" data collection practices and third-party sources to fill gaps where official partnerships don't exist.

In other words, when direct access isn't available, commercial tools simply find workarounds. One customer representative candidly admitted: "Even when they added more restrictions on Facebook, we found a way to get more data."

The research desert

Meanwhile, independent researchers face a starkly different reality. Over the past few years, platforms have systematically dismantled the infrastructure that once supported public interest research. Twitter's API, which powered over 17,500 academic papers since 2020, now comes with prohibitively expensive pricing. Meta shut down CrowdTangle in August 2024, mid-election year, replacing it with a far more restrictive system that many researchers struggle to access.

The implementation of Europe's Digital Services Act, which explicitly requires platforms to provide data access for researchers, has been promising but also slow and frustrating. One of the major reasons for the lack of progress so far has been how inadequate the various platform responses have been to date. According to the DSA Collaboratory, out of 38 completed applications, only 30 researchers have heard back from platforms and of those 30, more than a third were rejected. Even when platforms grant access, researchers report broken APIs, inconsistent data returns, and arbitrary revocations of already-approved access.

The overall impact has been devastating. A survey of 167 researchers found 30 canceled projects, 47 stalled initiatives, and 27 cases where researchers abandoned platforms entirely due to data access restrictions. Sixty percent of independent technology researchers now face significant barriers to accessing data essential for studying technology's societal impacts.

Why this matters for democracy

This two-tiered system—where corporations buy access while researchers are locked out—fundamentally undermines democratic oversight of technology. Consider what's at stake.

Platform bias and overall accountability: When researchers can't systematically study algorithmic bias, content moderation, or the spread of extremist content, platforms operate without meaningful independent oversight. The public and politicians are left trusting companies' own reports about their systems' impacts, which we know are often misleading or incomplete.

Consumer protection: Understanding how platforms affect the lives of users—from teen mental health to financial scams—requires data access. We’ve seen over and over how scams & frauds proliferate on platforms without meaningful external oversight. Moreover, without any data, policymakers craft are left crafting ineffective regulations based on incomplete information.

Election integrity: Researchers studying government-sponsored election interference patterns lack the data they need to track coordinated manipulation or understand how foreign-sponsored false narratives spread, while social media monitoring companies often times sell this exact capability to political campaigns.

Public health: During the COVID-19 pandemic, social media data proved invaluable for tracking vaccine hesitancy and mental health impacts. Yet researchers studying these patterns now face arbitrary access restrictions, even as pharmaceutical companies purchase powerful social listening tools that can give them a minute-by-minute readout of how their brands are being talked about online.

The irony is stark. Corporations can purchase comprehensive social media intelligence to optimize their advertising and brand management, while researchers attempting to study how these same platforms affect democracy, public health, and social cohesion are systematically excluded.

The path forward

This crisis demands immediate policy action on multiple fronts.

Safe harbor for public interest scraping: Given the slow pace of building effective data access programs, one clear and easy path forward is providing for legal protections for responsible public interest scraping. Existing legal ambiguities around non-commercial web scraping leaves researchers vulnerable to litigation & adversarial blocking while commercial entities often navigate these same gray zones freely. Clear legal protections for public interest research would level the playing field and provide immediate transparency & accountability.

Mandatory transparency requirements and safety standards: Platforms should be required to share data and to do so in ways that are standardized across the industry to make the data usable & accessible, but also adhere to ethics & privacy protections designed & informed by civil society (not platforms).

Enforceable requirements: The DSA provides a framework and other countries should learn from the requirements in the DSA, but enforcement still remains a large question mark. Data sources should be regularly audited for accuracy and reliability. Platforms that reject researcher applications should be required to provide specific justification and the rejection rates should be monitored. But most importantly, platforms that systematically deny meaningful, reliable access should face meaningful penalties..

Public data infrastructure: Just as clinical trial data must be shared publicly under FDA requirements, we need similar mandates for platform data relevant to public interest questions. The clinical trial transparency regime demonstrates how sensitive data can be shared with qualified researchers while protecting privacy and commercial interests.

Breaking the cycle

The current system creates a vicious cycle. Platforms restrict data access, citing privacy concerns and commercial interests. In the absence of independent research, public understanding of platform impacts relies primarily on company-sponsored studies and voluntary transparency reports. Researchers attempt to do their best work based on limited data and platforms inevitably respond to any unfair criticism by further restricting access.

Breaking this cycle requires recognizing a fundamental principle: data about public social media activity—the posts, shares, likes, and comments that shape public discourse—should be accessible for public interest research. Not all data, and not private information, but the public digital conversations that increasingly govern our politics, health behaviors, and social relationships are a public good and should be treated as such.

When investigations into the US meatpacking industry in the early 20th century exposed unsanitary practices, the industry's reputation and sales plummeted. Rather than embrace transparency, industry leaders denied the allegations and complained about costs. But mandatory inspection ultimately strengthened the industry by rebuilding consumer trust—proving that accountability serves business interests.

Today's technology platforms often make similar arguments but the stakes are far higher. These systems don't just process our meat or our oil—they mediate our democracy, shape our children's development, and increasingly determine what we know about the world.

We cannot afford a future where only those who can pay get to understand how technology shapes society. The time for voluntary cooperation has passed. What we need now is enforceable transparency, genuine researcher access, and recognition that understanding our digital public sphere is a prerequisite for governing it wisely.

The choice is clear: continue down a path where commercial interests are served while public interest goes wanting, or build the transparent, accountable system that democratic oversight requires. The research community is ready to do its part. The question is whether policymakers will provide the framework that makes independent inquiry possible.

***

The author draws on research from the Institute for Data, Democracy & Politics at George Washington University and the Coalition for Independent Technology Research, representing insights from hundreds of researchers across six continents working to understand technology's impact on society.

Authors

Brandon Silverman
Brandon Silverman is a policy expert and outspoken advocate for social media transparency. He was the former co-founder and former CEO of CrowdTangle, a social media analytics platform used to increase transparency on sites like Facebook, Instagram, and Reddit. After Meta acquired CrowdTangle in 201...

Related

Podcast
Why Independent Researchers Need Better Access to Platform DataNovember 9, 2025
Perspective
The World’s Growing Information Black Box: Inequity in Platform ResearchNovember 7, 2025
Analysis
Determining Which Researchers Can Collect Public Data Under the DSAOctober 27, 2025

Topics