Megan A. Brown is a research scientist at the New York University Center for Social Media and Politics.
On February 21, TikTok announced its new researcher API (application programming interface), and with it the set of terms and conditions for researchers to access it. Excitingly, TikTok’s announcement expands data access to researchers, offering a new look at TikTok data that has been previously inaccessible. With Twitter’s recent announcement that its API, a predominant source of social media data for researchers, will no longer be freely available (though recent updates leave uncertain what will happen to the researcher program), it’s heartening to see another social media platform expand access to researchers.
However, the new API is not without its critics. Researchers have (rightfully) criticized TikTok’s API Terms of Service for its restrictions on data retention, data sharing, and licensing agreements. These requirements are incompatible with the research process, which relies on rigorous analyses over time, sharing data for replication of findings, and independent publication of research.
But TikTok’s API requirements are not fundamentally different from other platform APIs. The problem here is not TikTok. The problem is that researchers have to rely on platforms to access data in the first place.
Social media platforms have a huge impact on our society, but the data to study them is largely locked inside of the platforms themselves. Researchers, within reason, should have access to platform data to study questions of public interest regarding the impact of platforms on issues of democratic representation, public health, and more.
The way research on platforms currently works is that platforms voluntarily make data available to researchers, either by publishing datasets or opening APIs, allowing them to collect and analyze platform data. These data are one of the primary ways that researchers can assess what is happening on digital platforms outside of platform transparency reports, which are limited in scope and often consist largely of transparency theater.
Core to many of the concerns about TikTok’s API Terms of Service is that the policies that researchers must follow to get access to data are fundamentally at odds with how researchers do their work. This is true! TikTok’s API ToS requires that researchers refresh data every fifteen days, bars researchers from sharing data, and requires that researchers send their papers to TikTok thirty days in advance. This, as correctly noted in another Tech Policy Press column, flies directly in the face of academic research norms, which require the long-term retention of data for analysis, open data for replication, and the independent sharing of results in peer-reviewed journals.
But other platforms require the same thing. Twitter’s Academic API, which was previously lauded for its transparency and openness, also requires that researchers regularly refresh their data to remove deleted content. YouTube’s researcher program similarly requires regular refreshment of data (though there are provisions that you can stop refreshing when data must remain fixed for analysis), bars researchers from sharing data, and requires that researchers provide YouTube with a copy of research in advance of publication.
Furthermore, before there is ever any data collection, analysis, publication, or replication, researchers typically have to apply to the platform to begin with, providing details ranging from their qualifications and their institution to the research questions and types of analysis they plan to undertake with the platform data, effectively granting platforms with the power to screen researchers and research projects. At any stage, the platforms can decide to ban individual researchers, or revoke access altogether, like Facebook did in 2018 in the wake of Cambridge Analytica, with little recourse. This issue has once again come into stark focus in recent weeks, as Twitter announced it was halting free access to its API, imperiling research projects around the world.
It shouldn’t be this way. To perform public-interest research investigating social media — including on vital questions around democracy, public health, crisis informatics, and more — researchers should not have to rely on the good will of platforms to provide data access. And platforms should not get to decide what data is available to researchers, which researchers get access to data, and what restrictions on collection, analysis, and publication are put in place.
This is not to say there should not be any restrictions on researchers for studying platforms. There are serious ethical and privacy concerns regarding the scope and scale of platform data that must be considered. However, platforms should not be the arbiters of these decisions.
For more independent and rigorous research on social media platforms, data access requirements should be legally mandated. First and foremost, researchers collecting publicly available data should be protected from litigation from companies when their research serves a non-commercial public interest. Second, research projects of utmost importance — and subject to sufficient scrutiny for ethics, methods, and impact — should be granted specialized access to non-public data. Finally, a third party should mediate the relationship between platforms and researchers for data access, as several current policy proposals suggest.
With an increasing focus on the outsized power that social platforms have in governing the way we communicate in our day-to-day lives, it’s vital that upcoming regulations, including the Digital Services Act in Europe and other legislation in the United States, are informed by independent research. While TikTok’s Researcher API is a promising step in the right direction, mandated data requirements are paramount for ensuring that these lines of research continue.