TikTok’s API Guidelines Are a Minefield for Researchers
Joe Bak-Coleman / Feb 22, 2023Dr. Joe Bak-Coleman is an associate research scientist at the Craig Newmark Center for Journalism Ethics and Security at Columbia University and an RSM assembly fellow at the Berkman Klein Center's Institute for Rebooting Social Media.
As Twitter winds down its free API, imperiling public interest and research projects, researchers like myself were delighted to hear that TikTok will provide data access to qualified researchers through a new API. At face value, such access could be instrumental in providing insight into our rapidly changing digital world, particularly as TikTok is quickly becoming a dominant social media platform even as it is plagued with concerns over security, privacy, and teen mental health.
In practice, the Terms of Service (ToS) create a veritable minefield for researchers. Common to many API ToS, they begin by requiring the researcher to describe a distinct research project—subject to approval by TikTok. Although this does forfeit a great deal of independence, it is fairly commonplace and rarely strictly adhered to in practice.
If a researcher does manage to get approval for a (sufficiently innocuous?) research project, that’s when the headaches will begin. While collecting data, they’re required to check every 15 days to see if anything they’ve collected has been removed, and then remove that data from their dataset. If the data removed are not missing at random, any analysis they conduct might lead to substantially different results 15 days later. Any data collected over longer timescales will be biased by data missing from early dates, making it nearly impossible to reliably study phenomena on timescales of more than a couple weeks.
So imagine a rule-following researcher is forced to collect and analyze their data within two weeks, conduct their analysis and submit the paper to a journal for publication. As required, they check to see if any data has changed or been removed, and find that a single post has been removed, requiring them to delete the data. This would certainly run afoul of my university’s data retention policy, for instance, which requires me to keep data for 3 years following the end of a project. Depending on their funding, deletion may even violate laws. They’re now at a crossroads where they can either violate the ToS, or try to convince their university or funder to make an exception.
Perhaps the researcher will overcome this hurdle, persuading all relevant institutions to bend the rules. When peer review arrives, the editor or peer reviewers may request access to the data in accordance with their policies. Our researcher has to explain to the editor and peer reviewers that they cannot share the data, and if they could, some data no longer exist. They cannot even respond to requests to rerun or adjust the analysis, as the missing data are not missing at random and will bias the findings. I would certainly hope that journals would reject a paper if the data cannot be produced, but it’s possible an author could talk their way through.
Now the real trouble begins.
Upon acceptance, the author is asked to sign a standard form granting the journal exclusive rights to publication. Unfortunately, they’ve already bestowed Tiktok with “worldwide, free, non-exclusive, and perpetual” rights to any research products. They’ll have to convince the journal to waive their exclusive rights to publication, or perhaps pay for open access.
As a ToS-required courtesy, they send a copy of the manuscript to TikTok 30 days prior to publication. From here, TikTok has a lot of options. They’re free to distribute the work to journalists, publish it alongside your name and university logo, or re-analyze the work using their in-house data and generate a rebuttal before it hits the press.
What if TikTok really does not like the author's work? The company can simply delete the underlying data from its servers, which requires the researcher to, in turn, delete their data. The author would now need to reach out to the journal editor, their university, and funders and say that not only has a portion of data been deleted prior to publication, but now it is all gone.
Perhaps they persuade everyone to allow the work to get out there—it’s important enough to make exceptions. As with any published work, there is the remote possibility that the researchers are accused of academic dishonesty. When asked to turn over their data and materials to university committees and journal investigators, they can only respond “I had to delete it.” This is unlikely to be a particularly convincing defense, and certainly not enough to clear the researcher of allegations. This could be cause for a retraction, or even being let go from their position.
In most cases, of course, their work will be published uneventfully. Perhaps it even garners news attention, and TikTok wants to use the research to promote the API. In signing up for the API, they’ve permitted use of their institution's name and branding for promotion, typically something they’re not permitted to do under branding guidelines. If they didn’t manage to clear unconditional transferring of full institutional branding rights to a third party private company (unlikely), they’re in for a rough meeting with higher-ups. Their institution may choose to sue TikTok for unlicensed use, arguing that the researcher did not have authority to license the brand. TikTok might now sue the researcher for failing to uphold the ToS. The university might take action against the researcher, all over a logo on a blog post put out by TikTok.
This slew of worst-case-scenarios highlights what is required to follow the basic ToS. A researcher must propose a project; have it accepted; convince their university to transfer branding rights; conduct their work within two weeks; convince their institution, funder, journal, and peer reviewers to allow a paper published on deleted or missing data; and send a copy prior to publication to TikTok. After all of this, they will possess no ability to defend their work against allegations of impropriety, and no researchers can reproduce their findings.
All API ToS’s are violated at times, but the conditions TikTok has created seem distinct in that there appears to be no plausible pathway to published research without violating the ToS. Even then, researchers have to hope their university is comfortable with the transferring of name-use rights for promotional reasons. Given this lack of options, our choices are to either avoid the API entirely or break the ToS and let the lawsuits fly.
Ideally, the research community can convince TikTok to loosen the ToS and make them more practical. Failing that, the question is not whether the ToS will be broken, but how and by whom. Many researchers simply lack the institutional support or job security required to face a lawsuit from a $300 billion company. Instead, we’ll have to rely on those with sufficient legal and institutional protection to flagrantly violate the ToS and invite a lawsuit they know they can win.
Although more restrictive than most, these terms of service contain many restrictions imposed by other platforms. For instance, YouTube also has a data refresh policy (30 days) rendering it likewise impossible to meet the ToS and guidelines for data retention, much less norms of reproducible research. In this sense, these ToS highlight precisely why we need to enshrine standards of transparency and data access into law. We simply cannot expect companies to provide us with the basic levels of data access required for independent oversight and evaluation of their societal impact.