Enabling Independent Research Without Unleashing Ethics Disasters

Josephine Lukito, J. Nathan Matias, Sarah Gilbert / May 10, 2023

Dr. Sarah Gilbert is a postdoctoral associate at Cornell University and Research Director of the Citizens and Technology Lab. Dr. J. Nathan Matias is a Fellow at the Center for Advanced Study in the Behavioral Sciences at Stanford University and an Assistant Professor at the Cornell University Department of Communication, where he is founder of the Citizens and Technology Lab. Dr. Josephine ("Jo") Lukito is an Assistant Professor at the University of Texas at Austin’s School of Journalism and Media.

When Twitter cut off the National Weather Service storm warning system by revoking its API access, it was the clearest indication yet of a critical digital infrastructure in dramatic decline. By charging at least $42,000 a month for API access, Twitter is effectively pulling the plug on thousands of essential public-interest projects across government, journalism, civil society, and academia.

How can research conducted independently of tech companies persist in the face of attacks and negligence from Twitter and other tech firms? New constellations of researchers are scrambling to collect data, build shared infrastructures, and advocate for new transparency laws. If we're not careful about privacy, ethics, and gatekeeping, the remedy might be worse than the problems we're trying to solve.

For the last six months, our team of researchers has interviewed dozens of journalists, civil society, and academics about the ethics and privacy risks of independent research. Groups such as the Coalition for Independent Technology Research (of which we are members) are organizing mutual aid, advocacy, and collective action for researchers affected by the Twitter API restrictions. Because the response is moving so fast, we have decided to share some preliminary suggestions for the field.

Ethics, Privacy, and Gatekeeping in Independent Research

Everyone working on researcher access must wrestle with three basic challenges: privacy, ethics, and gatekeeping. To understand why, we need to look at another important tech news story. The week Twitter cut off its API access was also the week that applications opened for the $725 million that Meta agreed to pay consumers in response to legal complaints about sharing data with Cambridge Analytica. The scandal, which justifiably generated global outrage, has widely been cited as the main reason or excuse for companies to restrict independent access to tech platforms.

How can we uphold the public good without creating another Cambridge Analytica? Whether it's a scraping project or a new regulatory scheme for data access, initiatives for independent research need to address three questions:

Privacy: What rights and power should people hold over their participation in independent research, and what kind of data sharing is justifiable between researchers?
Ethics: what relationship and obligations should independent researchers have with the people whose data we are studying? What forces will ensure that researchers face consequences for abusing their power?
Gatekeeping: who should be allowed to carry out independent research?

Without care, quick solutions to data access could make the situation much worse by opening the door to unethical practices—or by restricting access in ways that prevent important work from happening across journalism, civil society, and academia.

The problems of ethics, privacy, and gatekeeping are clearest when academics rush to create data collection and sharing projects. Universities are used to relying on the U.S. Common Rule and Institutional Review Boards (IRBs) to manage gatekeeping. But any solution that relies solely on IRBs will automatically cut off journalism and civil society, who don't have the same regulatory systems and protections.

How can we create solutions that can help civil society, academia, and journalism collectively serve society with independent research? Based on our research, we have a few initial ideas.

First: Why Independent Research Matters

Without independent access to data, researchers cannot continue important work exposing injustices perpetrated by big tech and governments: both in the US and worldwide. For example, citizen scientists would not have discovered that AirBnB manipulated data to hide illegal short term rentals, journalists could not have uncovered inequitable ad pricing on Facebook, and academics would not have identified algorithmic biases across many platforms, including Twitter and YouTube.

Access to internet data was how researchers tracked the disproportionate impacts of the COVID-19 pandemic on Black Americans, uncovered the use of biased algorithms to determine access to government subsidized housing, and revealed that governments were undermining Americans’ right of assembly by surveilling and suppressing protests. And without independent research, the role of Facebook in supporting Muslim genocide in Myanmar, racist development practices in South Africa, and social media manipulation during elections in India would not have been exposed.

This work has life-changing impacts. For example, groups like AI for the People and Data for Black Lives develop art and educational resources to empower Black people. This research also influences policy that would support data rights, regulate the use of emergent technologies, and secure housing.

Managing Threats to Independent Data Collection

What options do researchers have when platforms resist data collection? Typically, they have four options.

First, until Twitter started cutting off access, many researchers worked through official platform APIs, which are available in various forms from multiple platforms. But the terms are often restrictive, and are subject to change at the whim of the company.

Second, researchers can scrape data without seeking special permission from platforms. This essential technique is sometimes possible when platforms don't take active steps to restrict it. But scraping carries possible legal risks, which is why researchers have called for scraping safe harbor regulation. Additionally, scraping carries significant privacy and ethics challenges, especially if it involves data that people expect to be private.

Third, independent researchers can also ask others to share data that has already been collected, using it for purposes other than the initial reasons of data collection. While several shared data repositories already exist, the privacy, ethics, and legal constraints on sharing data are significant. And many data repositories created by academics explicitly restrict access to anyone outside of universities.

The final option is to ask people to donate their data. Consent-based data donation solves many ethics and privacy challenges, but it can be costly to recruit people at the scale the research sometimes needs. And researchers sometimes have legitimate reasons to study powerful public figures who might not want to appear in a journalistic investigation or academic study.

The Gatekeeping Problem in Independent Research Access

How can we support independent research while avoiding the next Cambridge Analytica? The history of research is littered with horrific stories about people who justified terrible things with the belief that it was for the common good. Independent research of all kinds needs guardrails and accountability. The problem is that the most common, familiar forms of accountability in one sector tend to create problems for people in other sectors.

While relying on existing institutions, such as IRBs, to assess the ethical risks of research may seem like the obvious solution, requiring ethical review by IRBs in all instances would damage the research landscape. Prior work has uncovered a number of challenges faced by IRBs, such as keeping up with evolving technologies and practices, which is further complicated by the highly contextual expectations of users. Further, many studies using public data are exempt from review. But most importantly, many of those conducting important independent research, such as citizen scientists, activists, and journalists, would be locked out from this work.

To imagine forms of accountability that don't turn into harmful gatekeeping, we need to be able to recognize how different fields manage privacy and ethics—and build our approach to research access on those existing norms. That's what our research team is studying.

Solutions to the Privacy and Ethics Problem

So how do journalists, civil society, and academics manage ethics and privacy today? Even though we have different laws, institutions, and practices that govern our work, all three sectors share many common approaches to ethics and privacy. Our interviews revealed four key tactics: centering the human, increasing agency and awareness, protecting data, and being proactive.

Centering the human: Interviewees described thinking of data not only as “numbers” but also as a reflection of human experiences. This included accounting for context, power, and impact; for example, being aware of potential harms to vulnerable people while holding those responsible for harm accountable. Community norms and expectations provided important guidelines for ethical decision-making processes.
Increasing agency and awareness: Whether or not—and even how—to get informed consent for research is a thorny issue. The researchers we spoke to described navigating situations in which gaining consent would be challenging by increasing agency, such as offering opt-out options, or through increased awareness, such as sharing results with communities.
Protecting data: Protecting data was a key tactic for minimizing the potential for harm. Interviewees discussed protecting data and participants by developing anonymization protocols for sharing or describing their data in research papers, and by limiting use and distribution of their data.
Being proactive: Importantly, researchers also noted that they wanted to be proactive in building ethical norms and guidelines. They did not want to wait until something bad happened, which is often how ethical discussions have historically emerged, such as with the Tuskegee Syphilis Study and the Belmont Report.

These solutions go a long way in protecting users. However, the existing guardrails guaranteeing these solutions aren't consistent across sectors, and they're completely absent in some cases. Without at least some guardrails, neither researchers nor the public will be adequately protected.

Charting a path forward on platform access

As internet researchers from across constituencies develop new solutions in response to Twitter and other tech firms limiting access to key infrastructure, it is important that ethics—and the humans who comprise the data—are not forgotten in the chaos. Drawing from our interviews with experts, we offer the following set of recommendations:

Invest time and money in understanding how ethics are currently managed within each sector. Our upcoming report will offer more detail on this question.
Respect the expertise of practitioners—any new infrastructure project needs legal and ethical input from across constituencies, as members from each have deep and varied experiences to draw from.
Include communities affected by digital research and compensate them for their time and expertise.
Support conversations and action on ethics and privacy guardrails and ensure that they will work for civil society, journalists, and academia alike.

We hope this advice will be useful to policymakers, research teams, and funders working on solutions to the platform access problem. We're grateful to the research teams who have spoken with us over the last six months, and we look forward to sharing more results and recommendations in the coming months.

Authors

Josephine Lukito

Dr. Josephine ("Jo") Lukito is an Assistant Professor at the University of Texas at Austin’s School of Journalism and Media and incoming Professor of Digital Communication in the Digital Democracy Centre at the University of Southern Denmark. She is also the Director of the Media & Democracy Data Co...

J. Nathan Matias

Dr. J. Nathan Matias is an Assistant Professor at the Cornell University Department of Communication. He holds a Ph.D. from MIT Media Lab’s Center for Civic Media, a Master’s from MIT, and a dual Master’s and BA from Cambridge University. He has been a fellow at Columbia University’s Knight First Am...

Sarah Gilbert

Dr. Sarah Gilbert is a postdoctoral associate at Cornell University and Research Director of the Citizens and Technology Lab. Her work focuses on understanding and designing healthy online communities, studying topics like what influences participation, how volunteer moderators’ labor impacts commun...