The Demise of CrowdTangle and What It Means for Independent Technology Research
Justin Hendrix / Jun 21, 2024Audio of this conversation is available via your favorite podcast service.
A topic we return to often in this podcast is the dire need for independent technology researchers to have access to platform data. Without it, we cannot understand the extent of the harms and effects of social media on people and on society, and we cannot understand the limits of those harms. This makes it difficult to respond in acute moments such as elections, and to understand issues such as the relationship between tech platforms and social cohesion, or mental health, or any number of the other issues policymakers care about.
In this episode, I speak with two people on the front lines of the fight to secure access to data, including advocating for Meta to do better in light of the impending deprecation of CrowdTangle, a tool used by researchers study Meta's products, including Facebook and Instagram. They are:
- Brandi Guerkink, the executive director of the Coalition for Independent Technology Research, and
- Claire Pershan, EU advocacy lead at the Mozilla Foundation.
What follows is a lightly edited transcript of the discussion.
Justin Hendrix:
Let's start with what's been happening with the European Commission, which has announced that it has opened an investigation into Meta around various concerns it has around disinformation. Also, around decisions that's taken to limit access for researchers to get platform data, the demise of its CrowdTangle service. Claire, can you give us a general sense of what's going on with the Commission? What do we know about this investigation?
Claire Pershan:
These formal proceedings are looking into multiple possible infringements of the Digital Services Act, but the one I think that's most relevant to CrowdTangle and to public data access, is the question of whether or not Meta was right to decide to deprecate CrowdTangle ahead of many elections in the EU. And the way that's been stated in their press release is that the commission will be looking into the non-availability of an effective third party real-time civic discourse and election monitoring tool. So that is a mouthful. But the idea is basically that Meta has an obligation to address its systemic risks, including the risk to civic discourse in election integrity at a time of many elections. And that the decision to deprecate CrowdTangle at this time would suggest that perhaps it has not fully mitigated this risk.
Justin Hendrix:
I suppose we're just getting started with these investigations under the Digital Services Act, so no one's exactly sure what to expect in terms of what the timeline will be or exactly what types of documentation will be made available to the public as we go along.
Claire Pershan:
Yeah, absolutely. So there are now formal proceedings into four companies under the Digital Services Act, Meta, TikTok, AliExpress, and also X and Twitter. I believe those are the four. But that's just in a sense the tip of the iceberg in terms of the work that the commission is doing. They've also been requesting information formally from the designated companies. And they've also been creating secondary elements to the regulation as well. So there's a lot else that's happening, but definitely I think a lot of this is just getting started. And interestingly, CrowdTangle and the DSA have a long history, which I also can definitely talk about here.
Justin Hendrix:
I think one of the things that I'm wondering about too is the extent to which this investigation and the Commission statements about CrowdTangle, fit into the broader conversation about how the DSA will be implemented in particular, granted Article 40 and this provision around requiring platforms to provide access to researchers for platform data. What do we know about that process? How are we coming along on getting to a point where there are set rules?
Brandi Geurkink:
I think that we are still in the early stages of understanding how platforms are complying with their obligations under Article 40.12. So one of the things that we are doing at the Coalition for Independent Technology Research is a DSA data access audit. So I think it's important that we acknowledge that it's not only up to the commission and through their formal proceedings and investigations to understand and to let the public know how companies are complying with their obligations under the DSA. There's also a lot that we can be doing as the research community itself, to really be organizing to be sharing information with one another about the state of our access to information. Especially in Europe that is now codified in law through the Article 40.12 provisions. And so that's one of the things that we're aiming to do, is really to collect information from researchers about the state of their access to data so that we can let companies know, let policymakers know and let the public know ultimately what the state is.
And at the moment, it is still early days. There are still a lot of researchers who have submitted applications to these programs and haven't heard back yet about the state of their access. There's an open question about what does that say in itself and what should some of these timelines be. But all of these are really new questions that we are grappling with, and so I think that we're still in the early stages of understanding this.
Justin Hendrix:
Despite all the sort of almost, I don't want to call it enthusiasm, but the hope that we hear about the DSA and the likelihood that it'll allow independent researchers to really get a solid grasp of what's going on inside many of these platforms, especially the major ones.... I'm worried that we're still looking at years of lawyering away in terms of process and protocol, and it's unclear exactly what the balance of power will be between the researchers and the platforms. Even when that process is complete we could be what, five years away from the first published papers that we can directly point to and say, yeah, the DSA led us to that conclusion?
Claire Pershan:
I would say don't lose hope. Also, don't hold your breath, but don't lose hope. I think that what the DSA sets out in terms of data access is this two-tiered regime, which I personally think is very smart. So there is going to be a tier of data which includes potentially personal data that probably we'll only be seeing the papers from that in years from now. And that kind of research will be incredibly important and impactful when it comes. I don't know exactly when it will appear. This relates to the delegated act and infrastructure being built up, but the other tier is related to public data. And it very much is about transparency and this is the tier that Brandi was just speaking about this 40.12. And I think we can be much closer to achieving this kind of transparency than it might seem. So that right now the new programs that have appeared are, as Brandi was saying, not all there.
And that's why I think there is so much of a strong reaction to Meta's announcement to shut their CrowdTangle because CrowdTangle was the inspiration for these other programs. And so if we're losing CrowdTangle, what does that mean for the trajectory of all of these other programs? Is this now a race to the bottom? But the way that the community has responded to the shuttering of CrowdTangle is what gives me a lot of hope. And the fact that this provision is legally enforced, we may get further guidance on it. We may also just learn a lot through the process of advocating to protect CrowdTangle. So I'm hopeful that this can actually serve as an example and actually a really much larger discussion of what it is that we do want and need related to public data, at least.
Justin Hendrix:
Let's talk about the community. You played a key role in kind of drawing them together, as well as Brandi. What was this latest letter about? What were your demands?
Claire Pershan:
So a few weeks ago, Mozilla responded to Meta's publication of public displays that were for monitoring the European parliamentary elections, which took place this June 6th through 9th. And we reacted because as much as we really were real to see something like election dashboards and displays, these are incredibly useful. These are something that are now in European election guidelines. This was not what the CrowdTangle community needed. CrowdTangle is still being shuttered, and so we were very concerned that this was window dressing or even a public relations move to try to make it seem as though the company was sharing data publicly with the community in a way that they're in fact not. And so our reaction was, I think fairly strong because we also feel that CrowdTangle is emblematic of the larger discussion around public data access. And we do not want to settle for something that we believe was fundamentally public relations and not actually moving us forward.
Brandi Geurkink:
What I want to add is that the CrowdTangle community is not necessarily the same community as what is being promised by the Meta Content Library. And not what is obligated under Article 40.12 of the Digital Services Act. I think that's a really important distinction about there's the spirit of the law and then there's the letter of the law. So as we're thinking about the months and years that it might take and all of the lawyers to go through compliance requirements. And then there's also the spirit of what this law is about, which is about providing the public with access to trustworthy information about technology's impacts on society. Some of the most critical voices who are doing that work are independent journalists, are data reporters at both nonprofit and for-profit newsrooms. They are independent researchers who are not necessarily affiliated with a particular institution. And a lot of those people currently have access to CrowdTangle. And some of the most critical investigations, pieces of research and reporting that have been done are done using CrowdTangle.
And Meta's deciding to take that away. Meta's deciding to replace that with the Meta Content Library. But the fact is that whether journalists are going to be allowed to access Meta Content Library. Whether researchers who aren't affiliated with a particular institution that is considered eligible are going to be able to access that, those questions still have not been answered. So it's not that the CrowdTangle community is necessarily being served by this other tool that Meta is making a lot of promises about and is rolling out, because those pieces are still really critical and they're still unanswered for.
Justin Hendrix:
What functionally is the difference between CrowdTangle and this content library that's being maintained outside of it?
Claire Pershan:
The fundamental differences between CrowdTangle and Meta's new content library are related to functionality and that the way that data can be probed and used. But even more critically, it relates to the scope and scale of who has access to it. So CrowdTangle at certain times was used by as many as tens of thousands of researchers and groups globally, whereas the Meta Content Library we don't know exactly how many groups have access, but it would be somewhere in the hundreds, an optimistic estimate. So the amount of eyes that are able to take part in this critical type of monitoring is just so limited, that the fundamental purpose of at least things like rapid response monitoring or election observation is largely lost.
Brandi Geurkink:
I do think that as the public interest research community, we really run a risk of defining what good looks like with regard to transparency and access to public data, by focusing on software solutions that are offered by specific platforms. I do think specificity is really important. I do think that if everybody who has access to CrowdTangle currently were to get access to Meta's Content Library. And the content library were to be improved along the lines of some of the things that Claire has mentioned and what other advocates have been calling for, that would be a good outcome. I do also think that just focusing on CrowdTangle versus content library is missing the bigger picture. We need to be having the conversation about what the public interest research community needs on our own terms, that aren't just defined by the parameters that are set by specific software solutions that various companies have. And I hope that's where we get to because I think that we really run the risk if we're not doing that. We're only ever going to be responding, we're not going to be making the proactive case.
Justin Hendrix:
This reminds me of that phrase that Mike Wagner used describing the US 2020 elections research project, "independence by permission." Sounds like what you're saying is you want to get past the permission part of this.
Brandi Geurkink:
Yeah. And I think that there's a lot of different ways that public interest researchers are conducting research. CrowdTangle and Meta's Content Library and other APIs that have been rolled out by other platforms largely in response to the DSA and Article 40, that is one part of the tooling that researchers rely on. There is a whole host of other ways that researchers do this work. And so it's also important to think about that in the bigger picture. And think about what kinds of protections do we need from researchers who do this work in a more non-permissioned way and these sorts of things. I do think that it's important that we make sure that what Meta has invested in and what they have promised is as good as it can be. And frankly it's irresponsible to shutter CrowdTangle when the Meta Content Library has not lived up to the promises that the company itself has made. And so we'll see. Look, it's not August 14 yet. Let's see where we're at at that point.
But if I had to guess, based on how it's looking now, based on the researchers who have not yet received access to the tool, I think that Meta should think really seriously about whether it's responsible at this point in time to shutter that tool when Meta's Content Library is still not up to par.
Justin Hendrix:
Of course, this will have implications across elections. The US election for instance, I'm unaware of Meta having any specific project in place to provide access to platform data related to the 2024 election as it did in 2020 in the US. So lots of questions that will likely go unanswered if in fact we're in this limbo. Let me ask you a question about your observations of other platforms to the extent that you're paying attention to them, is there anyone that you would call out as being ahead of the pack as appearing to maybe internalizing some of the types of criticisms that you're levying here and possibly doing a better job? Somebody, I don't want to say using best practices, but a platform you feel is moving in the right direction?
Claire Pershan:
I would love to talk about other platforms, but unfortunately it won't be to name necessarily a best example.
Brandi Geurkink:
I know. I was going to say let the audience hear the silence on the question. Honestly, I do think that this goes back to what I was saying in this earlier comment, but there's real danger in defining what good looks based on what the pack is doing. That's not where we should be. Where we should be is defining what good looks like independently of the pack. I do think that, again, going back, if Meta Content Library is what is promised, then that's great. The CrowdTangle was considered gold standard for transparency. But I think what we need to be asking ourselves is, do we feel that way based on our imaginations of what good access to public data looks like? What is robust, what is necessary for the kinds of research that we want to be seeing, or do we want to just define that by the tools that are put in front of us? So I think that's really the question that we need to be asking ourselves here.
Claire Pershan:
I would add that part of the reason that CrowdTangle was considered gold standard was it was actually bringing together public data from other platforms as well. There was a time, it now feels very long ago, a simpler time when the CrowdTangle had also integrated data from X's public API and also from Reddit's API. That of course is no longer the case for CrowdTangle. I think it's really important to think about the role that Meta platforms play in our information space. So the DSA designates platforms with over 45 million active monthly users in the EU, but Meta has 250 million active monthly users. So this is information infrastructure, and that's why it was so powerful when it was integrated also with other key platforms. And researchers could actually see and monitor narratives and trends across all of those platforms. So that is not something that we have today and that's why the fact that it's now moving us to a race to the bottom is so impactful.
Justin Hendrix:
I want to ask you also a little bit about a conversation that I'm observing in the press play out a little bit lately, which is around the idea that companies should be required to disclose their own internal research. And we're seeing more and more pressure on companies, I suppose, after the Frances Haugen leaks. Of course, we've learned that Meta was conducting a good amount of science and including identifying certain harms that were happening across its platforms. I'm talking to you on the day that US Surgeon General has an op-ed in the New York Times, where he's saying a bunch of things that we could agree or disagree on with regard to whether labels should be put on social media platforms to tell children or teens that they may cause harm.
But he has this one sort of line where he says, "Companies must be required to share all of their data on health effects with independent scientists in the public. Currently, they do not. And allow independent safety audits. While the platforms claim they're making their products safer, Americans need more than words. We need proof." This seems he's almost trying to make a rhetorical shift away from let's define the harms to, let's start from the different place, which is we don't know that there aren't harms which is an interesting thing to me starting the other way around. Almost the way you'd start with a pharmaceutical or some other type of product, that you wouldn't release into the wild before you had ruled out that it might cause harm. What do you make of this idea that the companies themselves should have to share the data of their own experiments?
Brandi Geurkink:
Yeah. There's also this new law in Minnesota which will require companies to release the experimental data from AB tests that they conduct, I think with more on tests that are more than 1000 people, for instance. And I think that could be very critical for increasing public understanding. And many advocates in the community have called for the results of AB tests to become public information. I think it's also important to consider the shift in what this will mean for the industry and how they do this kind of testing in the first place. So we've also seen research recently from colleagues at a cat lab about this Upworthy Archive project, which actually looked at the kind of quality of validation of tests. And found that little attention and emphasis is often placed on validation of test results. And so I think that we're going to have to see how the industry is responding to this law, and really understanding the role that validation plays in that as well, because this is a very fast moving space.
And I think that immediately saying we should draw this comparison to the pharmaceutical industry, is maybe not going to produce the best outcome for science and for public understanding either. Because tech companies aren't used to performing in the way that pharmaceutical industries are, because they have been regulated at very different speeds and had very different amount of time to adjust to this. And I do think that it's the direction that we want to be moving in and also we should think about. And take with a grain of salt like the test results that would result from the immediate application of that law. And really be thinking carefully about the importance in validation of those results as well.
Justin Hendrix:
What have I missed?
Brandi Geurkink:
I think from my perspective, I just want to reinforce the point that independent researchers around the globe, including those who are working on election integrity and public health initiatives, identifying and mitigating gender-based online violence, are extremely concerned about their ability to continue their work when they lose access to CrowdTangle on the 14th of August. We just need to continue reinforcing that point that the research community is not in a good state right now with regard to access to public data. And that what we are hoping for is that the promises that have been made both by companies like Meta about the Meta Content Library, the giant promise that is the DSA and its ability to enable transparency and accountability. And improve the public's understanding of what is happening on social media platforms, and the impact that might have on our lives, on our health, on the structure of our communities, that these are promises at this point and we do not know how they're going to play out.
And so it's really important that as we are waiting to see how some of these promises play out, that we do everything that we can to pressure companies to keep what is there and what is providing access to the community right now like CrowdTangle.
Justin Hendrix:
Claire, you're sitting in Belgium at the moment and you are closer to the kind of political machinations that go on around the DSA. Do you get the sense that Meta, do you get the sense that there's a lot of urgency in their ranks to answer the sorts of concerns that you and Brandi have raised here today, or does it seem like business as usual from them?
Claire Pershan:
I would say that it depends on the certain provisions that we're thinking about. And that generally we're seeing a lot of very positive signs of companies being concerned and making changes. And we're also seeing a lot of slowness and a gap between what's really needed by researchers and by consumers actually. And then what companies have yet to provide, so both. I would say though that some of the, at least what I perceive to be the most promising developments have related to the DSA risk mitigation measures, in the sense that we've seen companies both TikTok now with TikTok Lite and Meta now with CrowdTangle respond to the commission's inquiries or with changes. So in the case of TikTok Lite, TikTok actually decided to voluntarily roll back this feature which the commission had questioned whether it actually was bringing possibly any unsafe elements to children because it was nudging people to be using that app.
And in the case of CrowdTangle, we have seen these election displays. And even though the displays are not what we asked for, they were something. And so I think the space between companies needing to mitigate a risk after the commission brings that to their attention, there is something that we can work with. However, we just had elections here in the EU. We will have a new commission that will be selected. There will be new priorities, there'll be new budgeting, and it is not a guarantee that the regulator is able to continue doing this kind of work, which is very intensive. I think we have a very savvy regulator right now, and that is not something that we can guarantee going forward. I hope so.
Justin Hendrix:
Brandi, Claire, perhaps we need to reengage on this conversation in August, when we'll have a little more clarity on what's happened with regard to the demise of CrowdTangle and what perhaps has been offered in its place. And then also perhaps a little more clarity into what the makeup of the Parliament will be and how that will affect the operation of the commission, and we'll see how things go from there. So I appreciate the two of you taking the time to talk me through what's happening at the moment, and for all of the work that you do to try to enable independent research into technology companies.
Brandi Geurkink:
Thank you so much, Justin.
Claire Pershan:
Thanks for having us.