Ofcom’s New Consultation Documents Demonstrate the Limits of Social Media Regulation

Tim Bernard / May 21, 2024

OFCOM offices at Riverside House, Bankside, London. Jim Linwood/Wikimedia CC by 2.0

On May 8, Ofcom, the regulator responsible for enforcing the UK’s Online Safety Act (OSA)—and in many ways clarifying what the Act will mean in practice—released its second major tranche of consultation materials, this time on protections for children under the OSA. Weighing in at 1,378 pages (excluding the Welsh translations), the documents are a trove of research, analysis, and serious thought about harms to children via social media and other online services and how they might be mitigated.

One particularly impressive section is Volume 5, which details each of the 47 non-governance measures in the proposed code, including an evaluation of each one for tradeoffs with rights for both adults and children, such as freedom of expression and privacy, as well as associated costs. Another useful chapter proposes risk profiles, laying out specific risk areas associated with different features of online services. Ofcom’s team also “show their work” at every stage, explaining their frameworks, evidence, and reasoning.

Evidence on Harms

Volume 3 of the consultation package is dedicated to describing categories of problematic content, explaining how it manifests online, and what its impacts may be, including the risk factors that may make the likelihood of harm more acute. With regard to the impacts of this content, it is notoriously difficult to obtain reliable evidence that encounters with online content are causing harm, due to elements including: difficulties in obtaining control groups of “offline” children, ethical concerns with creating treatment groups, and identifying and controlling for confounding factors for harms that are in reality complex social phenomena.

While Ofcom can be credited for centering the experiences of young people, some of the most regularly-cited evidence is from studies that used surveys or interviews to ask children (as well as parents and practitioners), sometimes in very small numbers, for their subjective perceptions of the harm that they have experienced online. One of the studies cautions that “[t]he findings should be taken as illustrative, rather than generalisable to the wider population of children and young people.” Another notes that “[c]hildren appeared to be unaffected by the content and did not recognise the content as hazardous... . However, parents/carers believed their children may have been harmed by the content due to their belief that it was age-inappropriate.” Much of the other evidence is correlational only. (This is at the heart of much of the controversy surrounding the recent popular book by Jonathan Haidt, The Anxious Generation.)

While most of the content discussed is defined as “harmful” in the OSA itself, either as “primary priority content” or “priority content,” the Act also defines “non-designated content,” (NDC) i.e., content “of a kind which presents a material risk of significant harm to an appreciable number of children in the United Kingdom.” Ofcom’s analysis here is revealing. Two potential types of NDC are discussed: “body image content” and “depressive content.” In both these cases, the guidance concluded that more evidence was required for the “material risk” requirement to be met (at least for material that did not already qualify for the OSA-designated categories of “eating disorder content” or “suicide and self-harm content”). It would have been beyond Ofcom’s remit to evaluate each of the designated categories of content for material risk, but if they had done so, it is far from clear what they would have concluded.

Reduction in Content Prevalence

Many of the elements of the mitigation code sound helpful and uncontroversial. On the procedural side, though one may quibble with some of the mandated strictures associated with risk assessments, it is hard to argue that those who run online services ought not regularly think about risk to children in a structured way, and Ofcom has prescribed a framework for this. As just one example from the design side, certain services that employ recommendation systems are required to implement a mechanism that allows children “to provide negative feedback on content that is recommended to them,” (RS3) which again seems like a very sensible feature.

Ofcom cites a range of sources, including academic research from inside and outside of platforms, to support the effectiveness of the code’s measures. The code also stipulates that services should internally monitor and report the performance of measures to mitigate harms to children (GA4) and use this data as an input for future risk assessments. However, Ofcom does not appear to require specific metrics corresponding to the code’s measures to be submitted or reported publicly. A useful point of comparison is the Meta Oversight Board. In 2021, they established an implementation committee and associated team that not only evaluates how completely Meta implements each of the Board’s recommendations, but also measures how well the recommendation’s goals are achieved. This is no small task. In a paper outlining their work and suggesting insights for regulators, members of the Oversight Board Data and Implementation team write:

Determining the correct data to gauge the size and impact of a policy or operational change is challenging. The scale at which social media platforms operate, and the overhead involved in tracking and validating data for consumption by regulators, makes it very difficult for an external body to assess the causal impact of new or amended policies, enforcement systems, and transparency initiatives.

The authors’ team deal only with Facebook and Instagram (and as of very recently, Threads), and the team’s lead previously worked for Meta’s CrowdTangle helping researchers obtain the Facebook data they needed. For Ofcom to figure out what data they need from all covered services, and to then obtain and analyze it, would be virtually impossible, and so it is unclear how we will know which of the code’s measures are having their intended impact on which services.

Procedure and Design

As alluded to earlier, the code contains both procedural and design-based components. Procedural elements can be independently justified: good training for content moderators (CM6) or Terms of Service that are accessible for all users (TS2) may be considered appropriate, regardless of the degree of impact. In fact, we might question whether this latter measure will have significant impact: there are a number of topics that Terms of Service are required to cover, including those that are mandated in TS1 and TS3, and a 13-15-year-old participant in an Ofcom study is quoted in the materials as saying: “The rules they show you are paragraphs long and no one reads all that.” (As a point of comparison, Google’s Terms of Service evidence considerable effort to make their language accessible, yet I suspect that even few Tech Policy Press readers have ever examined them closely.)

However, design decisions get more complicated to justify. Ofcom understood their responsibility to consider “proportionality” to mean only prescribing measures (especially for smaller services, when they have “evidence that the measures proposed will make a material difference” (and Ofcom regularly describes itself as “an evidence-based regulator”). This evidence can be hard to come by, which must have greatly constrained Ofcom’s options. Indeed, a key advocate of age-appropriate design legislation, Baroness Beeban Kidron, has already critiqued the proposed code on these grounds (emphasis added):

“The code is weak on design features ... [T]he requirement for measures to have an existing evidence base fails to incentivize new approaches to safety... . How can you provide evidence that something does not work if you don’t try it?”

Perhaps Kidron has, as the saying goes, said the quiet part out loud. It is very difficult to obtain convincing evidence that specific interventions are feasible and effective when implemented on every service within scope without actually trying them in each of these contexts.

Legislating Safety by Design

Kidron is correct that experimentation is critical, especially in order to tackle the broad and variable range of possible online harm. But it is very difficult to require companies to do this. Ofcom has included some design measures that have a good amount of evidence behind them, and are likely to have a positive impact. Other evidence-based lists of good design features can be found in the Prosocial Design Network library—categorized by evidence level—and the Neely Center Design Code for Social Media. The lead author of the latter has analogized these codes to building safety codes. But platforms don’t have doorways with specific widths or materials with predictable tensile strengths.

Online services are not only unique with respect to their affordances and their user bases, but they are also in constant flux, with product tweaks and attention trends changing all the time, not to mention bad actors attempting to evade every new safety feature. What works on one platform may be ineffective, or even counterproductive on another, and many interventions will require initial calibration and then further adaptation as unintended consequences emerge or conditions change. The following examples can illustrate this:

One measure that is included in the guidance (RS2) is that certain platforms should “significantly reduce the prominence of content that is likely to be P[riority ]C[ontent]” in the recommended feeds of children. This is clarified to mean that content that has not yet been assessed and removed from the platform (or from visibility to children in the UK), but has either been flagged for review by classifiers, users, or trusted flaggers under a Priority Content category, or is otherwise assessed as having a likelihood of being Priority Content based on other relevant available information. Due to the nature of different platforms’ recommendation algorithms, no specific metric for downranking can be given, though the detailed description explains that this should be done in “a way that overrides any existing engagement patterns (likes, watch time, reshares etc.)”. Even so, factors like the degree of downranking, platform- or user-specific features of the feed, and the initial prevalence of Priority Content could make all the difference between this measure having a great degree of impact and virtually no impact.

If Ofcom were to require detailed reporting to establish whether or not such a measure was being implemented effectively, or if this data was revealed as part of risk assessments, they would still be hampered by the lack of standardized, industry-wide metrics to judge the platform against. There are metrics that can reasonably be applied across platforms, such as the Integrity Institute’s Misinformation Amplification Factor, though, due to platform differences, even this is better understood as a heuristic to suggest where platforms may be falling short, rather than a hard indicator that could justify penalties for non-compliance. (Under the DSA, the revelation of red flags like this is a key part of the regime—Mathias Vermeulen has called it “a data-gathering machine”—and the data it obtains is already being used to prompt investigations, though none has yet concluded, and so it remains to be seen if they result in enforcement action.)
To return to the Terms of Service case, while the procedural measure, as worthy as it is, may not have much impact, a design-based solution could be more effective. Services may, for example, be able to integrate bite-sized educational touchpoints into the regular user experience to gradually explain the platforms’ rules and procedures, or gamify the process of learning about the Terms of Service. But this would take judgment, continuous experimentation and data gathering, and would presumably look wildly different on different platforms, or even for different user populations within one service. This would be very difficult to lay out in an enforceable code.

Ultimately, Safety by Design (SbD) is rooted in a set of design principles, considerations that should be prioritized by product managers, designers, and others involved in the operation of and that cannot be readily reduced to a code that lays out specific design features and procedures. Internet design codes, including this one, can make services appear to practice SbD, but no more than that. Risk assessments and other procedural measures, such as metric tracking, a code of conduct, and training (GA4-7) can be check-box compliance exercises. Mandated features can be implemented without the application of judgment, experimentation, and the detailed data work required to confirm their continued efficacy in any single context.

As specified in the OSA and the consultation materials, if a service follows the relevant provisions of the code, it will be considered to be complying with its duties under the OSA. This is only to be expected of a fair law. But will this actually lead to a significant reduction of harm? Several causes for doubt have been described:

We have limited evidence that online content is a significant cause of many specific harms.
Application of even evidence-based measures may be ineffective in reducing the prevalence of “harmful content” when blindly applied in new contexts.
The regulator does not have a clear mechanism for ongoing evaluation of the efficacy of these measures in reducing the prevalence of “harmful content”.

Incentivizing Safety by Design

Regardless of these doubts, genuine SbD could still be of great value. Ironically, the type of interview- and survey-based study that Ofcom relied heavily on matches neatly with techniques often recommended for product research, and in commissioning and analyzing these reports, Ofcom has done a service to those actively engaged in SbD. The OSA, however, may not provide sufficient incentives for experimentation and creativity, and may even disincentivize research and experimentation in the following ways:

By imposing a detailed code, the focus of product safety teams may shift from creative principles-based development to generic compliance.
If (potentially better!) alternatives to the code’s measures are to be considered legal substitutes, the company must demonstrate equivalence—a tall order when there is no baseline efficacy for the code’s measures.
The expectation that the results of research will be included in risk assessments may discourage companies from commissioning such work, out of fear that the results may invite bad publicity, regulatory scrutiny, or lawsuits.

It is not altogether clear how governments could encourage research and the adoption of the SbD. However, here are a few closing suggestions for legal and extralegal incentives that governments could consider:

Mandate risk assessments that include reporting on the extent of safety measure testing and implementation, without requiring implementation of particular measures or full details of testing results.
Grant immunity against certain liabilities (cf. some US state legislation) in exchange for allowing independent, vetted researchers access to platforms to conduct live experiments.
Offer diplomatic support (when consistent with state interests) for companies who demonstrate their dedication to applying the approach when they are experiencing tensions with other governments.
Participate in photo ops and meetings that celebrate successful advances in online safety.
Provide SbD training for internet service employees.
Prioritize research into effective safety interventions when awarding grants.

Safety by Design is a worthy approach to online harms from the perspective of platforms. However, the prescription of specific features and procedures that appear to be consistent with SbD is less convincing. The current evidence on the causes of online harms, the variation between online services, and their ever-changing nature, do not support this as a core legislative remedy.

Authors

Tim Bernard

Tim Bernard is a tech policy analyst and writer, specializing in trust & safety and content moderation. He completed an MBA at Cornell Tech and previously led the content moderation team at Seeking Alpha, as well as working in various capacities in the education sector. His prior academic work inclu...