Social media data and metadata is best managed through an architecture that is highly protective of contextual privacy, says Richard Reisman.
As the case for social media “middleware” continues to gain support among scholars, policymakers, and developers – as a way to increase “user control over what information is received” and thus to improve the quality of online discourse while protecting freedom of expression – our understanding of related concerns and how to overcome them has also advanced. A recent article in Tech Policy Press by the Initiative for Digital Public Infrastructure’s Chand Rajendra-Nicolucci and Ethan Zuckerman builds on Cornell scholar Helen Nissenbaum’s argument that privacy is not as simplistic a binary of personal ownership as many presume, but depends on context – to help cut through the concerns that such agents could sacrifice privacy that were cited by Stanford Cyber Policy Center’s Daphne Keller in 2021. “Contextual integrity” is the idea that privacy is a nuanced matter of social norms, in specific contexts, governing just what should be shared with whom, for what uses.
To complement and expand on Rajendra-Nicolluci and Zuckerman’s article, I draw attention to further insights from Keller later that year, and to a solution architecture that I proposed in response. Those further comments and suggestions add a layer of architectural structure to managing those privacy issues in context. The core idea is that social media data and metadata is best managed through an architecture that is highly protective of contextual privacy by breaking the problem down to multiple levels.
Most discussion of middleware considers only a single level of it. An open market for “attention agent” middleware services must offer wide diversity, and so must be open to lightweight service providers. That potentially makes it hard to ensure that privacy can be protected. But the addition of a second, and more tightly controlled data intermediary layer between the platform service and the attention agent service can ensure tighter control of privacy. A body of work on data intermediaries, cooperatives, and fiduciaries supports such a strategy. Tightly controlled data intermediaries can support lightweight and less tightly controlled attention intermediaries by limiting how data is released, or by requiring that the algorithms be sent “to” the data, rather than the other way around. Such data intermediaries can also potentially help limit abuses of personal data by the platforms.
Here is some further background, plus more on how and why two layers of middleware intermediaries may help better address privacy, and potentially help bridge divides between polarized social media users.
Middleware attention agents and contextual privacy
The idea of user-controlled attention agents is not new. A Stanford group led by political scientist Francis Fukuyama brought it to the attention of the tech policy community as “middleware” and explained why it was urgently needed to protect democracy from corporate, and potentially authoritarian, control of social media feeds and recommendations. A 2021 Fukuyama article in Democracy was followed by a set of articles critiquing various issues with the middleware proposal (which I summarized in Tech Policy Press), including the one by Keller later addressed in Rajendra-Nicolucci and Zuckerman’s recent article.
Soon after that, I moderated a session with Fukayama and Keller at a Tech Policy Press mini-symposium that expanded the debate. I asked Keller a question in which I indicated that for attention agent services to be effective, it’s important to consider not only personal content data but also metadata about content and reaction flows. Keller elaborated on why she thought that might “make things more complicated and harder.” She agreed that “a lot of content moderation does depend on metadata,” such as for “spam detection and demotion” and based on “patterns of connections between accounts,” referring to the Actors-Behaviors-Content (ABC) framework of Camille Francois. The concern is such metadata “is often personally identifiable data about users, including users who haven’t signed up for this new middleware provider” and is “a different kind of personally identifiable data…that adds another layer of questions I’m not sure how to solve.”
I was not yet aware of Nissenbaum’s contextual integrity formulation, but wondered aloud whether such data should be considered private. “Because in regular culture, people get reputations for how they behave and we decide who to listen to based on their reputation, not just what they say,” I said at the time. “I think there’s a counter argument that that’s fundamental to how society figures out what’s meaningful and what isn’t.” How to do this effectively in practice, however, is another question.
Two levels of intermediaries
That conversation with Keller spurred me to refocus on ideas for both data and attention intermediaries (“infomediaries”) that I first encountered in work by John Hagel and his co-authors in 1997 and 1999, and to consider more recent work on the related idea of “data cooperatives.” That led to the idea that there should be two levels of intermediaries. My blog post a month later, “Resolving Speech, Biz Model, and Privacy Issues – An Infomediary Infrastructure for Social Media?”, proposed a more sophisticated distribution of functions.
The idea is to separate out two interacting levels of “middleware” services that intermediate as user agents between the platforms and the users:
- Data infomediaries – a few, highly privileged, agents that are tightly secured and accredited agents for managing limited sharing of sensitive user data.
- Attention agent infomediaries – many, and more limited, agents that can make controlled use of that data to serve as filtering and recommendation agents.
This concentrates elements of the infomediary role that have network-wide impact and sensitive data into a small number of reasonably large competitive entities that can apply needed resources and controls to maintain privacy and still offer some diversity. It enables much larger numbers of filtering services that serve diverse user needs to be lean and unburdened.
- Because the data infomediaries would be accredited custodians of sensitive messaging data, as fiduciaries for the users, they could share that data among themselves, providing a collective resource to safely power the filtering services.
- Support for the filtering services might be done in two ways. One is to provide time- and purpose-limited access to the data. Perhaps simpler and more secure, those data infomediaries could serve as secure platforms that enable the attention agent infomediaries to “send the algorithm to the data” and return rankings, without ever divulging the data itself.
This should enable powerful filtering and ranking based on rich data and metadata within and across platforms and user populations. The platforms would no longer control or be gatekeepers for user attention or data. The interface boundaries between platforms, data intermediaries, and attention intermediaries can be well defined. Implementation will not be trivial, but is not unlike many complex systems already working in other domains, such as finance.
Such an approach might evolve to a more general infrastructure that works across multiple social media platforms and user subsets. It can support higher levels of user communities and special interest groups on this same infrastructure, so that the notion of independent platforms can blur into independent groups, communities, using a full suite of interaction modalities, all on a common backbone network infrastructure – the emerging “fediverse” and “pluriverse.”
My update in Tech Policy Press on related discussions at the Stanford HAI conference soon afterwards reported that in the Middleware session (with Fukayama and others) Hebrew University computer scientist Katrina Ligett lent support to this idea of using data infomediaries to secure data and metadata for attention middleware:
[Ligett] reinforced the need for filters to be bolder, to consider not only content items, but the social flow graph of how content moves through the network and draws telling reactions from users. Ligett made a connection to the emerging approach of data cooperatives that would intervene between a platform and the user on usage of personal data, again as the user’s agent, as being another kind of middleware. She also emphasized that some aspects of so-called personal data- such as this social flow graph data- are really pooled and collective, with a value greater than the sum of its parts.
The direct focus of filtering against harm is the personalization and tailoring of what Ligett called “the incoming vector” from the platform to us – but driving those harms are how the platforms learn from the patterns in “the outgoing vector” of content and reactions from us. Unlike putting harmful content on a billboard, the platforms learn how to make it effective by feeding it to those most susceptible, when they are most susceptible. Ligett argued that interventions must benefit from a collective perspective. This is how social mediation can enter the digital realm, providing a digital counterpart to traditional mediation processes.
I expanded on the benefits of this two-level intermediary architecture soon after that in a companion blog post. It hinted at the idea that the emerging fediverse could support a dual-layered framework and could potentially offer robust and varied filtering services (whether working to unbundle current dominant platforms or evolving into a more distributed context). This structure could simultaneously maintain a significant, almost centralized, federated support service that would be shielded from both the platforms (large or small) and the filtering services themselves. Infomediaries could also play a crucial role in addressing the business model challenges associated with middleware and the advertising revenue model, aligning with Hagel’s early recommendations.
But what about filter bubbles and bridging?
At that Stanford HAI conference, Kate Starbird noted that “toxicities on social media are not primarily related to individual pieces of content that can be labeled, but rather to the algorithms that amplify and recommend, creating influence across networks.” The privacy issues relating to social media metadata that Keller and I discussed, plus the above comments by Ligett and Starbird, bring us to perhaps the most commonly raised and fundamental concern about attention agent middleware – from the policy community as well as ordinary users: “Won’t user agency over feeds lead to worsening filter bubbles and echo chambers?”:
As I’ve noted before, skeptics are right that user-selected filtering services might sometimes foster filter bubbles. But these skeptics should also consider the power that multiple services all filtering for user value might achieve, working together in “coopetition.” A diversity of filtering services might collaborate to mine the wisdom of the crowd. User-selected filtering services may not always lead to better quality information for individual users, but collectively, a powerful vector of emergent consensus can bend toward quality. The genius of democracy is its reliance on free speech to converge on truth – when mediated toward consensus by an open ecosystem of supportive and appropriately adversarial institutions. Well-managed and well-regulated technology can augment that mediation, instead of disrupting it.
That is why access to metadata on user activity patterns is essential – and why a multilevel intermediary architecture seems the only way to make that data available to manage discourse in a reasonably open and robust way. There are a variety of proposals for reducing polarization that would use such metadata, such as the concept of bridging systems. I have long seen such methods as highly desirable components of attention agent services. But that brings us back to the question of who has legitimacy to decide what and how to make a “bridge” between users on polarizing topics.
The future is not yet here, nor is it easily distributed
The core motivation for user agency to be delegated to attention agent middleware is to optimize for “freedom of impression” — as a complement to freedom of expression — and also incorporates the benefit of social mediation. That is illustrated with this simple diagram:
The common fear that user agency would run amok is countered by this understanding that thought is a social process that has long been tolerably well moderated in the real world by our social mediation ecosystem. The paradox of open societies is that the price of openness is informal reliance on the kind of social glue that can only be legitimized by openness. The path forward is not to retreat from openness, but to integrate robust online support for the bottom-up generation and nurturing of the social glue that mediates it.
Keller summarized the privacy concerns and other issues that have limited enthusiasm for user-agent middleware in her 2021 Democracy article, but has also articulated in depth why that seems to be the only solution to managing online speech to serve both individuals and society that can thread the needle of First Amendment freedoms (such as elsewhere in 2021, and also in 2022 and again last month). Now, there are promising paths toward resolving those concerns. These ideas are now gaining real-world support in Bluesky, in Rajendra-Nicolucci and Zuckerman’s work on Gobo, and elsewhere. They are also gaining some traction with legislators in both the US and EU, and are perhaps most significantly adopted in pending New York Senate Bill S6686.
This evolutionary path towards a better, more democratic internet will not be quick or easy, and will likely encounter many disruptive twists and turns as we learn more about how we shape technology and how technology shapes us. But that is the nature of the kind of social and political transformation that historic advances in media and communications can cause.
Richard Reisman is a Nonresident Senior Fellow at the Foundation for American Innovation, Contributing Author to the Centre for International Governance Innovation’s Freedom of Thought Project, and frequent contributor to Tech Policy Press. He blogs on human-centered digital services and related tech policy at SmartlyIntertwingled.com, and his work was cited in an FTC Report to Congress on Combating Online Harms. His book, FairPay: Adaptively Win-Win Customer Relationships, and related blog, FairPayZone.com, introduce new customer-value-first revenue strategies for digital services that were described in Harvard Business Review. He has managed and consulted for businesses of all sizes, developed pioneering online services, and holds over 50 media-tech patents licensed by over 200 companies to serve billions of users (now all in public domain).