What is Media Diversity and Do Recommender Systems Have It?
Priyanjana Bengani, Jonathan Stray, Luke Thorburn / Oct 25, 2023Priyanjana Bengani is a Senior Research Fellow at the Tow Center for Digital Journalism at Columbia University. Jonathan Stray is a Senior Scientist at The Center for Human-Compatible Artificial Intelligence (CHAI), Berkeley. Luke Thorburn is a doctoral researcher in safe and trusted AI at King’s College London.
One of the classic criticisms of recommender systems arises when they show you a narrow range of content, be it popular items or items similar to whatever you’ve already clicked on. This can be a failing of more elementary approaches to recommendation systems, but production recommender systems typically include diversification algorithms for myriad reasons. We’ve previously discussed the mechanics of filter bubbles, echo chambers, and other types of information-limiting environments. Here we examine the many meanings of “diversity” in media, why society in general and platforms in particular might want it, whether existing platforms have it (and what that even means), and how recommender systems can help achieve it.
There are many different definitions of diversity used by communications scholars, as well as a variety of metrics and diversification algorithms used in recommenders that cover much of the same ground from different perspectives. We might want each user to see items from a variety of sources, across diverse viewpoints, on multiple topics, or including multiple media formats. We could also think of diversity in terms of novelty and serendipity, thereby ensuring users don’t see posts covering the same topics every time they log in, which can lead to boredom and engagement dropping off in the longer term. These conceptions of diversity are all from the perspective of the consumer – that is, which items each user sees. We can also look at diversity from the view of the producer – that is, what users each producer reaches. Measures such as coverage or popularity inequality are used to ensure that the system displays content from more than a few “superstars,” such that it is viable for a wider range of content producers.
Why society wants diversity in media
Media diversity has been a key concept in social science and media policy since at least the mid 20th century, with no single agreed-upon definition but many different ideas. A recent systematic review notes that “research on this topic has been held back by the lack of conceptual clarity about media diversity and by a slow adoption of methods to measure and analyze it.”
There are a myriad reasons to care about media diversity: informed citizens, inclusive public discourse, or preventing large companies from dominating the media landscape. Different conceptions of diversity have focused on item content (e.g. what each article contains), the structure of the media ecosystem (e.g. the number of different publishers and who owns them or the demographics of journalists), or individual exposure (what each person sees, as determined both by self selection and recommender systems). Recent works give several different conceptions and reasons for diversity in media, including:
- “[the] heterogeneity of media content in terms of one or more specified characteristics.” (Bernstein et al, 2019)
- “Diversity refers to the idea that in a democratic society informed citizens collect information about the world from a diverse mix of sources with different viewpoints so that they can make balanced and well-considered decisions.” (Helberger et al, 2016)
- “support and seek to give voice to a plurality of competing views – from those with different backgrounds, histories and stories. Help build a more inclusive, less fragmented society” (European Broadcasting Union)
- “news consumers emphasize that they find it important to be able to hear and read about topics they haven't thought of, viewpoints they don't quite understand, and perspectives that are unknown.” (Harambam, 2018)
Ultimately, from a societal standpoint, the type of media diversity you care about often depends on how you think democracy should work.
Why platforms want media diversity
Platforms, on the other hand, have other motivations for incorporating media diversity. When users consume a wider array of items, it can improve long-term metrics including user conversion and retention. Platforms also need to ensure that the system doesn’t simply recommend the most popular items or it won’t be attractive to the smaller or newer content creators. TikTok’s algorithm, for example, prioritizes content over the number of followers an account has, thereby allowing even small creators to gain an audience.
Most real-world recommenders run diversification algorithms prior to presenting users with a list of items. The rationale for diversification given by the different platform operators differs based on priorities and values. Examples from large platforms include:
- avoiding monotony — “don’t see multiple posts from the same person or the same seed account” (Instagram)
- encouraging novelty — “help users discover new content or inculcate new tastes” (Spotify)
- preventing rabbit holes — “avoid recommending a series of similar content – such as around extreme dieting or fitness, sadness, or breakups – to protect against viewing too much of a content category that may be fine as a single video but problematic if viewed in clusters” (TikTok)
- accounting for evolving or unexpressed preferences — “intersperse recommendations that might fall outside people's expressed preferences, offering an opportunity to discover new categories of content” (TikTok)
- minimizing popularity bias — “spread consumption across artists and facilitate consumption of less popular content” (Spotify)
These different goals mean that there is no one ideal way to incorporate diversity in recommender systems. Instead, approaches may differ widely depending on a platform’s priorities and use cases.
Are platforms diverse today?
For those outside of a platform, measuring media diversity can be challenging because good data on who sees what is often inaccessible, and the core algorithms and policies are ever-changing. Nonetheless, there have been a variety of audits, surveys, collaborations with the platform companies along with corporate research which provide a window into what’s actually happening, and can help confirm or dispel widely-believed myths based on anecdotal evidence. In this section, we’ll try to summarize what is publicly known.
- Many have voiced concerns about algorithmic systems creating echo chambers and filter bubbles of like-minded content that could potentially result in more polarization across the board. Yet survey research shows that there is more incidental exposureto different sources for those who consume news online, while those who only read news offline tend to engage in a more self-selected media diet associated with their own partisan leanings (selective exposure).
- Two different studies into Google News (Germany) and Google Search (US) show the concentration of news sources. In the case of Google News, over a six-day period, researchers found that 86% of all articles were from just eleven outlets, with Focus Online and Die Welt making up 37% of these recommendations. These eleven outlets weren’t necessarily the ones that had the most reach according to data compiled by the German Audit Bureau of Circulation (IVW). Meanwhile, on Google Search, an audit of location-specific news results in 3,000 US counties found three outlets making up just over 15% of all results, with the platform favoring national outlets over local news outlets, indicating the presence of superstar economics.
- Similarly, an audit of Apple News over a two-month period found that over 45% of stories shown in the platform’s algorithmic-driven “trending stories” section came from just three sources (CNN, Fox News, People). Comparatively, 24% of the human-driven “top stories” section came from three sources (Washington Post, CNN, NBC News).
- For Facebook, one axis of diversity is cross-cutting content, i.e. ensuring that users are exposed to content from across the partisan aisle. In 2015, the company conducted a study on exposure to ideologically diverse news and opinion with 10.1 million US users and 7 million links over a six-month period and found “~45% of the hard content that liberals would be exposed to would be cross-cutting, compared with 40% for conservatives.” More recently, a collaborative study conducted by academics with Facebook focusing on the 2020 US election, found “[conservative] audiences are more homogeneously conservative and, therefore, more isolated.” This is a result of both individual preferences and platform decisions.
- Spotify has run multiple experiments trying to ascertain the impact of recommendations on consumption diversity for podcasts as well as music. In the case of music, the company found that personalized recommendations (i.e. recommendations based on past listening behavior) led to 25% more streams and 10% fewer skips for users who listened to less diverse music (“specialists”). Song streams even went up by 10% for users with more diverse tastes. In the case of podcasts, Spotify found that personalization led to users consuming more homogenous podcasts (decreasing the individual-level diversity of podcast streams by 11.5%) even though the aggregate diversity of podcasts streamed increased by almost 6%.
The different conceptualizations of diversity means that there isn’t a single optimal measure or a consensus as to what the baseline should be. If you are unhappy with some of the numbers reported above, what do you think the numbers should be? As Nechustai & Lewis put it in their study of news recommender diversity, “in most cases, public-facing algorithms … do not function catastrophically or perfectly, but somewhere in between. Precisely where they fall on such a spectrum remains open for debate.”
Algorithmic approaches to media diversity
Most recommender systems today include some mechanism for diversifying the content shown to users as well as the users or creators whose content is shown. These systems account for multiple conflicting objectives (including trade-offs between diversity and engagement, fairness, accuracy, and other objectives), of which diversity is simply one goal — and probably not the primary one.
Different recommender systems incorporate diversification algorithms at different stages of the pipeline, typically in a re-ranking step that comes after initial item-ranking. This re-ranking approach tries to re-order the list of selected items to prevent homogeneity, such as having multiple posts in a row from the same person or on the same topics. In this design, if the candidate generation step did not produce a sufficiently diverse set, the subsequent diversification step won’t help.
Diversity for consumers
Almost all of the many available diversification algorithms ultimately depend on the existence of “a consistent notion of similarity that can compare any pair of items.” This can also be referred to as “similarity metric,” “dissimilarity metric,” or “distance metric.” There are many different item attributes that such a metric might consider, as in the table below.
[wpdatatable id=14]
While a similarity function compares only two items, the diversity of an entire list of items is called intra-list diversity. Intra-list diversity is sometimes defined as the average pairwise dissimilarity between the items recommended to a user on a single “slate,” the set of items displayed to a user in a single session, though there are many other definitions.
For users consuming content, diversity is important both within a single slate (as measured by intra-list diversity), and across multiple slates.
- Per slate:Diversity in a single session, to maintain short term engagement, prevent boredom, and avoid redundancy. A user is served a diverse slate of recommendations in a single go that takes into account what they are likely to engage with as well as what they might find novel or serendipitous. (Example: ClusterDiv, Topic Diversification, Determinantal Point Process)
- Across slates (longer term):Diversity across sessions, to maintain longer-term retention/engagement. A secondary goal is to avoid rabbit holes. As a user's interests evolve, a user's recommendations over time must be diverse; this can also help prevent the user from going down rabbit holes. (Example: Temporal Switching or Temporal User-based Switching, SlateQ)
Homogenous recommendations in a single session can cause boredom, while recommendations over multiple sessions that don’t account for changes in user interests or preferences can reduce longer-term engagement. Repetitive recommendations can lead to rabbit holes. Moreover, monotonous recommendations are not necessarily useful: “real value is found in the ability to suggest objects users would not readily discover for themselves.” Diversity generally minimizes overfitting to a user’s preferences and habits, which can be a problem when systems optimize only for accuracy.
Additionally, different users may want different amounts of diversity. For example, some users may be satisfied listening to a small set of popular artists, while others may value musical exploration. To estimate the amount of diversity that a user finds most satisfying, recommenders can use measures like the Shannon’s entropy or root mean square diversity error to quantify the difference between a user’s preferred level of diversity and the diversity of the recommendations. The intuition is, if a user has explored items with different characteristics in the past, they are more likely to be amenable to diverse recommendations. On the other hand, if a user has stayed within the confines of specific interests, they’ll be less satisfied with diverse recommendations.
Diversity for producers
Diversity can also be evaluated in terms of the audience for each item, rather than the items for each user. This is known as producer diversity, and it can have a large effect on the economics of content production. Multiple studies have shown that some recommender systems — especially those using collaborative filtering — can expose individuals to a wide range of items (high consumer diversity), while selecting those items only from a small set of sources (low producer diversity). This results in the system being profitable for only a small subset of providers, also known as the “superstar economics” effect. In such an environment, new entrants fail to find an audience and smaller players fail to grow their consumer base beyond their typical traditional audience.
- Single item: Ensure that the same item is shown to a diverse set of users irrespective of their demographics, interests, and ideological beliefs.
- Item-group: Ensure that all items within a group (e.g. movies) get shown to at least some users. Ensure that recommendations are dissimilar even within an item group; for example, show different genres of movies if the system selects movies to serve a user.
Single-item diversity ensures that diverse user segments — be it based on demographics, interests, ideological beliefs, or some other metric — see each item regardless of popularity. Item-group diversity, on the other hand, refers to the idea that a diversity of items within a group are recommended. For example, on a music streaming platform, users don’t just see different genres of music, but also see a diverse set of artists within each genre. This ensures that small or new producers in each genre have a chance at success.
Producer diversity can be measured in a variety of ways. Coverage is the percentage of items in the catalog recommended at least once. High coverage ensures that recommendations are extracted from the long tail as opposed to favoring popular items. Alternatively, the Gini coefficient, widely used in economics to measure the inequality of income distributions, can be used to measure inequality of item popularity instead. Spotify has used the Gini coefficient to determine the impact of personalization on podcast consumption, finding that it reduced market concentration and distributed a larger fraction of streams to less popular podcasts.
Where next
There are times when the different interpretations and implementations of diversity can be at odds with one another, and there are times they might complement one another. Because the term is so loaded, it’s important to hone in on specifics when discussing how to incorporate and evaluate diversity in recommenders.
Societal objectives might be at odds with corporate objectives. Even within the same platform, diversity might be treated differently for different contexts — be it the main content feed, suggested friends, or recommended groups — with no agreement as to what is the optimal baseline, how much personalization is too much personalization, and the extent to which these things should stay constant or change over time. Designers of recommender systems must determine what types of diversity the relevant stakeholders care about, and measure it over time.
Stakeholder desires and chosen metrics can act as guiding principles, but key questions around what optimal diversity looks like in any system remain subject to discussion. As Nechustai & Lewis ask, “What standards should be used to assess [recommender] performance? Which visions of the public and of public life should such algorithms embody and encourage?”