Home

Donate

Generative AI, Section 230 and Liability: Assessing the Questions

Justin Hendrix / Mar 23, 2023

Audio of this conversation is available via your favorite podcast service.

In this episode of the podcast, we hear three perspectives on generative AI systems and the extent to which their makers may be exposed to potential liability. I spoke to three experts, each with their own views on questions such as whether Section 230 of the Communications Decency Act-- which has provided broad immunity to internet platforms that host third party content-- will apply to systems like ChatGPT. My guests, in order of appearance, are:

  • Jess Miers, Legal Advocacy Counsel at the Chamber of Progress, an industry coalition whose partners include Meta, Apple, Google, Amazon, and others;
  • James Grimmelmann, a law professor at Cornell with appointments at Cornell Tech and Cornell Law School;
  • Hany Farid, a professor at the University of California Berkeley with a joint appointment in the computer and information science departments.

As you listen to each guest grapple with how courts and lawmakers may interpret these emerging technologies, pay attention for the points at which they agree, and the points at which they disagree, including on how to interpret the degree to which the outputs of large language models are the same or different from technologies such as search and recommendations, the degree to which it is advantageous for the companies producing these systems to be exposed to potential liability, how to think about the intellectual property questions around training data and the outputs of these systems, and the balance between the potential promise and the potential peril of AI more generally.

Each interview runs for about 25 minutes.

Below is a lightly edited transcript of the discussion.

Jess Miers:

Jess Miers, I am Legal Advocacy Council at Chamber of Progress, which is a center left industry coalition promoting technology's progressive future.

Justin Hendrix:

Can you tell us just a little bit about the Chamber of Progress, how it was constituted, where it came from, how it operates?

Jess Miers:

Yeah, absolutely. Chamber of Progress started I believe in 2020, so relatively new tech trade association. We work to ensure that everyone has benefits from technological leaps and that the tech industry is operating responsibly and fairly. We're incredibly principled in our approaches to that mission as well, so for example, we're First Amendment absolutists, we're Section 230 absolutists because we believe that laws like the First Amendment and Section 230 protect consumers online, for example. We have several tech company partners. We don't represent them, they do not have a vote on our positions. They're very, like I said, mission principle driven.

Justin Hendrix:

What's your background? Where were you at prior to Chamber of Progress?

Jess Miers:

Yeah, so I was formally with Google while I was still in law school. I was on their trust and safety team. I was working on product policy related work and legal removals work. And then I transferred over to Google's government affairs and public policy team where I was working primarily on US intermediary liability law. So again, Section 230, content regulation topics in the state and at the federal level. And I'm also an attorney.

Justin Hendrix:

I must point out for my listeners that you are wearing a Section 230 necklace while we're conducting this interview, which is perhaps the first Section 230 bling that I've seen.

Jess Miers:

That's awesome. I mean, you do know I have a Section 230 tattoo too, right?

Justin Hendrix:

Wow.

Jess Miers:

I can show it. It's on the left wrist, and it's how I got started.

Justin Hendrix:

I will confirm for the listener that there is in fact a Section 230 tattoo. We'll get into the conversation a bit. I was prompted to get in touch with you after seeing some comments included in a Washington Post piece by Will Oremus and Christiano Lima, "AI Chatbots May Have a Liability Problem." And so, this article sort of set out some questions that have been raised by experts and some questions that intersect also with the Supreme Court oral argument in Gonzalez versus Google, which could potentially have bearing on the degree to which AI systems and, of course, large language models could potentially be covered by or excluded from or otherwise handled with regard to Section 230 cases. So want to talk a little bit about that, talk a little bit about some of the cases in which you think the outputs of large language models would receive Section 230 protection and perhaps some of the cases where it might not. So I'll put that question to you just to start.

Jess Miers:

Just to kick the conversation off, I have to say, one of my driving principles as an internet lawyer is just because the technology is new and it always will be new, will always be evolving and innovating technology, doesn't mean that the established legal principles underpinning the modern web need to necessarily change. I say that just because in several of these conversations that I've had, I feel like sometimes we rushed into the discussion of, "Well, how are we going to regulate? And how do our laws change?" before we take a step back and actually think about, "Do we need to change? Do we need to take an exceptional view to this technology?" As it stands right now, there's a lot of discussion about whether Section 230 applies to, let's start with generative AI products like ChatGPT for example. That's mostly what I've been discussing with technologists and other lawyers as well.

Starting with Section 230, the opposition, the folks who say that Section 230 doesn't protect generative AI products argue that, for starters, Section 230, it was developed or enacted pre-algorithm era, for example. Congress never knew about or intended Section 230 to protect this sort of AI technology. AI technology like this didn't exist at that time, so Section 230 isn't written to apply to these products. I push back on that in saying that Section 230 was initially established with the recognition that the online world would undergo frequent advancements and that the law would have to accommodate these changes to promote a thriving digital ecosystem.

My argument is, actually, this is exactly what Congress intended. It anticipated that the internet was going to keep evolving, and now here we are, generative AI is presenting an interesting question. You're asking what are the cases in which Section 230 would and would not protect, and I think it's actually really complicated question because there's the legal, we can take this from the textualist legal analysis approach to these products, but then we can't do that without thinking about how the courts right now are undermining current Section 230 case law, how politics, the current political wars against online speech and Section 230, how those are positioned to undermine 230 as well and also what Congress and the states are doing too.

I think starting with generative AI like ChatGPT, I know the technologists hate this argument, I am a technologist myself, I understand that there are complexities to these programs, but for legal purposes, it is my opinion that ChatGPT and other generative AI products right now work a lot or operate a lot like Google search. So, a user provides an input or a query asking it to give it an output in the same way that a user presents a query to Google search and Google search presents a list of results. Now, in those results for Google search, you're going to see something, we call them snippets. So you'll see the URL the that you can actually click on to go to the site and right below you're going to see a snippet. Those snippets are generated by Google and they summarize what is going to be at that page.

We have seen early lawsuits brought against Google for the actual snippets stating that Google developed a snippet in part, for example. This is typically the argument used against why 230 shouldn't protect ChatGPT. But because Google developed the snippet in part, Google doesn't get to use Section 230. Those cases have consistently turned around to say that actually Google does get Section 230 because Google is, one, responding to third party inputs when you actually make the search. But the second important part is that Google is summarizing third party content. Google is still not the creator in whole or part of the actual content that it's summarizing, the actual content at issue.

And so, analogizing that to ChatGPT for example, the same legal principles should apply here. The only way you can get ChatGPT to respond with an output is if you, a third party user, give it an input. And more so, taking that a step further, if we want to argue about the content that's actually output from ChatGPT, you could make the argument that ChatGPT is actually just curating or summarizing existing public information or existing third party content. I think legally speaking, if we're just looking at the statute of the text and we're looking at prior case law as it is applied to some other analysis products, Section 230 should apply to generative AI products at their current technical iterations.

I think getting into where Section 230 wouldn't apply, it's going to depend on a few things. Let's say the technology advances so much that the AI is making its own decisions, it's not drawing from third party inputs. I say that even hesitantly because how is the AI trained? The data had to come from somewhere. So I could make an argument, but I think those arguments get more tenuous the more, I'd say, potentially advanced AI becomes. I haven't seen an example or case like this yet, at least in the generative AI space, so I would tend to still stick to my point that I think Section 230 still protects those products. There's a lot of places we can go with this, so I'll go ahead and pause in case there's more questions.

Justin Hendrix:

Let's talk about some of the types of transformations that could potentially happen because I think that that does seem to be where the question really is. If, for instance, I could imagine a future state where a large language model or maybe some other technology is being applied and it does appear that a search engine or a video platform or what have you is making significant alterations to the way that the material is presented possibly based on some kind of optimization, they're trying to make the material more engaging or make the material more satisfy what the engine predicts the user was looking for based on information they may have about that user or their prior searches or what have you, where is the line for you between the manipulation of that content or the transformation of that content and becoming an information provider?

Jess Miers:

Yeah, that's a really important question. Actually, it goes back to my initial point at the beginning of this podcast, which is, again, just because the technology is new doesn't necessarily mean we need to change the underlying legal principles. This is actually a perfect example where if we look at the Gonzalez case for example, the case that's up in the Supreme Court, that case entirely has to do with YouTube learning from its users and curating or displaying content, recommending content in a way that will promote specific user engagement. That's what the court is wrestling with right now with regards to whether Section 230 should go that far, if it should protect those recommendations.

Now, I've always said I don't think this case had no business being in the Supreme Court because actually there is no circuit split. Courts below the Supreme Court have consistently held in other cases that, of course, Section 230 applies to the curation of content. And that's really what's happening here. Again, we talk about whether it's ChatGPT, whether it's YouTube and its recommendation systems or even when we're talking about an offline newspaper company and they are deciding what goes on the front page to, again, drive reader engagement or reader interest, all of those boil down to curatorial functions. And those functions are regular functions that publishers consistently use to express themselves and to bring attention to the speech that they are catering to.

I think in that regard, in my opinion, Section 230 should continue to apply as long as, and these are some developing or already developed exceptions in Section 230, as long as the service, the intermediary, doesn't, for example, change the underlying meaning of the content. That's always been a principle in Section 230 case law. Online company can edit your speech as long as they don't change the underlying meaning to it. Or if we're talking about something that's illegal for example, and it's uncovered that the online service at issue has materially contributed to the unlawful expression or has somehow converted whatever the third party curated content is into their own first party content because they materially contributed to it, then that would be a situation where Section 230 may not apply. That stems from the famous Roommate's case as your listeners may or may not be aware of.

Justin Hendrix:

Let's just get a little nitty-gritty, just some specific examples perhaps help the listener think through the hypothetical. Right now, for instance, Google in its search results for videos will automatically rewrite video headlines, for instance. Some studies say about a third of the headlines that you see in search engine results pages are automatically reformulated in order to perhaps better match the search query. There is a possibility, I suppose, that a large language model could perhaps produce a meaning that is perhaps slightly different or could potentially make a particular piece of content more enticing to a user based on the formulation that it comes back out with based on its predictive model. Can you imagine hypotheticals where that would potentially put the platform in a bind with regard to Section 230?

Jess Miers:

Yeah, I've wondered a bit about in the situation what would happen if YouTube came up with or generated a headline or a title that was defamatory, for example. YouTube would likely never do this, but let's say that you've got a video that's like Jess Miers hates 230 and this is a video about Jess Myers talking about why Section 230 should be repealed. I mean that would be credibly for myself and my brand. Is YouTube liable because the underlying content, the underlying video is created by, let's say, me, another third party user. So if I tried to sue YouTube and I was basing it on the video itself, Section 230 would apply. If I'm suing YouTube based on the actual title, I think YouTube might actually have a viable argument to say that... or I would have a viable argument to say that YouTube potentially materially contributed to the unlawfulness of the content at issue, the content being the title. And because the reason they are likely viable is because they've changed the underlying meaning of the content itself in creating that title.

There's also question too as to we jump into the lawsuit question quickly and we don't think about who would the parties to this suit be? So is it YouTube's own algorithm or are they embedding OpenAI software, and could OpenAI be liable for the mistake that occurred here as well? Or could OpenAI be protected by Section 230 because of the way that YouTube implemented the content? These are harder questions than we've seen before in the 230 space, for sure

Justin Hendrix:

When it comes to other kinds of transformations, so if we could imagine various synthetic media types of approaches that might be incorporated by the platforms in the future, for instance, transformations to audio, possibly even transformations to video, upscaling of content to higher resolution or fixing lighting, maybe optimizing the sounds that are coming through a particular video, what do you think about that? Is there a point at which the platform becomes a co-creator or co-producer of material?

Jess Miers:

I don't think so for Section 230 based claims specifically. So again, thinking about things like false light for privacy tort or, again, defamation negligence, I think for those types of claims that where 230 would normally apply, I don't think that the company is at risk of becoming a co-creator. Again, this stems back to existing decades of precedence with Section 230. An intermediary can actually do a lot with third party content. They can solicit third party content. They can remix existing third party content when they curate it. They can edit it as long as they don't change their underlying meaning. And so in a situation where if all of they're doing is heightening the audio quality or they're making the display better or they're even teeing it up in a way that is more interesting to a specific user, as long as they are not participating in changing that underlying content or they're not actually creating or developing that content, Section 230 will protect.

Now, I will say, obviously Section 230 isn't limitless, there are exceptions. An important one would be for intellectual property. Right now there is a circuit split, for example, when it comes to right of publicity claims in states where right of publicity is treated like a privacy claim. And by the way, for those of you who are listening, right of publicity, we're talking about claims when somebody's name or audio or likeness is being used without your authorization. It's a little bit more complicated than that, but that's the gist of it. Some states treat right of publicity like a privacy tort, and that should be protected by Section 230.

Other states do treat right of publicity like a intellectual property tort though. And in those states, if for some reason if somebody could bring a viable claim against an internet service that has altered the audio, for example, or made someone else's audio sound like me or deepfake technology, et cetera, we can be totally out of the purview of Section 230 if we go down the intellectual property route.

Justin Hendrix:

In Gonzalez, the concern is around the degree to which the platform perhaps provided support to a terrorist organization or whether they should be allowed to even that case to be heard. If we think about some of the transformations that could happen, I'll just give you a hypothetical, if we find out, for instance, in the future that a platform like YouTube is optimizing video or upscaling thumbnails or automatically producing snippets using large language models that are particularly targeted to an individual and their search history or their demographic profile or what have you and those types of transformations do appear to make that content more attractive to a user than perhaps it might otherwise have been, would it be your position that for the most part they still shouldn't have their day in court?

Jess Miers:

Absolutely, I would say that Section 230 would apply. There's always going to be this question, right? So Section 230, the test is a three part test. You're always asking, first off, is the defendant in interactive computer service i.e. a website or somebody who is using an interactive computer service? The second test is going to look at, are the plaintiff's claims, are they attempting to hold the defendant liable as a publisher of, third test, third party content? The third party content piece is really important because, again, if we're talking about there is a huge difference between a website that is creating its own content and/or materially contributing to unlawful content in that it is actually a co-creator, it is developed, it has built that content as well, versus a website that is hosting their underlying third party content, remixing underlying third party content, displaying in a way that is attractive to its users.

All of those are not only protected activities under 230, but they're also protected publisher discretion, editorial discretion under the First Amendment as well. Now let's see what happens with the Gonzalez case though. I mean, again, we've seen already decades of 230 case law and precedent that have consistently said that if you are just displaying third party content, remixing it, making it interesting to your users, you're good to go. Recommendations: you're good to go under 230. I'm worried, I am a little concerned that what could come out of Gonzalez, though, is that is a discussion or an arbitrary line that gets drawn that says, "Section 230 only applies to algorithms when those algorithms are implemented 'neutrally"." That could really throw a wrench in the discussion because now the courts would have to assess on a case by case basis whether every implementation of what you've described as a recommendation or some sort of transformation, was it done neutrally, whatever that means, by the service itself. That's going to create, I think, a sticky question that could go either way depending on what jurisdiction you're in.

Justin Hendrix:

I guess another question, bear with me as I puzzle this out a little bit, when you think about what these models are doing, you've referred to remixing content on a number of occasions, is that, you think, an appropriate metaphor for what large language models are doing or other synthetic media approaches?

Jess Miers:

This is where the technologists have pushed back on me on Twitter as well. I actually think it's a very reductive way of explaining these technologies. I think if you're a technologist and you're explaining how the technology works, I think it's probably a little bit more than just remixing third party content. It's also looking at how many people are using OpenAI and what is the algorithm learning from those people who are putting in inputs and some other technical complexity that goes into the outputs from a generative AI product.

However, if we're talking about just the law and from the perspective of the law and what Section 230 says, I think for now the way the technology currently works, I think it is analogous to, again, a Google search for example. Or even the discussion that took place in the Roommate's case where we're getting that materially contributed to unlawful content test. Roommate's provided just a blank form for people to just input text, and I would argue that the same thing is happening here with ChatGPT. ChatGPT doesn't create an output for you on its own. If you figure out a way to ask ChatGPT how to tell you how to make a bomb, the question is going to be, who started that query? Who is the one who caused ChatGPT to create that output in the first place? And that is always going to be a third party user.

Until we go beyond a third party user's interaction with the service, if for some reason one of these models is now doing things completely and entirely on its own and it's creating it without input from a user, I think that'll probably be a harder question with regards to whether 230 protects. But then I would also have a lot of questions too as to, again, what's the underlying claim and who are the parties.

Justin Hendrix:

We've already seen in many hypotheticals or actually in many actual use cases where an interlocutor with a chat product has been able to convince the system to go round its own guardrails ,for instance, and to produce material which it claims it does not want to do. Are there cases where you can imagine, even based on the input from an individual, that... I feel like this is very important to your argument argument here, that when I enter something into that box at ChatGPT that I am personally then shaping what's coming out of it, which I understand to some extent. I think I'd probably side with the technologists on some level who would say it's not remixing, it's predicting, it's making up what it thinks you want to hear based on the string of words you've given it. I don't know, that's where I struggle, is that it's not necessarily remixing from a bag of prior material that it's cutting and clipping, but rather it's predicting based on that prior material, based on the patterns in that prior material, what the response should be.

So in that way, I can see how if I'm having a direct conversation with a search engine, say I'm talking to it over a smart home device or something and it gives me something that is, I don't know, illegal or gives me something that leads me to do something that brings harm to myself or others, I could imagine that that would produce liability.

Jess Miers:

Again, there's how the law would work and there's reality, where are the courts actually going with this? I actually think you're, that is the crux of the issue here, trying to figure out who at the end of the day is the information content provider. Now, under the actual statute, my argument would be that the person who kicks off the query is the information content provider, not the interactive computer service. I think we can compare that similarly to... Let's take some of these other recommendation cases that we've seen in Section 230 that have come out strongly in favor of Section 230 protection. One of them, I believe it's the Dyrroth case, has to do with when a website was automatically sending recommendations, I think via email, to specific users who had been putting in queries at the time in search of drugs.

I think in that case, I believe the user I think died from an overdose or whatever from taking the drugs that they had found on the service. The argument was similar. The website in that case was just predicting what the user was interested in. The user didn't input anything to ask for that automatic update. At the same time, the website was protected because what it was doing was, again, the underlying content that it was notifying the user about was third-party content. The website itself did not post where to buy those drugs or the website itself was not the source of the drugs, which is information. And so, I think if you compare it to that or even just take it a step back and think about, again, how Google search works or how even YouTube recommendations work, both Google search and YouTube, they work by taking in your interests and then suggesting or, choose your words, predicting what results are going to be most relevant to you as a user who has submitted a query.

Arguably, that is what ChatGPT is doing as well. It is using inputs to figure out what would be to predict and provide an output based on what you have asked it. I would say, look, practically speaking when we're talking about folks actually litigating OpenAI and these generative AI products, I think we're going to have a really hard discussion with the courts because we have to get the courts to understand the technology first and then we have to have a difficult discussion, in my opinion, about where to draw a line with Section 230. And if these Supreme Court cases go in and make things even more complicated with this neutral tools analysis, this could be even more complicated. To your point, I absolutely could see courts create an arbitrary line where ChatGPT products don't get the protection of Section 230 based on this prediction analysis. I think that would be the wrong way to go because, again, at the end of the day, the information content provider, the interlocutor, as you mentioned before, they're the ones that kick the query off, they are the creator of the output.

Justin Hendrix:

I think it will all, to some extent, hinge on that. It'd be a complicated question that I'm sure we'll see play out. I hope that the courts will perhaps look to some technical experts as they ask these questions. As Justice Elena Kagan brought up, there's a competency issue here that probably goes well beyond the Supreme Court.

Jess Miers:

Absolutely agree with that. I really hope that we start getting more technologists in the room on these discussions. I know I joked that they were pushing back on me on Twitter, but this needs to be a fluid discussion. This can't just be a purely legal discussion or purely engineering discussion. What's at stake here is innovation of these technologies. If these technologies feel as if, for example, Section 230 doesn't apply, there's going to be less incentive for folks to create their own generative AI products because they don't want to be liable for what some user, some interlocutor causes their product to do or say. So yeah, there's a lot at stake. There's access to information issues. We were talking about online speech and the development of more of these products and innovation. I hope that we're continuing to have a multi-stakeholder approach to these discussions.

Justin Hendrix:

And then on the other hand, of course, safety concerns, so-

Jess Miers:

Exactly.

Justin Hendrix:

... weighing the innovation with potential safety concerns as well.

Jess Miers:

Absolutely.

Justin Hendrix:

Jess, thank you so much for speaking to me today.

Jess Miers:

Thank you for having me. This is such an important conversation. I'm happy to be here.

Justin Hendrix:

If you're enjoying this podcast, consider subscribing. Go to techpolicy.press/podcast and subscribe via your favorite podcast service. While you're there, sign up for our newsletter. Next up in this series of discussions on generative AI is James Grimmelmann.

James Grimmelmann:

James Grimmelmann. I'm a law professor at Cornell University at Cornell Tech and Cornell Law School.

Justin Hendrix:

James, I appreciate you joining me this morning. We're going to talk a little bit about Section 230 and generative AI. There's a debate going on about the extent to which Section 230 may apply to the outputs of generative AI systems like ChatGPT and others and in what cases there might be liability, in what circumstances. And so, I want to get your first thoughts. I know that you are doing some research on this in your group and thinking through this question.

James Grimmelmann:

I think this is one of those questions that's not going to have a simple answer. I can imagine generative AI systems that produce outputs that they really don't seem like they should be responsible for. They work more like search engines and they're just pointing you to things other people have said. I can imagine generative AI systems that are highly culpable and have been engineered to do terrible things. It's not clear that Section 230 should treat those two the same.

Justin Hendrix:

Let's talk a little bit about the way that you understand the technology and the extent to which the way that the technology works and metaphors that we might use from prior technology may impact the way that courts think about the role of generative AI systems in this regard.

James Grimmelmann:

Yeah, so very loosely, generative AI systems are trained on a lot of examples, usually scraped from the internet or in other very large data sets. They learn in the structure of their networks, the statistical regularities and whatever they've been trained on. So language models learn which words tend to occur together and in what order and which words are substitutes for each other with different inflections. Image models learn what kinds of patterns of image data show up together, and so they can pick up both the structure of what a house looks like or the distinctive painting style of impressionist. And then users feed them a prompt, a few words or a source language to start from, and then these systems complete that prompt with patterns they find most likely to fit with it in context. So if you start a conversation with a language model, it will continue in the same tone as a respondent having a conversation with you. Or if you use an image model, you can prompt it with some words, and it will try to create an image that fits with those words in its labeled database.

Justin Hendrix:

So how does this affect or how does this understanding of the technology affect whether we may consider tools like ChatGPT as an information content provider versus an intermediary?

James Grimmelmann:

I can tell you different stories about what the generative AI is doing. One story makes it sound a lot like a search engine. When people use ChatGPT for doing research or to summarize information on something, it functions a lot like that. If you ask ChatGPT to explain photosynthesis to you, it will be drawing on lots of people's explanations of photosynthesis from around the web, and the words that it give back will basically reflect the descriptions that thousands of people have given in its input data. It may look like some of them in particular, may be more of a squared out average. But ChatGPT has no real views on photosynthesis, it truly is mirroring what's in the training data.

Justin Hendrix:

In that version of the telling, do you regard a system like ChatGPT or other large language model outputs to be essentially remixing the content, or is it developing the content?

James Grimmelmann:

In that version of the story, it's really just a remix. It's a very complicated remix where it is boring together lots of different sources into a component of composite. But that's like taking thousands of people's photographs and blurring them together into a single generic composite face. The output really is just coming from the inputs in a way that brings them together and draws on them all.

Justin Hendrix:

So what's the second telling of the story?

James Grimmelmann:

A second version of the story would be that it's not just finding things that are in the data. That first version makes it sound like a remix or a search engine. A second version is the well-known tendency of AI models to hallucinate, that they will make up things that are syntactically plausible but didn't come from their training data. It's not just they're returning back to things they've been taught on, they will complete a dialogue or a picture with things that are unlikely, even if the particular details are highly implausible or didn't happen. So if you go to ChatGPT and you ask it for information about scandals that give a person's name has been involved in, it's quite possible that it will invent some scandals because that's the kind of response you would get into that kind of exchange. It doesn't happen that that particular person has been involved in these things, it's just that the model is writing out or, somebody would say, hallucinating a plausible looking answer to that question even though there's no semantic content.

That's a case in which the information content provider argument is a little harder to make. It is producing a remix of the words that people have used on the web, but it has synthesized them into an allegation against a person that is not present anywhere. The claim that James Grimmelmann beats puppies, that didn't come from any source that said I beat puppies, that was actually synthesized by the model. And so, in the language of some of the 230 cases, it has contributed materially to the illegality of the content.

Justin Hendrix:

Are you imagining different types of ways that these systems could be used, for instance, Microsoft baking it into being, we might see Google using language models in its search products in the future? How do you think this will play out? What other types of, I guess, scenarios that you'll be looking for?

James Grimmelmann:

I think the diversity of context this will be used in is part of what makes it so tricky. You are going to have search assistance built into being into Google. You are going to have office assistance built into Microsoft Office and Google Docs that will complete text for you and do research. You are going to have standalone apps that aren't coming from these major companies but are built on models that have been incorporated into other programs. And you can already get apps like that on your iPhone that run locally on the phone. They are going to be customized models based upon these ones where someone has taken GPT and highly customized it to work for engineering applications or for technical programming-related applications. Each of these settings was going to raise distinct questions. The exact combination of who put together this training data, how was the model spit customized, what kind of filtering did they do afterwards, who actually ran it, these kinds of factual intricacies are going to make the legal analysis more complicated.

Justin Hendrix:

We know that the Supreme Court is considering this question. We heard Justice Neil Gorsuch talk a little bit about the idea that AI may in fact generate content, asking the question about the extent to which that might be incorporated into the recommendations of a search engine or another internet platform. What do you think the Supreme Court decision in Gonzalez v. Google may end up portending for this question?

James Grimmelmann:

It's quite possible that a clear victory for one side or the other will have sweeping implications here. So if the court rules for Gonzalez on a very broad theory that almost any recommendation or ranking or ordering is uncovered by 230, then that theory would almost certainly leave a lot of room for generative AIs to be allowed. If the court adopts Google's position in which none of this is outside of 230, then that could wind up being a pretty broad shield for Ais, that they're just an extension of the algorithms that search engines are already broadly using. If the court comes down somewhere in the middle or if it says very clearly, "We are not judging how this will apply to AIs," I think I'll still leave the terrain open for future cases.

Justin Hendrix:

If Justice James Grimmelmann were writing the decision, and assuming he had the majority, what would you hope would happen?

James Grimmelmann:

I think at this point the 230 case law is just too well-developed to tinker with it now. I would say something like, "The courts that have considered this in the District Courts and Courts of Appeals have settled on a very broad interpretation of 230. It has its critics, but it's also been functioning. We leave changes to Congress."

Justin Hendrix:

What would you recommend to Congress?

James Grimmelmann:

For Congress, I would try to carve out some specific areas from 230 that aren't really at stake in this case. I would want a clearer line between speech and conduct, so to have a clearer rule that marketplaces, so eBays and Airbnbs and Ubers, that they can't just put their entire business operations inside of Section 230. And I would want some kind of clearer rule about obligations to review content upon some kind of notice or knowledge that this specific item is problematic in some way, at least where the platform is already giving that kind of human review to the content, so not the truly automated ones, but in cases where it's artisanal and closely moderated. You have these gossip sites that are already hand-selecting which posts to put on the site, and it's not obvious to me that they really need the shield of 230 in the same way that a really large site like a Reddit does.

Justin Hendrix:

I think a lot of the fears of folks who are concerned about language models and the way that they may roll out in broader context on the internet do regard things like discrimination, things like bias in the underlying training data that may be reflected in the outputs, but then also potentially just the insertion or injection of misinformation. Even difficult to identify misinformation into information systems that may then have second order effects. Do you think that the types of exceptions you're describing there would cover off those types of concerns?

James Grimmelmann:

I think a lot of the concerns we're having around large language models right now are actually less about the models themselves and more about the way that they're being thrown into all kinds of applications with very little thought about the suitability to them. There are people who think that these models have some kind of special access to truth and that if we just ask them the right questions we will be able to predict the future or access knowledge that's being kept from us, and so that this are some ways more reliable search. In fact, that's a pretty dangerous [inaudible 00:41:14] approach to what they do.

You also have people just throwing them into processes and putting data out from them, using them to write papers or to write blog posts with very little concern for reviewing the quality or accuracy of it. I think the bigger problems for our information ecosystem are just the increased flood of very poorly-vetted content. I'm more concerned about how we use it than about how we train it. We could have really good vetting and really good controls on the quality of training data, and if people are still using this to spew out these fire hoses of content, we're still going to have a lot of the same terrible effects.

Justin Hendrix:

What would be an effective governance regime for large language models, or AI more generally, that would perhaps mitigate some of those concerns?

James Grimmelmann:

I may not be super popular for saying this, but I don't think governance is going to be a great framework for dealing with these very rapid, short-term shifts. I think it's at this point really hard to create a framework that manages that well. I think common law case by case development as something to be said for it because it can respond to, like I've said, the huge factual diversity in the way these different models work and how they're used. What I really want would just be more responsive, faster-moving courts. Get a lot of cases into the system and start dealing with them. We'll get some of them wrong, but that's the way to have a process that actually learns.

Justin Hendrix:

But of course, that would require that some of these cases can actually come to the court as opposed to just being kicked out under Section 230.

James Grimmelmann:

Yeah, I appreciate the tension there that Section 230 has created a regime where people rely heavily on it. And so, the case by case learning we want has actually happened in some ways in an immense wave of lawsuits that are dismissed on Section 230 without reaching the merits, which is, of course, this ability that provides is intention with our ability to learn from examining these cases.

Justin Hendrix:

A lot of the folks who have an interest, or seem to have an interest, in perhaps offering a temporary amnesty or a period in which we can evaluate the potential both harms and opportunities of these technologies suggest that really that's the reason we should avoid creating a scenario in which they're liable for their outputs in the near term, that we need to see where the tech goes, what it can do, what it can do for society. What do you think of those arguments?

James Grimmelmann:

I don't think we are in any danger of preventing AI from happening if we have some of those concerns happening. I think that there is so much pressure and so much interest economically in making this happen that it's not as though we can turn the switch off at this point and prevent AI from happening over a five or 10 year timeframe. I think it would actually be nice to move slower. And if the companies had more fear of liability, they would not be rushing so aggressively to deploy these models with quite little vetting. So in some ways I think the environment where they feel more chilled would actually lead to a healthier balance overall.

Justin Hendrix:

I've heard some folks say that maybe at the higher end, the 10 or 12 or so companies that are really investing the most into the development of these models, investing the most into the development of AI more generally, that perhaps there should be special controls on them, compute governance, for instance, some kind of registration for training massive models that may have unexpected capabilities or may pose particular types of risks. Have you given that any thought?

James Grimmelmann:

Not a lot. The thing that I worry about is that it really does seem like they're pouring immense amount into having the best models, and they have the money to throw at it, but models that are a generation or two back are shockingly good and increasingly shockingly inexpensive to train. We are not talking cases where you have to be putting in nine figures to be in the game at all. It really does seem as though you can get competent, surprisingly good models for hundreds of thousands to a few million dollars worth of compute. So we're paying attention to the current frontier with the OpenAIs and the Googles, but people have gotten surprisingly good models that are open sourced, and so we're need to see widespread availability of these capabilities very soon.

Justin Hendrix:

What do you make of the approaches to trust and safety so far of those major players, Microsoft, Google, Facebook, OpenAI? OpenAI has just put out its technical "paper" around GPT-4, described various steps that it's taken, including a red teaming process, alignment process, the like. What do you make of what we're seeing from these companies with regard to their responsibility?

James Grimmelmann:

On the one hand, it's quite clear that things will be immensely worse if they weren't doing this, that OpenAI has taken a pretty serious approach to thinking about different use cases and have put in place training and reinforcement wanting to prevent some of the bad ones. It's also clear that it's woefully inadequate that they're learning from their exposure to actual users trying different things just how many ways their safety guardrails break. We saw this with the blowout of Bing Sydney, in which people were really able to break out of the limited bounds that Microsoft thought it could be used in. And they shut that one off only by limiting the number of queries you could give to Sydney in any one conversation. I'm glad that they're doing it, but so far they're not inspiring great confidence in their being up to the task in the long run.

Justin Hendrix:

What is your degree of concern around the rollout of these technologies more generally? If you could think back from, say, the end of the decade, 2030, where do you think we're likely to be? I mean, there are some folks, of course, predicting AI apocalypse, others predicting the age of abundance. Where do you think we're likely to end up?

James Grimmelmann:

I don't know because I think these technologies massively increase technical and social instability, that we don't really understand how they work and how they'll be used, especially at the scale of the billions of people already doing things on the internet. And so, I just think they significantly increase the unpredictability of the way that society will work. And that's in some ways what concerns me more than anything else.

Justin Hendrix:

Another set of concerns, of course, around intellectual property and copyright. We've just seen the Patent Office in the United States basically say that any output of one of these systems will not deserve copyright. What do you make of what's happening in that space and the current lawsuits that are happening?

James Grimmelmann:

IP isn't necessarily the best way to have all these conversations about how generative AI should and shouldn't be used, but they're the lawsuits we've got, and they do raise real questions. In the past, we've taken the attitude that training machine learning models is almost always a fair use, and that calculus might look different now that we have really good generative AIs that are producing outputs that genuinely compete with the work of the artists they were trained on. So that's a serious question worth asking.

And the, "Let's just gather data from everywhere without much concern about its providence, quality, legality, even consent," that's an attitude that I'm not sure has gotten us to great places. So maybe IP and lawsuits are a way of putting the brakes on or firing some warning shots about the way in which we go about developing these models. It also is a way that gets us to ask, "Where does the output come from? What do we mean when we say this was generated from this training data?" That turns out perhaps to be a useful perspective in thinking about the issues in these other kinds of lawsuits. I don't have a strong view on how these lawsuits should come out, but I'm actually glad that we're having them. They drive conversation in a way that other issues don't.

Justin Hendrix:

The tension seems to be between the view perhaps from Silicon Valley that the internet is a corpora of material that represents all of humanity's investment in learning and art and creativity in communication and that we can take that as a general training set and from it produce anything we might like. Where there are others, people who produce culture, people who produce ideas, who look at that and say, "That is expropriation. You're taking our culture. You're essentially flattening it and allowing it to be reproduced with no remuneration, with no consequence to its use." Is there any way to square those things? Can you imagine some kind of copyright regime in the future that pays people for perhaps similarities between the outputs of an AI and whatever went into training it?

James Grimmelmann:

One way I think about this is that we might not like the world of culture that these AIs are bringing us into, but copyright may be a bad tool for stopping us from going there or changing it, that the assumptions about how people create and use culture baked into the copyright system may really be hitting their breaking point. We thought that with the internet and file sharing. It may actually really happen with AI. It might be that we want both to have some governance system about how AI is used and how it relates to culture and how we flag things that are synthetically created versus ones that came from people. It may be that we need to have some very serious conversations about how to support artists and writers and creators and to make sure that creativity is something that still people in society broadly share. Those two things just have become completely decoupled and that copyright as the regime that unifies them is no longer the right tool for the job.

Justin Hendrix:

Well, what a time to be alive. As someone who studies these things, do you feel like you are closer to clarity on some of these issues or does it seem daunting?

James Grimmelmann:

It's incredibly daunting. I feel like I am trying to wade upstream in a river that is flooding like the rivers in California with the rainfall and snow coming down on them, like every step I take forward is against a raging torrent.

Justin Hendrix:

One of the things I'm struck by in looking at some of the outputs, especially from OpenAI, the papers that it's done in collaboration with academics on everything from disinformation to the potential impact on labor and the economy of these systems, is that there seems to be a sort of gap between what independent academics and civil society are able to do and understand about the direction of these technologies and what the companies themselves are able to do in terms of the way they are able to resource and even consider the potential implications of the things they're developing. Do you see that gap? Is there a real gap there? Is that something we need to be concerned about?

James Grimmelmann:

I think there is a mismatch. Things are moving so quickly in this space that, on the one hand, people working at these companies do not feel that they have the time or ability to ask these broader questions. They have frameworks they've developed to try to think about them, and those frameworks under a lot of strain and they're not taping up with reality, but they're such a race on right now that they really can't step out of the stream. And those of us looking on are doing our best to think it's tough when things are coming at us so quickly, but we're always also just a few steps behind because we can react to whatever is happening now while the next generation of model was already in training in secret somewhere.

Justin Hendrix:

James, I hope you'll come back and talk about these things as those developments unfurl.

It's been a pleasure.

Hany Farid:

My name Hany Farid. I am a professor at the University of California Berkeley.

Justin Hendrix:

Hany, can you just give the listener a little bit of sense of your expertise and why you are particularly interested in both generative AI as well as Section 230?

Hany Farid:

I am a computer scientist by training. I'm on the faculty here in computer science. I also have a joint appointment in information science. I think about not just developing technology, but how technology interacts with us as individuals with societies and with democracies. I have been concerned over the last now 20 years with how technology is being weaponized in the form of child sexual abuse, in the form of terrorism and extremism, in the form of fraud, in the form of non-consensual sexual imagery, in the form of promoting and celebrating hate and vitriol and general awfulness that is the internet today. I am not anti-technology, I'm very much pro-technology, but this is not the internet that I signed up for 25 years ago, and I think we can do better.

Justin Hendrix:

You recently testified in a Senate Judiciary Subcommittee on Privacy, Technology and the Law in a hearing titled Platform Accountability Gonzalez and Reform. Senators there wanted to talk through the arguments in Gonzalez v. Google and to hear from Section 230 critics and, indeed, the council who led the argument in Supreme Court for or in favor of Gonzalez. What did you hope to convince the senators of that day in your testimony?

Hany Farid:

What I think has been frustrating for me, and for many people, is that a lot of discussions about fixing some of the problems that most people agree we are seeing online is that it has become very partisan. It has become partisan in part because 230 has become partisan, and that has become partisan because Republicans, conservative voices are fixated on the narrative that content moderation is anti-conservative. It is not true. The evidence is overwhelming, in fact, that conservatives dominate social media. On the other side of the aisle, the Democrats, are concerned with how technology being used to disrupt democracies, impede our efforts to combat global climate change, and things like COVID.

And so, we sort of all agree that something's not working, but we disagree on the nature of the problem. And so, my interest was to try to break through that partisanship. One way that I've been thinking about this is to forget about 230. I think there's problems with 230, but forget it. In some ways, my thinking on this has evolved, and I've come to this realization that the problem, in fact, is not the content, the problem isn't that there are awful people doing awful things on the internet, the problem is that the platforms themselves are designing their products to encourage, amplify, and monetize the worst content on the internet because that seems to be what's driving our engagement. In that regard, you don't get protection from a statute that says you are protected against what somebody else does. If you design a product that is unsafe, that is leading to harm, well, you own that the same way you own that in the physical world.

Let me give you a concrete example of that with the Gonzalez case, which was, of course, centered around YouTube. YouTube claims Section 230 protection because ISIS is uploaded videos, and that's not our problem, we have protection from third party content. But if you go over to YouTube and you watch a video, what happens afterwards? Well, you get recommended another video. In fact, even if you don't watch a video, if you just go right now to YouTube, you will be recommended things to watch. That's a feature, that's a design. They didn't have to do that, by the way.

Well, think about core functionality of YouTube: upload a video, watch a video, maybe search for a video. They chose to design a feature that promotes, recommends videos after you watch one. Why? To maximize user engagement, maximize ad revenue. The algorithms, the way they do that, amplifies some of the worst, most vitriolic, conspiratorial content because that's what engages users. This is not a third party Section 230 issue, this is a product liability issue. You've designed a faulty product.

Let me give you one more example. In Lemmon v. Snap, there was a case where kids were using what was called a speed filter on Snapchat, which would record your speed superimposed on the top of your video. Kids were getting rewarded for higher speeds, and kids did what kids do. They got in their car and they drove 125 miles an hour and wrapped themselves onto telephone poles and killed themselves. The family sued Snapchat saying, "You are responsible for this." A lower court said, "Nope, they have 230 protection." But the Ninth Circuit Court said, "Wait a minute, this is not a content issue. Snap designed a speed filter, and they knew or they should have known that this was going to encourage bad behavior." In fact, they sent it back down to the lower court. And importantly, and this is incredibly important here, the Ninth Circuit did not find Snap libel, they just said, "You don't get the 230 shield."

And that's what I'm saying too. You don't get 230 shield for designing a faulty product. People who are harmed by your product should have their day in court and let the jury decide about liability. And it will be messy for a while, but we'll figure it out the way we figured out all other product liability in the physical world.

Justin Hendrix:

You've given some thought to the role of recommendation algorithms, other algorithmic transformations perhaps that these platforms are doing. I know that Google, for instance, is automatically rewriting headlines for videos and search engine result pages. It's possible in future that a variety of synthetic media approaches could potentially serve to do a variety of upscaling of content or other types of transformations that we can imagine. Is your mind going in that direction as you think about how to make this distinction?

Hany Farid:

It's hard to look around right now and not see the power and the influence that generated AI will have, whether that's ChatGPT, whether that's image synthesis, video synthesis, or voice synthesis. So let's play out a few scenarios here. So ChatGPT or versions of these interactive generative programs are now being released. There's a version on Snap. There's a version on Microsoft. Google just released a version today. Imagine you go over to, let's say, Microsoft's aversion of ChatGPT and you start engaging with it in a conversation and it convinces you that you should go out and commit a crime. Does Microsoft get 230 protection from that?

Well, it's hard to imagine how they could because there's no third party content. It's your AI system. Yes, the AI system is built on third party content, you scraped the internet and you built that using large language models and all this internet content, but it's your program, and I think you own that. And so, I don't think the way 230 is written now, and certainly not how it was conceived in the mid-nineties, that you get 230 protection from that. So I think these companies are going to have to think very carefully how they start using generative AI to whole-cloth create content which doesn't get 230 protection.

But also, what if they just start modifying content to make it more engaging? Now it's starting to sound a lot like the Henderson case. In the Henderson case, which you heard about in that Senate Judiciary hearing, a company that was taking data, massaging it, drawing inferences from it, and then releasing it out, which had lots of errors, which impacted people's lives, the court found did not get 230 protection. I think companies have to tread very lightly here with how they start to deploy 230... I'm sorry, generative AI, because it suddenly is not going to look like they're a distributor, it's going to start looking like they are a publisher and a creator, and they don't get 230 for this.

Justin Hendrix:

What would you make of the argument, for instance, that services like ChatGPT or stable diffusion or other things like that are simply remixing the training data, that they're coming up with something that is born of the parts that they've collected?

Hany Farid:

Yeah, here's what I love about that argument. They will argue that is exactly not the case when they go to court and argue about copyright infringement. So if they argue, "Oh, we are simply mixing up a bunch of third party content, and therefore this is third party content," they have a copyright infringement problem. You can't have it both ways. You can't steal people's content and then claim, "Well, we're doing something completely original. This is fair use," and then when it comes to harm saying, "We're just mixing up a bunch of stuff that other people have done. We're not responsible for it." Pick which one you want. One way or another, you're going to be on the hook.

By the way, it is true that generative AI can reproduce things from the training data, but generally speaking, that is not the case. Generally speaking, it is creating novel images, video, audio, and text. And so, I think that argument doesn't fly with respect to how the technology works. And I guarantee you the tech companies are going to try to have it a different way when the lawsuits for copyright infringement happen.

Justin Hendrix:

When you think about the potential applications of things like large language models, other forms of "generative AI"... Be interested, by the way, what you think of that phrase, it seems to be relatively new.

Hany Farid:

Yes,

Justin Hendrix:

I've been in this world for a little bit, as you've been for many more years working on media forensics and questions around this. And the phrase itself seems relatively new to me.

Hany Farid:

Good. We should talk about that, by the way, I have thoughts.

Justin Hendrix:

What else are you seeing down the pipe when it comes to this question around liability for the outputs of these systems?

Hany Farid:

Yeah. First let's talk about that phrase, generative AI. It is fairly recent. I think with ChatGPT, is where it came from. So there's some things I like about it. I think it's fairly descriptive if you acknowledge that none of this is AI and it's all machine learning, but let's use that term AI broadly. But here's why I think it came about. I think the industry generated it because, what did we used to call it? Deepfakes. Deepfakes are scary and bad because people are doing awful things with videos and audio and images. I think this is a softer version of deepfakes, which is very scary, and I think that's where the term came from.

Now, where do I see the liability issues? Well, in a couple of places. First of all, it is absolutely the case that there is an inflection point in the last few months. We have started to see these large language models do things we haven't seen before. We've started to see image synthesis like Midjourney's version five create images that are unbelievably realistic. ElevenLabs has basically nailed voice cloning. With 60 seconds of somebody's voice you can now clone their voice very, very easily and just type and it will generate their voice. Of course, deepfake videos continue its trajectory. We are seeing the same harms as we've seen in the past, but accelerated, non-consensual sexual imagery, fraud, large scale and small scale. People are now getting phone calls from what they think are their loved ones saying they're in trouble, send money, and it is working. The FTC has been releasing warnings telling people. Phone calls now, it's not just the texts and the emails, so that is on the rise.

We are absolutely seeing the rise of deepfakes and generative AI and disinformation. Just in the last two weeks, every single day I have gotten an email from a reporter with an audio clip purportedly of Joe Biden saying something inappropriate on a hot mic. I've seen one of Bill Gates. Today I saw one where it was a fake image of Putin kneeling before President Xi. Every single day now. Why is that? There's actually a couple of reasons. One is the technology really has gotten very good, but also is you need no skill to do this. Just go to Midjourney, go to DALL-E, go to Stable Diffusion, go to ElevenLabs, go to any number of online portals, type something in, and it will create a very realistic piece of content. And then carpet bomb the internet with it.

And so I think there has been an inflection point, and I think that in 2024 here in the US in the national elections, this is going to be a real problem. I think it's going to be a problem for two reasons. One is people are going to create fake content, and it is going to have an impact. But also, when a candidate really does say something inappropriate, who's going to believe it? Go back to 2015 when Trump was caught on the Access Hollywood tape about what he said about women. He has plausible deniability. That audio released today, nobody will believe that audio is true.

Justin Hendrix:

OpenAI has invested quite a lot, they say, into trust and safety. They have done a substantial amount of red teaming, at least per the "technical report" that they released around GPT-4. What do you make of these approaches to trust and safety by OpenAI or by these other firms, by Microsoft, by Google, by Stability AI?

Hany Farid:

Yeah, so two things on this. One is I think OpenAI is doing a reasonably good job. I don't think it's great, but I think it's a reasonably good job. But the problem here is that we're only as good as the lowest common denominator. For example, when OpenAI released DALL-E, the image synthesis network, they put some pretty good guardrails on it. You couldn't create images of Joe Biden and recognizable people. You couldn't create sexually explicit material. They had put some pretty good guardrails on it, and they were adapting that policy as they saw people abusing it.

But Stability AI comes along, create Stable Diffusion, open sources it and says, "Do whatever you want. Non-consensual imagery, Joe Biden, we don't care." And so, the problem is all of this technology is more or less out in the ether. The large language models, the networks for doing image synthesis and deepfakes, a lot of them are just open sourced on GitHub. And so sure, great, I'm happy to see them at least thinking and talking about it, but the reality is as long as there's somebody out there not willing to put those guardrails, that is going to be our weakest link.

OpenAI also recently released these voluntary principles. Fine, but you and I both know that voluntary principles don't work in an industry. No industry, when there are potentially billions at stake, are going to voluntarily slow down, reduce their profits, do something that is maybe making them anti-competitive with other people who are not going to be part of the voluntary principle. I think we very much need our regulators to start thinking about these problems, think about how to create a healthy ecosystem for innovation, but also start thinking about how to mitigate some of the harms. And that can't come from inside the industry, it has to come from outside the industry.

Justin Hendrix:

Of course, the threat of litigation liability would be an important countermeasure or adversarial force-

Hany Farid:

Yes.

Justin Hendrix:

... potentially on the companies to try to do their best. Regulation, you are in favor of it. It doesn't look like Congress may act very soon, at least that's not my assessment. I don't know, how worried are you in the scheme of things that we're not going to get this right in the near term?

Hany Farid:

I think it's more likely than not we will not get it right. We have a national election coming up, things tend to get pretty crippled. We have the freak show that is going to be a potential Trump indictment this week. It's going to be very hard to get traction here. We're dealing with still instability in the financial sector. And by the way, on top of all that, hardly anybody in Congress really understands this technology in any real way so they're all working in the dark. Little bit of glimmer of good news, the Europeans have been thinking very carefully about AI regulation, both on the predictive side and the generative side, so maybe there's a glimmer of hope coming out of Brussels and the UK. Australians have also been very thoughtful on these issues.

I think we are crippled here in this country, and I think that there is a lot of money being poured into generative AI and AI in general. I think that there are very powerful lobbying efforts that will absolutely stifle any innovation by saying, "Well, if you regulate, we will be anti-competitive with China." I think that we are going to repeat the same mistakes we did of the first 20 years of the internet. So think about all the mess that is social media today, throw AI on top of that, and I think we're going to end up in a very messy landscape.

Now, there is one glimmer of hope here, which is I do think, and you mentioned it, Justin, so let me repeat it, is that one body of law that is actually fairly well-established is copyright law. This is the one place where we actually can get something done. So I think it might be interesting to see. Getty, as a matter of fact, has filed a lawsuit against Stability AI for scraping their images illegally and using them to train their network. I think this may be a place where the original copyright holders there's may be able to hold the feet to the fire of these big companies that are leveraging all of their content. And maybe that can create a little bit of regulatory pressure. But I don't think we're ready for what is happening, and it is happening very, very fast.

Justin Hendrix:

I can't help but ask you to cast your mind forward a bit. Everyone's wowed by GPT-4, they're wowed by Sydney, they're wowed by all of these new contraptions. But 2, 3, 4, 5 years from now, bigger models, more compute, who knows, people will be spending billions to train models and burning off a good bit of carbon as they do it. What are the mechanisms that we can put in place, perhaps, to prevent these things from getting beyond our ability to govern them or to understand them?

Hany Farid:

Good, I'm glad you asked me that. So I have two thoughts on this. One is, I think, first of all, the burden should be on those generating the content, not on us the consumer or even on the social media companies for that matter, the effective distributors. So I think if you are in the business of cloning people's voices and creating videos of people saying things they didn't say, you own that, both ethically and I think it should also be legally. So what can they do? Well, here's a couple of ideas. They can watermark every single piece of content that they create, text, audio, image, video. They can insert robust multiple watermarks that make it easier to detect downstream. Not guaranteed, it's not a silver bullet, there are ways of attacking this, we know this, but create some friction in the system, create some speed bumps that make it easier to detect.

They can also, particularly if you're generating this content server side, is fingerprint every single piece of content and make sure that you can't create it anonymously. So make sure you have an account and you have an email and maybe you have a phone number and we can associate who did what with what so there's some accountability. They can do this. We have the technology to fingerprint and watermark every single piece of synthetic content created. Now, on the flip side, we can also start thinking about how do we trust real content. So there's an effort I'm involved in, it is a not-for-profit, multi-stakeholder effort called the C2PA, Coalition for Content Provenance and Authentication. It is being led by Microsoft and Adobe, and Sony's in there, and the BBC is in there and lots and lots of companies who are thinking about... In fact, we're not thinking about it, we've done it. We've written a specification that would allow people who record something, a politician, police violence, human rights violations, whatever it is, to authenticate where it was photographed or recorded, when, and then what are the pixels that were recorded.

You put all of that cryptographically signed onto an immutable ledger, a centralized ledger, and then downstream. When the politician says, "I didn't say that," you can go to that ledger and say, "Well, wait a minute, I have a cryptographically-signed piece of data that says you did say that." So I really like that tackling the problem from the other side, authenticating the real stuff so the liar's dividend doesn't work, and then forcing those on the synthesis side to watermark and fingerprint and keep track of the content that they are creating.

Justin Hendrix:

We won't catch all the mice, but perhaps the cats can at least have some advantage.

Hany Farid:

That's exactly right. Think about this like every cyber attack. Spam, we haven't solved the problem, but we've mostly contained it. Even viruses and malware are reasonably contained, right? All of these efforts over decades have made this a more or less manageable problem, right? It's a problem that we can sort of live with. Still threats, we have to adapt to it, we have to keep thinking about these problems, keep adapting, but you can't make it 100%, but right now it's at 5%. We got to get it up into the 90s, and I think there are mechanisms to do that, both legally, regulatory, and technically.

Justin Hendrix:

Well, in the cat and mouse game, always nice to talk to one of the cooler cats. Thank you, Hany, very much.

Hany Farid:

Great to talk to you, Justin. Thanks for that.

Authors

Justin Hendrix
Justin Hendrix is CEO and Editor of Tech Policy Press, a new nonprofit media venture concerned with the intersection of technology and democracy. Previously, he was Executive Director of NYC Media Lab. He spent over a decade at The Economist in roles including Vice President, Business Development & ...

Topics