AI Accountability and the Risks of Social Interfaces

Justin Hendrix / Apr 22, 2023

Audio of these conversations is available via your favorite podcast service.

Last week the U.S. National Telecommunications and Information Administration (NTIA) launched an inquiry seeking comment on “what policies will help businesses, government, and the public be able to trust that Artificial Intelligence (AI) systems work as claimed – and without causing harm.” Assistant Secretary of Commerce and NTIA Administrator Alan Davidson announced the request for comment in an appearance at the University of Pittsburgh’s Institute of Cyber Law, Policy, and Security.

Alongside him was NTIA Senior Advisor for Algorithmic Justice Ellen P. Goodman, who said the goal is to create policy that ensures safe and equitable applications of AI that are transparent, respect civil and human rights, and are compatible with democracy. In this episode, we’ll hear from Goodman, who is at NTIA on leave from her role as co-director and co-founder of the Rutgers Institute for Information Policy & Law.

And, we’ll speak with Dr. Michal Luria, a Research Fellow at the Center for Democracy & Technology who had a column in Wired this month under the headline, Your ChatGPT Relationship Status Shouldn’t Be Complicated. Luria says the way people talk to each other is influenced by their social roles, but ChatGPT is blurring the lines of communication.

Below is a lightly edited transcript of the discussion.

Justin Hendrix:

Is this the first time the NTIA has had a senior advisor for Algorithmic justice?

Ellen Goodman:

Yeah, it's a new position. I'm on an IPA, so I'm on this sort of this status of a loaner.

Justin Hendrix:

And how long will you be in the role?

Ellen Goodman:

I hope to be there basically through the end of the calendar year.

Justin Hendrix:

You just last week have announced this request for comment inviting in experts and others in the community to essentially provide NTIA with ideas about AI accountability. Can you explain a little bit about this call, what you're hoping to get and what might happen as a result of folks contributing ideas?

Ellen Goodman:

Yeah, so let me just back up for anyone who doesn't know what NTIA is, because it's a small agency in the Department of Commerce but it punches above its weight. It is by statute the President's advisor on technology and communications policy. And so this particular request for comment, and I should say it does not have regulatory authority. So the output of this request for comment will be a report and recommendations, and we're calling it request for comment on AI Accountability Policy. And I guess our sister agency is NIST. And NIST has conducted, as probably many of your listeners know, this really sophisticated AI risk management framework. And it's voluntary and it is descriptive and not prescriptive. And so it talks about measuring and managing or provides tools for measuring and managing AI risk, but it does not purport to have a normative position on what risks are acceptable.

And then in addition to that, I'm just just sort of giving you the landscape and then I can sort of say more specifically what we're trying to do. In addition to that, there's the Whitehouse's OSTP's blueprint for an AI Bill of Rights, which is also guidance and sort of aspirational and hortatory, and it focuses on how AI systems ought to protect fundamental civil rights. And then where we come in is we are hoping to provide policy makers, especially Congress and federal policy makers, but actually also we're seeing so much activity in the States, also hope to be helpful there, to actually drill down on if we want to see AI systems act responsibly and the kind of abstract now that the federal government uses is trustworthy. Trustworthy AI.

If we want to see that happen and we think maybe it's going to require regulation, maybe it's self-regulation, maybe it's the market, maybe it's technical solutions and we're asking about all of that, but actually what do we need? What tools are necessary? Policy tools. And also sort of governance tools are necessary to make that happen. And if you want, I can get more specific about the questions we're asking and our kind of theory of the case if you want.

Justin Hendrix:

Well you've got three main kind of areas that you're interested in, that are posted with the call, with the request. Maybe let's talk a little bit about them specifically. The first question is what kind of data access is necessary to conduct audits and assessments? And right now that's a hot topic, particularly with this new species of generative AI large language models that we're seeing out in the wild. Folks are wondering to what extent do we need to understand training sets? To what extent do we need to understand the specifics of how these systems are coming to the conclusions that they come to in order to be able to test their reliability?

Ellen Goodman:

So let me first say that we're asking about, in addition to audits and assessments, we're also asking about certifications. And so we're really asking about for the life cycle of the system, whether we're looking at pre-market and whether or not there should be required certifications before a tool or a system is introduced to the market either in a testing phase or in an actual marketing phase. And we're also asking through its life cycle when there are moments and these could be required or self-regulatory for audit of the system against either third party benchmarks or we can see that in Europe, the DSA is sort of against the risk assessment that the company did itself. Everyone's talking about that. There ought to be, we ought to have, but what actually is necessary to make that work? And so with respect to data access, we could say sort of data and information access.

And so it might be how much, how do you get into the training data to examine it if you're a third party auditor or if you're a certifier and you want to make sure that data set was complete and representative, how can you assess that? But we could also talk more generally about what sorts of information is necessary. And so with generative models, my understanding is that in addition to the training data, the reinforcement learning is also very critical to understand because that sort of creates the refusal space where the system won't return a result. It's critical to understanding where was the go territory, where was the no-go territory, and does that comport to we can just put in bracket standards, whether those are mandatory standards or voluntary standards.

And then we could also in that bucket of data access, we can throw and we ask about all of this, are we talking about auditor access under some sort of NDA? Are we talking about researcher access that's more open? And we might also be talking about kind of transparency mechanisms like model cards or system cards or things, artifacts that would be generated and maybe they're mandatory, maybe they're voluntary and in either case, should they be standardized or how do we make this work most efficiently? So that at the point at which whatever that point is, that a third party or a first party, the company itself is going to make representations about what this system is and how trustworthy it is. Do we have the tools, and we're sort of describing this as ecosystem facilitation, are there the personnel, the resources, the data access, et cetera to make that work?

Justin Hendrix:

There are a lot of interesting sub-questions in here, everything from the timing of audits and assessments on through to the different factors that should inform how those assessments operate. And an interesting question about the degree to which maybe government disclosures or assessments are different from private sector ones.

Ellen Goodman:

So I mean I think what we've seen, two examples we've seen so far, one is draft and one's enacted. The draft one is the Washington, DC kind of algorithmic audit requirement. Again, it's just a draft, but a requirement that AI systems be audited to the extent that they affect important life decisions. So sort of high risk ones. And my understanding is that that audit would be disclosed only to the regulator and then the regulator would do what the regulator would do, as opposed I think to the enacted New York City law 144, which is for the hiring algorithms where an audit would be made public. So yeah, you could imagine very different requirements. If it's just going to be for the regulator, then that obviously requires regulatory capacity that we need to think about and make sure that ... I think it's true for both of those laws actually, that regulatory capacity is a really tricky aspect of all of this.

Justin Hendrix:

IP is another one of those hot button issues right now when it comes to artificial intelligence, what are you looking for here?

Ellen Goodman:

I mean we're really sort of looking for both industry and civil society as well as academics, to help us understand what the landscape looks like and what the sensitivities are. I mean, obviously there are a lot of sensitivities around trade secrets and this is not new. We saw this all the way through the social media battles over access and transparency and accountability. And so we expect to receive a lot of comments that transparency, access to data can only go so far because these are critical trade secret and intellectual property rights that need to be protected. And we also expect to get comments hopefully about where there are mechanisms to sort of deal with that, how you can maybe create synthetic data or kind of workarounds. And we've all heard about these audits that have NDAs attached to them and so they're less useful because we can't sort of peek under the hood of the audit. So anyway, I guess what we're hoping for is just more information so we can begin to make recommendations about that issue.

Justin Hendrix:

There's a specific question in here about potentially what types of activities government should fund to advance a strong AI accountability system. Are there proposals out there that you've seen already that sort of fit the bill of this question or can you say anything more about what types of proposals you're looking for here?

Ellen Goodman:

Yeah, I mean I can say that one thing we've heard a lot of, especially from academics is that, and I'm talking about computer scientists and data scientist academics, is that, let's just say we're focused on the work of audits, assessments and certifications. There's just not the personnel to do that at the scale at which it's imagined if every jurisdiction, let's say there's an EUIA act everywhere. And so there's sort of like conformity assessments and audits. And so both pre-market and post-market surveillance as it's called. There aren't the people to do that. That kind of work right now is not that incentivized in academic programs.

And so one could imagine, and I'm sure there already are NSF grants, but one could imagine a much more robust intervention by the federal government. Also in the form of prizes to make that work, be more highly rewarded in academia. Another idea, and of course, Rumman Chowdhury did this at Twitter, the fairness bounty. So you could imagine kind of prizes and bounties that were connected to some sort of accountability regime. I mean, I very much doubt that those are a replacement for other ... I mean, those are soft law approaches. I don't know that they're a replacement either for self-regulation or for regulation, but they're definitely sort of an ancillary tool that are used in other areas that we should be thinking about.

Justin Hendrix:

I mean, there are 34 discrete questions with a lot of sub-questions in this request. To what extent are looking at the EU AI Act and some of the sort of machinations that they've had to go through in that process to date, to inform this inquiry or some of the other thinking you're doing?

Ellen Goodman:

I think it's definitely relevant. So we ask about lessons learned. I think the EU AI Act takes this horizontal approach. It's really kind of a product safety approach to AI, which is very different I think, from what we can expect in the US, which tends to be a much more vertical, sectoral approach. So I don't think, as we say in law, it's not presidential, but it's informative. But to me, one of the most interesting questions in the EU AI Act, which is relevant for all AI governance, is that there's a kind of model of standard setting and then you build to the standard or you audit to the standard.

But because it's now well recognized and you see this all over the NIST AI risk management framework, that these are socio-technical systems that for a lot of our normative goals and or trustworthy AI components, it's hard to imagine that they could be reduced to a standard and that even if they could be, even if one of the standard setting organizations could create a standard for how accurate a response to a prompt in a generative model should be, like what's an acceptable degree of accuracy? Even if they could set that standard, those standards setting bodies are not hugely democratic or democratically accountable.

And so there's a problem there. There's a mismatch between, and this is being recognized in Europe. I'm not saying that they don't realize it's an issue, but we ask some questions about the alignment with democratic principles of some of the AI governance ideas and also obviously AI systems themselves. So that's one feature of the EU regulatory regime I'm particularly interested in.

Justin Hendrix:

One of the things that the EU AI Act is doing is trying to make distinctions, of course, between high risk and lower risk categories of AI. I assume even the answer to some of the questions you just raised would, it just depends on the application really. In some cases we're probably fine with language models spitting out sub 50% kind of responses in terms of being accurate if it's in a creative field or something along those lines. But if it's informing some kind of crucial information system, then 99% or better is going to be necessary.

Ellen Goodman:

Yeah, absolutely. And so I guess I could say two things about that. One is I think that's in some ways the virtue of a vertical use case specific approach to AI governance.

On the other hand, it also kind of raises ... One of the reasons why this question of when do you assess the system, whether it's an audit or a certification, because if we're talking about foundational models, we don't really know the use case and the use case changes. And I mean it's true that they have, I think OpenAI has in its terms of service, don't use this for anything that's too high stakes. Don't use it in high risk applications, but we know that it will be. And so I think that's absolutely true, and that's the right way to think about it is in a risk-based way, but it's not quite clear how that maps on to the lifecycle or stage of development of an AI system.

Justin Hendrix:

One thing you do give a nod to is the importance of open source potentially in the AI ecosystem. I just want to ask you a little bit about that. Are there kind of complexities to thinking about open source models or open source implementations of AI systems that you think are going to be difficult for regulators to grapple with?

Ellen Goodman:

Well, I think if we're focused on AI accountability policies, you can see pros and cons. So open source really helps out with the opacity, the data access, the getting under the hood.

On the other hand, they then are much more adaptive and agile and changing. And so when you're talking about accountability, it puts a lot of stress on the question of who. Who is accountable? And when it's open source and it's being modified? I mean, I guess the same thing is true even if it's not open source, but it's an adaptive as opposed to a locked model. In some ways, and we'll be very interested in hearing responses on this because I think someplace we ask one of those 34 questions, what is the respective role of, I think we say courts and legislatures and regulators and in industry standard setting or other industry bodies?

And one thing one might say is that question we were just discussing about who is accountable, the courts will decide that. And that's just a liability question. We've always had complicated contributions to liability. And that might be part of the answer, but I suspect as we've seen in other areas of tech, I mean first of all, there's the whole question of whether or not Section 230 applies here, especially when you're talking about ... I know you had a whole fascinating series on that. So we don't know exactly how liability is going to work in this context, but some of our questions are designed to provoke a discussion of that.

Justin Hendrix:

I think it's fair to say there's a sort of sense in the kind of technology community that the technology's moving very, very quickly, that there are profound risks to a lot of different aspects of society, to the economy, to democracy more generally. And so far the federal government, we've got a blueprint for a Bill of Rights, we've got risk management framework will have this advisory result. Do you think that policymakers can catch up or that we can move quickly enough? I know that you are obviously pushing ahead as fast as you can in your role, but are we going to be stuck in this slight mismatch between the pace of technological change and the machinations of government for a while?

Ellen Goodman:

I mean undoubtedly, there's a pacing problem and there's been a pacing problem with law since we entered the digital age. And I don't think we've solved it. Whether we can solve it, not without, I don't think, without a huge amount of innovation in our policy, in our politics. So I think we definitely have a pacing problem. And I think on the one hand there's a trope in the press, which I don't think is fair, which is that Congress doesn't get it and the regulators don't get it and they're woefully sort of tech ignorant. And that might have been true at one point, but they're actually dozens of proposed pieces of legislation that weren't developed for generative models, but they apply, which is the beauty of law is that you don't have to have in mind exactly the tech that's coming down the road. Now, none of those passed and it's over-determined that none of those passed. But I don't think law is helpless. I don't think the pacing problem needs to be as bad as it has been. But I also think invariably there is a pacing problem.

Justin Hendrix:

Well, comments are open until June 12th, 2023. So if folks want to weigh in on this one, they have the opportunity to do so. The listings at the Federal Register, I'll include it in the show notes here so that folks can find it. Ellen Goodman, can we have you back to tell us what you come up with after this process is complete?

Ellen Goodman:

Yeah, I'd love it. And Justin, thank you so much for all that you do.

- - -

Justin Hendrix:

Scholars researching human computer interaction have been studying chatbots for decades. My next guest draws on the work of Byron Reeves, Clifford Nass and Sherry Turkle to explore the risks posed by the conversational style today's systems such as ChatGBT, which attempt to emulate people as closely as possible.

Michal Luria:

Hi, I am Michal Luria. I'm a researcher at the Center for Democracy and Technology.

Justin Hendrix:

Michal, tell me a little bit about your research and how you got interested in chatbots.

Michal Luria:

Sure. So I started working on social agents from the perspective of robots and trying to understand in a research lab that I was part of in Israel, to try to understand what are the empathy impacts that physical objects that resemble human characteristics have, and how that can impact people's behaviors, or feelings, or mostly how people feel empathy towards a thing that is basically a technology that also exists in physical space. So that's how kind of where I began with. But over the years I went into kind of the more broad spectrum of social agents, which also includes conversational agents, chatbots, and more robots. But all of these have in common, the socialness that they portray in the way that they interact with people. So it's technology with this extra layer of socialness that really makes things different. And I started from being very curious about how that could be interesting and helpful and support all kinds of causes or goals.

Later, I found a more skeptical perspective because it has these powers over people and how they interact with technology and that could be used in all kinds of harmful ways. So I think we need to be more cautious when introducing social technology.

Justin Hendrix:

So there are plenty of headlines about this stuff right now. You point to the much discussed Kevin Roose column, his conversation with Bing's chatbot. Of course, lots of folks are interacting with ChatGPT in different ways, exploring how to engage with it to try to get the most out of it, both for work and for entertainment. What are the risks of these interfaces as you see them and why is this moment with ChatGPT different?

Michal Luria:

I think the Kevin Roose example is a good one because Kevin Roose is an extremely informed individual who knows so much about the technology, he's a tech journalist. So really if anyone should know to not be overwhelmed by this kind of interaction, it's someone with that role, but that really exemplifies well why it doesn't really matter the knowledge you have. The way that social technologies work or technologies that have the social element to them work, is that they impact our brain in a way that we can't really control even if we know more and we understand that this is technology. I think this can be more risky in the context of more vulnerable groups, teenagers who are more susceptible to being harmed by technology and just the way that technology can have that social impact on us, can be dangerous because it could really go in many different routes.

It's not only the content itself that the technology can produce, which can be harmful in many ways, and I think has been discussed in many different venues. But it's also just the fact that it's social, can have some mental impacts on people and kind of create these conversations that can be really disturbing. And I think the reason why it's different with ChatGPT than other previous technologies is in two ways. First, I think ChatGPT is very, very impressive in how the language seems natural, it seems reliable. A lot of times people kind of interpret chatbots as dumb and not sophisticated and having a bunch of grammar issues and making mistakes and so it's easier to dismiss them.

But when a technology speaks so well and understands the flow of conversation from one topic to another, it really creates a sense of nuance, of understanding the conversation and the full context of it. And it creates the sense of an entity that knows what's happening, which is not the case, but that's the sense that it creates in people. And that's kind of the way that we naturally perceive the way the conversation goes. And that's why I think it's different than what we've had so far.

So with let's say, Alexa, conversations are very single instant. So every conversation begins and ends with one correspondence, with one person and this one entity. But here there's a more continuous interaction. People go back and forth and describe what they didn't like or the conversation builds up one sentence after another. And so that really makes a huge difference, I think, in the way that people understand the interaction and understand what's happening on the other side of the conversation.

Justin Hendrix:

What you're arguing for here are boundaries, essentially some sort of sense of the clear role that the AI is meant to play in a particular conversation. But I mean, I'm struck by when I interact with these things, this sort of almost sort of declarative, or in some cases, sort of stentorian tone that the thing often takes that makes it perhaps seem more confident in its output than perhaps it should be.

Michal Luria:

Yes, in the Wired article, I argue for creating more boundaries and social roles for these kinds of agents. And the reason why I think that could be a good way to design AI agents is that it's trying to emulate human conversation. But the thing that humans have that this kind of agent doesn't, is we always have a role, some kind of social role. It's not always fully defined. Let's say this conversation that we're having has a professional tone to it, but maybe more friendly. We have these kind of ideas of what kind of conversation we're having, who are the individuals who are participating, what is the expected content more or less. And if one of us would say something extremely outside of that scope, that would be alarming and weird and it would just stop the conversation most probably. With ChatGPT, you can kind of do whatever you want and you can say whatever you want and you can take it into a more informational conversation. Or you could ask it personal questions about how it perceives the world, like the conversation with Kevin Roose.

And so I think not having these boundaries is problematic, both from the designer side and from the user side. From the designer side, it's problematic because it's very difficult to make sure your technology is responsible when you don't have any boundaries in what the interaction would be. Now they do have some boundaries. They kind of put some boundaries around things like giving people tips on how to harm themselves or how to harm others, things like that. But we've seen in previous articles that there are walkarounds for many of those. So these red lines kind of work but not fully, but also they still don't really encapsulate what the interaction should be for a particular conversation. And so that's also problematic for the user because the user doesn't really come with a set of expectations. And we know from prior work, prior research that expectations really help a technology be effective and the interaction to be valuable for the user.

And so by having agents that have a more specific role or a predefined goal which they try to fulfill, could help users come with adjusted expectations and to know what to anticipate from the conversation that they're going to have. And this can vary, vary depending on what the implementation is or what the desired outcome is for a particular instance of ChatGPT, but having this kind of open-ended potentially could do it all AI is difficult to do right off the bat. I think that because this is a new technology, we need to start slow, maybe try to define boundaries and social boundaries more clearly and then see how this develops and how this technology moves along. And that could help people criticize it and try to understand what are the societal impacts in a more manageable way than the way it is now.

Justin Hendrix:

What you've just said sounds very reasonable to me, but it's totally counter to what OpenAI and other Silicon Valley firms are trying to do. They don't want to have particular agents that are bounded for particular use cases with particular rules around those interactions. They're trying to create generalized platforms that can be used by anybody in any circumstance. And it's incredibly difficult to predict exactly what a person might like to say to one of these systems and what types of response are best going to satisfy their prompt or their query. So I don't know, how do you square your thinking with the underlying modus operandi of Silicon Valley?

Michal Luria:

I think Silicon Valley has a lot of visions that are maybe problematic in other people's point of view. So this vision of a can do it all entity, an AI agent that can do it all for you is okay, but even when you say, "Oh, an AI that can do it all" usually people have a specific role in mind, I think. So let's take an example of Iron Man's robot, Jarvis. That's a vision that Silicon Valley people like, I'm guessing. So that robot still has somewhat of a defined role. It has some boundaries in what it does or doesn't do. It's a helpful assistant, it helps with physical stuff because it's a robot, it's super attentive, it has these personality characteristics. And so at the end, it is still within a specific social role and it's okay for OpenAI to want people to innovate with an AI in all kinds of ways, but still they can provide more ways of doing it in a structured format.

So one thing that I talk about in the article is that OpenAI introduced the system input in ChatGPT-4, where you can add some high level descriptions of what the interactions should be. There was a workaround that people did with ChatGPT-3, let's say they wanted a recommendation for where to travel. And you couldn't ask that directly, but you could say, "Pretend that you're a travel agent in a play. What would you recommend for a vacation for a family of four?" I think that in ChatGPT-4, the system space is opposed to allow a designated area where people could put that kind of input.

And I think that it's a start for trying to give some high level guidelines of what this agent should be doing or what is the social role that it's going to fulfill in this next conversation. And I think more can be done in that direction to really make it more constrained and more gradual in the way that people use this technology.

Justin Hendrix:

So if folks at OpenAI, perhaps another one of the companies that's working on generative AI systems, were to read your Wired column and decide, "Michal is going to be our head of interaction design. We're going to give her a call and bring her in and put her in charge." What do you think would be the first thing you'd do?

Michal Luria:

Research. It's not a solved question. I think there is a need which is providing more structure to these interactions. And I think there is more to be done to understand exactly what that structure should be and how it should be rolled out. And if OpenAI are going in a direction of selling their technology to other companies and other companies would be able to implement it in all kinds of ways that seem right for that particular company, then I think it would make sense to introduce some way of defining boundaries or defining a social role. And I'll give one example, which I think is a good example for this, which is the BloombergGPT where they tried to create an AI that would be some kind of financial advisor. And so it was trained only on financial data.

And so I think this is a really good example because Bloomberg had a goal. They want an AI agent that can give financial advice and accordingly, they designed the system, they designed the input data, they probably designed the way that it converses and the way that it interacts with people. And so once you have that idea, you can use the technology in a very intentional way. And I think it's much safer to do it that way than to say, "Oh, here's an AI that can do anything. Just throw things at it and let's see what happens." Because we know that hasn't worked well in the past with other technologies that were really groundbreaking. And maybe there's space to open it up later on and to provide more kind of flexibility in what can be done and what is the social role. But I think because we don't know what the impacts are going to be, it's better to start slow and to define the different AIs one step at a time.

Justin Hendrix:

Well, let's hope you get that phone call.

Michal Luria:

I'm not going to be sitting next to the phone waiting, just so you know.

Justin Hendrix:

Well, we'll be waiting for your next column in Wired. Thank you so much for speaking to me.

Michal Luria:

Thank you Justin. It was great being here.

Authors

Justin Hendrix

Justin Hendrix is CEO and Editor of Tech Policy Press, a nonprofit media venture concerned with the intersection of technology and democracy. Previously, he was Executive Director of NYC Media Lab. He spent over a decade at The Economist in roles including Vice President of Business Development & In...

AI Accountability and the Risks of Social Interfaces

Our Content delivered to your inbox.

Thank you!

Authors

Topics