Skip to content

Human-in-the-Loop Systems Are No Panacea for AI Accountability

Recent research and reporting by journalists shows that automated decisions can lead to discrimination, pre-encoded bias, and reinscription of the past. Automated decision systems nudge police towards over-policing certain areas based on past biased data. Automated bail hearings replace considered judgements with computerized systems. Automated lending systems may violate equal opportunity legislation

What is it about automated decisions that leads to this wide variety of issues? For some, the problem lies in that automated decisions lack scrutability. Scrutiny requires explainability and accountability. A party must speak on behalf of the algorithm and faithfully explain how the algorithm made a decision. Or, the algorithm must be transparent and explainable to speak for itself. Essentially, an algorithm must be communicative with its questioners and engage in the kinds of justification humans do with each other naturally.  

In response to the rapid proliferation of automated decision systems across healthcare, corporations, and even government agencies, politicians ask how to achieve such transparency. Some solutions include auditing and automated safeguards. Others are trying to program different ethical thinking strategies into machines. Privacy regulation and some mechanisms in antitrust law are another route towards protecting data and thereby protecting consumers from automated decisions.

Lawmakers are introducing solutions. In the US, the proposed Algorithmic Justice and Online Transparency Act would introduce certain transparency and accountability mechanisms for automated systems, specifically targeting discriminatory algorithmic practices. Across the pond, the European Commission’s GDPR Article 22 grants citizens the “right not to be subject to a decision based solely on automated processing” when the decision may have a significant effect on a user. 

There is a catch, however, to the European law that is important to explore, as it reveals a misconception about how human judgment is paired with automated systems. While there is no guidance on what constitutes a significant effect or how to judge the quality of ex post human review, the law effectively mandates what is called “human-in-the-loop” automation by giving a user “the right to obtain human intervention” in a decision taken by an automated system. The human-in-the-loop concept refers to inserting a human in between the machine and the outcome of its function in the world. Some say this hybrid decision-making model attempts to maintain human agency and accountability compared to the alternative automated-only systems. Humans are supposed to act as overseers of machines. But does it really work that way?

For example, content moderators for social networks are humans-in-the-loop. Social networks have complex AI systems that screen content for possible violations of their code of conduct. When the AI suspects a post, the AI may pass it to a human for review. Notice two machine decisions trump the content moderators’ control: which posts the AI system finds suspect of violation, and which of those are immediately removed or sent to content moderators. Content moderators must evaluate the post presented to them and check a series of boxes corresponding to removal policies. Content moderators are subjected to violent and extreme media, pressured into working faster to keep up with the mountain of content, and are systematically silenced for speaking out. While it is hard to tell what the internet might be like without the extraordinary work of content moderators, does their human review of problems surfaced by AI really solve the problems of automated decisions? No, because it misses the issue of the scale in an accelerating digital world, and the lack of power that human reviewers have in practice.

First, human-in-the-loop review cannot keep up with its automated counterpart, and cannot make up for the volume of problems automation fails to identify. While computerized systems make decisions at increasingly large scales, human review will forever be at a human scale. It’s a futile endeavor. If enough workers are around to faithfully review decisions insofar as they could justify those decisions as their own, then there would be no need for automation. The reviewers could make those decisions to begin with. 

Human-in-the-loop review is a tool for reviewing the work of a machine, but it is by no means a framework for addressing the social and political impact automated decision making poses.

The more likely case is that automated decision-making’s true ambition is to reduce human labor and the skill level needed for decision oversight. Human-in-the-loop is a counterbalancing act between the aims of automation and authentic human reasoning. Given the history of tech companies choosing profits over other concerns, tech companies will push the scale significantly towards automation, reducing the oversight power of human reviewers to a minimum whenever possible.

Second, human-in-the loop review shifts the burden from the tech companies to the decentralized reviewers with little power. In order to avoid the consumer protections outlined in GDPR, companies merely need to institute human review departments–rather than seriously consider the use of a technology and its risks. This creates a compliance culture where harms are mitigated, but never halted.

The Facebook papers highlight that high-quality research into the effects of polarization and extremism alone is not enough to demand reform– system review by itself does not produce structural change. Research which shows that certain algorithms propagate extremism went to Mark Zuckerberg’s desk. He ultimately decided not to make adequate changes needed to decrease polarization or extremism on the platform. If we were to focus on the individual researchers at Facebook, and demand they produce more and better research– demand more humans reviewing the system and monitor it– the structural changes would still never happen. The mere instantiation of a research or review office alone is not the road to the kind of social network citizens wish for.

Human-in-the-loop reviewers are no different than the Facebook researchers. They are stuck in a structure pre-determined by the tech firms that employ them. While research scientists are afforded flexibility in their papers, human reviewers are provided with strict instructions on how to code and tag decisions to fit a policy. Humans-in-the-loop are expected to take on the responsibility and authority for decisions, but they are not given any power. Those with power–the executives and technologists operating the automated system in the world and designing the human review process–ought to bear the burden of accountability. 

Of course, the human-in-the-loop loophole created by GDPR does not originate from bad intentions–after all, the EU regulation is a strong attempt to safeguard and protect citizens. But the idea fails to achieve its ambition to preserve something uniquely human in decision-making processes. Ultimately, the structures and the barons overseeing them need to be held accountable. 

The human-in-the-loop idea and its supporters are chasing after a deeper qualm with an increasingly automated world. People want there to be something human in the coldness of computerized systems. The current idea of human-in-the-loop focuses only on making automated decisions better. But, what is unique about humans is not calculative reasoning. Computers are likely much better at this, and generally better at epistemic choices. But then, would it make sense to base the rightness or wrongness of automated decisions on the lottery of which human reviewer one gets? We must strive to make automated decisions epistemically right, but the notion of what is right must come from citizens– not from Big Tech or a lowly human reviewer. 

The near-miss of human-in-the-loop review stems from an essential human activity: communication. Arguing, convincing, and discussing are fundamental to decision making. It happens around any family’s dinner table up to heads of state on diplomatic phone calls. We all want humans in the decision-making process because we want to understand, relate to, and have decisions that bear authority and are justifiable to one another on a communicative level.

Until automated decisions can deliberate with us on our terms, automated decisions have no place in high-risk environments. Human-in-the-loop review is a tool for reviewing the work of a machine, but it is by no means a framework for addressing the social and political impact automated decision making poses. Human-in-the-loop review does not open up algorithms to the public sphere– where the buck falls to citizens to decide on the values, norms, and world we want.  

In the meantime, we must demand transparent automated decision systems. Automated decision-making can make reasoning explicit and increase transparency for citizens to understand and engage in how and why decisions are made. By making systems public and open to debate and deliberation about how they operate, public scrutiny and institutional powers can protect citizens from immoral, unjust, and illegal decisions. That responsibility should not fall to the disempowered human reviewer, who is rarely ‘in the loop’ on the decisions that count.