The Denominator Problem in AI Governance
Michael A. Santoro / Apr 24, 2026
Alexa Steinbrück / Better Images of AI / Explainable AI / CC-BY 4.0
With the European Union's AI incident reporting mandate taking effect in August and US states from New York to Colorado enacting their own AI reporting requirements this year, a basic measurement problem threatens to undermine all of these efforts before they begin. The OECD AI Incidents and Hazards Monitor catalogs thousands of documented harms across domains as varied as autonomous vehicles, deepfakes, algorithmic discrimination, and health chatbots.
The AI Incident Database, MIT FutureTech, and Arcadia Impact are collaborating on a project that attempts to move beyond raw incident counting toward something more analytically useful: classifying AI incident types into phases—from rare occurrence through rapid expansion to endemic patterns—by estimating factors such as the rate of harm and the scale of deployment. I became aware of what I call the denominator problem through my advisory work on this project, and I believe it names the single most important unsolved measurement challenge in AI governance today.
The concept is elementary. A numerator is the count of observed harms—incidents, adverse events, failures. A denominator is the total number of opportunities for those harms to occur. A rate is the numerator divided by the denominator. Without a denominator, a numerator is uninterpretable. If reported AI harms double in a year, that could mean systems are failing more often, that reporting has improved, that detection is better, or simply that deployment has doubled. Each scenario has radically different policy implications, and without a denominator, they are indistinguishable in the data.
Where the denominator works – and where it breaks down
Consider autonomous vehicles, the one AI domain where the denominator problem largely does not exist. The numerator is straightforward: crashes, property damage, injuries, fatalities. The denominator is equally accessible: miles driven, number of vehicles in operation, hours of autonomous engagement. Mandatory reporting regimes—including those required by the National Highway Traffic Safety Administration, vehicle registration databases, and telematics data (e,g. GPS and breaking patterns) make both sides of the equation observable. Regulators can calculate meaningful rates—crashes per million miles driven—and compare them across companies, geographies, and time periods. This is what functional safety measurement looks like.
In nearly every other AI domain, the denominator is elusive. Consider deepfakes.. They may be spreading more widely across the internet and social media, but what is the denominator? The total number of synthetic media files generated? The number of people exposed to them? The number of contexts in which a deepfake could cause harm—financial fraud, nonconsensual imagery, election interference? There is no registry of generative AI use, no mandatory reporting of synthetic content production, and no agreed-upon unit of exposure. A rise in deepfake incidents could mean the threat is escalating, that detection tools are improving, or that media attention is driving more reports. Without a denominator, the data cannot distinguish between these scenarios.
AI-driven hiring presents a similar problem–with an inverted twist. According to a Harvard Business School report, an estimated 99% of Fortune 500 companies now use applicant tracking systems, and more than 90% of employers using such systems rely on them for first-cut decisions. Documented cases of algorithmic discrimination in recruitment are accumulating, most prominently in litigation against platforms like Workday. Yet knowing that nearly every large employer deploys these tools does not answer the denominator question. What we need is not a count of adopting companies but a count of the individual hiring decisions AI actually shapes — and that number is unknown. Employers are not required to disclose when or how AI participates in candidate screening; AI hiring vendors do not report volume; and no regulatory body tracks automated hiring decisions at scale. Ubiquity is not the same as visibility.
The hardest case: AI in healthcare
The healthcare domain is where the denominator problem is most consequential because the stakes are highest. Although the institutional infrastructure for measurement is most developed, it still falls short. AI systems are already influencing diagnosis, triage, and treatment recommendations across health systems. Hospitals track adverse events. Governance frameworks require documentation of individual clinical decisions. But current practice stops at counting events. No major regulatory body has yet to establish a methodology for converting those counts into rates—adverse outcomes per AI-assisted clinical interaction, for instance.
What, exactly, should the denominator be? There are several plausible candidates, and each draws the boundary in a different place.
At the most granular level, the denominator could be the model inference — every instance in which an AI system generates an output, whether a diagnostic prediction, a risk score, or a treatment flag, regardless of whether a clinician ever sees or acts on it. This would capture the full volume of algorithmic activity, but a single hospital stay can trigger hundreds of such outputs, most of which operate invisibly in the background. Tracing which specific inference by an AI model might have contributed to a given patient outcome requires a level of documentation that current health IT systems do not support.
A step up in aggregation is the AI-assisted clinical decision — every documented instance in which an AI system generates a recommendation and a clinician responds. This ties the denominator to a moment where AI actually shapes care, and in principle, it is measurable: if electronic health records consistently flagged AI-informed diagnoses and treatment recommendations, hospitals could count them directly. Today, most EHRs do not, and these decisions happen routinely without being captured as a distinct, countable category — but that gap is, in theory, closeable by better logging. Once closed, the denominator problem for this candidate is solved. A separate problem remains, and it is worth naming as such: a rate of adverse outcomes per AI-assisted decision, even if cleanly measured, cannot by itself tell us whether AI caused those outcomes. That would require controlled comparison against unassisted care — a question of causal inference, not of counting.
Coarsest of all is the patient encounter involving AI—every clinical encounter in which an AI system is active in any capacity, whether or not the clinician is aware of it. This is the easiest to approximate from existing data, but it is also the bluntest: it would count encounters in which AI played a trivial role the same as those where it drove the central clinical judgment.
Each of these denominators would yield a different rate for the same set of harms, and each implies a different theory of where responsibility lies.
Measurement, bias and the stakes ahead
The healthcare domain is the hardest case for the denominator problem — hardest not only because the stakes are highest, with AI-influenced decisions determining whether patients live, recover, or are harmed, but also because the structure is the most complex: AI touches care at multiple levels simultaneously, and no single denominator captures all of them. Although the institutional infrastructure for measurement is the most developed in any AI domain, it still falls short.
Current federal healthcare AI rules — including Health and Human Services's (HHS) Section 1557 final rule (45 C.F.R. § 92.210), HTI-1 predictive decision support criteria (45 C.F.R. § 170.315(b)(11)), and the Food and Drug Administration's Good Machine Learning Practice principles for medical devices — are disclosure regimes. They require covered actors to describe training data and bias management; they do not mandate that one dataset or representational method be used over another. Even that disclosure architecture is now the subject of a political fight: HHS's pending HTI-5 proposed rule would roll back core HTI-1 elements, including the AI "model card" disclosures the Biden-era Office of the National Coordinator for Health IT had required.
Whichever way that debate is resolved, a denominator stratified by race, gender, insurance status, and clinical pathway becomes more essential, not less — because disclosure without rate-based measurement cannot be audited for fairness, and a weakened disclosure regime leaves even less to audit
In healthcare, defining a denominator without requiring stratification by race, gender, insurance status, and clinical pathway would constitute algorithmic bias in a new bottle—not the overt kind that early AI fairness work made legible, but the structural kind embedded in what gets counted and how. This stratification issue is not confined to healthcare. For example, AI-enabled recruitment may be more efficient and surface higher-quality candidates, but research has found that this benefit is accompanied by persistent disparities by race and gender that aggregate performance metrics can mask.
The urgency of the denominator question is increasing. The European Union AI Act’s incident reporting provisions take effect in August 2026. The Organisation for Economic Co-operation and Development is developing a common reporting framework for AI incidents. In the United States, the Department of Health and Human Services is shaping AI reporting requirements for medical devices through the Food and Drug Administration. None of these efforts has yet solved the denominator problem.
Insurers and corporations are converging on the same question from multiple dimensions. The Doctors Company, one of the largest physician-owned malpractice insurers in the United States, has flagged AI-related liability as an emerging frontier of risk in the medical malpractice field. Other major insurance carriers, including AIG, Great American, and WR Berkley, have requested regulatory approval to limit their exposure to AI-related claims. A Harvard Law School Forum analysis documents a wave of AI-specific Director & Officer liability insurance exclusions and disclosure-related securities claims now exposing directors and officers to personal liability–pressures that will ineluctably reward governance frameworks able to demonstrate measurable safety outcomes and penalize those who cannot. Whether the governance community is ready or not, these pressures will demand a resolution of the denominator problem. In healthcare, as in other high-stakes domains, this will require a shift from 'we followed protocols' to proving safety rates.
The denominator problem is not a technical detail. It is the foundation on which every subsequent question about AI safety, accountability, and equity depends. Across domains—autonomous vehicles, deepfakes, hiring, healthcare—the pattern is the same: we are counting harms without measuring the opportunities for harm, and attempting to draw conclusions from data that cannot support them. The organizations defining measurement standards today are making choices that will shape who is held responsible for AI failures and who bears the cost when those failures go unaddressed. The question is whether the denominator will be defined in a way that serves accountability and algorithmic fairness.
Authors
