The Productivity Illusion Nobody Is Measuring

The Paranoidist | Issue #3 By Paul Morin | February 21, 2026

Here are two numbers your board has probably seen in the last quarter. And here is what nobody told them those numbers actually mean.

Number one: AI-augmented productivity is up. Way up. The most rigorous study to date, published in the Quarterly Journal of Economics in 2025 by Brynjolfsson, Li, and Raymond, tracked the staggered rollout of an AI assistant across 5,172 customer support agents at a Fortune 500 software firm and found a 14-15% increase in issues resolved per hour, with gains of 30-35% among less experienced workers. A separate set of randomized controlled trials across Microsoft, Accenture, and a Fortune 100 company found a 26% increase in completed pull requests among developers using AI coding tools. The OECD's 2025 review of experimental studies documented average productivity gains of 5-25% across customer support, software development, and consulting. The Penn Wharton Budget Model projects average labor cost savings growing from 25% to 40% over coming decades. The dashboards are green. The quarterly earnings calls are enthusiastic. The headcount projections are coming down.

Number two: The neuroscience of time pressure and cognitive load is unambiguous, and it says the opposite of what those dashboards suggest. When humans operate under compressed decision timelines and sustained cognitive demand, working memory capacity degrades measurably. A 2023 systematic review in Psychoneuroendocrinology by Geissler and colleagues at the University of Trier analyzed decades of laboratory studies on acute stress and working memory, documenting how stress disrupts prefrontal cortex function through noradrenergic and cortisol-mediated pathways. Eye-tracking studies published in Frontiers in Psychology confirm that under time pressure, decision-makers shift from systematic, analytical information processing to heuristic scanning strategies, relying on pattern-matching shortcuts that bypass deliberate evaluation. The degradation is causal, not correlational. Time pressure doesn't just accompany worse decisions. It produces them, through mechanisms that neuroscience can now trace to specific neural pathways.

Now put those two numbers together.

Issue 1 examined why institutional capacity can't keep pace with AI deployment. This is the mechanism by which that failure plays out inside your organization.

We are celebrating productivity gains that are simultaneously hollowing out the last line of defense against AI failure. The entire safety architecture for AI in consequential systems, from healthcare to financial services to cybersecurity to legal adjudication, rests on a key assumption: that a competent human will review the output, catch the errors, and exercise judgment on the edge cases. That assumption was reasonable when the human reviewer had time, context, and a manageable caseload. It is becoming unreasonable at speed, because the same AI systems generating the productivity gains are compressing the conditions under which human oversight occurs into exactly the zone where neuroscience tells us oversight breaks down.

And nobody is measuring it. Not because the data doesn't exist, but because the metrics organizations track (output volume, turnaround time, throughput per employee) are incomplete metrics. They measure production. They don't measure the quality of the human judgment that's supposed to keep production from becoming catastrophe.

The Last Mile Is the Whole Problem

Here is how AI augmentation actually works in practice, as opposed to how it looks on a dashboard.

AI handles the routine. In cybersecurity operations centers, AI-powered SIEM systems monitor network traffic and flag suspicious activity. In financial compliance, AI scans transactions for patterns that might indicate money laundering or fraud. In radiology, AI reads imaging studies and flags anomalies. In emergency departments, AI triages patient data and surfaces potential diagnoses. In each case, the AI does 80% or more of the volume work, faster and cheaper than humans ever could.

The remaining fraction goes to humans. But that fraction is not a random sample. It is often the hardest, most ambiguous, highest-stakes slice of the workload: the cases the AI couldn't resolve, the anomalies that don't fit clean patterns, the edge cases where judgment actually matters. And it arrives at the human reviewer in the same time budget that used to contain the full 100%.

This is the "last mile" paradox, and it is reshaping the cognitive reality of knowledge work in ways that no current risk framework captures.

Consider the specifics. In cybersecurity, the numbers are staggering: according to the 2025 AI SOC Market Landscape report, organizations face an average of 960 security alerts per day, with enterprises of 20,000-plus employees seeing more than 3,000. The Osterman Research Report found that nearly 90% of security operations centers are overwhelmed by backlogs and false positives, while 80% of analysts report feeling consistently behind. The 2025 SANS Detection Engineering Survey found that 64% of SOC teams cited high false positive rates as a common challenge. The human consequence: 70% of SOC analysts with five years or less of experience leave within three years.

In financial compliance, the picture is even more extreme. A PwC Market Abuse Surveillance Survey found that 17 participating banks raised a combined 40 million trade alerts over twelve months. Of those, 99.99% were false positives. Compliance analysts reviewing those alerts are not performing oversight. They are performing a kind of cognitive endurance test in which the signal-to-noise ratio makes genuine judgment almost impossible.

In healthcare, the pattern takes a different but equally concerning form. AI-enabled triage systems in emergency departments have reduced average radiology report turnaround times from 11.2 days to as low as 2.7 days, and AI tools can reduce radiologist workloads by up to 53% according to a 2025 review in Health and Technology. But the workload doesn't disappear. It concentrates. The radiologist no longer systematically reviews a manageable number of studies. Instead, they evaluate AI-flagged anomalies from a much larger pool, seeing only the cases the algorithm couldn't resolve, in compressed timeframes, with no routine work in between to provide cognitive recovery.

In every one of these contexts, the cognitive load per minute of human decision time has intensified dramatically. But the dashboard shows higher throughput. The quarterly report shows fewer employees processing more volume. The CFO sees margin expansion. Nobody is looking at the variable that actually predicts whether the system will hold: the ratio of complex decisions to available human cognitive bandwidth, and the quality assurance conditions under which those decisions are being made.

The aviation industry learned this lesson decades ago. Autopilot handles routine flight, and pilots handle emergencies and edge cases. But aviation didn't simply hand pilots the hardest tasks and call it efficiency. The industry built an entire discipline around measuring cognitive demands on pilots with granular precision: crew resource management, mandatory rest periods, maximum duty hours under FAA Part 117, sterile cockpit rules under FAR 121.542, workload distribution protocols. Aviation understood that the human in the loop is only as reliable as the conditions under which that human operates.

In AI-augmented knowledge work, we have done none of this. We have handed the hardest cognitive tasks to the human and called it a productivity gain.

The Correlation That Is Causation

Let me be precise about why this matters, because the instinct will be to dismiss it as a "soft" concern, a wellness issue rather than a risk issue. It is not.

The relationship between time pressure and cognitive error is causal, not merely correlational. This is among the most robust findings in cognitive science. Time pressure degrades working memory capacity through measurable neurological mechanisms: elevated cortisol impairs prefrontal cortex function, noradrenergic flooding shifts processing from deliberate analytical pathways to rapid heuristic ones, and sustained cognitive load causes what Barrouillet and colleagues demonstrated in the Journal of Experimental Psychology as a direct trade-off between processing demands and storage capacity in working memory. The more time-pressured the processing, the less working memory is available for maintaining the context needed for good judgment.

The practical consequences have been documented across many high-stakes decision environments. In medicine, diagnostic errors in emergency departments are explicitly linked to the "chaotic and high-pressure environment" that "increases the likelihood of these errors, as emergency clinicians must make rapid decisions with limited information, often under cognitive overload," as a 2025 study in Academic Emergency Medicine put it. National trends toward increasing ED patient acuity and comorbidity are intensifying both time pressure and diagnostic complexity simultaneously. According to the same study, ED attending physicians currently devote approximately one minute per patient to electronic health record chart review.

But it is the phenomenon of automation complacency that connects the neuroscience most directly to AI risk. The term was formalized by NASA's Aviation Safety Reporting System as "self-satisfaction that may result in non-vigilance based on an unjustified assumption of satisfactory system state." Parasuraman and colleagues demonstrated in controlled studies that monitoring performance deteriorates after as little as 20 minutes of observing reliable automation. Molloy and Parasuraman found that when automation failures are rare (a single failure in a 30-minute session), detection rates drop significantly if the failure occurs late in the session versus early. The rarer the event that requires human intervention, the worse humans become at detecting it.

Now apply this to AI-augmented workflows where AI handles the routine 80% and humans see only the exception 20%. The human is in precisely the cognitive configuration most susceptible to automation complacency: monitoring a mostly reliable system for rare failures, under time pressure, without the routine task engagement that maintains vigilance. A 2025 study in AI and Ethics warned explicitly that automation complacency in healthcare "risks undermining the role of clinicians" when decision support systems create a false sense of security. The European Data Protection Supervisor, in a September 2025 technical dispatch on human oversight of automated systems, cited the 2018 Uber self-driving fatality as a direct consequence of this dynamic and referenced human factors researcher MC Elish's finding that creating a role "where humans must jump into an emergency situation at the last minute is something humans do not do well."

Most alarming is a finding from radiology research: when an AI system provided incorrect localized explanations in chest X-ray cases, physician diagnostic accuracy dropped from 92.8% to 23.6%. The AI didn't just fail to help. It actively degraded human performance by triggering the automation bias that leads clinicians to defer to machine judgment even when it is wrong.

The dangerous part is that this degradation is invisible. Unlike physical fatigue, where the body sends clear signals, cognitive overload produces no obvious warning. The person experiencing it feels productive. They are producing output. The decisions feel reasonable in the moment. The pattern-matching shortcuts that replace analytical thinking feel like efficiency, not like error. The degradation only becomes visible after the fact, when someone audits the decisions and finds that error rates climbed, that edge cases were miscategorized, that the judgment the system depended on wasn't actually exercised.

The question every board, every CRO, and every CEO should be asking is: how close are our people to that threshold? And the honest answer, in most organizations, is: we have no idea, because we aren't measuring it.

The Headcount Reduction Illusion

Wall Street has a simple story about AI and labor costs: fewer employees, same or better output, margin expansion. This story is driving equity valuations, restructuring decisions, and workforce planning across every sector touched by AI. And for a category of work, it is correct. Routine data entry, initial research and information gathering, report generation, scheduling, and coordination are genuinely being automated, and the headcount reductions in those functions are real.

But for anything mission critical, the story is wrong, and the error has profound implications for how organizations price AI adoption.

For any workflow where the consequences of failure are significant (clinical decisions, financial risk assessment, legal analysis, compliance adjudication, safety-critical engineering), human oversight is not optional. Not yet, and not for the foreseeable future. AI can augment these functions. It cannot replace the human judgment that catches the cases where AI fails, because the failure modes of current AI systems are too unpredictable and too consequential to leave unsupervised.

The Harvard Business School and BCG "Jagged Frontier" study makes this point with uncomfortable precision. Researchers gave over 700 BCG consultants access to AI and found that on tasks within AI's capability frontier, performance improved substantially. But on tasks outside that frontier, consultants using AI actually performed worse than those working without it. They were less likely to catch errors and more likely to defer to AI-generated outputs that were confidently wrong. The frontier is jagged, meaning it is not obvious which tasks fall inside and which fall outside. This is exactly the environment in which human oversight is supposed to function, and exactly the environment in which AI augmentation undermines it.

What this means is that AI doesn't eliminate the need for humans in consequential work. It changes the type of human you need. The entry-level analyst who processed routine work is being displaced. But the senior reviewer who can evaluate AI output, catch subtle errors, exercise judgment on ambiguous cases, and make high-stakes decisions under uncertainty is more necessary than ever, and more expensive. The organizational shape isn't shifting from a pyramid to a leaner pyramid. It's shifting from a pyramid to a diamond: fewer people at the base, but more people in the middle and upper tiers, doing harder work, requiring deeper expertise, and commanding higher compensation.

The CFO who cut 30 junior analysts and reported the savings to the board may discover in 18 months that the organization needs 15 senior risk reviewers at twice the salary, plus the infrastructure and tooling to support them. The "savings" were an accounting illusion created by the time lag between cutting the base of the pyramid and discovering that the middle needs to expand.

This is an unpriced assumption hiding in plain sight. The market is valuing AI adoption as a cost reduction when, for mission-critical functions, it is a cost restructuring. The total cost may go up before it comes down. And the organizations that cut too deep, too fast, harvesting headcount savings before the oversight infrastructure exists, are the ones most exposed to the cognitive overload problem described above. They have fewer humans, handling harder work, at higher speed, with less margin for error. The productivity dashboard says they're winning. The neuroscience says they've built a system that will work perfectly until it doesn't.

The Recursive Solution (And Why It Doesn't Exist Yet)

There is an architectural answer to the cognitive overload problem, and it's worth describing because it illuminates both where organizations need to get and how far most of them are from getting there.

Instead of AI output flowing directly to a human reviewer (which creates the time pressure pathology described above), you insert layers of AI-on-AI oversight. Think of it as a recursive audit architecture. The production AI generates the work. A second AI system audits that output against defined quality and safety criteria. A third audits the audit. Only the exceptions that survive multiple recursive review passes roll up to the human. This does two things simultaneously: it dramatically reduces the volume reaching human reviewers, and it increases the signal quality of what does reach them, so the human is evaluating genuinely ambiguous edge cases rather than drowning in routine flags.

This is not science fiction. The architectural patterns exist. Multi-agent AI systems, recursive review frameworks, and layered validation pipelines are active areas of development. But in practice, almost no organization has implemented them for mission-critical workflows. Most are stuck in what I'll call Phase 1: AI augmentation with direct human oversight, which is where the time pressure pathology lives. Phase 2 (recursive AI audit with human exception review) is the pressure relief architecture that would make the current deployment model sustainable. We are nowhere near Phase 2 in most sectors.

The danger is that organizations are operating in Phase 1 while telling the market they've achieved Phase 2 economics.

But even Phase 2 has problems that need honest acknowledgment.

First, the trust recursion problem. At some point in a recursive AI audit, you're trusting AI to tell you when AI has failed. This works until the failure mode is one the audit layer shares with the production layer, which is a real risk when both are built on similar architectures and trained on similar data. Recursive AI audit is the best available interim architecture. It is not a solution to the fundamental oversight problem. It's a pressure relief valve.

Second, the complacency trap. If the AI audit layer catches 99% of exceptions, the human reviewer sees only the 1% that survived. That's excellent for cognitive load management. But it creates its own cognitive problem: vigilance decrement. Parasuraman's research demonstrated this clearly: high automation reliability increases the chance of complacency, not decreases it. Singh and colleagues found that monitoring performance was far worse under higher levels of static automation reliability than under lower levels. The rarer the failure event the human must catch, the worse they become at catching it. The recursive architecture solves the volume problem but potentially creates a vigilance problem.

Third, and this is where corporate governance enters the picture, the question of who controls the audit layer.

The Governance Architecture That Should Exist

If you've spent time around corporate boards, the answer to "who audits the audit" has a familiar structure. In corporate governance, the internal audit function reports to management. The audit committee of the board exists precisely because you cannot trust management to audit itself without independent oversight. That separation of reporting lines is the foundation of audit integrity. It was codified into federal law by the Sarbanes-Oxley Act of 2002, which mandated that audit committees of listed companies be directly responsible for the appointment, compensation, retention, and oversight of the external auditor. Under SOX Section 301 and SEC Rule 10A-3, every member of the audit committee must be independent of management; members may not receive consulting or other fees beyond board and committee compensation, and may not be affiliated persons of the company or its subsidiaries. Section 407 requires disclosure of whether the audit committee includes at least one "financial expert." The audit committee has the authority to engage independent counsel and advisors, and the company must provide appropriate funding for those engagements. These requirements are further reinforced by NYSE and NASDAQ listing rules. Post-SOX, 100% of public companies have fully independent audit committees, up from just 51% before the law was enacted, and nearly half of all audit committee members now qualify as financial experts.

The reason these provisions matter for this discussion is that they establish a precise governance template: independent oversight of management's self-reporting, with the independence guaranteed by law, the expertise mandated by regulation, and the funding protected from management control. This is exactly the template that AI oversight needs.

The architecture that should exist has two independent layers.

The Management AI Audit Layer. This reports to the CRO or equivalent. It audits production AI workflows recursively, catching the vast majority of routine exceptions. This is the operational layer. It exists to reduce the time pressure on human reviewers by filtering volume and improving signal quality before anything reaches a human decision-maker.

The Board AI Audit Layer. This reports to the audit committee, not to management. It operates on separate infrastructure, ideally built on different model architectures or at minimum different training configurations, to avoid sharing systematic blind spots with the management-level system. Its job is not to duplicate the internal audit. Its job is to audit whether the internal audit is working, and to flag anomalies that the management-level layer might miss or, critically, might be incentivized to downplay. It reports exception findings directly to the board: "The internal audit function is not catching this category of error," or "These failure patterns are trending upward and management's system isn't flagging them."

This is not a novel governance concept. It is the existing dual-audit governance model, mandated by Sarbanes-Oxley and enforced by the SEC, extended into the AI layer. The principle is identical. The implementation is new.

Several things make this harder than it sounds.

The independence requirement is non-trivial. If both the management layer and the board layer run on the same foundation model, they may share systematic blind spots. This is directly analogous to the 2008 financial crisis, where every rating agency used similar risk models, so their "independent" assessments were actually correlated. The Financial Crisis Inquiry Commission documented this failure: the appearance of independence without its substance. The Board AI Audit Layer needs genuine architectural independence: different models, different evaluation criteria, possibly entirely different vendors. That is expensive and operationally complex, which is precisely why most organizations won't build it voluntarily.

The incentive alignment matters. The management-level layer is configured by management, which has an incentive to show that AI systems are performing well. The board-level layer must be configured by the audit committee itself (or its independent advisors), with criteria reflecting the board's fiduciary obligations rather than management's operational goals. This raises a capability question: does your audit committee have the technical sophistication to specify what its AI audit layer should evaluate? Right now, most boards do not. That gap is itself an unpriced risk.

And this creates a new board competency requirement. Just as Sarbanes-Oxley required audit committees to include members with financial expertise, this model implies that audit committees will need AI audit expertise: not building the layer themselves, but understanding what it's doing, what it's checking for, and whether its independence is genuine.

The Missing Metric: Quality Assurance for the Oversight Environment

There is one more gap that deserves attention, because it sits between the audit function and the cognitive load problem and nobody is addressing it.

Current risk frameworks measure two things: whether the AI system's outputs are correct (output quality metrics), and whether errors are caught after the fact (the audit function, whether human or automated). What no framework currently measures are the conditions under which oversight is performed: the environment in which the human reviewer makes their judgment calls. Think of it as the difference between inspecting finished products on a factory floor and monitoring the conditions on the factory floor that predict defect rates: temperature, machine calibration, worker fatigue, throughput pace.

For AI-augmented oversight, this means tracking a continuous quality assurance metric for the oversight environment itself. Not just: did the reviewer catch the error? But: what was the exception-to-reviewer ratio when that decision was made? How many consecutive complex decisions had the reviewer already processed that shift? What was the time gap between flag and required disposition? Was the reviewer in a cognitive state where analytical processing was likely, or had the conditions shifted into the heuristic-shortcut zone?

This is not an audit metric. It's a leading indicator: a measure that predicts oversight quality before failures occur, rather than detecting failures after they happen. Aviation has this. The sterile cockpit rule doesn't exist because pilots made errors and were then punished. It exists because the conditions under which errors become likely were identified and regulated in advance.

No equivalent exists for AI-augmented knowledge work. No regulator requires it. No risk framework includes it. No board asks about it. And that absence is arguably the single most important gap between the AI deployment model that organizations believe they have and the AI deployment model they actually operate.

The Pricing Failure

Connect this back to what The Paranoidist is built to identify: risk that isn't priced.

No enterprise risk model currently includes "cognitive load per unit of human decision time" as a monitored variable. No board receives reporting on the ratio of AI-flagged exceptions to human reviewer capacity. No CRO tracks whether the disposition time window has compressed from days to minutes, even though that compression fundamentally changes the risk profile of the workflow. No quality assurance framework measures the conditions under which oversight decisions are made, as opposed to merely auditing whether those decisions were correct after the fact.

The market is pricing AI adoption on the assumption that "human in the loop" is a binary: either there's a human reviewing, or there isn't. In reality, "human in the loop" exists on a spectrum from "thoughtful expert with adequate time and context making a considered judgment" to "exhausted reviewer speed-processing exceptions at a pace set by machine throughput." Most organizations are sliding toward the latter while reporting the former. The productivity metrics can't tell the difference.

This is the same structural error the financial system made before 2008. The risk models showed healthy fundamentals. The instruments were rated investment grade. The dashboards were green. What the models didn't capture was the qualitative degradation of the judgment layer, the ratings themselves, that the entire system depended on. When that layer failed, everything downstream failed with it.

The human oversight layer in AI-augmented systems is this generation's equivalent of the credit rating. It is the judgment layer that the entire deployment model depends on. It is assumed to be functioning. It is not being measured. And it is degrading under conditions that neuroscience tells us are predictably, causally destructive to exactly the cognitive functions it requires.

What to Do About It

The Paranoidist is about productive paranoia, not paralysis. Here's what I'd actually do if I were sitting in your seat.

If you're a board director: Ask management two questions at your next meeting. First: "What is the ratio of AI-flagged exceptions to human reviewers in our mission-critical workflows, and how has that ratio changed in the last 12 months?" Second: "Are our cost projections for AI integration assuming Phase 2 oversight architecture when we're actually operating in Phase 1?" If they can't answer the first question, your organization is flying blind on the variable most likely to determine whether your AI deployment succeeds or fails catastrophically. If the answer to the second question is yes, your financial projections are built on an assumption that doesn't match operational reality. Beyond that, begin treating AI oversight expertise as a required audit committee competency, the way Sarbanes-Oxley Section 407 made financial expertise a required disclosure for audit committees. The governance template already exists: independent audit committees, with independent funding, independent advisory authority, and mandated expertise. Extend that template to AI oversight. Start building toward an independent board-level AI audit function that reports to the audit committee, not to management. You don't need to build it today. You need to be planning it today.

If you're a CRO or risk leader: Four immediate actions. First, mandate cognitive load audits on any workflow where AI handles the routine and humans handle the exceptions. Measure the volume of exceptions per reviewer per hour, the complexity profile of those exceptions, and whether there's a fatigue curve over the course of a shift. Aviation has been doing this for decades under FAA Part 117 rest requirements and crew resource management protocols; the frameworks exist. Import them. Second, track the disposition time window: the elapsed time between "AI flags an anomaly" and "human must make a decision." If that window has compressed from days to minutes, your risk profile has changed fundamentally, even if your error rate dashboard hasn't moved, because the dashboard is averaging across easy and hard calls, and the hard calls are where catastrophic risk concentrates. Third, build a "productivity illusion" indicator into your risk register. When AI-augmented productivity metrics improve by 40% but staffing levels are flat or declining, that is not evidence of efficiency. It is a leading indicator of cognitive overload in your oversight layer. Treat it the way you'd treat a sudden spike in near-miss reports. Fourth, and this is the gap nobody is addressing: build a quality assurance metric for the oversight environment itself. Not just "did the reviewer catch the error?" but "what were the conditions under which that review was conducted?" Exception-to-reviewer ratios, consecutive-decision counts, disposition time windows, shift-level fatigue indicators. This is the leading indicator that predicts oversight failure before it happens; the audit function detects it afterward. You need both. And start designing the recursive audit architecture for your highest-consequence workflows now. Define what exception criteria should look like, what the audit layers should check for, and what the human reviewer actually needs to see. Don't wait for vendors to sell it to you.

If you're a CEO or founder: Resist the pressure to harvest AI headcount savings before the oversight infrastructure exists to support them. The headcount reduction illusion is that you can cut the base of the pyramid without building the middle. For mission-critical functions, AI changes the shape of your cost structure; it does not simply shrink it. You need fewer entry-level executors and more senior oversight, planning, and strategy roles, and those roles cost more. Budget for the transition honestly: the cost curve may go up before it comes down. Before redeploying the headcount that AI "freed up," run the numbers on what happens when your remaining humans are handling only the hardest 20% of decisions at five times the throughput. You may have accidentally created a system where the humans in the loop are operating in exactly the conditions that neuroscience says produce the worst decision-making: high time pressure, high complexity, no routine tasks to provide cognitive recovery, and a pace set by machine throughput rather than human capacity.

If you're a citizen and a thinker: Demand transparency not just about whether there's a "human in the loop," but about what that human's working conditions look like. A cybersecurity analyst drowning in 3,000 alerts per day is not providing the same quality of oversight as one reviewing 50 carefully prioritized threats. A compliance officer reviewing AI-flagged transactions from a pool where 99.99% are false positives is not exercising genuine judgment; they are performing cognitive endurance. A radiologist speed-reading AI-flagged anomalies from 200 studies is not providing the same diagnostic attention as one systematically reviewing 40. The difference between those scenarios is the difference between genuine oversight and the performance of oversight, and it directly affects you if you're a patient, a borrower, a job applicant, an insurance claimant, or anyone else whose case passes through an AI system with human review. The same advocacy that pushed for algorithmic transparency needs to extend to oversight transparency: not just "is there a human reviewing this?" but "under what conditions?"

The Paranoidist's Assessment

Probability that AI-augmented productivity gains are masking cognitive overload in human oversight layers: Very high. The structural incentives (productivity metrics that measure output but not judgment quality, market pressure to demonstrate AI ROI, competitive pressure to match AI-enabled throughput) all push in the same direction. And no countervailing measurement system exists.

Probability that a major AI failure will be traceable to degraded human oversight caused by time pressure: High. Not because any single failure will be dramatic, but because the pattern will become visible in retrospect across healthcare, financial services, cybersecurity, and legal contexts simultaneously. The Verizon 2024 DBIR already documents the pattern in cybersecurity: in 74% of breaches, alerts were generated but ignored, usually because analysts were overwhelmed by volume.

Probability that organizations will build recursive AI audit architectures before being forced to by a crisis: Low. The architecture is expensive, operationally complex, and doesn't show up as a line item on any current risk framework. Most organizations will wait until the cost of not having it exceeds the cost of building it, which means they'll build it after the failure, not before.

Probability that the headcount reduction illusion will produce material financial restatements or risk events in the next 24 months: Moderate to high. Organizations that cut oversight headcount based on Phase 2 economics while operating Phase 1 infrastructure are carrying an unrecognized liability. The reckoning arrives when the first high-profile AI error occurs in a workflow where the human reviewer was overloaded.

Probability that current AI valuations account for the cost of building adequate human oversight infrastructure: Near zero.

What I'm watching: Cybersecurity is the leading indicator, because it already has the data to prove the pattern. The Verizon DBIR, the SANS surveys, and the SOC analyst attrition rates collectively tell a story: alert fatigue in AI-augmented security operations is already causing the exact failure mode this article describes. If regulators begin connecting SOC alert fatigue to breach liability, the implications will cascade to every other AI-augmented oversight workflow. I'm also watching whether any regulator, anywhere, begins requiring disclosure of cognitive load metrics or oversight-condition quality assurance for AI-augmented workflows. If that happens, it will force the measurement that organizations are currently avoiding. If it doesn't happen, the data gap persists until it's filled by a crisis. In healthcare, AI diagnostic errors that go undetected because of the "silent failure" problem (AI misclassifies a condition as benign; the patient doesn't return; no one connects the harm to the AI decision) could surface in malpractice data over the next 12-24 months.

Where I might be wrong: It's possible that AI augmentation, even in its current Phase 1 form, produces net-positive outcomes despite the cognitive load increase, because the volume of routine errors caught by AI outweighs the increase in edge-case errors caused by human overload. In that scenario, the productivity gains are real and the degradation of human judgment is a cost worth paying. I don't think this is right, because it ignores tail risk: the errors that matter most are not the average case but the catastrophic outlier, and those are precisely the cases most affected by cognitive overload. But it's a defensible counterargument that deserves monitoring. It's also possible that workers will adapt, developing new cognitive strategies for operating in compressed-time environments the way air traffic controllers and emergency physicians have. But that adaptation took decades in those professions and was supported by enormous institutional investment in training, protocols, and workload management. Nothing comparable exists for AI-augmented knowledge work.

The Paranoidist publishes weekly. If this changed how you think about one thing, consider subscribing. If it didn't, tell me what I'm missing. The whole point of productive paranoia is that I might be wrong, and I'd rather know now.

Paul Morin is the founder of DeepStrategy.ai and publisher of The Paranoidist, BoardroomRadar and ScenarioWatch. He has spent more than three decades in entrepreneurship, finance, risk management, and insurance, which is why he worries about the things that keep other people awake at night.

Researched, written, and edited in collaboration with Claude by Anthropic.