Review Debt: The Cognitive Cost of AI-Assisted Coding

April 22, 202610 min read

The hidden tax nobody is measuring, and four practices for managing it.

A few months back, I shipped four features in two days.

By Friday of that week, I couldn't explain two of them without opening the code.

It wasn't a lazy week. It wasn't a bad week. By every surface metric, pull requests merged, tickets closed, features live in production, it was one of the most productive stretches of my career. My velocity numbers looked beautiful. I should have felt good.

Instead, I closed my laptop on Friday with a specific, hollow kind of tired I hadn't felt in a decade of shipping software. Not the tired of a hard week. Not the satisfied tired of having solved something difficult. Something different. Something closer to:

I don't quite own what I just built.

That feeling nagged at me for weeks before I finally put a name on it.

I've started calling it review debt: the accumulated cognitive cost of shipping code you didn't fully internalize while it was being written. It behaves almost exactly like technical debt, except it doesn't live in your codebase. It lives in your head. And like technical debt, it compounds silently until the day it doesn't.

This is a piece about that kind of debt: how it got there, why nobody's measuring it, what the data actually shows, and the four-practice system I've built to keep it in check.

In 30 seconds

AI shifts more engineering work from writing to reviewing.
That shift creates hidden cognitive overhead when you ship code you have not fully internalized.
I call that overhead review debt, and it compounds like technical debt.

Key takeaways

Reading unfamiliar code is cognitively heavier than writing it.
Recent data suggests AI can increase output while still degrading comprehension and ownership.
The practical fix is not less AI, but stricter review discipline.

Something shifted, and nobody put it into words

AI hasn't made senior engineers faster. I want to be precise about that claim, because it's easy to misread. It's not that AI doesn't work, or that the code it produces is bad, or that we should go back to the old ways. None of those things are true, at least not in the simple version.

What's true is something stranger: AI has shifted where our work happens in a way the industry hasn't yet articulated, and the new location of the work is quietly more expensive for humans than the old one.

Before AI-assisted coding, a typical day as a senior engineer looked roughly like this. Maybe 80% of my active hours were spent writing: designing, typing, debugging small pieces, looking things up, writing again. Maybe 20% was reviewing: reading teammates' PRs, auditing my own diff before commit, scanning logs. The writing was slow. That was the cost. But the writing also built my mental model of the codebase, for free, as a byproduct of the act itself. By the time a feature shipped, I knew the code, because I had lived inside it for days. Review at the end was a light final tax.

Post-AI, that ratio has flipped. I'd estimate my current split is closer to 20% prompting and 80% reviewing. And the writing, the part that used to build comprehension as a byproduct, has been largely automated away. What remains is the reviewing, but now it's reviewing code I didn't write, often spanning twenty files I've never seen, at a pace nobody was ever designed to read code at.

The output looks identical on the outside. The cognitive cost has moved to an entirely different location, and that location happens to be one that fatigues humans disproportionately.

Why reading code is harder than writing it

Here's a piece of industry folklore that turns out to be backed by real cognitive science: we spend far more time reading code than writing it. Robert C. Martin has often estimated the ratio at around 10 to 1. That number is approximate, but every working engineer feels it. What matters is the direction: we read far more than we write, even in the pre-AI world.

Why is reading harder than writing? Because working memory is small.

The average human can hold roughly four distinct chunks of information in working memory at once. Cognitive load researchers, building on the foundational work of John Sweller in the late 1980s, have established that when the cognitive demand of a task exceeds available working memory, comprehension collapses. You can still scan the words. You just stop building an accurate model of what they mean.

When you write code, you construct the mental model yourself, piece by piece, and each new piece slots into a model you've already built. The cognitive load is distributed over time, and every line you type is also a rehearsal of the structure you're building. When you read unfamiliar code, you have to construct that same mental model from scratch, in reverse, while simultaneously absorbing the next few lines that depend on what you haven't yet fully processed. Reading code isn't passive consumption. It's active reconstruction under working-memory pressure.

This is exactly the mode AI-assisted coding forces us into, at volumes we weren't designed to handle. And it's why the fatigue feels different from the fatigue of the pre-AI era, because it is different. It's not "I worked too hard" tired. It's "my working memory has been thrashing for eight hours" tired. Same body, different part of the mind, different kind of cost.

The data is starting to catch up

If this were only subjective, I'd hesitate to make the argument. But the evidence is starting to accumulate, and the numbers are sharper than most people realize.

In July 2025, METR, a research group studying AI's real-world impact on work, published a randomized controlled trial that should have landed louder than it did. They recruited sixteen experienced open-source developers, each with multiple years of contribution history on large, mature repositories. Each developer brought real issues from their own projects. Tasks were randomly assigned to "AI allowed" or "AI not allowed." The developers used the frontier tools of the time, primarily Cursor Pro with Claude 3.5/3.7 Sonnet.

Before starting, the developers predicted AI would make them roughly 24% faster. That prediction matched the expert consensus. Machine learning and economics researchers surveyed ahead of the study forecast even larger speedups.

The actual result: developers were 19% slower when using AI.

The more unsettling finding is what they believed after the study ended. Even having lived through the slowdown, the developers still reported that AI had made them about 20% faster. The gap between how fast we feel and how fast we actually are in AI-assisted work is roughly forty percentage points. We are systematically wrong about our own productivity, and we're wrong in the direction that hides the cost from ourselves.

A separate line of evidence comes from GitClear, which analyzed over 211 million lines of changed code between 2020 and 2024. Their findings track the same trend from a different angle. Copy-pasted lines climbed from roughly 8% of all changes in 2020 to 12% in 2024. The share of refactored ("moved") code, traditionally a marker of thoughtful reuse and good architecture, collapsed from around 24% to under 10% in the same period. Code churn, the percentage of lines rewritten within two weeks of shipping, nearly doubled.

Read those numbers together and the picture sharpens. We're producing more code, understanding less of it, reusing less of it, and rewriting it sooner. The review surface has ballooned. The cognitive budget has not. That gap is where the new exhaustion lives, and where the hidden costs quietly accumulate on the balance sheet nobody is keeping.

Owning the Diff: a four-practice system

None of this is an argument for using less AI. I use AI daily. I'd fight anyone who tried to take it away. The productivity is real, just not in the shape we thought, and not without new disciplines to match the new mode of work.

Over the past eighteen months, I've built a system for keeping review debt manageable while still getting real leverage out of AI. I call it Owning the Diff. It's four practices. Each one is a direct response to a specific failure mode I kept falling into.

1. The Out-Loud Rule

Before I accept any AI-generated block, I explain what it does: out loud, or in a short comment, or in a message I send to myself in Slack. If I can't explain it, I haven't reviewed it. I've just seen it.

This is the most important practice of the four, because it attacks the core illusion of AI-assisted work: that reading equals understanding. It doesn't. Reading is a visual activity. Understanding requires active reconstruction. The old way of working built that reconstruction in automatically, because we typed every character ourselves. The new way doesn't. We have to put it back by hand.

In practice, this looks like: AI generates a function, I glance at it, it looks reasonable, and then I stop. I write a one-line comment above it explaining what it does in my own words. If writing that comment feels effortful, that's diagnostic information. My understanding was thinner than I assumed. Time to actually read the function, not just skim it.

Related tell: if I find myself about to accept a diff with the thought "I'll understand it when I need to," that's always a lie. The moment I need to understand it is when something breaks at 11pm on a Friday. The moment to build the model is now, while the code is fresh and the stakes are low.

2. Review as Stranger

I treat every AI diff as if a contractor wrote it, not me. Different psychological mode, different scrutiny.

The trap here is subtle. Your name ends up on the commit. The tool is integrated into your IDE. The workflow feels continuous with your own typing. So your brain, reasonably, tags the output as "mine" and skips the rigor you'd apply to a teammate's PR.

But your brain didn't build it. You didn't work through the edge cases. You didn't struggle against the type system until it made sense. You didn't choose between three implementations and pick one for a reason. The code is, in every way that matters to your comprehension, someone else's. Review it like someone else's. Look for the things you'd call out if your junior teammate had submitted it. You'll find them: the off-by-ones, the swallowed exceptions, the import that pulls in a 400KB dependency for a one-liner. AI makes all of those mistakes. You just stop seeing them because your brain has filed the code under "mine, already approved."

3. Bounded Sessions

I cap any single AI-assisted push at roughly three files or two hundred lines. Then I stop. I run the code. I read what was produced. I make sure I understand it. Only then do I continue.

This is the hardest practice to actually follow, because unbounded generation is fun. You can feel yourself making progress. The code piles up. Features appear on screen. It's only on Thursday afternoon, when you're staring at a change set you can't reconstruct, that the cost becomes visible.

The pre-AI workflow had natural rhythm built in for free. Typing is slow. Compiling forces pauses. Debugging imposes breaks. You couldn't generate 500 lines in 15 minutes even if you wanted to. AI removed that rhythm entirely. If you want it back, you have to put it there deliberately: a hard stop, a rerun, a moment of comprehension, and only then permission to continue.

4. The Slow Lane

One day a week, or one project, or one particular module, I write without AI at all. Not out of nostalgia. Not out of principle. For a specific functional reason: I need at least one part of my codebase where my mental model is complete. Not partial. Not approximate. Complete.

That part of the codebase becomes my reference point. When something feels off elsewhere, a weird bug, a surprising behavior, a suspicion, I can trust my instincts because I have a calibrated baseline. Without the Slow Lane, you eventually lose the ability to tell good from off. You stop being able to smell bugs before they bite. Your judgment goes soft in a way you can't detect from the inside, because the same erosion that's happening to your skills is happening to your ability to notice it.

This practice sounds precious. It isn't. It's the maintenance work that keeps you from slowly becoming dependent on tools you can't audit, and by extension, from becoming an engineer whose judgment can't be trusted on the things that matter most.

The skill that compounds

Here's the reframe I want to leave you with.

The scarce skill in software engineering isn't prompting. It isn't picking the right model. It isn't knowing the latest AI-native framework. All of those skills will commoditize within a year or two, because the tools that provide them are getting easier, cheaper, and more standardized every month. The floor is rising. Anyone can prompt.

The scarce skill, going forward, is reviewing with judgment, at the speed AI now ships things to you. That's the skill that doesn't scale with GPUs. That's the skill that has to be built inside a human brain the slow way, through thousands of hours of paying attention. And it's the skill that separates engineers who compound with AI from the ones who quietly burn out beneath it.

The tired I felt that Friday a few months back wasn't a sign I was doing something wrong. It was a signal of a real asymmetry the industry hasn't yet reckoned with: between what AI can produce in a morning and what a human can meaningfully absorb in one.

Review debt is real. It behaves like technical debt. It compounds. And like all debt, the first step to managing it is refusing to pretend it isn't there.

References & further reading

On AI's measured impact on developer productivity

Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. METR. Randomized controlled trial with sixteen senior developers showing a 19% slowdown while participants still felt faster. METR summary and arXiv paper.
GitClear. (2025). AI Copilot Code Quality: 2025 Look Back at 12 Months of Data. Analysis of 211 million changed lines showing rising copy-paste, falling code movement, and higher churn. Read the report.

On cognitive load and working memory

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285. Foundational paper on cognitive load theory and working-memory limits. Wiley abstract.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.

On reading vs. writing code

Martin, R. C. (2008). Clean Code: A Handbook of Agile Software Craftsmanship. Source of the "well over 10 to 1" reading-to-writing ratio commonly cited by engineers.