AI Coding Anti-Patterns: How Teams Waste the Three Levers

Devs were 19% slower with AI yet felt 20% faster. The 7 AI coding anti-patterns that corrupt cost, output, and autonomy, plus the fix for each.

Dennis PilarinosJun 23, 2026AI AgentsEngineering Insights

AI Coding Anti-Patterns: How Teams Waste the Three Levers

Key Takeaways

• AI coding anti-patterns are not random bad habits, they are three specific failures of the cost, output, and autonomy levers every team controls.

• The diagnostic signature is a felt-vs-real gap: developers ran 19% slower with AI yet believed they were 20% faster (METR, 2025).

• On cost, teams measure token spend instead of token yield. On output, they count lines and merged PRs while rework climbs. On autonomy, they auto-merge code nobody verified.

• The durable fix across all three is curated context underneath the agent, so it generates less waste in the first place.

In a 2025 randomized controlled trial, experienced open-source developers were 19% slower when they used AI tools, and yet they believed they had been 20% faster (METR, 2025). That 39-point gap between felt and real is the signature of an AI coding anti-pattern: a habit that corrupts one of the three levers a team controls, while feeling like progress.

The pillar building with AI effectively frames those three levers as cost, output, and autonomy. This post is the inverse. Every AI coding anti-pattern here is a team unconsciously breaking one of them: burning tokens that don't produce, shipping volume that gets counted as output, or granting an agent trust its context hasn't earned.

We've sorted seven of them by the lever each one breaks. Every anti-pattern closes the same loop: the habit, the lever it corrupts, and the control that fixes it.

What makes a habit an AI coding anti-pattern, not just a mistake?#

A mistake is a one-off; an anti-pattern is a habit that feels productive while quietly corrupting a lever you control. The clearest evidence is the felt-vs-real gap: in a controlled study, developers ran 19% slower with AI yet believed they were 20% faster (METR, 2025).

That gap is the diagnostic. When a habit makes work feel faster but the numbers say otherwise, you're looking at an anti-pattern, not a slip. The three levers it can break come straight from the operating model in building with AI effectively: cost, output, and autonomy.

Cost is what you spend per useful result. Output is durable, working software, not raw volume. Autonomy is how much you let an agent act without a human in the loop. Each lever has its own failure mode, and each failure feels like a win in the moment, which is exactly why it survives.

The seven AI coding anti-patterns ahead each break one of these levers, and each comes paired with the control that fixes it. Two are cost failures, three are output failures, and two are autonomy failures. None of them require a bad engineer. They take only a metric pointed at the wrong number and an agent left to run without the context that would keep it honest.

---

Why does paying for more tokens not buy more output?#

Cost is the first lever, and the AI coding anti-patterns here all optimize the wrong number. Goldman Sachs projects global token usage to multiply 24x by 2030, and Jellyfish reports per-developer consumption already up 18.6x in nine months (TechCrunch, 2026). Teams that chase spend instead of yield pay that curve without buying progress.

Anti-pattern 1: Measuring token spend instead of token yield#

Spending more does not buy more output, and one dataset makes the point bluntly: a software firm's heaviest token users were roughly twice as productive but spent about ten times the tokens (Jellyfish, via TechCrunch, 2026). Spend rose 5x faster than the result it bought.

The trap is treating "more tokens" as "more progress." With per-developer consumption climbing 18.6x in nine months (Jellyfish, via TechCrunch, 2026), a bigger bill starts to feel like proof of velocity. It isn't.

This corrupts the cost lever. When you optimize cost-per-token, you reward whoever burns the most, not whoever ships the most. A leaderboard of token spend will quietly promote your least efficient workflows. The control is to instrument cost-per-useful-output, what we call token yield, and tie spend to merged PRs and resolved tickets instead of raw consumption. Measure the result, then divide by what it cost to get there.

The teams with the cleanest yield numbers aren't the frugal ones; they're the ones whose agents waste fewer tokens because they start with the right context. Spend follows context quality, not the other way around. See the token-yield context problem and how to measure AI productivity.

Anti-pattern 2: Letting agent loops run unbounded#

An agent with no stop condition spends without producing, and the bills have gotten loud. One firm running with no usage limits faced a roughly $500M Claude bill, Uber exhausted its 2026 AI budget by April, and one engineer hit around $40K a month (TechCrunch, 2026). Treat those as color, not benchmarks.

The pattern underneath is the same in every case: a loop with no exit. The agent retries, re-reads, and re-plans, and each pass costs tokens whether or not it moves the work forward. Nobody decided to spend that money; the loop just kept going. The danger is that an unbounded loop looks like diligence. The agent appears to be working hard on a stubborn problem, when it is really circling a goal it can't reach because it's missing the context to reach it.

This corrupts the cost lever directly. Open-ended autonomy without a budget converts compute into noise. The control is mechanical: cap iterations, set per-step token and step budgets, and add a verify step before the agent re-loops. In our experience, the cheapest fix is the dumbest one, a hard ceiling on steps, because it turns an unbounded cost into a known one. For why these loops get so expensive, see why AI agents burn tokens.

---

How does AI make your team look faster while shipping more rework?#

Output is the second lever, and the failure is counting volume as if it were durable work. Code revised or reverted within two weeks of being written rose from 3.1% in 2020 to 5.7% in 2024, while refactoring fell from 25% to under 10% (GitClear, 2025, 211M lines analyzed). More lines arrived; fewer of them lasted.

Anti-pattern 3: Counting generated lines as productivity#

Line count is rising while durability falls, and GitClear's 2025 edition quantifies the split: across 211 million lines analyzed, cloned lines rose from 8.3% to 12.3%, and in 2024 copy-pasted lines exceeded moved lines for the first time (GitClear, 2025). Volume up, reuse down.

Pair that with the churn numbers. Code revised or reverted within two weeks climbed to 5.7% in 2024, and refactoring, the work that keeps a codebase healthy, dropped below 10% (GitClear, 2025). A dashboard counting lines added would show all of this as a win.

This corrupts the output lever. When you measure volume, short-lived and duplicated code inflates the same number as durable work, so the metric goes up while the codebase gets worse. Copy-pasted code is the clearest tell: it adds to the line count today and adds to your maintenance burden forever. The control is to measure rework rate and churn instead of lines added, and to treat code that dies within two weeks as a cost, not a contribution. A line that gets reverted didn't ship anything; it consumed a review and produced a revert. More on the right metrics in how to measure AI productivity.

Anti-pattern 4: Treating throughput gains as free#

Throughput rises but stability falls, and the bill arrives later. Across 22,000 developers and 4,000 teams, bugs per developer rose 54%, production incidents per PR roughly tripled, and code churn climbed about 10x (Faros AI, 2026, 22,000 developers / 4,000 teams). Faster shipping, more breakage.

The DORA 2025 research names the same tension directly: AI raises throughput and instability at once, with about 90% of respondents using AI at work (DORA, 2025). Velocity numbers climb. Working software does not climb with them.

This corrupts the output lever, because the speed gain is real but it isn't free, and the stability cost erases part of it. A team can post record throughput and ship less working software than it did the quarter before. The control is to track the stability side of the ledger, incidents and change-fail rate, next to the speed numbers, so the two move together in your dashboards instead of one hiding the other. And this is where curated context pays off: agents that start with the full picture generate less of the rework in the first place.

"My workflow is: here's the Jira ticket, here's the Confluence doc, here are the Slack threads, now build me a plan. Unblocked pulls all of that together so the agent starts with the full picture. Without it, I'd estimate I'm 20 to 30 percent less productive."

— Tushar Kawsar, Software Engineer, UserTesting

See the positive framing in building with AI effectively and the deeper dive in the AI productivity paradox.

Anti-pattern 5: Shipping the "almost right" first draft#

The "almost right" draft is the most expensive output trap, because it looks done. In the Stack Overflow Developer Survey 2025, the top AI frustration, cited by 66% of developers, was code that's "almost right but not quite," and 45.2% said debugging AI-generated code took more time than expected (Stack Overflow, 2025).

That hidden debugging tax is the whole problem. The draft ships, the velocity chart looks great, and then the time you saved drains back out in review and rework. The apparent gain and the real gain are different numbers. Worse, "almost right" is harder to catch than plainly wrong. Code that fails loudly gets fixed; code that works in the happy path and breaks on the edge case ships, then surfaces as an incident weeks later.

This corrupts the output lever: what looks like finished output is a draft carrying a deferred cost. The control has two parts. First, budget verification of AI output as part of the work, not as overhead. Second, reduce "almost right" at the source by giving the agent the WHY, the decisions and constraints behind the code, not just the code itself. Most "almost right" output isn't a model failure; it's a context failure, the agent never saw the constraint it violated. More on that in three myths about context for AI agents.

---

When does giving an AI agent more autonomy backfire?#

Autonomy is the third lever, and it backfires when you grant it faster than context and review can justify. After AI adoption, median PR review time rose about 5x and 31.3% more PRs now merge with no human review at all (Faros AI, 2026). Trust is being granted by default, not earned.

Anti-pattern 6: Merging AI code without human review#

Auto-merge is autonomy without verification, and the trend runs opposite to actual trust. Faros found 31.3% more PRs merging with no human review after AI adoption, while only 3.1% of developers "highly trust" AI accuracy and 45.7% actively distrust it (Stack Overflow, 2025). Few people trust the output; many merge it unread anyway.

That contradiction is the anti-pattern. The same teams that say they don't trust AI code are quietly letting more of it through without eyes on it, usually because review queues got slow and auto-merge felt like relief. The volume of AI-generated PRs grew faster than human review capacity, and something had to give. What gave was the review, not the volume. That is autonomy granted by exhaustion rather than by trust, and it is the most common way the autonomy lever gets mis-set.

This corrupts the autonomy lever, because autonomy is being granted while the verification that should justify it goes missing. The control is to calibrate autonomy to verified trust: keep a human in the review loop for anything the agent can't justify from its context, and reserve auto-merge for the narrow, well-understood changes that earn it. We cover how to dial that in without micromanaging in stop babysitting your agents.

Anti-pattern 7: Trusting AI output you can't verify#

Granting autonomy over code nobody is security-reviewing is the highest-risk anti-pattern, and the data is stark. Across 100-plus LLMs evaluated, 45% of generated samples failed security tests against the OWASP Top 10, Java failed 72% of the time, and cross-site scripting was defended only 14% of the time (Veracode, 2025, 100+ LLMs evaluated).

Independent review of merged code shows the same gap. In an analysis of 470 PRs, AI-authored PRs carried 10.83 issues on average versus 6.45 for human PRs, roughly 1.7x, with logic and correctness issues up about 75% and security issues up to 2.74x (CodeRabbit, 2025, 470 PRs analyzed). More issues, riskier ones.

This corrupts the autonomy lever: the agent is acting on code paths no human or scanner is checking. The risk compounds, because the same context gaps that produce insecure code also make it harder for a human to spot the flaw on a quick read. The control is to gate autonomy with automated review and security scanning, and to treat AI-authored PRs as higher-risk by default. Run the scan before the merge, not after the incident. That gate is the calibrated-autonomy lever from building with AI effectively, put into practice.

Which control fixes each anti-pattern?#

Here is every anti-pattern, the lever it corrupts, and the control that fixes it.

Anti-pattern	Lever it corrupts	The control that fixes it
Measuring token spend instead of yield	Cost	Instrument cost-per-useful-output; tie spend to merged PRs and resolved tickets
Letting agent loops run unbounded	Cost	Cap iterations; set step and token budgets; verify before re-looping
Counting generated lines as productivity	Output	Measure rework and churn, not lines; treat short-lived code as a cost
Treating throughput gains as free	Output	Track incidents and change-fail rate alongside speed; curate context to cut rework
Shipping the "almost right" first draft	Output	Budget verification as work; give the agent the WHY, not just the code
Merging AI code without human review	Autonomy	Calibrate autonomy to verified trust; keep humans in the loop
Trusting output you can't verify	Autonomy	Gate with automated review and security scanning; treat AI PRs as higher-risk

Which Anti-Pattern Is Costing You Most#

The three levers fail together, but most teams have one dominant leak, and most AI coding anti-patterns cluster around it. Run the one-line diagnostic. If your bill is climbing faster than your shipped work, the leak is cost. If your velocity chart looks great while rework and incidents climb, the leak is output: production incidents per PR roughly tripled across 22,000 developers in one study (Faros AI, 2026, 22,000 developers / 4,000 teams). If more PRs are merging with no human review, the leak is autonomy.

Whichever lever is leaking, the durable fix has the same shape: curated context underneath the agent, not more tokens, more PRs, or more trust. An agent that starts with the full picture wastes fewer tokens, generates less rework, and gives reviewers something they can actually verify. That is the difference between a team that feels faster and one that is.

How far along that curve your team sits is something you can place on the context-maturity framework, which maps the stages from ad-hoc prompting to context that travels with the work. If you want a quick read on where you stand before you pick a lever to fix, the readiness assessment scores your current setup in a few minutes.

Start by naming your dominant leak this week, then fix the lever, not the symptom.