Cost per Merged PR: The Unit Economics of AI Coding

AI-assisted PRs run about 18.2% larger and review time can climb 91%, so cost-per-token looks cheap while the cost per merged PR quietly rises.

Brandon WaselnukJun 12, 2026Engineering InsightsContext Engineering

Cost per Merged PR: The Unit Economics of AI Coding

TL;DR: Cost-per-token and cost-per-seat measure inputs, not outcomes. The cost per merged PR divides fully-loaded AI cost (tokens, tools, review time, rework) by the PRs that merge and survive a churn window. AI inflates raw PR counts while review and rework rise, so the input metric looks great while the outcome metric climbs. Context is the lever: higher token yield shrinks the numerator and grows the denominator.

Picture the quarterly review. Finance puts a single chart on the screen: AI tooling spend tripled this year. Then the question lands, and it is a fair one. We 3x'd AI spend, so what shipped? The engineering leader opens the dashboards. Tokens consumed, up and to the right. Seats provisioned, all of them. Suggestions accepted, PR volume climbing. None of it answers the question, because none of it is denominated in shipped, surviving work.

Here is the fix, and it changes the whole conversation: pick one unit economic, the cost of a merged PR, and measure it. Put fully-loaded cost in the numerator. Put PRs that ship and survive in the denominator. This metric is the AI coding cost metric that maps spend to outcome, and it is the number finance was actually asking for. Cost-per-token tells you how cheap the input got. It tells you what the output cost.

What is cost per merged PR?#

The metric is your total fully-loaded AI cost over a period divided by the PRs that both merged and survived. Survived means not reverted or substantially rewritten inside a churn window. This is different from cost-per-seat and cost-per-token, which count what you bought, not what shipped. It is the AI coding cost metric that ties spend to outcome.

Why does the denominator's shape matter so much? Because AI changes it. According to Jellyfish (Sep 2025), AI-assisted pull requests run about 18.2% larger, rising from 74.8 to 88.4 additions per PR across millions of PRs at more than 500 companies. Bigger PRs mean each merge carries more code, more review surface, and more places to churn. The unit you are dividing by is not the unit it used to be, and a cost-per-shipped-feature view inherits the same distortion.

Why does cost-per-token make AI look cheap while it gets expensive?#

Cost-per-token optimizes the wrong thing. The per-token price keeps falling, but consumption explodes and most tokens never convert into shipped output. A falling unit price on a soaring unit count doesn't save money; it hides a budget line you stopped watching.

The scale is the tell. As reported by Tom's Hardware (2025), a Goldman Sachs analysis projects agent-driven token demand could rise by over 24x in the next few years even as the price per token falls. The token-cost term itself is real and worth modeling: Anthropic publishes per-MTok rates you can plug straight into a budget. The trap is treating that input price as the score. Cheaper tokens consumed in vastly greater volume, with no view of what merged, is exactly how a tripled bill sneaks past a healthy-looking cost-per-token chart. For the broader framing, see our AI tokenomics cost framework.

Why do raw PR counts lie about AI output?#

Raw PR counts lie because AI inflates both volume and size, and a merged PR is not a shipped outcome if it gets reverted or rewritten. You need to separate merged-and-survived from merged-and-churned. Counting merges without a survival test rewards motion over delivery.

The cleanest illustration comes from Faros AI (Jun 2025): high-AI teams merged 98% more PRs and completed 21% more tasks, yet PR review time rose 91% and PR size climbed 154%, with no measurable org-level DORA gain. So volume nearly doubled while the outcome at the organization level held flat. That gap is the whole point. PRs that cost more to review and do not move delivery metrics are spend with a soft denominator, not free output. Survival data sharpens the warning: an arXiv study (Sep 2025) found 83.8% of agent-assisted PRs eventually merged but only 54.9% merged unmodified, so nearly half needed rework before they shipped. This is the cost-side face of the same problem we documented in the AI productivity paradox, where output rose but outcomes did not.

What is the formula for cost per merged PR?#

The formula sums four cost terms and divides by surviving merges. Each term is real money or real hours: the token spend uses per-MTok pricing from Anthropic, tool and seat spend is your allocated subscription and agent run cost, review time is loaded reviewer hours, and rework is engineer hours spent fixing or reverting AI output. The denominator is the disciplined part.

textCost per merged PR =

    ( Token spend + Tool & seat spend + Review time cost + Rework cost )
    -------------------------------------------------------------------
              PRs that merge AND survive the churn window

where:
  Token spend        = tokens consumed x per-MTok price (Anthropic pricing)
  Tool & seat spend  = AI subscriptions + agent run costs (period, allocated)
  Review time cost   = reviewer hours x fully-loaded hourly cost
  Rework cost        = engineer hours fixing/reverting AI output x loaded rate
  Surviving merge    = a merged PR not reverted or substantially rewritten
                       within the churn window (e.g., 2 weeks)

The discipline lives in that last line. A merge only counts once it survives the window.

Worked example: what a merged PR actually costs#

A worked example shows the denominator moving, not just the numerator. The numbers below are illustrative, not a customer result, but the arithmetic is internally consistent and grounded in one first-party measurement: an Unblocked controlled test found that better context cut tokens 42%, ran 27% faster, and used 64% fewer tool calls on the same prompt and model.

Same team, same model, one month, two context conditions:

textWORKED EXAMPLE (illustrative)                Low context     High context

  Token spend                                  $4,000          $2,320
  Tool & seat spend                            $2,000          $2,000
  Review time cost                             $6,000          $4,200
  Rework cost                                  $3,000          $1,400
  ---------------------------------------------------------------------
  Total cost (numerator)                      $15,000         $9,920

  PRs merged                                       40              42
  PRs that survived churn window                   30              38
  ---------------------------------------------------------------------
  Cost per merged PR (survivors)                 $500            $261

Token spend falls 42%, fewer "almost right" cycles trim review, less churn lifts survivors from 30 to 38, and the metric nearly halves. The cheapest token is the one that ships.

How does code churn quietly raise cost per merged PR?#

Churn raises the metric because rework is the term teams forget to book. Churned code is spend with a zero in the denominator: you paid to generate it, paid to review it, paid to fix or revert it, and shipped nothing durable. It hits both sides of the ratio at once.

The trend lines support treating churn as a first-class cost. According to GitClear (report published 2025, data window 2020 to 2024), cloned or copy-pasted code rose from 8.3% to 12.3%, refactored lines fell from 25% to under 10%, and code revised within two weeks climbed from 3.1% to 5.7%. More copy-paste and less refactoring mean more lines that get rewritten soon after merge. That two-week revision rate is your churn window made visible. Booking rework keeps the metric honest, a point we expand in how to measure AI productivity.

Why does the review tax belong in the numerator?#

The review tax belongs in the numerator because "almost right but not quite" output transfers cost from generation to review, and reviewer hours are real spend on the path to every merge. If the model writes faster but a human spends longer checking, the work didn't disappear; it shifted downstream to review, where an hour usually costs more.

The survey data quantifies the tax. In the Stack Overflow 2025 Developer Survey, only about 29 to 33% of developers said they trust the accuracy of AI output, and 66% reported hitting answers that are "almost right but not quite." Defect density compounds it: as reported by The Register (Dec 2025), review vendor CodeRabbit found AI-authored code averaged 10.83 issues per PR against 6.45 for human-written code, roughly 1.7x, with about 1.4x more critical issues. More issues per PR means more reviewer hours per merge, which is exactly why the time belongs in the cost.

How does context lower cost per merged PR?#

Context lowers it by raising token yield, the share of token spend that converts into shipped, surviving output (defined fully in our AI tokenomics cost framework). Higher yield shrinks the numerator, fewer tokens, fewer tool calls, less rework, and grows the denominator, more PRs that merge and survive. It moves both sides of the ratio in your favor.

The first-party signal is direct: in the Unblocked controlled test, better context produced 42% fewer tokens on the same prompt and model, 27% faster runs, and 64% fewer tool calls. Unblocked is the context engine for engineering teams, and the effect it targets is the survival rate of merges. This is the cost-side twin of P9's context-adjusted productivity: that metric adjusts output for context, this one adjusts cost. Read the context-adjusted productivity piece for the output side, and measure AI coding tool ROI to wire both into a return calculation.

Frequently asked questions#

What does this metric actually measure?#

It is your fully-loaded AI cost, tokens, tools, review time, and rework, divided by the PRs that ship and survive a churn window. It is the AI coding cost metric that maps spend to outcome rather than to inputs like seats or tokens. The survival test, typically two weeks, is what separates it from a raw merge count.

Why not just track cost per token?#

Cost-per-token measures an input, not an outcome. The per-token price keeps falling while consumption could rise by over 24x in the next few years, per a Goldman Sachs analysis via Tom's Hardware (2025). Most tokens never ship, so a cheap token is not a cheap merge.

How is cost per merged PR different from cost per shipped feature?#

They are the same idea at different grain. The PR is the instrumentable unit your tooling already tracks; the cost per shipped feature is the business unit a stakeholder cares about. Both denominate cost by surviving outcome. Start at the PR level because it is measurable today, then roll up to features for executive reporting.

How does context lower the cost?#

Context raises token yield, the share of spend that converts into surviving output. In a controlled test, better context delivered 42% fewer tokens on the same prompt and model, plus fewer review cycles and more PRs that survive churn. That shrinks the numerator and grows the denominator at once, which is what actually moves the number.

The Only Denominator That Matters#

Cost-per-token is an input metric. The cost of a merged PR is an outcome metric, and only one of them answers the question finance asked in the quarterly review. The evidence points the same way from every angle: PRs are getting larger (Jellyfish), review time and PR size are climbing without org-level gains (Faros AI), churn is rising (GitClear), and as the arXiv survival data showed, only 54.9% of agent-assisted PRs merge unmodified. Pick the denominator that survives.

The path to a lower number is higher token yield, which is where context earns its keep. Unblocked raises the share of PRs that merge and survive, and that is the only saving that shows up in the metric that matters. Start by instrumenting cost per merged PR this quarter, then read the AI coding tool ROI reckoning for what to do with the number once you have it.