Bottom line: The official GitHub MCP server is the canonical context-tax offender in 2025-2026. Nebulagg measured it at ~42,000 tokens in tool definitions alone; Piotr Hajdas's more recent count put it at 55,000 across 93 tool definitions. That's roughly 21% of a 200K context window paid before your first prompt. Four concrete fixes apply, in descending order of ROI: (1) enable Claude Code 2.0's tool-search subagent (Joe Njenga measured a 46.9% reduction), (2) allow-list only the MCP tools you actually use, (3) replace MCP calls with the GitHub CLI (OnlyCLI benchmark: 4-32x cheaper per operation), (4) move retrieval out of the agent loop into a context layer. Stack fixes 1 and 2 for a 90%+ reduction.
If you have the official GitHub MCP server installed in Claude Code, roughly 42,000 tokens of your context window are spent before you type your first prompt. That's the cost of tool definitions alone, paid on every model call inside the agent loop. Two independent community measurements confirm the number. Nebulagg's dev.to autopsy pinned the GitHub MCP at around 42,000 tokens in tool definitions (dev.to/nebulagg, 2026). A more recent count from Piotr Hajdas pushed the figure to 55,000 tokens across 93 distinct tool definitions (dev.to/piotr_hajdas, 2026), as the schema has expanded with new GitHub features. Either way, you're paying 21% or more of a 200K context window before any actual work happens. This is the canonical "context tax": the standing schema cost levied by verbose MCP servers on every turn. Below is a short autopsy of where the 42K goes, followed by four ranked, measured fixes you can apply this week.
Where does the GitHub MCP server spend the 42,000 tokens?#
The 42,000 tokens are spent almost entirely on tool-definition JSON Schema, not on any work the agent actually does. Per Nebulagg's 2026 autopsy, the GitHub MCP charges roughly 42,000 tokens; Piotr Hajdas's later measurement found 55,000 across 93 tool definitions (dev.to/piotr_hajdas, 2026), suggesting the schema has grown as GitHub added surface area.
The breakdown looks roughly like this:
| Component | Approximate tokens |
| Repository operations (list, get, create, branch, fork) | ~10,000 |
| Pull request operations | ~9,000 |
| Issue operations | ~7,000 |
| Workflow and Actions operations | ~6,000 |
| Code search and file operations | ~5,000 |
| User, org, and team operations | ~3,000 |
| Other (releases, labels, projects) | ~2,000 |
| Total | ~42,000 |
Numbers are illustrative, based on community-reported breakdowns. The exact distribution shifts with each MCP release.
Why so big? GitHub's API surface is one of the largest in the developer ecosystem. MCP requires verbose JSON Schema for every tool, with parameter types, descriptions, examples, and error formats inline. And every definition reloads on every model call inside the agent loop. There is no caching across turns.
Scott Spence put it bluntly after profiling his setup: with GitHub MCP installed, he was burning through a third of his context window before doing anything useful (scottspence.com, 2026). Stefano Demiliani independently described the same pattern as "too many tools" (demiliani.com, 2026), and Amzani's tour of the failure mode called it MCP "eating your context window" (dev.to/amzani, 2025). Mert Köseoglu's Context Mode write-up reaches the same conclusion from a different angle: when MCP servers preload everything, the agent's effective working memory shrinks before it starts (mksg.lu, 2026).
The interesting thing isn't the absolute number. It's the slope: GitHub MCP grew from ~42K to ~55K in roughly six months as the server added tool surface. The context tax compounds with feature velocity.
Fix 1: Does Claude Code 2.0's tool-search subagent actually cut the bill?#
Yes, and by the largest margin of any single fix. Joe Njenga's production measurement, published in 2026, recorded main-thread token usage dropping from 51,000 to 8,500, a 46.9% cut on a workload where the GitHub MCP was the heaviest installed server (Joe Njenga, Medium, 2026). That's the highest-impact fix on the menu.
How the tool-search subagent works#
Instead of loading every tool definition into the main agent's context, Claude Code spawns a subagent in its own fresh context. The subagent reads only the tool definitions relevant to the operation the agent decides to call. Anthropic's own engineering guidance frames this as progressive disclosure of tool surface: expose what's needed, when it's needed (Anthropic, 2026). The mechanism is documented in the Claude Code sub-agents reference (Anthropic Claude Code docs, 2026).
When it helps most#
GitHub MCP is the textbook case. Most sessions use 3 to 5 GitHub operations out of 50-plus available, so the subagent's gating throws away cost the main thread never needed. The savings show up immediately on cost dashboards via the Claude Code cost endpoint (Anthropic Claude Code docs, 2026).
How to turn it on#
Enable the tool_search subagent in your Claude Code settings. You don't need to rewrite any prompts. The subagent runs transparently and intercepts tool-definition loading on the main thread.
ROI: highest of the four fixes. Effort: minutes.
Fix 2: How does MCP tool allow-listing cut the schema cost?#
Allow-listing cuts the GitHub MCP context tax by roughly 80% of its schema overhead, with no code changes. Most MCP clients (Claude Code, Cursor, and others) let you whitelist which tools the server exposes. The math is simple: if you load 5 of 50 tool definitions, you pay roughly one tenth of the original token cost on every turn.
The honest question to ask is whether you really need all 50-plus GitHub tool definitions, or whether 5 cover 90% of your real workflows: get_repository, list_pull_requests, get_pull_request, list_issues, and search_code.
A concrete mcp.json example#
Most clients accept a tools array inside the server definition. Here's an allow-list that exposes only five core tools:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
},
"tools": [
"get_repository",
"list_pull_requests",
"get_pull_request",
"list_issues",
"search_code"
]
}
}
}What this leaves on the table#
You lose access to workflows, releases, labels, project boards, and the long tail of org and team operations until you re-enable them. For most engineering teams, that's a non-issue. For platform teams running Actions or release automation through the agent, plan a second profile with those tools added back.
Allow-listing stacks well with Fix 1. The subagent gates tool loading at runtime; allow-listing reduces the surface area the subagent has to search in the first place. Together they typically yield 90%+ reduction on the GitHub MCP context tax.
Fix 3: Can the GitHub CLI replace the MCP server outright?#
Often, yes. OnlyCLI's 2026 benchmark measured equivalent CLI tools at 4-32x cheaper per operation than their MCP counterparts (OnlyCLI, 2026). Scalekit replicated the experiment on Sonnet 4 at 10,000 operations per month and recorded $3.20 for CLI versus $55.20 for MCP, a 17x gap on identical workloads (Scalekit, 2026).
Why the CLI wins on cost#
The schema preload disappears. The gh CLI is a single executable the agent invokes as a shell command. There's no 42K-token tool definition loaded into context, just one short command and its output. For well-scoped, repeatable operations, clone, list issues, view a PR, this is the simplest possible fix.
Where the CLI loses#
Composition. When the agent needs to chain 5 or more GitHub operations into one reasoning step, per-call shell overhead and output parsing start to dominate. The token-per-operation cost stays low, but the round-trip count climbs. MCP's strength is exactly this multi-step composition.
Community benchmarks suggest the break-even sits around 4 composed operations per agent turn. Below that, the CLI wins on cost and latency. Above that, MCP's batching advantage starts to matter. The exact crossover depends on workload shape and which other MCPs are loaded.
The pragmatic split#
Use the GitHub CLI for retrieval and one-shot writes (open PR, view issue, clone). Keep MCP available behind allow-listing for workflows that genuinely need composition. See our MCP vs CLI decision rubric for a workload-by-workload breakdown.
Fix 4: When does on-demand retrieval via a context layer win?#
It wins whenever the agent is using GitHub MCP for retrieval rather than mutation, which is most of the time. The honest read on most agent transcripts: the GitHub MCP is being asked "what changed", "who reviewed this", "which issue is this PR closing", far more often than it's being asked to create or merge anything. A context layer that pre-aggregates those answers and exposes them on demand removes the need for live GitHub API calls inside the agent loop entirely.
This isn't another MCP server. It's an architectural shift. Instead of giving the agent tools to fetch GitHub state turn by turn, you give it answers to consult about GitHub state. The retrieval cost moves out of the agent's context window and into a separate service. The context tax drops because the schema isn't there to begin with. The same shift fixes the codebase-shape forgetting pattern — agents that re-explain their stack every session are paying a retrieval problem with memory-store tooling.
Andrei Antanovich, Engineering Lead at Waste Logics, described how his team operationalized this pattern:
"The first instruction in every agent project file is: before making any changes, gather context. That pulls from Jira, Confluence, and Slack via Unblocked, because that's where most of our knowledge actually lives, in threaded discussions. I set it up the day it was announced and now I don't even think about it. I just get the relevant information."
Unblocked is the context layer for coding agents. It fetches PR descriptions, commit history, and related issues on demand, and unifies code repos with PRs, Slack, Jira, Notion, and Confluence at the moment of decision, so the agent doesn't pay the 42K context tax just to ask "what happened on this PR last week".
The teams who pick this path generally aren't trying to optimize the GitHub MCP. They've decided the agent's working memory is too valuable to spend on schema, and they want retrieval handled by a service designed for it.
Which fix should you start with?#
Start with Fix 1, then layer Fix 2 on top, before considering 3 or 4. The ROI ranking below assumes you have the official GitHub MCP installed in Claude Code and want measurable token savings this week.
| Fix | Effort | Token savings (typical) | Risk |
| 1. Tool-search subagent | Low (config flag) | 40-50% | Low |
| 2. MCP tool allow-listing | Medium (edit mcp.json) | 60-80% of schema | Low |
| 3. Replace with gh CLI | Medium (rewrite prompts) | 90%+ for scoped ops | Medium (loses cross-tool reasoning) |
| 4. Context layer (on-demand retrieval) | High (architectural) | Variable, often largest | Medium (requires new service) |
Stacking compounds#
Fix 1 and Fix 2 compound cleanly. The subagent gates tool loading at runtime, allow-listing shrinks the surface it gates over. Together they routinely deliver 90%+ reduction on GitHub MCP context tax without touching prompts. Verify the drop on the Claude Code cost endpoint before and after each change (Anthropic Claude Code docs, 2026).
Substitution vs. layering#
Fix 3 partially substitutes for Fixes 1 and 2 by removing MCP from scoped operations entirely. Fix 4 is the structural answer: the context tax exists because retrieval lives in the agent loop. Move retrieval out and the tax disappears.
The fix you choose is really a statement about how you treat the agent's context window. Fixes 1 and 2 say "keep MCP, pay less." Fix 3 says "use the right tool per workload." Fix 4 says "stop paying schema rent for retrieval at all."
For the underlying budget math, the P8 token-budget autopsy walks through the full accounting. For why MCP alone doesn't solve context loss, see MCP isn't a context strategy.
FAQ#
How many tokens does the GitHub MCP server actually use?#
Two independent 2026 measurements bracket it: Nebulagg recorded ~42,000 tokens in tool definitions (dev.to/nebulagg, 2026); Piotr Hajdas measured 55,000 across 93 tool definitions on a slightly later release (dev.to/piotr_hajdas, 2026). Plan for 42K to 55K, paid on every turn until you intervene.
Does Claude Code's tool-search subagent work for the GitHub MCP?#
Yes, and it's the highest-ROI single fix. Joe Njenga's measurement showed main-thread tokens dropping from 51K to 8.5K (46.9%) on a GitHub-MCP-heavy workload (Joe Njenga, Medium, 2026). The mechanism is documented in the Claude Code sub-agents reference (Anthropic, 2026).
Can I just remove the GitHub MCP and use gh instead?#
For scoped operations, yes. Scalekit's 2026 benchmark put monthly cost at $3.20 for CLI vs $55.20 for MCP at 10,000 ops on Sonnet 4 (Scalekit, 2026). The trade-off: you lose cross-tool composition. Most teams end up using both, with allow-listing on the MCP side. See the MCP vs CLI decision rubric.
Is the official GitHub MCP server the only option?#
No. Several community MCP servers expose smaller subsets of the GitHub API at much lower token cost. The 42K number applies specifically to the official server with its full tool surface. Allow-listing reduces it without switching servers, and is usually the simpler path.
What about the GitHub Copilot extension? Does it have the same problem?#
The Copilot extension is a different surface that doesn't expose MCP-style schema preloads inside the Claude Code agent loop, so it doesn't levy the same context tax. The trade-off is reduced flexibility: you can't compose Copilot operations the way you can with MCP tools. Different tool, different shape, different cost profile.
Choose your fix#
If you take one action this week, enable Claude Code 2.0's tool-search subagent and audit your mcp.json for allow-listing. Together, those two changes typically cut GitHub MCP context tax by 90%+ with no prompt rewrites. Verify the drop on Claude Code's cost endpoint, before and after.
Within the month, pick one workflow, PR review automation is a good candidate, and run it through the gh CLI instead of MCP. Compare cost, latency, and reliability. If it wins on all three, expand the pattern.
Quarterly, ask the bigger question: how much of the GitHub MCP's job is actually retrieval the agent could consult instead of fetch? If the answer is "most of it", evaluate whether on-demand retrieval via a context layer architecture fits your stack. The fixes above buy you breathing room. Moving retrieval out of the loop is the structural answer to the context tax.



