All Articles

Claude Code's Context Window: How It Works and How to Manage It (2026)

Claude Code's context window is 200,000 tokens by default, 1M on Opus 4.6 and later. Size isn't the lever. See where it goes and how to manage it.

Claude Code's Context Window: How It Works and How to Manage It (2026)

Key Takeaways

Claude Code's context window is 200,000 tokens by default; Opus 4.6 and later, plus Sonnet 4.6, can run a 1-million-token window.

A bigger window is not a more usable one: Opus 4.6 reads 76% of a 1M context on MRCR while Sonnet 4.5 manages 18.5%.

Your 200K fills before you start. System prompt, CLAUDE.md, and one MCP server (around 42K tokens) can eat a fifth of the window.

Run /context to see exactly where your tokens go and how much autocompact buffer is reserved.

Use /compact to summarize and keep going on the same task; use /clear to wipe and start a new one.

The 1M window is opt-in on Opus 4.6 and later, but a model that reads it at 18.5% makes a clean 200K the better default.

The Claude Code context window is 200,000 tokens by default, roughly 500 pages of text, and on Opus 4.6 and later, plus Sonnet 4.6, you can switch it to a 1-million-token window (Anthropic Claude Code docs, 2026). That is the headline number every page on this topic quotes. Size is the least interesting part. A single MCP server can burn roughly 20% of your 200K before you type a prompt, and a 1M context is only as good as the model's ability to read it: on Anthropic's own benchmark, that ranges from 76% down to 18.5% (Anthropic, 2026). This post covers what the window is, what fills it, how the 1M option actually works and costs, and the four commands that manage the window day to day.

One framing runs through all of it: curated retrieval over context-dumping. Surface the relevant slice instead of paying to stuff the window, because context degrades as the window fills.

How big is the Claude Code context window?#

Claude Code's context window is 200,000 tokens by default, about 500 pages of text, and the 1-million-token option is limited to Opus 4.6 and later, plus Sonnet 4.6 (Anthropic Claude Code docs, 2026). Everything else, including Sonnet 4.5, stays at 200K (Anthropic build-with-claude docs, 2026). The window is a finite attention budget, not free storage.

The number matters less than people expect. A 200K window you keep clean beats a 1M window you let fill with noise. There's also a hard edge to know about: on 4.5 and later models, if your input plus max_tokens exceeds the window, the API returns a stop_reason of model_context_window_exceeded rather than silently truncating (Anthropic build-with-claude docs, 2026).

Per Anthropic's documentation, the default Claude Code context window is 200,000 tokens, with a 1-million-token option on Opus 4.6 and later plus Sonnet 4.6; all other models, including Sonnet 4.5, remain capped at 200K (Anthropic, 2026).

ModelContext window
Opus 4.6 and later1M
Sonnet 4.61M
Sonnet 4.5200K
All other models200K

Source: Anthropic Claude Code docs, 2026.

What fills the context window before you even start?#

A fresh session is already partly spent. Per Anthropic's docs, the system prompt runs around 4,200 tokens, a project CLAUDE.md around 1,800, a global CLAUDE.md around 320, and auto-memory loads the first 200 lines or 25KB of MEMORY.md (Anthropic Claude Code docs, 2026). None of that is your prompt. It's overhead the window pays before work begins.

MCP tools are cheaper than people assume by default: they load tool names only, around 120 tokens, with schemas deferred via tool search. Setting ENABLE_TOOL_SEARCH=auto loads schemas upfront only when they fit under 10% of the window. Then file reads take over. Each read runs 1,100 to 2,400 tokens, and the docs are blunt that file reads dominate ongoing context usage (Anthropic Claude Code docs, 2026).

The line items you can see, your prompts and your file reads, are not the ones eating your budget fastest. The fixed overhead and an enabled MCP server's schemas do that, before your first request, which is why curated context beats stuffing the window.

A fresh Claude Code session spends thousands of tokens before any prompt: roughly 4,200 on the system prompt and 1,800 on a project CLAUDE.md, while file reads, at 1,100 to 2,400 tokens each, dominate ongoing usage (Anthropic Claude Code docs, 2026).

Where a fresh 200K window goes#

Line itemApprox. tokensNotes
System prompt~4,200Fixed overhead, every session
Project CLAUDE.md~1,800Loaded at session start
Global CLAUDE.md~320User-level rules
MCP tool definitions (full server enabled)~42,000Around 21% of 200K; deferred by default
File reads1,100-2,400 eachOngoing; dominates usage

Figures per Anthropic Claude Code docs, 2026; the MCP figure is Unblocked's own measurement, 2026.

Why does a single MCP server cost you 42,000 tokens?#

Enabling a full MCP server's schemas is the single largest controllable line item in your window. In our own measurement, GitHub's MCP server injects roughly 42,000 tokens of tool definitions before your first prompt, about 21% of a 200K window (Unblocked, 2026).

Default loading avoids this. Claude Code loads tool names only and defers schemas through tool search, so the tax appears when you enable a whole server's schemas, not when you connect it. The fix is mundane: enable only the tools a given task needs, and rely on deferred tool search for the rest. Most teams never measure it, so we walk through the full MCP token-tax breakdown server by server.

A single MCP server can be the largest controllable cost in a Claude Code window. GitHub's MCP server injects roughly 42,000 tokens of tool definitions, about 21% of a 200K window, before the first prompt (Unblocked, 2026), and trimming that tax is one of the fastest ways to reduce AI token costs overall.

Is a 1-million-token context window actually usable?#

Size and usable size are different numbers. On Anthropic's 8-needle MRCR v2 benchmark at 1M tokens, Opus 4.6 scores 76% while Sonnet 4.5 scores 18.5%, the same window read four times less reliably (Anthropic, 2026). Turning on 1M does not give you 1M usable tokens. It gives you a bigger budget the model reads with model-dependent reliability.

This isn't an Anthropic-only quirk. Chroma Research tested 18 frontier models, including Claude Opus 4, and found every one of them degrades as input grows, in non-uniform ways (Chroma Research, 2025). Coverage of Opus 4.6 noted the same gap between window size and retrieval, alongside the model's context-compaction work (InfoQ, 2026). The headline "a million tokens" weakens the moment you measure whether the model can retrieve from it. This post focuses on the window itself, while our companion piece covers why accuracy drops as input grows and how retrieval recovers it.

A 1M context window is not 1M usable tokens. On Anthropic's 8-needle MRCR v2 benchmark at 1M, Opus 4.6 retrieves 76% while Sonnet 4.5 manages 18.5%, and Chroma Research found all 18 models it tested degrade as input grows (Anthropic, 2026; Chroma Research, 2025).

How do you turn on the 1M window, and what does it cost?#

Cost used to be the catch. Until early 2026, requests above 200K tokens billed at a premium long-context tier, roughly $10 and $37.50 per million input and output tokens against the standard $5 and $25. Anthropic removed that surcharge in March 2026, so the full 1M window now bills at standard per-token rates (The New Stack, 2026; Anthropic pricing docs, 2026). That settles the price question, but not the usability one: a model that reads a 1M window at 18.5% is the real reason to reach for it rarely, not by default.

Turning it on is simple. In Claude Code, select a [1m] model variant on a supported model. On the API, you pass the 1M beta header, and 1M support extends across Anthropic's API, Bedrock, and Vertex (Anthropic build-with-claude docs, 2026). Anthropic first shipped the 1M window for Sonnet in 2025 before extending it (Anthropic, 2025), and current per-token pricing lives in Anthropic's pricing docs (Anthropic pricing docs, 2026). Reach for 1M on genuinely large single-shot tasks running on Opus 4.6 and later. For everyday work, a managed 200K beats an unmanaged 1M.

What do /context, /compact, /clear, and /usage actually do?#

Four commands control the window, and most users only know two. Per Anthropic's docs, /context gives a live breakdown by category plus optimization suggestions and shows the reserved autocompact buffer; /compact replaces the conversation with a structured summary, and you can focus it with /compact focus on X; /clear wipes everything so context returns to 0; and /usage (also /cost or /stats) reports session cost and plan limits (Anthropic Claude Code docs, 2026).

Some models also self-report. Sonnet 4.6, Sonnet 4.5, and Haiku 4.5 receive a live token-budget <system_warning> so the model itself knows how full the window is (Anthropic Claude Code docs, 2026). The table below is the quick reference worth bookmarking.

CommandWhat it doesWindow effectUse when
/contextLive breakdown by category plus suggestions; shows autocompact bufferNone (read-only)You want to see where tokens went
/compactReplaces the conversation with a structured summaryLarge reduction; continuity preservedSame task, window getting full
/clearWipes the conversation entirelyContext returns to 0Switching to unrelated work
/usageShows session cost and plan limitsNone (read-only)Checking spend against your plan

Four slash commands manage the window: /context shows a live breakdown and the autocompact buffer, /compact summarizes the conversation, /clear wipes it to 0, and /usage reports cost and plan limits (Anthropic Claude Code docs, 2026).

What survives a /compact, and what gets lost?#

Compaction is not lossless, and knowing the boundary saves you from re-explaining your project. Per Anthropic's docs, a /compact preserves the system prompt, the project-root CLAUDE.md, unscoped rules, and auto-memory. It drops paths:-scoped rules and nested CLAUDE.md files until they are re-read (Anthropic Claude Code docs, 2026). Skills get re-injected but capped at 5,000 tokens per skill and 25,000 total.

The practical rule follows directly. Anything that must persist across a compaction belongs in your project-root CLAUDE.md or in unscoped rules, not in a nested or path-scoped file that compaction drops until re-read. This is one of the more common ways Claude Code seems to forget your codebase mid-session.

/compact vs /clear: which should you use?#

This is the claude code clear context question most people get wrong. Use /compact for continuity: you're on the same task, you still need the thread's reasoning, but the window is filling. Use /clear for separation: a new, unrelated task where you want a clean slate and a full window. Compact to keep going, clear to start over (Anthropic Claude Code docs, 2026).

There's also auto-compaction. It exists and fires automatically as you approach the limit. The exact trigger threshold is not officially published and has shifted across releases, so don't anchor to a number you read in a forum. Instead, watch the autocompact buffer shown in /context. That reserved buffer is the real signal for how much room you have left before Claude Code summarizes on its own.

Use /compact to summarize and continue the same task and /clear to wipe context and start a new one; auto-compaction fires automatically near the limit, but the exact threshold is unpublished, so watch the autocompact buffer in /context (Anthropic Claude Code docs, 2026).

How should you manage the context window day to day?#

Managing the Claude Code context window beats buying a bigger one. Here is the decision tree, drawn from the command behavior above (Anthropic Claude Code docs, 2026). Window filling on the same task? Run /compact, or /compact focus on X. Switching to unrelated work? Run /clear. An MCP server eating the window? Disable unused servers and lean on deferred tool search. A genuinely large single-shot task on Opus 4.6 or later? Switch to the [1m] variant. Not sure where it all went? Run /context first, always.

The durable fix sits underneath all of it: curated retrieval over context-dumping. Surface the relevant slice on demand instead of paying to stuff the window with everything you might need. Teams who treat the window as a budget to defend, not a vault to fill, spend less time on redo loops. One customer put the before-state plainly:

"Before Unblocked, I was manually compiling documentation into a local folder just so Claude Code could reference it. Now it pulls everything directly. I'm getting 90% accuracy on complex data structure questions."

— Austin Rojan, Onboarding Specialist, Subsplash

That persistence across sessions is what turns a clean window into durable institutional memory.

Frequently asked questions#

How do I clear context in Claude Code?#

Run /clear to wipe the conversation entirely, which returns context to 0, when you're switching to unrelated work. If you're staying on the same task but the window is filling, use /compact instead to summarize the thread and continue with the reasoning preserved (Anthropic Claude Code docs, 2026).

Does Claude Code have a 1M context window?#

Yes, on Opus 4.6 and later, plus Sonnet 4.6: select a [1m] model variant. Sonnet 4.5 and every other model remain at 200,000 tokens. Anthropic removed the earlier above-200K premium in March 2026, so the 1M window now bills at standard per-token rates (Anthropic build-with-claude docs, 2026; The New Stack, 2026).

Why does the context window fill so fast?#

Overhead and reads, not your prompts. The Claude Code context window is spent partly before you type. A fresh session already runs roughly 4,200 tokens on the system prompt and 1,800 on a project CLAUDE.md, and file reads run 1,100 to 2,400 tokens each. A full MCP server can add around 42,000 more (Anthropic Claude Code docs, 2026; Unblocked, 2026).

What is Claude Code's context window size?#

The default context window size is 200,000 tokens, roughly 500 pages of text. On supported models, Opus 4.6 and later plus Sonnet 4.6, you can switch to a 1,000,000-token window (Anthropic Claude Code docs, 2026).

Managing the Window in Practice#

The team that wins isn't the one that switched on 1M. It's the one that keeps 200K clean. Size is the least important lever you have. Management is the real one: see the window with /context, shape it with /compact and /clear, prune the MCP servers you don't need, and stop paying for context you never read. Treat the Claude Code context window as a budget to defend, and a 1M budget the model reads at 18.5% stops looking like an upgrade over a tidy 200K.

What makes the difference over a quarter of daily work is the habit underneath the commands: feed the agent the WHY behind the code instead of dumping every file and hoping the model finds the signal. Run /context on your next session to see where your tokens go, then read how context degrades and how retrieval recovers it.