Codex Context Window: How It Works and How to Manage It (2026)

GPT-5.5 lists 1,050,000 tokens, but Codex caps the window at 400,000 and only ~258,000 is usable. Size isn't the lever; curation is.

Dennis PilarinosJul 3, 2026Engineering InsightsContext Engineering

Codex Context Window: How It Works and How to Manage It (2026)

Key Takeaways

• The advertised 1,050,000-token figure is GPT-5.5's API ceiling, not your Codex budget; Codex caps the window at 400,000 and roughly 258,000 is usable in a live session.

• Size is not the lever. Accuracy degrades as the window fills, a pattern researchers call context rot.

• The system prompt, tool definitions, AGENTS.md files (32 KiB cap), and file reads spend the budget before real work starts.

• Manage it with /status, /compact, /clear, /new, and auto-compaction, then curate what you load instead of stuffing the window.

OpenAI lists GPT-5.5's context window at 1,050,000 tokens, with a maximum output of 128,000 (OpenAI GPT-5.5 model page, 2026). Inside Codex you get 400,000 of that, and a live session reports only about 258,000 as usable. Even those tokens don't all work equally well. The number on the box isn't the capacity you get. And the capacity isn't the real constraint anyway. Three numbers run through this guide: 1,050,000 advertised, 400,000 in Codex, and roughly 258,000 you can actually fill. Below, we cover how the Codex context window works, why the usable budget is so much smaller than the headline, and how to manage it day to day. Feed the agent only the context that matters, because a clean window usually beats a full one.

How big is the Codex context window, really?#

GPT-5.5's API context window is 1,050,000 tokens with a maximum output of 128,000 (OpenAI GPT-5.5 model page, 2026). Inside Codex that drops to a 400,000-token cap, and a live session shows roughly 258,400 usable, about a quarter of the advertised million. That gap is what trips people up.

The arithmetic is simple. The model catalog splits the Codex window into 272,000 input tokens plus 128,000 reserved for output, which sums to the 400,000 cap, and the CLI keeps about 5% headroom, so 272,000 times 0.95 lands near 258,400 (openai/codex #19319, 2026). One dated note, because models move fast: as of June 2026 the models behind Codex are GPT-5.5 (launched April 23, 2026) and the coding-tuned GPT-5.3-Codex, while gpt-5-codex reaches end of life on July 23, 2026. The window number keeps changing. The management discipline doesn't.

Number	What it is	Tokens
Advertised	GPT-5.5 API context window	1,050,000
In Codex	Window cap (272K input + 128K output)	400,000
Usable	Effective input budget (272K x ~0.95)	~258,400

Sources: OpenAI GPT-5.5 model page, 2026; openai/codex #19319, 2026.

Why is the usable window smaller than the advertised one?#

The gap is by design, not a defect, though it has generated its share of confused bug reports. The Codex model catalog reserves 128,000 of the 400,000-token window for output, leaving 272,000 for input, and the CLI trims about 5% as headroom (openai/codex #19319, 2026). So roughly 258,000 input tokens is the real budget. That reserved output slice is the room the model needs to write its answer, which is why you cannot reclaim it.

There is also a price edge at the top of the window. GPT-5.5 prompts above 272,000 input tokens are billed at 2x input and 1.5x output (OpenAI GPT-5.5 model page, 2026). Standard rates run $5 per million input tokens, $0.50 cached, and $30 per million output (OpenAI Codex pricing, 2026). The scarce top of the window is also the expensive part. So there are two reasons to treat the openai codex context budget as something you defend rather than fill: you run out of room, and you pay a premium for the last stretch.

Does a bigger context window make Codex smarter?#

No. Chroma Research tested 18 frontier models in July 2025 and found every one degraded as input length grew; focused prompts of about 300 tokens beat full prompts near 113,000 tokens on LongMemEval (Chroma Research, 2025). A bigger window buys more room, not more accuracy.

The pattern holds across studies. "Hidden in the Haystack" found that smaller gold contexts degrade performance and amplify positional sensitivity across eleven leading language models (arXiv 2505.18148, 2025). Even inside your ~258,000-token budget, then, stuffing the window quietly lowers the quality of the answer.

Adoption is outrunning trust. Codex passed 5 million weekly active users by June 2026, up from around 600,000 at the start of the year (Constellation Research, 2026), yet only about 33% of developers trust the accuracy of AI tools, down from 43% the year before (Stack Overflow Developer Survey, 2025). What closes that gap is context discipline, not window size. We dig into why a full window lowers accuracy in a companion piece.

Where does the Codex context window actually go?#

Before you type a task, the window is already paying rent. The system prompt, the tool definitions, and your AGENTS.md files all load first; AGENTS.md is read root to leaf, with nearer files overriding, and is skipped once the combined size hits project_doc_max_bytes, 32 KiB by default (OpenAI AGENTS.md guide, 2026). By the time work starts, a real chunk of the ~258,000 is gone.

In our own sessions, two line items do most of the damage: every connected MCP tool pays for its schema each turn, and file reads pile up fast as the agent explores. The fix is unglamorous. Keep AGENTS.md scoped and under the cap, and prune tool definitions you never call. That hygiene buys back usable window before you spend a token on the actual task, which is exactly the tool-definition tax we autopsied server by server.

Line item	Loaded when	Effect on budget
System prompt	Every session	Fixed overhead
Tool definitions	Each turn, per connected tool	Grows with tool count
AGENTS.md files	Session start, root to leaf	Skipped past 32 KiB
File reads	As the agent explores	Dominates ongoing usage

Source: OpenAI AGENTS.md guide, 2026.

How do you manage the Codex context window day to day?#

Per OpenAI's documentation, Codex ships built-in controls for managing the window, and most users touch only two of them (OpenAI Codex docs, 2026). The levers below map to a single command each, so the codex cli context budget stays visible and recoverable instead of silently filling until quality drops.

text/status        see the active model, token usage, and writable roots
/compact       summarize the conversation to reclaim tokens, keeping key details
/clear         wipe the conversation to start clean
/new           start a fresh session in the same repo
/statusline    surface a live context and token counter in the footer

Two more levers sit outside the command line. Auto-compaction fires automatically near the limit and is tunable through the compaction threshold in config. AGENTS.md hygiene, keeping the file under its 32 KiB cap and pruning unused tools, protects the budget at the source (OpenAI AGENTS.md guide, 2026). The practical rule: start fresh per task, compact before a long detour, and curate what you load.

What actually fixes the context problem, and where can't Codex help?#

The window controls slow the bleed; they don't supply the right context. Codex still can't tell which doc is current, why the code is shaped the way it is, or what your team already tried and rejected. None of that lives in the files it reads, which is why curation beats raw token count every time, the same lesson Chroma's degradation data points to (Chroma Research, 2025).

The durable fix is retrieval: pull only the slice that matters instead of loading the whole window. That is what context engineering is for, and one engineer described his setup plainly:

"My setup tells the agent: before you implement anything, go check Unblocked. It has everything — our repos, Notion, Slack, coding standards — and it surfaces things I wouldn't have thought to look for. GitHub Copilot doesn't have any of that organizational context."

— Justin McCraw, Software Engineer, The Information

A context engine like Unblocked does exactly this: it feeds the agent the relevant, reconciled slice from across your repos, docs, and chat instead of asking the window to hold everything. That habit, feeding the agent the right context on demand, is the lever the window size never was. It is also why "bigger window" lands as myth number three and why context engineering is the discipline underneath it.

Frequently asked questions about the Codex context window#

How big is the Codex context window?#

GPT-5.5's API window is 1,050,000 tokens, but Codex caps it at 400,000 and only about 258,000 is usable in a live session (OpenAI GPT-5.5 model page, 2026). The catalog reserves 128,000 of that cap for output, so input is the part you actually fill, and even that fills with overhead before your first prompt.

Why does Codex show ~258K when OpenAI advertises 1M?#

The 400,000-token Codex window splits into 272,000 input plus 128,000 reserved output, and the CLI keeps about 5% headroom, so 272,000 times 0.95 leaves roughly 258,000 input tokens to fill (openai/codex #19319, 2026). The 1,050,000 figure is GPT-5.5's API ceiling, not your Codex budget.

How do I free up context in Codex?#

Use /compact to summarize the conversation and keep going, or /clear and /new to reset for a fresh task; auto-compaction also kicks in near the limit (OpenAI Codex docs, 2026). Run /status first to see where your tokens went before deciding which control to reach for.

Does a bigger context window cost more?#

Yes. GPT-5.5 prompts above 272,000 input tokens are billed at 2x input and 1.5x output, on top of standard rates of $5 per million input and $30 per million output (OpenAI GPT-5.5 model page, 2026; OpenAI Codex pricing, 2026). So the top of the window is both scarce and pricier.

Is GPT-5.5 or GPT-5.3-Codex better for long sessions?#

GPT-5.3-Codex is the coding-optimized model, so it is a sensible default for long agent runs, while GPT-5.5 is the general flagship (OpenAI GPT-5.5 model page, 2026). Either way, curating context beats filling the window, since accuracy degrades with input length regardless of model.

Spending the Window Wisely#

The Codex context window is smaller and more expensive at the top than the headline suggests, and accuracy drops before you ever fill it. That changes the job. Stop chasing a bigger budget and start defending the one you have: check /status, compact before long detours, clear between tasks, keep AGENTS.md lean, and prune tools you don't use. Those habits hold steady even as the model behind Codex changes again next quarter.

The deeper win is upstream of any command. A context layer decides what's worth a token, surfacing the current doc and the reason the code looks the way it does, so the agent starts with signal instead of noise. If you also work in Anthropic's tooling, the same logic governs the Claude Code context window. Open /status on your next Codex session and watch how fast the window fills; that is where the real savings start.