Why Your AI Coding Agent Keeps Forgetting Your Codebase

Dennis Pilarinos·May 14, 2026·AI Agent Autonomy · Context Engines

Why Your AI Coding Agent Keeps Forgetting Your Codebase

Key Takeaways

• The "re-explain your stack" tax is the second-most-cited pain on r/mcp in 2025-2026, after raw token cost.

• Memory MCP servers (Stash, MemPalace, Hindsight, agentmemory) charge a 2-5K-token preload tax and only fix fact forgetting. They miss codebase shape, conventions, and the dynamic facts users never thought to save.

• This is a context source problem, not a memory store problem. Codebase shape and team conventions should be retrieved on demand at the moment of need, not preloaded as memory facts.

Wasting the first 5 minutes of every session re-explaining your stack. That phrase, pulled almost verbatim from the agentmemory project README and echoed across r/mcp threads through 2025 and 2026, names a pain every Claude Code user has felt by Tuesday morning.

You open a fresh session. You type the same six lines. "We use FastAPI 0.110. Postgres 15. Auth lives in /lib/auth/. Error handling follows pattern X. Tests use pytest with fixtures in /tests/conftest.py. Lint config is in pyproject.toml." Only then does the real prompt start. Tomorrow, you do it again.

This isn't laziness. It's a recurring cost the r/mcp community has been pricing into token bills for over a year, the same "context tax" we audited in last week's MCP token budget autopsy. The fix everyone reaches for first is a memory MCP server. The fix most teams actually need is something else entirely.

This piece does three things:

Gives the pattern a precise name.
Evaluates the memory-MCP partial fix and where it stops working.
Points to the structural fix: treat codebase context as something you retrieve at the moment of need, not something you preload as memory.

Why does your AI coding agent forget your codebase?#

Anthropic's own Claude Code documentation is explicit: session memory does not persist across /clear or new sessions, by design. Every session starts with the same blank slate plus whatever lives in CLAUDE.md and project rules. The community calls it the fresh-chat tax or the re-explain tax. It's the cost of starting over.

The mechanics are simple. A session boundary, whether triggered by /clear, a crashed terminal, or just closing the laptop, resets short-term memory to zero. There's no hidden cache. The next prompt loads only the system prompt, your CLAUDE.md, any MCP tool definitions, and the immediate user input. Everything you told the agent yesterday about your stack is gone.

CLAUDE.md is the de-facto memory store, and most teams overfill it. Anthropic's context-engineering essay makes the tradeoff explicit: every token in your project rules is a token spent every turn, in every session, forever. A 4,000-token CLAUDE.md charges 4,000 tokens against every single tool call the agent makes that day.

This hits coding agents harder than chat assistants for a structural reason. Codebases mutate. The auth.ts you wrote about last week may have moved. The pattern you documented last month may now be deprecated. Where a customer-support agent can pin "company refund policy" to memory and have it stay accurate for months, a coding agent's memory goes stale with every merged PR. Static memory is a poor fit for a dynamic artifact, and a codebase is the most dynamic artifact most engineers touch. The cost compounds with context rot past 200K tokens — bigger windows don't fix forgetting; they just delay it.

What do memory MCP servers actually do?#

The memory-MCP category has grown fast through 2025 and 2026. Tools like Stash, MemPalace, Hindsight, and agentmemory all sell the same promise: persist facts across sessions so you stop re-explaining. They deliver on that promise, with caveats. The doobidoo/mcp-memory-service project documents preload costs in the 2,000 to 5,000 token range per session, depending on stored memory volume.

Here is the landscape as it stands in May 2026:

Server	Stores	Token cost per session	What it fixes
Stash	User-defined facts	~2-3K	Recall of specific saved items
MemPalace	Hierarchical facts	~2-4K	Structured recall by category
Hindsight	Conversation context	~3-5K	Cross-session continuity
agentmemory	Vector + KV store	~3-5K	Semantic recall of past notes
mcp-memory-service	Long-term episodic	~3-7K	Persistent episodic memory

What they collectively do well: persist user-stated facts across sessions. If you told the agent "we use Postgres 15" yesterday and it's still true today, a memory MCP will keep that recallable. That's real value for stable preferences, code styles, and one-off corrections.

What they don't fix is everything the user didn't think to save. They don't fix codebase shape, which files exist and how they relate. They don't fix team conventions, the unwritten rules a senior engineer learned in code review three years ago. They don't fix the dozens of micro-facts ("auth uses Redis sessions, not JWT, because of the 2024 incident") that nobody writes down because writing them down is itself the friction we wanted to remove.

Memory MCP servers store user-declared facts and recall them on session start, with preload costs of 2,000 to 5,000 tokens per session per the agentmemory and mcp-memory-service project measurements. They solve fact recall. They don't solve codebase shape, team conventions, or any context the user never thought to save. The same critique that applies to standalone MCP servers applies here: MCP servers alone aren't enough for enterprise context, because the transport protocol doesn't make the architectural decision for you.

What are the two kinds of forgetting?#

There are two distinct forgetting patterns, often conflated in product copy and Reddit threads. Telling them apart is the prerequisite to picking the right fix. Joe Njenga's 2025 essay on the memory paradox names the split: agents forget facts, and agents forget structure. A third pattern, decision forgetting (the agent re-proposing rejected approaches because it never saw the rejection thread), gets its own treatment in the institutional-memory companion to this post. Different problems, different tools.

Fact forgetting is the easy case. "I told it yesterday we use Postgres 15, today it suggests SQLite." A memory MCP fixes this. The fact is small, stable, and the user is willing to declare it once. Memory stores were designed for exactly this shape of forgetting, and they work.

Shape forgetting is the hard case, and it's where the r/mcp pain actually lives. The agent doesn't know that apps/web/lib/auth.ts exists. It doesn't know to follow the pattern in apps/api/utils/error.ts. It doesn't know that the team rejected JWT in favor of Redis sessions back in March. None of that is solvable by storing facts, because the relevant facts number in the thousands and change with every PR.

The category mismatch matters: memory MCPs are sold to the fact-forgetting problem, but most engineers are feeling shape forgetting. Installing a memory MCP and expecting it to fix shape forgetting is like installing a contact-manager and expecting it to remember the floor plan of your house.

Why can't memory stores fix shape forgetting?#

Memory stores save facts users thought to save. Codebases contain roughly 10,000 relevant facts per 1,000-file repo, by Scott Spence's 2025 MCP context optimization analysis, and Mert Köseoğlu's Context Mode measurements at mksg.lu show vector-store recall accuracy degrading visibly past 10,000 entries. The math doesn't work. You can't save what you don't know to save, and you can't recall what the index can't rank.

There's a scaling wall. The doobidoo/mcp-memory-service README documents memory footprints exceeding 500,000 tokens after roughly 50 tool uses on a moderately active project. That's a full Claude context window spent on stored memory, with no room left for the actual code the agent is supposed to be editing. The memory becomes the problem.

There's also a freshness wall. Shape changes with every PR. Yesterday's memory entry that "auth lives in /lib/auth/" becomes a confidently wrong reference the moment someone refactors the monorepo. A memory MCP will happily serve the stale fact with full confidence, because it has no signal that the underlying code has moved. The result is a regression in agent reliability, not an improvement.

The fundamental mismatch: memory stores treat context as write-once, read-many. Codebases are write-many, read-many, with the write side moving faster than any reasonable cache invalidation strategy. Anthropic's context-engineering essay frames the principle as progressive disclosure: load the minimum, retrieve the rest. Memory stores do the opposite. They preload, then hope.

Memory stores require users to declare facts in advance, but codebases contain roughly 10,000 relevant facts per 1,000-file repo (Scott Spence, 2025), and vector recall accuracy degrades past 10,000 entries (Köseoğlu, 2025). Memory MCPs can hit 500K tokens after 50 tool uses on active projects (doobidoo/mcp-memory-service, 2025). The math forces a different architecture.

What does context-source retrieval look like?#

The structural fix is to invert the question. Instead of asking "what should we save in case the agent needs it later?", ask "what does the codebase and team history say at the moment of decision?". Anthropic's context-engineering blog calls this progressive disclosure. When the agent is about to write auth code, that's the moment to fetch the existing auth pattern, the recent PRs that touched auth, and the team decision threads where the pattern was set.

The framing shift is small but consequential:

Memory MCPs ask: "did you save this?"
Context-source retrieval asks: "what does the codebase, PR history, and team conversation say right now?"

The second question can be answered without the user thinking to save anything. It doesn't go stale, because it re-reads the source on each call. It scales, because the codebase indexes itself by being a codebase. And it covers the parts of context, conventions, design rationale, the why behind a pattern, that nobody would have written into a memory store because nobody knew they'd need to. The repo only shows you the WHAT; on-demand retrieval surfaces the WHY (code is not the full context).

Justin McCraw, a Software Engineer at The Information, described his setup in a recent customer conversation:

"My setup tells the agent: before you implement anything, go check Unblocked. It has everything, our repos, Notion, Slack, coding standards, and it surfaces things I wouldn't have thought to look for. GitHub Copilot doesn't have any of that organizational context."

Unblocked is built as the context layer for coding agents. It fetches answers on demand instead of preloading memory, and it unifies code repos, PRs, Slack, Jira, Notion, and Confluence at the moment of need. The shift Justin's describing is the same one this article is pointing at: stop trying to remember the codebase. Start retrieving it.

Context-source retrieval inverts the memory model: instead of preloading user-saved facts, the agent queries codebase, PR history, and team conversations at the moment of decision, the progressive disclosure pattern Anthropic's 2025 context-engineering guidance recommends. The cost is paid only when the answer is needed.

Where to look on Monday#

Three concrete moves, all measurable inside a week.

Audit your CLAUDE.md. Read it line by line. For each fact, ask: is this retrievable from code, PRs, or chat? If yes, it's preload tax. You're paying that token cost every turn of every session. Move it out. Anything that lives in git, in a Jira ticket, or in a Slack thread is better fetched on demand than carried in the system prompt.

Measure your memory MCP if you've installed one. Use the three metrics from last week's autopsy: tokens-per-tool-call, percent-of-context-consumed-pre-prompt, and first-try alignment on real tasks. If your memory MCP is costing 3,000 tokens per session and improving first-try alignment by less than ten percent, the ROI isn't there. Remove it and re-measure.

For shape forgetting specifically, run a two-week A/B. Half your sessions, work as you do today with whatever memory setup you have. Half your sessions, route shape questions ("where does auth live?", "what's our error pattern?") to on-demand retrieval against your actual repos and team systems. Track first-try alignment as the success metric. The gap, if there is one, will tell you whether your team's pain is fact forgetting or shape forgetting. Most teams find out it's the second one.

FAQ#

Why does Claude Code forget my codebase between sessions?#

By design. Anthropic's Claude Code documentation confirms session memory resets at every /clear and every new session. The only persistent state is CLAUDE.md and project rules, which charge a token cost on every turn. The community calls this the re-explain tax, and it's the second-most-cited pain on r/mcp in 2025-2026 after raw token cost.

Do memory MCP servers like Stash or Hindsight fix the forgetting?#

Partially. Memory MCPs solve fact forgetting, the case where you declared something yesterday and want it remembered today. They cost 2,000 to 5,000 tokens per session in preload, per the doobidoo/mcp-memory-service measurements, and they don't solve shape forgetting (where files live, what patterns the codebase actually uses). Most r/mcp pain is shape forgetting.

What's the difference between memory and context for AI agents?#

Memory is what the agent saved earlier and recalls later, static, declared, write-once. Context is what the agent needs right now to make a good decision, dynamic, retrieved, recomputed each call. Anthropic's context-engineering essay frames the distinction as preload versus progressive disclosure. Codebase shape belongs in the second category, not the first.

How do I make my agent remember my team's conventions?#

Don't try to make it remember. Make it retrieve. Team conventions live in code review comments, in design docs, in Slack threads where the senior engineer pushed back on the pattern. A memory store can't catalog that surface area. Index those systems and let the agent fetch the relevant convention at the moment it's writing code in that area. The convention stays fresh because the source stays fresh.

Is CLAUDE.md a good place to store memory?#

For stable, high-signal facts (test framework, language version, deploy command), yes. For codebase shape, file locations, evolving patterns, no. Every line of CLAUDE.md costs tokens on every turn, per Anthropic's own context-engineering guidance. If a fact is retrievable from code or git history, move it out and pay the cost on demand instead.