All posts

The Context Layer: What It Is and Why Every AI Agent Needs One

Dennis PilarinosDennis Pilarinos·May 15, 2026·Context Engines · Engineering Insights
The Context Layer: What It Is and Why Every AI Agent Needs One

Bottom line: A context layer is the architectural component that sits between an AI agent and every system where its institutional knowledge lives, code repos, PRs, design docs, Slack threads, Jira tickets, incident postmortems. Without one, every agent invocation either pays context tax preloading schemas it might not use, or pays context debt by making wrong assumptions from missing knowledge. The minimum viable context layer in 2026 has four properties: it ingests heterogeneous sources, surfaces answers on demand rather than preloaded, resolves cross-source conflicts, and enforces permissions per query. This pillar defines each property, names what isn't a context layer, and provides the architectural rubric.

Every AI coding agent shipping in 2026 runs against the same three-part stack, whether the team building it has named the layers or not. There is a tool layer that controls how the agent acts (MCP servers, CLI invocations, Skills, browser automation). There is a reasoning layer that controls how the agent decides (the model itself, plus the agent loop and planner). And in between sits the context layer, which controls what the agent knows at the moment it has to act.

The middle layer is the one most engineering teams underinvest in. The New Stack called context "AI coding's real bottleneck in 2026" (The New Stack, 2026), and Anthropic's own context-engineering guidance frames the problem the same way: "the model can only reason with what's in context" (Anthropic Engineering, 2025). Tool registries are well understood. Frontier models are well understood. The plumbing that decides which institutional knowledge reaches the model on each turn, less so.

Here is the architectural taxonomy this pillar uses throughout, presented as a table because architecture diagrams as inline SVG render poorly in most blog renderers.

Table 1. The three-layer agent architecture.

LayerJobExamples
ReasoningHow the agent decidesThe model itself, agent loop, planner, evaluator
ContextWhat the agent knowsContext layer (this article)
ToolsHow the agent actsMCP servers, CLI tools, Skills, browser automation

Read the table top-down the way a request flows through the stack. The reasoning layer issues a plan. The plan needs facts the model does not already carry in its weights, the team's facts: who owns this service, why was the previous rewrite abandoned, which Jira ticket explains the schema change, what did the postmortem say about retries. The context layer is the component that answers those questions on demand. The tool layer then executes whatever the agent decides to do.

This article does three things: it defines what a context layer is, names what isn't one, and gives you a six-criteria rubric for evaluating any vendor or in-house build that claims the label.

What is a context layer?#

Citation capsule. A context layer is the architectural component that sits between an AI agent and every system where its institutional knowledge lives, code repos, PRs, design documents, chat history, ticketing systems, and incident postmortems. The term has circulated informally since 2024 in coverage of "the missing context layer" for AI coding agents, but no single source has staked a precise definition. This pillar does.

The core definition has three parts. First, a context layer aggregates information from heterogeneous sources, not from one source. Second, it surfaces that information at the moment of decision, when the agent is mid-task and needs an answer, rather than preloading every possible fact into a system prompt. Third, it acts as a mediator, sitting between the agent and the source systems so that retrieval, permissioning, ranking, and conflict handling happen in one place rather than smeared across a dozen tool integrations.

That is what distinguishes the context layer from its neighbors. The tool layer beneath it acts on the world. The reasoning layer above it makes decisions. The context layer in the middle does neither, it answers questions. The New Stack's 2026 reporting on context bottlenecks frames the same architectural split (The New Stack, 2026), as does Anthropic's published guidance on agent context engineering (Anthropic Engineering, 2025).

The reason the term has stayed fuzzy is that vendors with adjacent products have all tried to claim it. A code-search tool wants to be a context layer. A vector database wants to be a context layer. An MCP server wants to be a context layer. None of them are, on their own, because none of them aggregate across heterogeneous sources, mediate permissions per query, and resolve cross-source conflict. Those four properties, defined below, are what separate a real context layer from a labelled wrapper around a single source.

How is a context layer different from a context engine?#

Citation capsule. Context layers and context engines are routinely conflated in 2026 vendor marketing. The earlier P5 spoke on this site, a context layer is not a context engine, drew the line: the layer provides information, the engine provides understanding. The layer returns relevant sources; the engine returns the right answer.

The cleanest way to see the difference is in what the component returns to the agent. A context layer's job ends when it has surfaced the relevant documents, snippets, threads, and tickets, ranked and deduplicated. A context engine's job continues past that point, into resolving contradictions across sources, weighting recency against authority, and synthesising a single decision-grade answer from the raw material.

Table 2. Context layer vs context engine.

DimensionContext layerContext engine
What it providesInformationUnderstanding
What it doesAggregates and surfacesReasons and resolves conflicts
OutputRetrieved sources, rankedSynthesised answer, justified
Stops atReturning relevant docsReturning the right answer
Failure modeMissing or mis-ranked sourcesConfidently wrong synthesis

Most products on the market in 2026 that call themselves "context engines" are actually context layers that have not yet done the reasoning work. Most products that call themselves "context layers" are tool registries that have not yet done the aggregation work. The category is muddier than the marketing suggests, and the rubric in H2 6 is meant to cut through it.

The two are not competitors. A context engine sits on top of a context layer, the same way the reasoning layer sits on top of the context layer. You need both to ship an agent that gives correct answers about institutional knowledge.

What are the four properties of a real context layer?#

Citation capsule. A real context layer in 2026 has four non-negotiable properties: heterogeneous source ingestion, on-demand surfacing, cross-source conflict resolution, and per-query permission enforcement. Any system missing one of the four is something else, a code-search tool, a vector store, a memory plugin, or a static config file. Anthropic's published cookbook on context engineering names overlapping concerns (Anthropic Engineering, 2025).

Heterogeneous source ingestion#

The layer must read from code repos and pull requests and issue trackers and chat and docs, not one or two of them. A "GitHub context layer" is incomplete. A "Slack context layer" is incomplete. Institutional knowledge that prevents agents from making wrong assumptions is distributed across systems by design, the design decisions live in docs, the rationale lives in PRs, the constraints live in chat, and the failure modes live in incidents.

Scott Spence's 2025 write-up on MCP optimisation made the operational version of the same point: connecting an agent to one tool feels productive until the agent confidently does the wrong thing because the constraint lived in the other tool (Scott Spence, 2025).

On-demand surfacing#

The layer must surface answers at the moment of decision, not preloaded into the agent's context window at session start. This is the property that distinguishes a context layer from a CLAUDE.md file and from memory MCP servers. Preloading is a context tax, the agent pays for every token whether it uses the information or not. On-demand retrieval is paid only when needed.

The cost difference is large. Community measurements from r/mcp through 2025 and Joe Njenga's 2026 write-ups consistently put preload-heavy agent configurations at roughly 71K tokens before the first user message, compared to single-digit-thousands for on-demand retrieval (Joe Njenga, Medium, 2026). The P8 Pillar on MCP token economics breaks down where those tokens go.

Cross-source conflict resolution#

When the Slack thread says X, the design doc says Y, and the code says Z, which one wins? A real context layer surfaces the conflict explicitly rather than silently picking the most recent source or the highest-ranked source. This is where "context layer" begins to overlap with "context engine", surfacing a conflict is information, resolving it is reasoning.

The three-hard-lessons retrospective covered the operational failure mode in detail: a layer that hides conflicts trains the agent to be confidently wrong, while a layer that surfaces them trains the agent to ask a clarifying question. The Anthropic cookbook treats this as a first-class concern in its examples (Anthropic Engineering, 2025).

Per-query permission enforcement#

Engineers reading the codebase do not have access to legal's HR Slack channel. Sales engineers do not have access to security's incident postmortems. The context layer must respect source-system permissions at query time, not at ingestion time, because the people asking the agent questions vary, and ingestion-time permissioning collapses everyone into a single least-restrictive role.

Otherwise you have not built a context layer. You have built a compliance violation with a chat interface.

What is NOT a context layer?#

Citation capsule. Several adjacent products are routinely sold or self-described as context layers and are not, because each is missing at least one of the four properties above. The Pragmatic Engineer's 2026 AI tooling review names many of the same products and notes the same gap between marketing claim and architectural reality (Pragmatic Engineer, 2026).

  • A single MCP server. One source, one tool surface. No aggregation. MCP is a transport protocol for tool calls, not an aggregation layer.
  • CLAUDE.md or AGENTS.md files. Static text, no retrieval, no permission model, no freshness mechanism.
  • A vector database on its own. No source diversity, no conflict resolution, no permission enforcement per query. Useful as a substrate, not as a layer.
  • A RAG pipeline. Typically single-source, typically no permission model. A starting point, not a finished context layer.
  • A general enterprise-search tool. Strong for breadth, weak on engineering-specific schema, weak on PR and code-history depth.
  • A code-search tool on its own. Strong on code, blind to PR rationale, chat decisions, design docs, incident history.

Naming these as comparison points is fair. Citing their marketing pages as authoritative architectural definitions is not, which is why this section's evidence is the Pragmatic Engineer review and the Anthropic 2026 Agentic Coding Trends Report (Anthropic, 2026), neither of which has a product to sell in the category.

Why does every AI coding agent need a context layer?#

Citation capsule. Anthropic's 2025 context-engineering guidance frames the case directly: "the model can only reason with what's in context" (Anthropic Engineering, 2025). Community measurements add the cost dimension, preloading everything costs roughly 71K tokens before the first turn, while loading nothing leaves the agent making wrong assumptions (Joe Njenga, Medium, 2026). A context layer is the architectural answer to both pressures.

There are three failure modes an agent hits without one.

  1. Wrong assumptions. No retrieval means the agent answers from model weights alone, which do not contain the team's decisions. This is context debt.
  2. Token bloat. Preloading everything to avoid wrong assumptions floods the context window, which Chroma's 2025 context-rot research shows degrades model performance well before the nominal context limit (Chroma, 2025). InfoQ's coverage of Anthropic Opus 4.6 reports the same pattern on the MRCR v2 benchmark, with retrieval accuracy dropping from 76% to 18.5% as context length grows (InfoQ, 2026; Anthropic, 2026).
  3. Stale knowledge. Memory MCPs that don't update when code, docs, or chat update give the agent a confidently outdated picture of the system — the decision-history retrieval problem sits in this category.

A real context layer threads all three: heterogeneous ingestion fixes the assumption gap, on-demand surfacing fixes the bloat, and freshness commitments fix the staleness.

Customer perspective — Raphael Bres, CTO, Tradeshift.

"You cannot make coding agents work without domain and functional context. We connected and trained Unblocked on our Code repos, Atlassian tools, Internal docs, Product Documentation, KB from Support and Slack history. When an agent asks a question, it gets the full picture, not just the code analysis, but also why decisions were made and what the constraints are. Other tools like Copilot know only the code. That's limited value."

Bres's account names what a context layer does in production. Unblocked is the context layer for AI coding agents, the system that unifies PRs, Slack, Jira, Notion, Confluence, S3, and code repos at the moment of decision, designed to give agents decision-grade context, not just retrieval. That is the only mention of the product in this pillar. The rest of the rubric below is vendor-neutral.

How do you evaluate context layers?#

Citation capsule. Use a six-criteria rubric. Score every candidate, in-house or vendor, on the same six dimensions, and a real context layer will separate cleanly from an adjacent product with a similar label. The Anthropic 2026 Agentic Coding Trends Report names overlapping evaluation dimensions for enterprise buyers (Anthropic, 2026).

Table 3. Six-criteria evaluation rubric.

#CriterionWhat to ask
1Source coverageHow many heterogeneous source systems, and which ones? Code, PRs, chat, tickets, docs, incidents at minimum.
2Retrieval shapePreload or on-demand? What is the median token cost per query?
3Conflict handlingSilent winner-picking, or explicit conflict surfacing with attribution?
4Permission modelPer-query enforcement against source-system ACLs, per-ingest only, or none?
5Update freshnessReal-time, near-real-time, batch, or static? How is staleness measured?
6Reasoning depthReturns raw docs, ranked snippets, or synthesised answers with citations?

Walk a candidate through all six. A code-search tool will score on (1) for code, (5) for freshness, and fail (2), (3), (4), and (6). A memory MCP will score on (2) for cheap retrieval and fail (1), (3), (4), and (5). A vector store will score on retrieval mechanics and fail nearly everything else.

The trap is to weight only (1) and (2). Source coverage and retrieval shape are the easy criteria to evaluate, because they are visible from a feature list. Permission enforcement and conflict handling are the criteria that decide whether the agent is safe to put in front of an engineer who does not already know the answer, and those criteria are visible only in an actual integration. Insist on the harder evaluation.

What's the relationship between the context layer and the rest of the stack?#

Citation capsule. The context layer sits between the tool layer (MCP, CLI, Skills) and the reasoning layer (the model and agent loop). The New Stack's 2026 piece on agentic development trends groups the same components into a similar three-part stack (The New Stack, 2026).

Below the context layer sits the tool layer. MCP servers expose actions and data sources. CLI tools execute commands (Claude Code, the OpenAI Codex CLI, and several peers). Skills package repeatable workflows. The agent uses the tool layer to act on the world after it has decided what to do.

The context layer itself aggregates from heterogeneous sources, surfaces answers on demand, resolves conflicts, and enforces permissions, the four properties from H2 3. It does not act and it does not decide. It answers.

Above the context layer sits the reasoning layer. The model evaluates retrieved context and plans. The agent loop iterates plan-act-observe. The evaluator checks results. This is where context engines start to live, on top of the context layer, doing the synthesis work the layer itself does not.

If you want the cost view of the bottom layer, the P8 Pillar on MCP token budgets is the right next read. If you want the upgrade path from context layer to context engine, the P4 Pillar covers it.

Frequently asked questions#

What's the difference between a context layer and a context engine?#

A context layer provides information, a context engine provides understanding. The layer aggregates and surfaces sources, ranked and deduplicated. The engine reasons over those sources, resolves conflicts, and returns a single justified answer. Most market products labelled "engine" are layers that haven't done the reasoning work, and most labelled "layer" are tool registries that haven't done the aggregation work. The earlier P5 spoke walks the distinction in detail.

Is MCP a context layer?#

No. MCP is a transport protocol for tool calls and data access, not an aggregation layer. A single MCP server exposes one source through a defined interface. A context layer integrates many such sources, applies a single permission model, resolves conflicts across them, and surfaces answers on demand. MCP is a substrate a context layer can use, the same way HTTP is a substrate a web application can use. The protocol is not the application.

What sources should a context layer cover for a typical engineering team?#

At minimum: code repos, pull requests with review comments, issue tracker (Jira or Linear), chat (Slack or Teams), design docs (Confluence, Notion, Google Docs), and incident postmortems. Anthropic's 2026 Agentic Coding Trends Report names overlapping sources as table stakes for enterprise agentic coding (Anthropic, 2026). Anything narrower misses the rationale, constraint, and history that prevents agents from repeating rejected approaches.

How do I evaluate a context-layer vendor?#

Use the six-criteria rubric in H2 6: source coverage, retrieval shape, conflict handling, permission model, update freshness, and reasoning depth. Score the vendor on each criterion using their live integration in your environment, not a demo. The criteria most often skipped, permission enforcement and conflict handling, are the criteria that decide whether the agent can be trusted in front of engineers who do not already know the right answer.

Can I build my own context layer?#

Yes, and several large engineering organisations have. The hard parts are not retrieval mechanics, which a vector store and a few connectors will get you, but the four properties from H2 3: keeping heterogeneous connectors fresh, enforcing per-query permissions across systems with different ACL models, surfacing cross-source conflicts without silently dropping them, and keeping query-time token cost low. Most in-house builds underestimate the connector and permission engineering and end up shipping a single-source RAG pipeline with a more ambitious name.

What to evaluate when comparing context layers#

Treat this section as a checklist for the next time a product, in-house build, or open-source project claims to be a context layer.

First, demand all four properties from H2 3. Heterogeneous source ingestion, on-demand surfacing, cross-source conflict resolution, and per-query permission enforcement. A system that has three of the four is interesting. A system that has only two is something else with a misleading label.

Second, score against the six-criteria rubric in H2 6. Score every candidate the same way, on a live integration, not on a slide deck. Pay special attention to permissions and conflict handling, because those are the criteria that determine whether the agent can be safely shown to engineers who do not already know the answer.

Third, map the candidate onto the three-layer architecture in Table 1. If the candidate also wants to act on the world or also wants to make decisions, it is doing more than one layer's job and the abstraction will leak. The components in each layer should compose, not overlap.

The category is young and the labels are still soft. The rubric is what makes the comparison real.