# Permission-Aware Context Retrieval for AI Coding Agents: A Guide for Engineering Teams


URL: https://getunblocked.com/blog/permission-aware-context-retrieval-ai-coding-agents/
Published: 2026-07-01T16:40:51.419Z
Author: Dennis Pilarinos
Categories: Context Engine, Engineering Insights

How to enforce permission-aware access control at the retrieval layer for AI coding agents. Gartner: 40% of enterprise apps will run agents by 2026.

---
Permission-aware context retrieval means filtering what an AI coding agent is allowed to pull into its context window at the moment it asks, scoped to the identity of the person the agent is acting for. It is the difference between "the agent can reach the wiki" and "the agent can only see the wiki pages this engineer could open themselves." Generic identity tools fall short for one reason: they gate whether an application can connect to a source, not what an agent retrieves and assembles into an answer once it is inside. That gap is where permission-aware access control lives, and it is why role-based rules written for human logins do not survive contact with autonomous agents.

For engineering managers, security-conscious CTOs, and platform engineers at regulated companies, the stakes are concrete. An agent that can quietly pull a restricted design doc, an unredacted incident timeline, or a compensation spreadsheet into an answer has just created a data-exposure event that no login screen would have caught. This guide explains the concept, why generic RBAC breaks for agents, how retrieval-time filtering works, the steps to implement it, what your audit trail must capture, how it maps to SOC 2, and how the major platforms compare.

## What is permission-aware context retrieval?

According to Auth0's engineering team, an agent's effective role "can change from moment to moment," so access decisions must be evaluated per request rather than assigned once ([Auth0, August 2025](https://auth0.com/blog/access-control-in-the-era-of-ai-agents/)). Permission-aware context retrieval applies that principle to knowledge, not just actions. When an agent needs context to answer "why did we deprecate the billing service," a permission-aware system gathers candidate material from code, tickets, chat, and docs, then discards anything the acting identity is not entitled to see before a single token reaches the model. The enforcement point is retrieval itself.

This differs sharply from app-level gating, which grants the whole application one broad service identity and trusts it to behave. Under app-level gating, once the integration authenticates, every document it can technically reach is fair game for every user who prompts the agent. Permission-aware access control instead moves the decision to where the sensitive data actually flows: the retrieval call. Each query is evaluated against the entitlements of the specific human on whose behalf the agent is acting, so two engineers asking the same question can legitimately receive different answers, because their underlying access differs. That per-identity resolution is the defining property, and it is what turns a convenient integration into a control you can defend to a security team.

## Why do generic RBAC and identity tools fall short for AI agents?

Oso's research puts the problem bluntly: employees ignore an estimated 96% of the permissions they hold, but "agents won't," relentlessly using whatever access they are given ([Oso, 2025](https://www.osohq.com/learn/why-rbac-is-not-enough-for-ai-agents)). Tools like Okta, SailPoint, and Infisign are excellent at what they do, which is deciding whether an application or a human can authenticate and connect. They govern the front door. They do not sit inside the retrieval path deciding, document by document, what a given query is allowed to surface.

When you wire an AI agent to a wiki or a repository through one integration credential, every user of that agent inherits the same over-broad service account. A junior contractor and a staff engineer prompt the same bot and, under the hood, both queries run as that one privileged identity. RBAC assumes stable subjects with stable roles, but an agent acting on behalf of many people against per-resource scopes would need a distinct role for every user-resource pair, which stops being a role at all. Oso calls this failure mode role explosion: creating narrow roles like "file-reader-for-project-x" quickly proliferates into an unmanageable sprawl. That collapse is why identity-provider RBAC alone cannot enforce who-sees-what once an agent starts assembling context across systems.

## How does retrieval-time filtering actually work?

Retrieval-time filtering evaluates permissions at the instant an agent requests context, scoped to the requesting identity, rather than trusting a one-time gate. The failure it prevents is structural: a language model has no inherent concept of user permissions, so any pipeline that passes large, unfiltered chunks into the prompt lets the model synthesize answers from documents the requester should never have been able to open. Nothing in the model stops it, which is why the enforcement has to happen before the text reaches the context window. Placement is the whole game. If you filter at ingestion, you freeze permissions at index time and drift out of sync the moment someone loses access. If you filter at the model, you have already leaked the data into the prompt.

Filtering at retrieval means the system collects candidate snippets, checks each one against the source system's live access-control lists for the acting user, drops anything unauthorized, and only then assembles context. Consider the sequence concretely: a query arrives carrying the engineer's identity, the retriever pulls fifty candidate passages ranked by relevance, the permission check removes the twelve the engineer cannot open, and the remaining thirty-eight form the context the model reasons over. This is why the retrieval layer is the correct enforcement point: it is the last checkpoint before private data becomes an answer. Retrieval-time filtering keeps the check tied to current, live permissions rather than a stale snapshot.

## How do you implement permission-aware context retrieval?

Implementing permission-aware access control for agents is a sequence of concrete engineering steps, not a single toggle. The arXiv guidance on resilient agents frames least privilege as an "essential complementary control" in a defense-in-depth strategy, which is exactly the posture these steps encode ([arXiv, September 2025](https://arxiv.org/abs/2509.08646)).

1. **Map identities to source-system permissions.** Inventory every connected source (repos, wikis, ticketing, chat, object storage) and confirm how each expresses access: groups, ACLs, project roles, sharing rules. Your retrieval layer can only enforce what it can read from the source of truth, so gaps here become blind spots later.
2. **Propagate the acting user's identity to the retrieval call.** The agent must carry the human operator's identity, not a shared service account, so every query is attributable to a real person and scoped to their entitlements. In practice this often means passing a user token or an identity claim through the agent's context call rather than authenticating once as the application.
3. **Filter candidate context at query time against live ACLs.** For each retrieved snippet, check the current permission in the originating system, or a faithfully synced mirror of it, and discard anything the acting user cannot access before assembly. Never rely on a stale index snapshot, because a revoked permission that still lingers in the index is an active leak.
4. **Log every retrieval.** Record who asked, which source answered, what was returned, and when. This is your audit evidence and your debugging trail, and it is far cheaper to build in from the start than to retrofit under audit pressure.
5. **Review access regularly.** Run periodic access reviews so entitlements, group memberships, and revocations stay current, and so the retrieval layer keeps reflecting reality. Access that made sense at onboarding rarely still makes sense a year later.

Done in order, these steps turn a permissive integration into a defensible control that an auditor and an incident responder can both trust.

## What should a context-retrieval audit trail capture?

A context-retrieval audit trail should record who asked, what source responded, what content was returned, and when, with each event timestamped and tamper-resistant. Guidance on SOC 2 access controls (control CC6.1) describes logging each access event to establish a continuous, traceable record that meets audit requirements, with every event timestamped and reviewable ([ISMS.online, 2025](https://www.isms.online/soc-2/controls/logical-and-physical-access-controls-cc6-1-explained/)). For AI agents that means logging at the granularity of individual retrievals, not just session logins.

Capture the acting identity, the query, the specific documents or snippets surfaced, the source system, and the exact time. Retention should match your compliance obligations, and the log itself should resist modification, ideally through append-only or write-once storage, so it can stand as evidence rather than a mutable convenience. A well-formed retrieval audit trail answers two questions after the fact: could this agent have seen this data, and did it. Without per-retrieval logging, you can prove an agent had access but never what it actually pulled, which is the question an incident review or auditor will ask first. The same trail also doubles as an operational debugging tool, letting you trace exactly which sources shaped a bad answer.

## What does SOC 2 and regulated-industry compliance require here?

SOC 2's Common Criteria CC6 governs logical access controls, and auditors increasingly extend that reasoning to service accounts and AI agents, expecting "distinct identity separation so every privileged action stays attributable to an accountable individual" ([soc2auditors.org, 2026](https://soc2auditors.org/insights/soc-2-security-controls/)). Two capabilities map almost directly onto these controls. Retrieval-time filtering satisfies the access-control requirement: it enforces least privilege on what an agent can read, per user, at request time. Per-retrieval logging satisfies the audit-evidence requirement: it produces the timestamped, attributable trail auditors ask for.

Regulated industries such as finance and healthcare layer additional data-handling and retention rules on top, but the core pattern holds across frameworks like ISO 27001 and HIPAA: prove you control who can access what, and prove you can show what happened. The shared-service-account pattern fails both tests at once, because it makes actions unattributable and access over-broad. Gartner reports that organizations running regular AI-system assessments are more than three times likelier to reach high generative-AI value ([Gartner, November 2025](https://www.gartner.com/en/newsroom/press-releases/2025-11-04-gartner-survey-finds-regular-ai-system-assessments-triple-the-likelihood-of-high-genai-value)). This section is guidance, not legal advice; confirm specifics with your auditor.

## Platform comparison: RBAC for engineering context

The tools below sit at different points in the stack. Some enforce policy decisions, some catalog data governance, and some retrieve engineering context directly. Comparing them head-to-head can mislead, because a policy engine and a context engine are not substitutes; they are layers. When evaluating permission-aware access control for AI coding agents, the decisive column is where enforcement happens and whether filtering is tied to the acting user at retrieval time. A tool that only gates application access, or only synchronizes permissions at ingestion, leaves the exact gap this guide is about. Read the table as a map of enforcement points, not a ranking.

| Platform | Enforcement point | Retrieval-time filtering | Audit log | IdP integration | Deployment |
| Unblocked | Context-retrieval layer, per query | Yes, scoped to acting user via Data Shield | Audit trail of context retrievals | Yes, existing identity providers | SaaS, SOC 2 Type 2 |
| Augment Code | Coding-agent context layer | Partial, respects source permissions on ingest | Enterprise logging | Yes, SSO/SAML | SaaS |
| Glean | Retrieval, filtered against synced ACLs | Yes, permissions checked before results reach the model | Yes | Yes, SSO/SAML | SaaS and hybrid |
| Cerbos | Policy decision point (PDP) | No, it decides allow/deny, does not retrieve context | Yes, decision audit logs | Yes, via app integration | Self-hosted or hub-managed |
| Atlan | Data catalog and governance control plane | No, governs data access and lineage, not agent retrieval | Yes, access monitoring | Yes | SaaS |


Cerbos and Atlan are strong at their jobs; Cerbos runs as a policy decision point that answers allow-or-deny for a given subject, action, and resource with sub-millisecond latency, and Atlan governs metadata, lineage, and data-access policy across pipelines ([Cerbos docs](https://docs.cerbos.dev/cerbos-hub/decision-points.html); [Atlan](https://atlan.com/active-data-governance/)). Neither retrieves engineering context for a coding agent, so pairing them with a retrieval-layer control is common rather than redundant: Cerbos can be the engine that answers "may this user see this document," while the retrieval layer is what asks the question at query time. Glean and Augment sit closer to Unblocked in that they retrieve, with Glean checking permissions before results reach the model ([Glean](https://www.glean.com/ai-glossary/rag-retrieval-augmented-generation)). The practical evaluation question is not which vendor is broadly best, but whether the enforcement point in your stack is the retrieval call and whether it is bound to the acting user's live entitlements.

Unblocked positions itself as the context engine for engineering, and its permission-aware feature, [Data Shield](/blog/data-shield/), enforces your permissions per query at retrieval time rather than only at ingestion. It keeps an audit trail of context retrievals, integrates with existing identity providers rather than replacing them, and is SOC 2 Type 2. The practical effect is that the same permission model your team already maintains follows the agent into every query it runs, without a separate set of rules to keep in sync. One engineering team described the governance value directly:

> "My biggest use of Unblocked MCP has been AI governance — searching across Slack, fourteen Notion docs, S3, trying to understand where data lives and where the gaps are. There is no other way to humanly accomplish this task. It's an absolute godsend for getting context out of sources that don't talk to each other."
>
> — Gustavo Alvarez, Software Engineer, Sixfold

## Frequently asked questions

**Is permission-aware retrieval the same as RBAC?** No. RBAC assigns permissions to roles and is usually checked when a user or app authenticates. Permission-aware retrieval applies to what an agent pulls into context at query time, scoped to the acting user. It can consume RBAC data from source systems, but it enforces it at a different, later point: the moment of retrieval.

**Why is retrieval-time filtering better than ingestion-time permissioning?** Ingestion-time permissioning freezes access at index time, so it drifts out of date whenever someone gains or loses access. Retrieval-time filtering checks live permissions at the instant of the query, so revocations take effect immediately and the agent never assembles an answer from documents the current user cannot open.

**Does permission-aware retrieval help with SOC 2?** Yes. It maps onto SOC 2's CC6 access-control expectations by enforcing least privilege on what agents read, and, when paired with per-retrieval logging, it supplies the timestamped, attributable audit evidence auditors request. It is a meaningful control contribution, though full compliance spans many controls. Confirm scope with your auditor.

**Can it use our existing identity provider?** Yes. A well-built permission-aware system integrates with your existing identity provider and source-system groups rather than replacing them. It reads entitlements from those systems and enforces them at retrieval, so you keep one source of truth for identity while adding a filtering checkpoint agents did not previously have.

## Putting permission-aware retrieval into practice

Gartner projects that 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from under 5% in 2025 ([Gartner, August 2025](https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025)). That trajectory makes retrieval-layer permissioning a near-term requirement, not a future one. The single step to take this week: audit how each of your AI agents connects to its sources, and find every place a shared service account collapses many users into one identity. Those are your leak points, and they are usually easy to spot once you look for a single credential fanning out to many humans. From there, decide where retrieval-time filtering and per-retrieval logging belong in your stack, and whether you build that layer or adopt one that already enforces it. If a context engine is part of that plan, Unblocked's [Data Shield](/blog/data-shield/) enforces permissions per query and keeps an audit trail of retrievals. To go deeper, see [what a context engine is](/blog/what-is-a-context-engine/) and how it compares across the [best engineering knowledge platforms for AI coding agents](/blog/best-engineering-knowledge-platforms-ai-coding-agents-2026/). The teams that get this right treat retrieval-time permissioning as table stakes, not a feature to bolt on after the first incident.