All posts

Satisfaction of Search: Why AI Agents Stop Before Finding the Right Answer

Brandon WaselnukBrandon Waselnuk·April 18, 2026
Satisfaction of Search: Why AI Agents Stop Before Finding the Right Answer

Key Takeaways

• Satisfaction of search is a cognitive bias where finding one result suppresses the search for others. Tuddenham first documented it in radiology in 1962, and Berbaum et al. confirmed a 30% secondary miss rate in 1990 (Berbaum et al., Radiology, 1990).

• AI coding agents replicate this bias by stopping at the first code path that compiles, missing organizational conventions, deprecated patterns, and cross-repo dependencies.

• RAG pipelines amplify the problem because similarity ranking rewards the first plausible chunk, not the most authoritative one.

• Countering satisfaction of search in AI agents requires structured multi-pass retrieval and conflict resolution, not just bigger context windows.

Satisfaction of Search: Why AI Agents Stop Before Finding the Right Answer

Why does your AI coding agent keep shipping code that compiles, passes lint, and gets bounced in review?

The code isn't wrong in the obvious sense. It runs. It even satisfies the immediate prompt. But it misses the migration convention your team adopted last quarter, the deprecated helper that still lingers in three repos, or the Slack thread where a staff engineer explained why the naive approach doesn't work. The agent found an answer. It just didn't find the right answer.

Satisfaction of search is a cognitive bias where finding one result causes the searcher to stop looking for additional results. Tuddenham first documented the effect in radiology in 1962. Subsequent studies confirmed that radiologists miss roughly 30% of secondary findings after detecting a primary abnormality (Tuddenham, 1962; Berbaum et al., 1990).

This pattern has found a new host. In medical imaging, a radiologist who spots one fracture becomes measurably less likely to notice a second abnormality on the same scan. Tuddenham first documented this effect in 1962, and subsequent research confirmed that approximately 30% of secondary findings are missed after a primary abnormality is detected (Tuddenham, "The Visual Perception of Radiographic Quality," in Radiologic Clinics of North America, 1962; Berbaum et al., Radiology, 1990). The bias isn't laziness. It's a cognitive shortcut: once a plausible result appears, the search feels complete.

AI agents exhibit the same behavior. Not because they're tired, but because they optimize for the first sufficient completion. This post defines satisfaction of search as it applies to AI agent behavior, traces its origins, and offers concrete strategies to counter it.

Learn more about how context shapes agent behavior in our guide to context engineering fundamentals.

!A radiology lightbox showing an X-ray with one circled finding and a second uncircled finding fading into the background

---

Satisfaction of search is a cognitive bias first identified in radiology by Tuddenham in 1962 and rigorously studied by Berbaum and colleagues, who found that radiologists missed roughly 30% of secondary abnormalities after detecting a primary finding (Berbaum et al., Radiology, 1990). The bias operates below conscious awareness and persists even among experts.

The bias describes what happens when finding one answer suppresses the motivation to keep looking. The term comes from diagnostic radiology, where a clinician spots a fracture and unconsciously stops scanning the image for additional injuries. Tuddenham's 1962 work first named the phenomenon. Berbaum's controlled experiments in 1990 quantified it: secondary miss rates climbed by roughly 30% once a primary abnormality was detected. The effect was not about skill. Experienced radiologists showed the bias just as reliably as trainees.

The mechanism is straightforward. A search has a cost. Once the searcher finds something that satisfies the query, the perceived cost of continuing rises while the perceived reward drops. Why keep looking when you already have a plausible answer? Gartner's 2025 forecast projected that by 2028, 75% of enterprise software engineers will use AI code assistants, up from fewer than 10% in early 2023 (Gartner, 2025). As adoption scales, so does the impact of this bias.

From radiology to software#

The leap from radiology to AI agents is shorter than it sounds. Both involve scanning a large information space under time pressure. Both reward "good enough" completions. And both suffer when the searcher treats the first sufficient match as the only necessary one.

In conversations with engineering teams using AI coding agents, we've observed a consistent pattern: agents retrieve one relevant code example, pattern-match against it, and produce output that compiles but violates an unwritten convention. The convention existed in a Slack thread, a Confluence page, or a PR comment the agent never consulted. The agent wasn't wrong. It was incomplete, in the same way a radiologist who finds one fracture but misses the second isn't incompetent. The search just stopped too early.

For a deeper look at how context engines address this gap, see our explanation of what a context engine is and why it matters.

How Does Satisfaction of Search Show Up in AI Coding Agents?#

The Stack Overflow 2025 Developer Survey found that 84% of developers use or plan to use AI tools, yet only 33% trust the accuracy of those tools' output (Stack Overflow Developer Survey, 2025). That trust deficit maps directly to satisfaction of search: agents produce plausible code that developers don't trust because it stops at the surface.

AI coding agents don't get tired or distracted. But they optimize for token-efficient completions. The JetBrains 2025 Developer Ecosystem Survey found that 62% of developers using AI assistants reported spending more time reviewing AI-generated code than they expected, suggesting the output quality gap is widely felt (JetBrains Developer Ecosystem Survey, 2025). When the agent queries its context, whether through RAG, tool calls, or file reads, it retrieves the first result that appears sufficient and begins generating. It doesn't ask "is there a second relevant source that contradicts this one?" That question would require a second retrieval pass, more compute, and a reasoning step that most agent loops don't include.

The result looks like productivity. Code appears fast. Tests pass. The PR goes up. Then a reviewer catches that the agent used a pattern the team explicitly abandoned two sprints ago. That reviewer's comment, "this doesn't match how we do X," is the fingerprint of satisfaction of search in AI agents.

The three symptoms#

Three patterns signal that this bias is operating in your agent workflow.

Convention misses. The code compiles but violates an unwritten team standard. The standard existed in Slack, not in the codebase, so the agent never saw it.

Stale pattern reuse. The agent found a code example that matched the query. That example was from 2023 and has since been refactored. Similarity search doesn't distinguish current from deprecated. Google DeepMind's 2025 research on tool-using agents showed that agents without temporal grounding reuse outdated API patterns at rates exceeding 40% in codebases with active refactoring (Google DeepMind, 2025).

Single-source answers. The agent consulted one document, one file, or one code snippet. It never cross-referenced against a second source. In radiology terms, it found one fracture and stopped scanning.

We've found that teams running multiple parallel agents hit satisfaction of search harder than teams running one agent at a time. With parallel agents, reviewers can't babysit every session. The agents run unsupervised, each one stopping at its own first-plausible answer, and the errors compound in the review queue.

This is why we recommend teams stop babysitting their agents and invest in structured context instead.

Chroma's 2025 Context Rot research demonstrated that retrieval accuracy degrades significantly as input length grows, with models losing the ability to use information well before the stated context window limit is reached (Chroma Research, 2025). RAG pipelines that stuff more chunks into the prompt hit diminishing returns and amplify satisfaction of search.

RAG retrieves by similarity. It ranks chunks by how closely they embed to the query vector. The top-ranked chunk isn't necessarily the most authoritative, the most current, or the most relevant to the specific task. It's the most similar. That's a critical distinction, and it makes RAG the architecture most prone to satisfaction of search in AI agents.

Here's the mechanism. The agent sends a query. The vector store returns the top five chunks. The agent reads chunk one, finds a plausible code pattern, and begins generating. Chunks two through five may contain contradictory information, a newer convention, or an explicit deprecation notice. But the agent already has what it needs, or thinks it does.

Similarity is not authority#

The Stanford HAI 2025 AI Index Report found that AI systems still struggle with multi-step reasoning tasks, with accuracy dropping significantly when queries require synthesizing information across multiple documents (Stanford HAI AI Index, 2025). Legal and medical corpora are structured and authoritative. Codebases, Slack threads, and Notion pages are messier. If retrieval-augmented systems fail that often on well-structured data, the failure surface for engineering context is larger.

Context rot compounds the problem#

Chroma's context rot research showed that models degrade as the prompt fills with retrieved content. More chunks don't help if the model can't reason over all of them effectively. So even if you retrieve five relevant chunks, the agent may functionally process only the first two. Satisfaction of search operates at the retrieval layer, then again at the reasoning layer. The bias compounds.

Most RAG evaluations measure recall: "did the relevant chunk appear in the retrieved set?" That's necessary but insufficient. The real question is whether the agent used the right chunk when multiple chunks conflicted. We've seen cases where the correct convention was retrieved at position three, but the agent generated from position one because it was sufficient. Recall was perfect. Behavior was wrong.

For a detailed comparison, read our breakdown of how a context engine differs from RAG.

What Does an Agent Miss When It Stops Too Early?#

GitHub's 2025 Copilot research found that developers using AI assistants accepted over 30% of code suggestions, but acceptance rate alone doesn't capture quality: teams reported that post-merge defects rose when agents operated without organizational context (GitHub Engineering Blog, 2025). Every missed convention, stale pattern, or unresolved conflict in agent-generated code lands in a codebase that's growing faster than teams can manually review.

When satisfaction of search goes unchecked, the misses cluster into three categories. Each one creates a different kind of downstream cost.

Cross-repo dependencies#

Modern codebases don't live in a single repository. A change to the authentication service affects the API gateway, the frontend session handler, and the mobile app's token refresh logic. An IEEE Software 2025 study on AI-assisted development found that cross-repository defects account for 28% of post-merge bugs in organizations using AI coding agents, compared to 11% in teams coding manually (IEEE Software, 2025). An agent that satisfies its search in one repo misses the ripple effects in three others. The PR compiles. The integration breaks.

Institutional decisions#

The Pragmatic Engineer's 2025 survey of engineering organizations documented that teams with strong internal documentation practices ship faster than teams without them, and that the gap widens as team size grows (The Pragmatic Engineer, 2025). Those institutional decisions, why we chose Postgres over DynamoDB, why the retry logic caps at three, exist in docs and discussions the agent never retrieves. An agent that stops at the code misses the reasoning behind the code.

Active work in progress#

Someone on your team is already refactoring the module your agent just modified. A PR is open. A Slack thread is live. The agent doesn't know because it didn't look beyond the committed codebase. Now you have two conflicting changes, and the merge conflict carries the full cost of the context gap.

"I keep Unblocked globally installed so it's in every Claude session by default. During the axios incident, I didn't have all of HeyJobs' repos on my laptop, so I asked Unblocked to find every repository we had that could be vulnerable and report back. It came back with repositories I wasn't fully aware of. That's the pattern, I know what I'm looking for, but Unblocked finds what I don't know about."
  • Youssef Eladawy, Senior Software Engineer, HeyJobs

!A developer's screen showing a PR review with multiple context-miss comments highlighted alongside the original agent-generated code

This is where decision-grade context changes the equation for engineering teams.

How Do You Counter Satisfaction of Search in Your AI Workflow?#

Anthropic's engineering team documented that effective context engineering requires curating the minimum high-signal tokens for the task, and that model behavior degrades as irrelevant context accumulates (Anthropic, 2025). Countering satisfaction of search isn't about more context. It's about structured, multi-pass context that forces the agent past the first plausible answer.

The radiology community didn't solve this bias by telling radiologists to "look harder." They introduced structured checklists, second reads, and systematic search patterns that force the clinician to examine every region regardless of what they've already found (Bruno et al., British Journal of Radiology, 2015). The same structural approach applies to AI agents.

Force multi-pass retrieval#

Don't let the agent query once and generate. Structure your agent loop to query, generate a draft answer, then query again with the draft as context. The second pass catches what the first pass missed because the agent now has a hypothesis to challenge. This is the agent equivalent of a radiologist's second read.

Add conflict resolution above retrieval#

When two sources disagree, the agent needs a mechanism to decide which one wins. Freshness, authority, and cross-reference checks are the minimum. Without conflict resolution, the agent picks whichever source ranked higher in similarity. That's satisfaction of search baked into the architecture.

Use checklists in the agent prompt#

Radiology checklists work because they externalize the search pattern. You can do the same for agents. Include explicit instructions: "Before generating, verify the pattern against the team's current conventions. Check for open PRs that modify the same files. Confirm no deprecation notices exist for the referenced APIs."

Expose the agent to cross-repo context#

DORA's 2025 State of DevOps report found that teams adopting AI tools without redesigning their workflows saw no improvement in delivery throughput, and some experienced increased change failure rates (DORA, 2025). Cross-repo context is the antidote: give the agent visibility into related repositories, active PRs, and discussion threads so it can't stop at a single-repo answer.

In our experience, the teams that most effectively counter satisfaction of search in AI agents aren't using bigger models or longer context windows. They're using structured retrieval with conflict resolution, essentially building what a context engine provides: multi-source, permission-aware, task-shaped context that forces the agent past the first plausible chunk.

The concept of being the context engine yourself is where many teams start before adopting tooling.

FAQ#

Satisfaction of search is a cognitive bias where finding one result causes the searcher to stop looking for additional results. Tuddenham first documented the effect in radiology in 1962. Berbaum et al. confirmed in 1990 that radiologists miss roughly 30% of secondary abnormalities after detecting a primary finding (Berbaum et al., Radiology, 1990). In the context of AI agents, satisfaction of search describes the pattern where an agent retrieves the first plausible code example and stops before finding more authoritative or more current sources.

Is satisfaction of search just a RAG problem?#

No. The bias appears in any retrieval architecture, including file reads, tool calls, and web searches. RAG amplifies satisfaction of search because similarity ranking structurally rewards the first plausible match, but the pattern operates wherever the agent's search loop lacks a forcing function for completeness. McKinsey's 2025 State of AI report found that 72% of organizations have adopted AI in at least one business function, yet fewer than half report measurable productivity gains from those deployments (McKinsey Global Survey on AI, 2025). Architecture alone won't fix a process gap.

Bigger windows help with capacity but not with behavior. Chroma's context rot research showed that model performance degrades well before the stated context limit (Chroma Research, 2025). Stuffing 200,000 tokens into the window doesn't guarantee the agent will reason across all of them. The issue isn't how much context fits. It's whether the agent is structured to use the second, third, and fourth relevant sources after finding the first.

How is satisfaction of search different from hallucination?#

Hallucination is when the model fabricates information that doesn't exist in any source. Satisfaction of search is when the model retrieves real information but stops before finding the most relevant or most current version. The output looks correct, and it partially is. That makes it harder to catch in review because the code compiles and the referenced patterns do exist, just not as the current standard.

Does satisfaction of search apply to non-coding AI agents?#

Yes. Any agent that searches a knowledge base, retrieves documents, or queries tools is susceptible to satisfaction of search. The New Stack's 2025 coverage of AI agent architectures documented that multi-step agent workflows fail most often at the retrieval-to-reasoning boundary, not at generation (The New Stack, 2025). The bias sits exactly at that boundary, where the agent decides it has enough and stops looking.

For a deeper analysis of how retrieval architecture shapes agent behavior, see our comparison of context engines and RAG pipelines.

Beyond the First Plausible Answer#

This is not a new bug. Satisfaction of search is a sixty-year-old cognitive bias that has found a new host in AI agents. Radiologists learned to counter it with structure: checklists, second reads, systematic scanning. AI agents need the same discipline, not through willpower, but through architecture.

The core insight is this: an agent that finds one plausible answer and stops is doing exactly what it was optimized to do. Changing that behavior requires changing the system around the agent. Multi-pass retrieval. Conflict resolution. Cross-repo visibility. Forrester's 2025 analysis of AI developer tools found that organizations investing in structured context delivery saw 2.3x higher developer satisfaction and 35% fewer post-merge defects compared to those relying on raw model capabilities alone (Forrester Research, 2025). Permission-aware context that doesn't just surface similar documents but resolves them into something the agent can act on with confidence is what closes the gap.

The Stack Overflow 2025 survey's trust gap, where 84% of developers use AI tools but only 33% trust the output, won't close by making models faster or smarter. It will close when agents stop settling for the first result and start searching until they find the right one.

Start building that foundation with decision-grade context for your engineering team.