Agentic retrieval vs. traditional RAG

Two retrieval philosophies

Traditional RAG retrieves once and answers. An agent retrieves, reads, decides whether it needs more, searches again, and then answers. That difference (one hop versus many) is the crux of the debate between classic vector RAG and agentic retrieval.

Neither is universally superior. Understanding when each wins is more useful than picking a winner.

Classic RAG: retrieve once, answer once

The original RAG architecture is a single-pass retrieval loop. Embed the query, fetch the top-k most similar chunks from a vector store, inject them into the prompt, generate a response.

This is fast and predictable. For simple factual lookups against a well-indexed corpus ("what is the refund policy?"), it works reliably. The latency is low because there is only one retrieval call and one generation call.

The failure mode is hard queries. When the answer requires combining information from multiple locations, when the query is ambiguous and needs clarification, when the top-k chunks happen to be near-misses rather than true matches, the single-pass model has no recovery path. It answers with what it retrieved, right or wrong.

Agentic retrieval: search, refine, search again

An agentic retrieval system gives the model agency over the search process itself. Rather than being handed a fixed context blob, the agent can:

Issue multiple search queries, not just one
Read a retrieved result and decide it needs a related document before answering
Recognize when retrieval returned weak results and try different search terms
Synthesize across multiple sources and verify consistency before generating a final answer

This is more expensive: more LLM calls, more tokens consumed, more latency. But for complex queries, the accuracy improvement is substantial. The agent can triangulate, cross-reference, and verify before committing to an answer.

The term adaptive RAG captures one practical formalization of this idea: a routing layer sends simple queries through the cheap single-pass path and routes complex queries through a more expensive agentic loop. The system optimizes cost versus quality based on estimated query complexity, rather than treating every query identically.

When each approach wins

Vector RAG wins on:

Huge, unstructured corpora with millions of heterogeneous documents
Fuzzy semantic matching (synonyms, paraphrases, multilingual content) where exact search fails
Simple factual lookups where top-k is almost always sufficient
Latency-critical applications requiring a single retrieval round-trip

Agentic retrieval wins on:

Complex multi-hop queries that require combining information across documents
Bounded, structured corpora where navigation beats probabilistic similarity scoring
Tasks where answer correctness matters more than response speed
Knowledge bases with consistent structure that the agent can navigate by filename and heading rather than by vector distance

The distinction is not just about scale. It is equally about structure. A corpus of millions of disparate PDFs needs fuzzy similarity search to surface relevant content. A corpus of a few hundred well-organized concept files does not. In the second case, agentic grep-and-read is faster, cheaper, and more precise than any index.

The iterative advantage in practice

Consider a query like: what are the conditions under which a specific clause applies, and how do they interact with the definitions in another section?

A single-pass vector retrieval will return whatever chunks score highest for that query, which may or may not include both the relevant clause and the relevant definitions. If those passages score differently in a large index, one may be cut off by the top-k limit.

An agent reads the first relevant file, notices a cross-reference to the definitions section, follows that link, reads the second file, and then constructs the answer with both sources in context. The answer is better. The reasoning is transparent. The retrieved sources are explicit.

For document-heavy use cases in regulated industries (legal, healthcare, finance, public sector), the ability to trace exactly which files informed which answer is not a nice-to-have. It is a compliance requirement.

How pdf2okf fits in

pdf2okf produces OKF-compatible bundles structured for agentic retrieval. Each concept in the source PDF becomes a navigable markdown file. An agent (Claude Code, Hermes Agent, Odysseus, OpenClaw, or any MCP-aware tool) can search that bundle iteratively, follow cross-references, and synthesize answers with explicit source attribution.

The bundle is bounded and structured, which is exactly the regime where agentic retrieval beats single-pass vector RAG. Queries that need one file resolve in one read. Queries that need three files traverse three files. In both cases, the answer is grounded in specific, citable text, not in the probabilistic output of a similarity search.

That is the difference between finding the answer and approximating it.