The LLM-wiki pattern: a markdown knowledge base for agents

The pattern in one sentence

An LLM-wiki is a folder of plain markdown files that an agent reads directly: no embeddings, no vector index, no preprocessing step standing between the document and the model.

That is the whole pattern. The sophistication lies in what it enables.

Where it comes from

Knowledge workers have been building personal knowledge bases in plain markdown for years. Obsidian, Logseq, and their predecessors proved that a flat folder of interlinked .md files is a surprisingly powerful thinking tool: searchable, portable, version-controllable, and readable without the application that created them.

The LLM-wiki idea applies the same principle to agent-accessible knowledge. Andrej Karpathy and others in the AI community articulated the intuition: instead of building a vector retrieval pipeline, give the agent well-structured markdown files it can navigate. The agent already knows how to grep and read. The file system is the index.

Why maintainability is the killer feature

The hidden cost of vector-based knowledge systems is maintenance velocity. Adding a new document means chunking, embedding, upserting, and verifying the index updated correctly. Correcting a factual error means finding the right chunk (which may have been split across multiple vectors), editing the source, and re-running the pipeline.

In a markdown knowledge base, you edit one file. That is the entire update cycle. The agent sees the change on the next query because it reads the files directly. No job to trigger, no index to verify, no cache to invalidate.

For teams maintaining living knowledge bases (internal documentation, regulatory guidance that changes, product specs that evolve), this matters enormously. The edit-to-visible gap is seconds, not minutes waiting for an indexing job. And because the files are in git, every change is auditable, reversible, and attributable.

The pattern, made concrete

A well-structured LLM-wiki typically follows a few consistent conventions:

One concept per file. Each markdown file covers one topic completely. Cross-references use standard markdown links to related files.
Frontmatter for machine-readable metadata. Title, tags, related concepts, and a last-updated date in the YAML header at the top of each file.
Consistent internal structure. Sections the agent can scan: a definition paragraph, key properties, examples, known caveats, related concepts. Predictable structure means predictable navigation.
A flat or shallow hierarchy. Deep folder nesting hurts agent navigation. Most LLM-wikis work best with one or two levels, organized by domain rather than by document origin.

An agent given this structure can answer a query by searching filenames, scanning headings, reading the matching file, and following one or two cross-references if needed. The retrieval is explicit, transparent, and reproducible. There are no opaque similarity scores. Either the file matched or it did not.

OKF: the standardized, agent-friendly version

The Open Knowledge Format (OKF), published by Google on 2026-06-12, formalizes exactly this pattern. It specifies markdown plus frontmatter as the standard representation for machine-readable knowledge bases: one concept per file, consistent metadata fields, designed to be consumed by agents directly without a preprocessing step.

pdf2okf produces OKF-compatible bundles. The tool is named after the standard: PDF to OKF.

OKFZ extends OKF into a portable, shareable bundle: a self-contained archive that packages the concept files, metadata, and structure so the whole knowledge base can be shared, versioned, and consumed without re-processing the source document. Build the bundle once from the PDF; share the .okfz file; the recipient's agent greps it directly. No vector database to ship, no embeddings to regenerate, no infrastructure dependency.

This is the LLM-wiki pattern with a standardized schema and a shipping format. The standard makes the pattern interoperable across tools.

Why this beats a vector DB for bounded knowledge

For large, heterogeneous corpora (millions of documents, fuzzy domain boundaries), vector search has real advantages: it handles partial matches, synonyms, and semantic drift that exact-match search misses. That is the right tool for the right job.

But most organizational knowledge is not that. A compliance manual, a product specification, a regulatory guidance document: these are bounded, structured, and professionally maintained. For this kind of knowledge, the LLM-wiki pattern's advantages dominate: human-readable at every layer, instantly updatable, no infrastructure overhead, exact retrieval by structure rather than approximation by distance.

pdf2okf makes your PDFs into an LLM-wiki

pdf2okf extracts a PDF and structures its content as an OKF-compatible markdown knowledge base: a proper LLM-wiki that any agent can read directly via the CLI (an MCP path is on the roadmap). No embedding pipeline, no vector database, no re-indexing. Just structured markdown files that agents already know how to navigate.

Your PDF library becomes a knowledge base. Share it as an OKFZ bundle. Query it from Claude Code, Hermes Agent, Odysseus, OpenClaw, or any tool that will support MCP. Update it by editing a file.