pdf2okf·

2026-06-12

Google publishes the Open Knowledge Format (OKF)

Stacked markdown files with YAML frontmatter lines, a portable, vendor-neutral open standard.

A standard for knowledge that isn't a database

On June 12, 2026, Google Cloud published the Open Knowledge Format (OKF) v0.1, a vendor-neutral spec for representing the knowledge an AI or agent reads. Its most radical design choice is also its simplest: it's just files. Markdown documents with YAML frontmatter. No proprietary account, no SDK, no managed service you have to call to make sense of your own data.

We've been saying for a while that this is the right shape for AI knowledge. It's good to have a published standard say it too.

What OKF actually is

OKF represents knowledge as plain markdown files plus structured frontmatter. That sounds modest. It's the whole point:

  • Portable: copy the folder and it works anywhere. No export job, no vendor migration.
  • Version-controllable: it lives in git. You can diff it, review it, roll it back.
  • Vendor-neutral: no account, no SDK, no runtime that has to be alive for the data to be legible. A human can open it in a text editor and read it.

Contrast that with the default way teams have been feeding documents to AI: chunk the PDF, embed the chunks, load them into a vector database, then ship that database (plus its hosting, its re-embedding bills, and its lock-in) alongside everything else. The knowledge ends up trapped inside infrastructure. OKF makes the opposite bet: the knowledge is the files. If you can read a folder, you can read your knowledge base. If you can copy a folder, you can move it.

To be clear: OKF is Google's standard

We didn't invent OKF, and we're not claiming to. Google Cloud published it; we're glad they did, because a published, vendor-neutral spec is worth more to the whole field than any single product's house format. What pdf2okf does is produce it.

Point pdf2okf at a PDF and it builds an OKF-compatible bundle of small, linked concept files: text, tables, and diagrams turned into explicit, greppable structure. The standard validates the format; pdf2okf is the sovereign, self-hosted way to get your documents into it, without uploading a single page to anyone. The conversion happens on your hardware, or against your own key. Compatibility with the standard is the point; ownership of the process is the difference.

Where OKFZ comes in

Our real differentiator sits one step beyond the file format. An OKFZ bundle is the portable, shareable package: build it once from the source PDF, then share the bundle itself.

That changes the economics of sharing knowledge with an AI. The recipient doesn't re-process the original document. They don't stand up a vector database. They don't re-embed anything or pay to index it again. The knowledge travels as a single self-contained artifact: versionable, diffable, and ready to read the moment it lands. Build once, share the bundle, done. No re-processing, no vector DB to ship, no per-recipient indexing bill.

This is what makes "share your documents with AI" a real workflow instead of a chore. A contract, a manual, a research corpus: package it as an .okfz once, hand it to a colleague, a client, or another agent, and it just works. The standard gives the format legitimacy; OKFZ gives it legs.

That's the second thing worth owning. A model you run yourself (Gemma 4) means no one can switch off your intelligence. A format you own (OKF / OKFZ) means no one can lock up your knowledge. And refusing to rent your stack (Fable 5) means no one can pull the plug on your continuity. pdf2okf is built around all three at once: own your model, own your format, own your stack.

Sources

pdf2okf.com

Be there when it opens.

pdf2okf is in private build, self-hosted, sovereign. Leave an email and you'll be first in.