Question 1

What is pdf2okf?

Accepted Answer

pdf2okf turns any PDF into an OKF-compatible knowledge bundle that an AI agent reads: self-hosted on your own hardware with a local open model, or with your own API key. Your documents never leave the building.

Question 2

What is an OKF / OKFZ bundle?

Accepted Answer

It's your document turned into small, linked markdown concept files with YAML frontmatter, OKF-compatible (Google's open standard), packaged as a portable OKFZ you own, version, and share. No vector database, no proprietary format.

Question 3

Does my data leave my machine?

Accepted Answer

No, when self-hosted with a local model: building the bundle and answering questions both run on your hardware, and nothing is uploaded. If you choose BYOK, only the requests you send go to the endpoint you picked, on your own account.

Question 4

Is it GDPR/DSGVO-compliant?

Accepted Answer

Running locally removes the hardest GDPR problems for AI: there's no third-country data transfer and no external processor ingesting your documents, so you stay the sole controller. It doesn't make you exempt from your own data-protection duties, but it removes the parts cloud LLMs can't cleanly solve.

Question 5

Which models can I use?

Accepted Answer

Any open model that runs locally (Gemma 4, Qwen, Mistral, OLMo, or EU-origin models like EuroLLM) or, via BYOK, your own cloud endpoint. The structure lives in the OKF bundle, not the model, so pdf2okf is model-agnostic.

Question 6

Does it run on my Mac?

Accepted Answer

Yes, on Apple Silicon you can run open models locally via MLX/oMLX, Ollama, or LM Studio. A 4-bit-quantized model needs roughly half its parameter count in GB of RAM (a 27B model ≈ ~14–16 GB), so a modern Mac handles grounded document Q&A comfortably.

Question 7

Are local models good enough?

Accepted Answer

For grounded document Q&A, yes, for most cases. When answers are tied to source text, small open models stay faithful nearly as well as frontier ones; retrieval quality matters more than raw model size. For the hardest multi-step reasoning, BYOK to a frontier endpoint is the escape hatch.

Question 8

Can I share a bundle once it's built?

Accepted Answer

Yes, that's the point of OKFZ. Build it once from the PDF, then hand the portable bundle to a colleague or commit it to a repo. No re-processing, no vector database to ship, no account required to read it.

Question 9

Is it compatible with Google's OKF standard?

Accepted Answer

Yes. Google Cloud published the Open Knowledge Format (markdown + YAML frontmatter) on 2026-06-12, and pdf2okf produces bundles compatible with it. We didn't invent OKF: we're the sovereign, self-hosted way to produce it from your PDFs.

Question 10

Does it work with my agent / CLI?

Accepted Answer

Yes, pdf2okf is a CLI and produces plain files, so any agentic tool that can run shell commands and read local files works: Hermes Agent, Odysseus, OpenClaw, Claude Code, Cursor, and others. An MCP path is on the roadmap.

Question 11

Do I need to be technical to use it?

Accepted Answer

pdf2okf is a command-line tool, so a basic comfort with a terminal helps. But the output (an OKF bundle of plain markdown files) is readable by anyone, and you can hand it to an agent with a friendlier interface (Open WebUI, a chat client) once it is built.

Question 12

What kinds of documents work best?

Accepted Answer

Anything text-heavy with structure: manuals, contracts, specs, reports, research, financials. pdf2okf turns text, tables, and diagrams into small linked concept files, so structured documents become especially easy for an agent to navigate and cite.

Question 13

What about the US CLOUD Act?

Accepted Answer

The CLOUD Act lets US authorities reach data held by US-owned providers wherever it sits, so EU residency alone does not protect you. Because pdf2okf can run entirely on your own hardware with no provider in the loop, there is no third party that could be compelled to hand anything over.

Question 14

Do I need a data-processing agreement (AVV)?

Accepted Answer

When you self-host with a local model, there is no external processor handling your data, so the usual cloud-AI data-processing agreement is not needed for that step. If you choose BYOK against a third-party endpoint, normal processor rules apply to that provider. This is general information, not legal advice.

Question 15

Can it run fully offline / air-gapped?

Accepted Answer

Yes. With a local model, building a bundle and answering questions both work with no network connection at all, useful for classified, regulated, or simply sensitive material. An air-gapped machine never gets the memo to send anything anywhere.

Question 16

Which model do you recommend?

Accepted Answer

For an EU-facing, sovereign setup, an Apache-2.0 model like Gemma 4 or Qwen3.5 is a strong default, or a European model like Mistral or EuroLLM. Avoid models whose license restricts you: Llama 4's license, for example, prohibits EU use. Because the structure lives in the OKF bundle, you can swap models freely.

Question 17

What hardware do I need?

Accepted Answer

Less than you'd think. A 4-bit-quantized model needs roughly half its parameter count in gigabytes of RAM, so a capable 7B–14B model runs on a modern laptop, and a 27B model fits a 32 GB Mac or a consumer GPU. For the heaviest reasoning, BYOK to a frontier endpoint is the escape hatch.

Question 18

Is the format future-proof?

Accepted Answer

It is just markdown and YAML: open, text-based formats that have been readable for decades and need no special software. An OKF bundle will still open in any text editor long after any particular tool or vector database is gone.

Question 19

Am I locked in to pdf2okf?

Accepted Answer

No. The bundle is plain files you own, in Google's vendor-neutral Open Knowledge Format, not a proprietary database. You can read it, edit it, move it, or feed it to a different tool without pdf2okf in the loop. There is nothing to lock you in.

Question 20

Is it cheaper than cloud RAG?

Accepted Answer

For ongoing use, usually yes. Cloud RAG bills you for a hosted vector database, for re-embedding when documents change, and for the tokens you re-send on every query. pdf2okf builds the bundle once and an agent greps only the few concepts an answer needs: a fraction of the tokens, and no database to host.

Question 21

Is there really no vector database to pay for?

Accepted Answer

Correct. pdf2okf's whole approach is grep over plain files, so there is no vector index to host, scale, or keep in sync, and therefore no monthly bill or re-embedding cost for one. That is one of the biggest hidden costs of classic RAG, simply removed.

Question 22

What does each question cost to answer?

Accepted Answer

With a local model, each answer costs only your own electricity and hardware: there is no per-token charge at all. With BYOK, you pay your provider's token price, but because the agent retrieves only the relevant concepts rather than the whole document, far fewer tokens go into each query.

Question 23

Is pdf2okf free?

Accepted Answer

pdf2okf is in private build and not yet released. Join the waitlist to be first in. The design goal is self-hosted and sovereign, so you run it on your own hardware or your own key rather than paying for someone else's cloud to read your documents.

Question 24

Does it have an MCP server?

Accepted Answer

An MCP (Model Context Protocol) server is on the roadmap. It would let any MCP-aware agent read an OKF bundle natively. Today, integration works through the CLI plus plain-file access, which every shell-capable agent already supports.

Question 25

Can I use it through a chat UI like Open WebUI?

Accepted Answer

Yes. Once a bundle is built, you can point a local agent at it and drive that agent from a friendlier interface like Open WebUI, with your own model, on your own machine. The CLI builds the bundle; how you chat with it is up to you.

Question 26

Is this like a self-hosted NotebookLM?

Accepted Answer

In spirit, yes (cited answers from your own documents) but without the cloud. NotebookLM runs on Google's servers with your files; pdf2okf produces an OKF bundle that stays on your hardware, read by your own model or your own key.

Questions, answered.

Product

Privacy & GDPR

Models & hardware

Format & portability

Cost

Integration

Be there when it opens.