Wiki

Everything behind sovereign document AI.

Deep, sourced explainers on the ideas pdf2okf is built on: the Open Knowledge Format, local open models, data sovereignty, and retrieval without a vector database.

Open Knowledge Format & OKFZ

The portable file format your agent reads.

What is the Open Knowledge Format (OKF)?
OKFZ: the portable, shareable knowledge bundle
OKF vs. a vector database: two ways to give an AI your documents

RAG without a vector database

Grep-based, agentic retrieval over plain files.

RAG without a vector database: grep-based retrieval
The LLM-wiki pattern: a markdown knowledge base for agents
Agentic retrieval vs. traditional RAG
Context engineering: the discipline that contains RAG
The hidden cost of RAG: re-embedding, hosting, and token bills
Long context vs. retrieval: should you just paste the whole PDF?

Sovereignty, GDPR & the EU

Why local inference is the clean answer.

Data sovereignty for AI: residency vs. sovereignty vs. digital sovereignty
GDPR-compliant AI: why local inference removes the transfer problem
The CLOUD Act: why EU data isn't safe in US clouds
EU AI Act 2026: what self-hosting does (and doesn't) solve
Air-gapped document AI: fully offline, no network

Industry & compliance

Sovereign document AI for regulated fields: law, health, finance.

Document AI for law firms: §203, client privilege & self-hosting
Self-hosted document AI for healthcare and patient data
On-premise document AI for finance: BaFin, MaRisk & DORA

Local & open models

Run it on your own hardware in 2026.

Running AI locally in 2026: a guide for document Q&A
Open weights vs. open source vs. fully open
Which open model for your documents? Gemma 4, Qwen, Mistral, OLMo, EuroLLM
Local document AI on a Mac: Apple Silicon, MLX & oMLX
Ollama vs. llama.cpp vs. LM Studio vs. vLLM
What hardware do you need for local document AI?

Exactness & determinism

Cited, auditable, hallucination-free answers.

Hallucination-free document Q&A: cited, deterministic answers
The model finds the structure, code does the counting

Agents, CLI & integration

Read your documents from any agent.

Read your documents from any agent: CLI & MCP
Integrating OKF bundles into agentic tools
MCP for documents: what it is and how OKF fits

Comparisons & alternatives

Self-hosted alternatives to the cloud tools.

A self-hosted alternative to NotebookLM
A local alternative to ChatPDF
On-premise vs. "German cloud": EU-hosted is not on-device
A sovereign alternative to AnythingLLM
A document-AI alternative to Open WebUI
A sovereign alternative to GPT4All for documents