Glossary
The vocabulary of sovereign document AI.
Short, plain-language definitions: each term in one place, cross-linked to the wiki.
- Agentic retrieval
An agent that searches files on demand instead of one-shot vector lookup.
- BYOK (Bring Your Own Key)
Use your own model API key: data and billing stay on your account.
- CLOUD Act
A US law that reaches data held by US-owned providers, wherever it sits.
- Context window
How much text a model can consider at once, measured in tokens.
- Data residency
Where your data physically sits: the weakest of the three guarantees.
- Data sovereignty
Which laws govern your data, not just where it sits.
- Deterministic AI
Letting code compute exact facts so the model only reports them.
- Digital sovereignty
Control over your whole stack: software, operations, and supply chain.
- Embedding
A numeric vector representing a chunk of text, the basis of vector search.
- EU AI Act
The EU's phased AI regulation: self-hosting makes you a deployer, not exempt.
- Frontmatter
A small YAML metadata block at the top of a markdown file.
- Inference vs. training
Training builds a model once; inference is running it to get answers.
- MCP (Model Context Protocol)
A standard interface for agents to talk to tools and data sources.
- OKFZ
pdf2okf's portable, shareable knowledge bundle.
- On-device / local AI
Running inference on your own machine: data never leaves.
- Open Knowledge Format (OKF)
Google's vendor-neutral standard: AI knowledge as markdown + YAML frontmatter.
- Open weights
Downloadable model weights you can run yourself, not always open source.
- Quantization
Shrinking a model by storing its weights at lower precision.
- RAG (Retrieval-Augmented Generation)
Feeding a model retrieved document passages to ground its answers.
- Vector database
An index of embeddings used by classic RAG, which pdf2okf doesn't need.