Open weights vs. open source vs. fully open

"Open" is three different promises

The word "open" gets stamped on models that owe you very different things. A model you can download is not necessarily a model you can use commercially, and a model with a permissive license is not necessarily one you can audit. For a sovereign, self-hosted stack the distinctions are not pedantry: they decide whether you are allowed to ship, whether you can prove what the model is, and in one notable case whether you may use it in the EU at all. There are three levels.

Open weights

Open weights means the trained model file is downloadable and runnable. You can pull it, load it into a runtime, and get answers offline. What you do not necessarily get is the training data, the training code, or an unrestricted license. The weights are a finished artifact handed over without the recipe, and the accompanying license may limit how you use them.

Most of the well-known local models are at least open-weights: Gemma, Qwen, Mistral, and Llama all ship downloadable weights. That is enough to run privately on your own hardware. But "I can run it" and "I am allowed to run it for this" are separate questions answered by the license, not by the download button.

Open source

Open source is open weights plus a genuinely permissive license: typically Apache-2.0 or MIT. That license is the part that matters commercially: it grants you the right to use, modify, and deploy the model for business purposes without asking, without paying, and without usage-restriction clauses to trip over.

This is the tier most teams actually want. Gemma 4 is Apache-2.0 (changed from a custom license at its 2026-04-02 release). Qwen3.5 is Apache-2.0. Mistral's small models have historically been Apache-2.0. With these, the legal question is settled before you start, which is why an Apache-2.0 LLM is the safe default for a product you intend to sell.

Fully open

Fully open goes further than the license: it publishes the weights, the training data, the training code, and the intermediate checkpoints. That means the model is reproducible and auditable: you can see what it was trained on, retrace how it was built, and verify claims about it rather than taking them on faith.

The reference here is OLMo 3 from Ai2, built explicitly to be fully open. On the EU side, EuroLLM follows the same philosophy. Fully open is the strongest position for sovereignty and compliance buyers, because "we can show exactly what this model is" is a categorically stronger claim than "trust us."

Why the difference matters

Three practical consequences follow from which tier a model sits in.

Commercial use. Only a permissive license (open source or fully open) clears you to deploy commercially without restrictions. Open-weights-only models may forbid exactly the use you have in mind. Read the card.
Auditability and sovereignty. Only fully open models let you inspect the training data and reproduce the build. For regulated buyers who must explain their systems, that is the difference between a defensible answer and a shrug.
Usage restrictions. This is the sharp edge. Llama 4's Community License prohibits use in the EU. A model can be free to download and still be off-limits for an EU-facing product. That single clause is why an EU-sovereign stack should prefer models whose licenses say yes: Gemma 4 (Apache-2.0), Qwen, Mistral, OLMo, or EuroLLM.

A quick way to read any model card: open weights answers "can I run it?", open source answers "can I sell with it?", and fully open answers "can I prove what it is?".

Where pdf2okf fits

pdf2okf is model-agnostic, which means these license tiers are yours to choose, not ours to impose. The OKF-compatible bundle pdf2okf builds is plain Markdown, and the structure lives in the bundle, not in the model, so you can run whichever tier your situation demands and swap later without touching your knowledge. Want maximum legal safety? Pick an Apache-2.0 model like Gemma 4. Need to prove provenance to an auditor? Pick a fully open one like OLMo 3. Building for the EU? Avoid Llama 4 and you are clear.

For help choosing among the actual candidates, see which open model for your documents; for the runtimes that serve them, see running AI locally in 2026. Own a model whose license says yes, and no one can switch off your right to use your own stack.