Which open model for your documents? Gemma 4, Qwen, Mistral, OLMo, EuroLLM

There is no single "best", and that is the point

The honest answer to "which open model should I use for my documents?" is: several of them are good enough, so pick on license, language, and the hardware you own, not on a leaderboard. For grounded document Q&A, where the model reads a passage you have already retrieved and answers from it, the field has converged. Size stopped being the bottleneck; retrieval quality and license terms now decide more than raw model power. Here is the short list and how to choose.

Gemma 4: the safe default

Gemma 4 (Google, released 2026-04-02) is the default recommendation for most people. It is Apache-2.0, so commercial use is settled before you start; it is multimodal, with a 128K–256K context window; and it spans from tiny edge variants (E2B/E4B) through a 26B mixture-of-experts to a 31B dense model. The same family runs on a phone-class device and on a workstation, so you can pick the size your hardware allows and keep the behavior consistent. It runs offline on consumer GPUs and Apple Silicon. If you want one model and no homework, this is it.

Qwen3.5: the long-context all-rounder

Qwen3.5 (Alibaba, 2026-03) is also Apache-2.0 and a strong local all-rounder, with long context that helps when a document or a retrieved span is large. It comes in dense sizes and a mixture-of-experts variant, so there is a fit for both modest and capable machines. A solid second choice, and often a first one for long, structured documents.

Mistral: the EU-origin name

Mistral (France) carries the EU-origin appeal, and its small models have historically shipped under Apache-2.0. Verify the license on each model card, since it varies across their lineup. For teams that value a European vendor and a permissive small model, Mistral is the familiar pick.

OLMo 3: when you must audit

OLMo 3 (Ai2) is the fully open option: weights, training data, code, and checkpoints all published, so the model is reproducible and auditable. Reach for it when "we can prove exactly what this model is" is a requirement, not a nice-to-have: regulated industries, public-sector procurement, anywhere provenance must be defensible.

EuroLLM and Teuken: the EU-sovereign stack

If the goal is a genuinely EU-sovereign stack, two European models belong on the list. EuroLLM-22B is fully open and covers all 24 EU languages, built in the same auditable spirit as OLMo. Teuken-7B (from the OpenGPT-X / Fraunhofer effort) is Apache-2.0 and sized to run comfortably on modest hardware. Both pair naturally with the one license rule every EU buyer must remember: Llama 4's Community License prohibits EU use, so prefer Gemma, Qwen, Mistral, OLMo, EuroLLM, or Teuken. The tiers behind these labels are explained in open weights vs. open source vs. fully open.

What runs on consumer hardware

All of the above run on hardware you can buy. Using the rule of thumb (a 4-bit model needs roughly half its parameter count in gigabytes of RAM), a 7B model like Teuken fits in about 4 to 5 GB, and a 27B-class model lands around 14 to 16 GB. With a Q4_K_M quantization (about 75% smaller for roughly 3% quality loss), a capable model fits a well-specced laptop, and Apple Silicon's unified memory makes Macs especially comfortable. The runtimes that serve them (Ollama, llama.cpp, LM Studio, vLLM, MLX/oMLX) are covered in running AI locally in 2026.

The structure is in the bundle, not the model

Here is why you do not have to agonize over this choice: with pdf2okf, the structure lives in the OKF bundle, not in the model. pdf2okf builds an OKF-compatible bundle of plain Markdown concept files, and any model reads from that same bundle. So the model is swappable. Start with Gemma 4, move to EuroLLM for an EU-only deployment, switch to OLMo 3 when an auditor calls. Your knowledge never changes, and you never re-process the source.

And for the rare question that genuinely needs frontier-grade, multi-hop reasoning, BYOK (bring your own key) is the escape hatch: keep everything local by default, and route just that one query to a frontier model on your own key. Model-agnostic by design means you are never locked to a vendor's model, a vendor's cloud, or a vendor's license. Pick what fits, swap when it changes, and keep your documents yours.