pdf2okf·

Wiki

On-premise vs. "German cloud": EU-hosted is not on-device

Three terms that sound similar and mean very different things

"Data residency," "data sovereignty," and "digital sovereignty" get used interchangeably in sales decks, procurement questionnaires, and political speeches. They are not the same thing, and confusing them is how you buy a "sovereign cloud" and end up with weaker data protection than you expected.

Data residency is about location: your data is stored and processed within a defined geographic boundary, say Germany or the EU. It's a factual claim about where the server is. It says nothing about who controls that server or what legal jurisdiction the operator answers to.

Data sovereignty goes further: control over data follows the jurisdiction of the data owner, not just the location of the server. True sovereignty means the laws governing the data are yours to invoke, not the provider's to choose.

Digital sovereignty is a policy concept: the ability of a region, nation, or organisation to run critical digital infrastructure independently of foreign providers, foreign capital, and foreign law.

The marketed "German cloud" (German datacenter, German legal entity, German-branded marketing) typically delivers residency, sometimes approaches sovereignty, and falls short of digital sovereignty almost by definition, because it still runs someone else's software stack at scale.

What EU-hosted genuinely improves

When you choose a German or EU-hosted cloud AI service, a few things genuinely get better:

  • Your data crosses fewer legal borders by default.
  • EU law governs the service contract more directly.
  • Some attack surfaces (intra-EU data flows, regional jurisdiction for consumer disputes) are cleaner.

These aren't trivial. Residency matters. It's worth something. But "EU-hosted" is not the same as "on-device" or "on-premise," and the gap is not a technicality.

An EU-hosted service still involves a third-party processor sitting between you and your data. That processor has system administrators, logging infrastructure, incident-response procedures, and an operator who is answerable to the laws of their incorporation. If that operator is a subsidiary of a US corporation (or relies on US-origin software infrastructure), the US CLOUD Act casts a shadow regardless of where the datacenter sits.

The CLOUD Act allows US authorities to compel US-controlled companies to disclose data, wherever it is stored. "The server is in Frankfurt" is not a defence against that order. This is not hypothetical: the CLOUD Act was precisely the mechanism that led European data-protection authorities, after Schrems II (CJEU, 2020), to question the adequacy of US-based cloud providers even when those providers had EU datacenters.

What on-premise actually removes

Run the model and the data on hardware you own and operate (whether that's a server in your own facility, a rack in a co-location site you control, or a workstation running Apple Silicon), and the processor disappears entirely. There is no third party. There is no data-processing agreement to negotiate. There is no operator answerable to a foreign court order. The data is yours, the compute is yours, and the legal surface is defined by your own jurisdiction and your own governance.

This is a structurally different promise from "EU-hosted." It's not better marketing for the same thing. It's a different architecture producing a different class of guarantee.

The tradeoff is real: you run the hardware. You maintain it. You handle updates, capacity, and security. That's a genuine operational cost. But for organisations handling confidential client data, regulated records, or anything with cross-border transfer restrictions, that cost is often the right trade, especially as capable small models make the compute requirements manageable on modest hardware.

The two questions that separate them

When evaluating a "sovereign" AI offering, two questions cut through the positioning:

  1. Who operates the infrastructure? If the answer is any company incorporated outside the EU, or any EU company that is majority-owned by a non-EU parent, the CLOUD Act exposure survives the EU datacenter.
  2. Is there a processor at all? On-premise and on-device answer this by removing the processor from the picture. EU-hosted answers it by choosing a different processor, which is useful but not equivalent.

Where pdf2okf fits

pdf2okf is built for the on-premise and on-device end of the spectrum. It converts PDFs into OKF-compatible knowledge bundles on your machine (following Google's Open Knowledge Format), and those bundles are then read by a model you also run locally. Nothing in that pipeline touches a cloud API unless you explicitly add one for frontier reasoning, and even then the document is already structured: you can send a precise cited excerpt, not the raw file.

The distinction drawn in this article (residency, sovereignty, on-device) maps directly onto the guarantee pdf2okf is designed to make. Not "your data stays in the EU." Not "we're a compliant processor." But "your data never left your machine in the first place." That's a different category of claim, and the one that holds up when the others are tested under a court order or a breach.

pdf2okf is in private build: join the waitlist if you're evaluating sovereign document AI for a regulated context or comparing on-premise approaches.

pdf2okf.com

Be there when it opens.

pdf2okf is in private build, self-hosted, sovereign. Leave an email and you'll be first in.