Wiki
Data sovereignty for AI: residency vs. sovereignty vs. digital sovereignty
Three words, treated as one
Buy "sovereign AI" today and you might be sold any of three different things. Data residency, data sovereignty, and digital sovereignty get used interchangeably in vendor decks, but they answer three separate questions, and the gaps between them are exactly where a compliance story quietly falls apart.
The shortest way to keep them straight: residency tells you where your data sleeps; sovereignty tells you whose laws can wake it; digital sovereignty tells you who can turn off the lights.
Data residency: where your data sleeps
Data residency is a question of geography. It asks one thing: in which country do the bytes physically sit? "Hosted in the EU," "stored in our Frankfurt region," "data never leaves Germany." These are residency claims. They're easy to verify, easy to market, and genuinely useful for latency and for some contractual commitments.
But residency is only about location at rest. It says nothing about who can reach the data, under whose authority, or what happens when a foreign court issues an order. A datacenter in Frankfurt owned by a company in Seattle is still a Frankfurt address with a Seattle landlord.
Data sovereignty: whose laws can wake it
Data sovereignty is a question of jurisdiction. It asks: which legal system actually governs this data, and who can compel its disclosure? This is where residency stops being enough.
The decisive case is the US CLOUD Act (2018). It lets US authorities compel a US-owned provider to produce data under its control, regardless of where in the world that data is stored. So a US hyperscaler's German region gives you EU residency while leaving you under US jurisdiction. The data sleeps in Frankfurt; US law can still wake it. Residency without sovereignty is the most common gap in the market, and the one most buyers never notice until a lawyer asks.
Digital sovereignty: who can turn off the lights
Digital sovereignty is the broadest of the three. It's about control of the whole stack: not just where data sits and whose law applies, but who runs the operations, who owns the software, who ships the updates, and whether a foreign vendor or government could degrade or cut off your access. It's the difference between renting a capability and owning it.
A "sovereign" arrangement that still depends on a foreign company's runtime staying switched on isn't sovereign in this sense. If someone abroad can flip a switch (a license revocation, an export-control order, a service withdrawal) and your system goes dark, you have residency and maybe even jurisdictional comfort, but not digital sovereignty.
The 2025–26 sovereignty push and "sovereignty-washing"
Europe spent 2025 and 2026 trying to close these gaps. There's a strong sovereign-cloud push across the public sector and regulated industries, and Brussels has gone further: the Cloud and AI Development Act (CADA) was floated on 3 June 2026, but it is a proposal, not binding law, and shouldn't be treated as if it were already in force.
In response, every major US hyperscaler now sells a "sovereign cloud" tier. Critics call much of this sovereignty-washing: a European-sounding wrapper around infrastructure whose ultimate owner, and therefore whose ultimate jurisdiction, hasn't changed. If the controlling company is still US-incorporated, the CLOUD Act exposure travels with it, "sovereign" label or not. Residency you can paint on. Sovereignty you cannot.
Local inference dissolves the question
There's a structurally clean answer that doesn't depend on trusting a label: don't move the data at all. If inference runs on your own hardware (the model reads your documents locally and the bytes never leave the machine), the three questions collapse into one easy answer. Residency: here. Jurisdiction: yours. Stack control: yours, because there's no external provider in the loop to be compelled, switched off, or relabeled. The sovereignty question doesn't get answered so much as it dissolves, because the condition that creates it, your data on someone else's computer, never happens.
Where pdf2okf fits
pdf2okf is built for that structural answer. It turns your PDFs into OKF-compatible knowledge bundles on your own hardware, or against your own key. No page is uploaded to a third party. There's no foreign region to audit, no provider to subpoena, and no runtime someone else can switch off. You get data residency (the files are on your disk), data sovereignty (only your law applies), and digital sovereignty (you own the whole pipeline) for the same reason: the data never left. For the legal mechanics behind that, see GDPR-compliant AI and the CLOUD Act.