Wiki
The CLOUD Act: why EU data isn't safe in US clouds
What the CLOUD Act actually is
The CLOUD Act (the US "Clarifying Lawful Overseas Use of Data Act," passed in 2018) gives US authorities a clear power: they can compel a US-based or US-owned provider to hand over data in its possession, custody, or control, regardless of where in the world that data is physically stored. The test is control, not location. If a provider can reach the data, a lawful US order can reach it through the provider.
The law grew directly out of the Microsoft Ireland dispute, where Microsoft argued that a US warrant couldn't reach emails stored in its Dublin datacenter. The CLOUD Act settled the question in the government's favor: storing data abroad does not put it beyond US legal reach if the company holding it answers to US jurisdiction.
Why a Frankfurt region doesn't save you
This is the part that surprises people who bought "EU data residency" and thought the matter was closed. Residency tells you where the bytes sit. The CLOUD Act is about who controls them. A US hyperscaler's German region, its EU subsidiary, its "data stays in Europe" commitment: none of that changes the controlling company's nationality. If the parent is US-incorporated, the German-stored data is still within its control, and therefore still within reach of a US order.
So "hosted in Frankfurt" and "safe from US access" are two different claims, and the first does not imply the second. For EU personal data, that gap is the whole problem: you can satisfy a residency checkbox and still be exposed to a foreign jurisdiction you never chose.
The Schrems backdrop: valid today, not permanently secure
This isn't a fringe concern; it's the core of more than a decade of transatlantic legal fighting. Two prior frameworks for EU-to-US data transfers were struck down by the EU's top court after challenges led by Max Schrems: Safe Harbor in 2015, and its replacement Privacy Shield in 2020, both struck down in large part over US government access to data.
The current arrangement is the EU-US Data Privacy Framework (DPF), in force since 2023. It survived its first legal challenge, but it can still be appealed to the Court of Justice of the EU, and given that its two predecessors were both invalidated, the honest description is: valid today, not permanently secure. Building a compliance posture on the assumption that the DPF will stand forever is a bet against a fifteen-year trend. The DPF is a real legal basis right now; it is not a guarantee.
Why self-hosting sidesteps the whole thing
Every problem above depends on one ingredient: a provider in the loop who can be compelled. Remove the provider and you remove the mechanism. If the model runs on your own hardware and the data never goes to a third party, there is no US-owned company holding your data, no order that can reach it through one, and no transfer framework whose survival you have to track. There's nothing to subpoena, because there's no one in the middle.
That's the difference between mitigating CLOUD Act exposure with contracts and not having the exposure. Self-hosting doesn't make the CLOUD Act go away. It makes you irrelevant to it.
Where pdf2okf fits
pdf2okf turns your PDFs into OKF-compatible knowledge bundles on your own hardware, or against your own key. No page is uploaded to a US-owned provider, so there is no one in the loop for a US order to compel, and no Frankfurt-versus-Seattle question to litigate. The data stays where you put it, under the only jurisdiction you chose. For the legal mechanics on the EU side, see GDPR-compliant AI; for how the three sovereignty terms fit together, see data sovereignty for AI.