Wiki
A self-hosted alternative to NotebookLM
NotebookLM is genuinely good, and that's exactly the problem
Google's NotebookLM is one of the more honest AI tools released in recent years. It stays grounded in the documents you upload, surfaces citations, and avoids the confident hallucination that makes most general-purpose chatbots useless for serious work. If you have a collection of research papers or product manuals, the cited-answer experience is genuinely impressive.
The problem is the word uploaded. Every document you feed NotebookLM travels to Google's servers and becomes part of a query-and-response pipeline you do not control. For a recipe collection or a public whitepaper, that is fine. For a draft contract, a patient summary, a board memo, or anything carrying confidentiality obligations, uploading to any external service is a non-starter, regardless of how good the product is.
What your documents reveal when they leave
The risk isn't hypothetical. When you send a confidential PDF to a cloud service, you hand it to a data processor under terms you probably haven't read closely. For legal professionals in Germany, sharing client documents with a cloud LLM may conflict with the professional-secrecy obligations in §203 StGB. For healthcare providers in the EU, patient records are subject to GDPR Article 9 restrictions that don't dissolve because the destination server is marketed as AI. For any company handling trade secrets, the moment the file is uploaded the "secret" part is under question.
NotebookLM's privacy documentation explains that documents are used only for your session and that Google follows its standard data-processing terms. That is a policy promise. It is not an architectural guarantee. Policies change, terms get updated, and they apply only as long as the provider chooses to honour them.
The self-hosted path: OKF on your own hardware
The same capability NotebookLM offers (grounded answers, cited sources, no hallucination beyond the supplied documents) can be reproduced on hardware you control. The key is the format the knowledge lives in.
pdf2okf converts your PDFs into OKF-compatible knowledge bundles, following the Open Knowledge Format standard Google published in June 2026. An OKF bundle is structured Markdown: each document becomes a set of prose chunks with frontmatter that preserves the source, the section, and the page reference. That bundle lives on your disk, in your network, under your control. No part of it was sent anywhere.
From that point, your own model (running locally via Ollama, llama.cpp, oMLX, or any OpenAI-compatible server) reads the bundle and answers questions exactly as NotebookLM would, except the model is also on your hardware and the answer never touches a cloud API. Citations come from the frontmatter in the bundle: the model references the original source and section, not a hallucinated description.
How the two approaches compare
| | NotebookLM | pdf2okf (self-hosted) | |---|---|---| | Where it runs | Google's cloud | Your hardware | | Who can see your documents | Google (under their terms) | Only you | | Format you own | Google's internal representation | OKF bundle: open standard, plain Markdown | | Offline capable | No | Yes, with a local model | | Cited answers | Yes | Yes, from bundle frontmatter | | Confidential documents | Requires upload to Google servers | No transfer, no external processor |
A word on fairness to NotebookLM
NotebookLM does things pdf2okf doesn't, yet. It has a polished browser UI, audio summaries, notebook sharing, and years of Google engineering behind it. The question isn't which tool is better in the abstract. The question is which tool is even available to you when the documents can't leave the building.
For public research, published whitepapers, or anything where a cloud upload is genuinely unproblematic, NotebookLM is excellent. The self-hosted alternative matters precisely where it isn't: contracts, clinical notes, internal financial models, or anything subject to professional-secrecy rules. That's not a niche. It's most of the work that actually matters in regulated industries.
Where to go from here
pdf2okf is built for the case where uploading simply isn't an option. It turns your PDFs into OKF bundles on your own machine, pairs with any local model or your own API key, and gives you cited answers with zero data leaving your control. It's in private build: join the waitlist to get early access and try the format on your own documents.
The grounding and citation quality that makes NotebookLM compelling doesn't require sending your documents to a cloud. It requires a structured format. That's what OKF is for.