The problem: your data goes to OpenAI
When you paste a client contract into ChatGPT, the text leaves your machine. It crosses the Atlantic, passes through OpenAI’s servers, and falls under US law. For many uses, that’s fine. For others, it’s a real risk.
Regulated professions know this well. A lawyer pasting a restructuring file into a cloud chatbot may violate attorney-client privilege. A DPO sending personal data creates an undocumented transfer outside the EU. A consulting firm running client financials through it risks a data leak it can’t control.
The question isn’t “is AI useful”. It is. The question is: how do you use it without sacrificing confidentiality?
The sovereign stack: what we deploy for our clients
Here’s the pipeline we actually install. Everything runs on a standard Mac or PC, with no internet connection needed after the initial setup.
Ollama: the local engine
Ollama is a runtime that runs LLMs directly on your machine. No cloud. No outbound requests. The model is downloaded once, then everything happens locally.
Installation takes 5 minutes. A single ollama pull mistral and you have a conversational model running on your workstation. On a Mac with 16 GB of RAM, it’s smooth. On 8 GB, it’s usable but slower.
The models we recommend:
- Mistral 7B: the French sovereign model. Excellent in French, fast, great for writing and text analysis. Our default choice for European clients.
- Llama 3.1 8B: Meta’s model. Faster than Mistral, strong in English, slightly less precise in French. Useful when speed matters.
- Gemma 2 9B: Google’s model. Good balance between quality and speed, supports multilingual use.
Model choice depends on hardware. With 16 GB of RAM, 7-9B models run comfortably. Below that, you need quantized models (lighter, slightly less precise).
AnythingLLM: the user interface
Ollama alone is a terminal. For non-technical users, you need an interface. AnythingLLM is a web app that connects to Ollama and provides a ChatGPT-like interface, but 100% local.
What AnythingLLM brings:
- Conversational interface in the browser, like ChatGPT
- Workspaces: one space per project or client, with its own context
- System prompts: permanent instructions per workspace (“you are a financial analyst, always respond in English, cite your sources”)
- Built-in RAG: drag and drop documents, the AI answers based on them
Local RAG: querying your documents
RAG (Retrieval-Augmented Generation) is what turns a generic chatbot into an assistant that knows your files. The principle: you drop documents into a workspace, the system chunks them, indexes them, and when you ask a question, it retrieves relevant passages and feeds them to the model as context.
In practice: you drag a 50-page file into AnythingLLM, and you can ask “what are the conditions precedent in the acquisition agreement?”. The model answers by citing passages from the document.
Local RAG limitations:
- Quality depends on the model. Mistral 7B is good but not perfect. It can miss nuances or misinterpret complex tables.
- Large volumes are slow. Beyond a few hundred pages per workspace, indexing takes time.
- No cross-workspace queries. A workspace doesn’t see documents from other workspaces.
When to stay local, when to go cloud
Local isn’t always the right answer. Here’s our decision tree:
Always local:
- Confidential client documents
- Personal data (HR, healthcare, legal)
- Unpublished financial analysis
- Anything under professional secrecy
Cloud acceptable:
- Marketing content writing (no sensitive data)
- Web research, intelligence, public article summaries
- Brainstorming, ideation, non-sensitive drafts
- Open source code
Grey area, evaluate case by case:
- Unclassified internal data
- Internal communications
- Anonymized HR processes
The simple rule: if you wouldn’t put the document in an email to an external contractor, don’t put it in a cloud chatbot.
What it costs
The cost of a sovereign stack is essentially setup time:
- Ollama: free, open source
- AnythingLLM: free, open source
- Models: free (Mistral, Llama, Gemma)
- Hardware: your current machine is enough if it has 16 GB of RAM
- Installation and configuration: that’s where the time goes
The real investment is learning. Knowing which model to use for which case, how to structure workspaces, how to write effective prompts. That’s what makes the difference between a tool that gathers dust and one that saves 2 hours a day.
Honest limitations
Let’s be clear about what a local LLM doesn’t do as well as Claude or GPT-4:
- Complex reasoning: on multi-step tasks or advanced logical reasoning, cloud models are still ahead. A Mistral 7B doesn’t compete with Claude Opus for strategic analysis.
- Code generation: for serious development, Claude Code in the cloud is far superior to local 7B models.
- Multimodality: image analysis, audio transcription, visual generation. That’s still the cloud’s domain.
- Context window: local models have shorter windows. For very long documents, RAG compensates, but it’s not as seamless.
The sovereign stack isn’t a cloud replacement. It’s a complement for cases where confidentiality is non-negotiable.
In summary
| Local (Ollama + AnythingLLM) | Cloud (Claude, ChatGPT) | |
|---|---|---|
| Confidentiality | Total | Depends on provider |
| Response quality | Good (7B) | Excellent (Opus, GPT-4) |
| Speed | Depends on hardware | Fast |
| Cost | Free (excluding hardware) | Monthly subscription |
| Setup | 30 min – 1 hour | Immediate |
| Ideal use | Sensitive data, compliance | Writing, code, research |
Digital sovereignty isn’t an abstract concept. It’s a technical decision: which data stays with you, which data can leave. The tools exist. You just need to install them.
Your data must stay on-premise? We install your local AI pipeline →