March 2, 2026 · Ulysse Trin

Sovereign AI pipeline: tools and field report

The problem: your data goes to OpenAI

When you paste a client contract into ChatGPT, the text leaves your machine. It crosses the Atlantic, passes through OpenAI’s servers, and falls under US law. For many uses, that’s fine. For others, it’s a real risk.

Regulated professions know this well. A lawyer pasting a restructuring file into a cloud chatbot may violate attorney-client privilege. A DPO sending personal data creates an undocumented transfer outside the EU. A consulting firm running client financials through it risks a data leak it can’t control.

The question isn’t “is AI useful”. It is. The question is: how do you use it without sacrificing confidentiality?

The sovereign stack: what we deploy for our clients

Here’s the pipeline we actually install. Everything runs on a standard Mac or PC, with no internet connection needed after the initial setup.

Ollama: the local engine

Ollama is a runtime that runs LLMs directly on your machine. No cloud. No outbound requests. The model is downloaded once, then everything happens locally.

Installation takes 5 minutes. A single ollama pull mistral and you have a conversational model running on your workstation. On a Mac with 16 GB of RAM, it’s smooth. On 8 GB, it’s usable but slower.

The models we recommend:

Mistral 7B: the French sovereign model. Excellent in French, fast, great for writing and text analysis. Our default choice for European clients.
Llama 3.1 8B: Meta’s model. Faster than Mistral, strong in English, slightly less precise in French. Useful when speed matters.
Gemma 2 9B: Google’s model. Good balance between quality and speed, supports multilingual use.

Model choice depends on hardware. With 16 GB of RAM, 7-9B models run comfortably. Below that, you need quantized models (lighter, slightly less precise).

AnythingLLM: the user interface

Ollama alone is a terminal. For non-technical users, you need an interface. AnythingLLM is a web app that connects to Ollama and provides a ChatGPT-like interface, but 100% local.

What AnythingLLM brings:

Conversational interface in the browser, like ChatGPT
Workspaces: one space per project or client, with its own context
System prompts: permanent instructions per workspace (“you are a financial analyst, always respond in English, cite your sources”)
Built-in RAG: drag and drop documents, the AI answers based on them

Local RAG: querying your documents

RAG (Retrieval-Augmented Generation) is what turns a generic chatbot into an assistant that knows your files. The principle: you drop documents into a workspace, the system chunks them, indexes them, and when you ask a question, it retrieves relevant passages and feeds them to the model as context.

In practice: you drag a 50-page file into AnythingLLM, and you can ask “what are the conditions precedent in the acquisition agreement?”. The model answers by citing passages from the document.

Local RAG limitations:

Quality depends on the model. Mistral 7B is good but not perfect. It can miss nuances or misinterpret complex tables.
Large volumes are slow. Beyond a few hundred pages per workspace, indexing takes time.
No cross-workspace queries. A workspace doesn’t see documents from other workspaces.

When to stay local, when to go cloud

Local isn’t always the right answer. Here’s our decision tree:

Always local:

Confidential client documents
Personal data (HR, healthcare, legal)
Unpublished financial analysis
Anything under professional secrecy

Cloud acceptable:

Marketing content writing (no sensitive data)
Web research, intelligence, public article summaries
Brainstorming, ideation, non-sensitive drafts
Open source code

Grey area, evaluate case by case:

Unclassified internal data
Internal communications
Anonymized HR processes

The simple rule: if you wouldn’t put the document in an email to an external contractor, don’t put it in a cloud chatbot.

What it costs

The cost of a sovereign stack is essentially setup time:

Ollama: free, open source
AnythingLLM: free, open source
Models: free (Mistral, Llama, Gemma)
Hardware: your current machine is enough if it has 16 GB of RAM
Installation and configuration: that’s where the time goes

The real investment is learning. Knowing which model to use for which case, how to structure workspaces, how to write effective prompts. That’s what makes the difference between a tool that gathers dust and one that saves 2 hours a day.

Honest limitations

Let’s be clear about what a local LLM doesn’t do as well as Claude or GPT-4:

Complex reasoning: on multi-step tasks or advanced logical reasoning, cloud models are still ahead. A Mistral 7B doesn’t compete with Claude Opus for strategic analysis.
Code generation: for serious development, Claude Code in the cloud is far superior to local 7B models.
Multimodality: image analysis, audio transcription, visual generation. That’s still the cloud’s domain.
Context window: local models have shorter windows. For very long documents, RAG compensates, but it’s not as seamless.

The sovereign stack isn’t a cloud replacement. It’s a complement for cases where confidentiality is non-negotiable.

In summary

	Local (Ollama + AnythingLLM)	Cloud (Claude, ChatGPT)
Confidentiality	Total	Depends on provider
Response quality	Good (7B)	Excellent (Opus, GPT-4)
Speed	Depends on hardware	Fast
Cost	Free (excluding hardware)	Monthly subscription
Setup	30 min – 1 hour	Immediate
Ideal use	Sensitive data, compliance	Writing, code, research

Digital sovereignty isn’t an abstract concept. It’s a technical decision: which data stays with you, which data can leave. The tools exist. You just need to install them.

Your data must stay on-premise? We install your local AI pipeline →

Lire en français