Open source AI and digital sovereignty: the case for a smarter European strategy

Everyone's talking about Europe needing its own AI models. Macron says it. Brussels says it. Every tech summit in the past two years has had some variation of the same conversation. Europe is falling behind, Europe needs to act, Europe must build its own answer to OpenAI and GPT. The concern is genuine and the diagnosis isn't entirely wrong. But the prescription, I'd argue, is.

Let me explain why.

The fears driving this conversation are legitimate. We've already seen what happens when a foreign government decides a technology poses a national security risk. One executive order and a platform disappears from the market overnight. If your hospital system, your legal infrastructure, or your financial institutions depend on a service running on servers in California or Beijing, you are one political decision away from a serious problem. That's not paranoia. That's operational risk management.

Data sovereignty is the other side of the same coin. Every prompt sent to a third party model, every document processed, every query answered, that data travels somewhere. Under someone else's jurisdiction. Subject to someone else's laws and, more uncomfortably, someone else's business interests. For most consumer use cases, that's an acceptable trade-off. For critical industries, it isn't.

So yes, the problem is real. Where I part ways with the mainstream European response is in how to solve it.

The temptation of the flagship model

There's something politically seductive about the idea of a European large language model. It has a name. It has benchmark scores. You can hold a press conference. You can point to it at a summit and say: we built this. It feels like sovereignty because it looks like what the Americans and Chinese are doing.

But let's be honest about what that race actually involves. The leading closed models, the ones from Anthropic, OpenAI, Google, represent cumulative investments running into the tens of billions. They are trained on infrastructure that took years to build, by teams that took a decade to assemble, on data that was accumulated over the entire history of the consumer internet. The gap between where Europe's best funded efforts currently sit and where those models are is not a gap you close by doubling a budget. It's a structural disadvantage that compounds with every training run.

Spending European taxpayer money to build a model that trails GPT-5 by six months is not sovereignty. It's an expensive consolation prize dressed up in a flag.

Where Europe's real advantage actually sits

Here's what the flagship model conversation consistently misses. Europe doesn't need to win the race to the biggest model. It needs to ask a different question entirely: what happens when you take the world's most sophisticated industrial data and do something serious with it?

Consider what Europe actually has. Airbus holds decades of proprietary data on aerodynamics, materials science, and aircraft maintenance that exists nowhere else on the planet. European hospitals, operating within universal healthcare systems that have run continuously for generations, have accumulated some of the most complete longitudinal medical records in the world. The German automotive industry, European pharmaceutical companies, the continent's legal institutions, its logistics networks, its financial infrastructure. These are not areas where Europe is catching up to someone else. These are areas where Europe has been setting the global standard for decades.

The data generated by those industries is not just valuable. In many cases it is irreplaceable. No American tech company has it. No Chinese competitor can replicate it quickly. It exists because of specific European expertise, built over specific European decades, in industries where Europe genuinely leads.

Now take an open source model and fine tune it on that kind of domain specific data. Run it on your own infrastructure, inside your own borders, under your own regulatory framework. What you end up with is not a worse version of ChatGPT. It is something that does not exist anywhere else in the world, that nobody can switch off, and that reflects genuinely proprietary European knowledge rather than a derivative of someone else's training data.

That is what digital sovereignty actually looks like.

How good are open source models, really?

Two years ago this argument would have been harder to make. The gap between the best closed models and the best open source alternatives was significant enough that just use open source felt like a compromise.

That gap has closed with remarkable speed. Meta's Llama models in their larger variants now perform comparably to GPT-4 on a wide range of standard benchmarks. Mistral, a French company as it happens, produces models that punch well above their weight in terms of efficiency relative to size. The academic and independent research community is releasing capable models at a pace that would have been unthinkable three years ago.

Does a gap still exist at the absolute frontier? Yes. For the most complex reasoning tasks, the most demanding coding challenges, the most nuanced multi step analysis, the leading closed models still have an edge. But that edge is measured in months, not years. And more importantly, for the vast majority of real business applications, document processing, customer interaction, internal knowledge retrieval, process automation, domain specific analysis, that frontier gap is already operationally irrelevant.

The Linux parallel is instructive here. When Linux emerged, the argument against it was that it was a compromise, that serious enterprise computing required proper commercial operating systems. Today Linux runs the overwhelming majority of the world's servers, its cloud infrastructure, and its most critical systems. Not because it eventually beat Windows at Windows's own game, but because it became the foundation on which everything serious gets built. Open source AI is on a similar trajectory.

The fine tuning argument

There is a subtlety here worth dwelling on. The question isn't just closed versus open source at the level of the base model. It's about where value actually gets created in the AI stack.

Training a large foundation model from scratch is extraordinarily expensive. Each iteration costs millions, sometimes tens of millions of dollars. The companies doing this at scale are essentially making long term bets that owning the base model will translate into a durable competitive advantage. That may or may not prove true. The commodity pressure on base models is real and growing.

But fine tuning an existing open source model on domain specific data is a fundamentally different proposition. The costs are orders of magnitude lower. The specialisation is dramatically higher. And crucially, the resulting model reflects knowledge that you own, trained on data that you control, producing outputs that are calibrated to your specific context in a way no general purpose model can match.

A European medical AI fine tuned on anonymised data from a network of public hospitals, with clinical validation built into its development process, operating under GDPR and within European legal frameworks, that is a more defensible, more trustworthy, and ultimately more valuable asset than access to a slightly better general purpose model hosted in Virginia.

The same logic applies across sectors. Legal AI trained on decades of European case law. Manufacturing AI trained on proprietary process data from firms that have been refining those processes since before Silicon Valley existed. Agricultural AI trained on European soil science, climate patterns, and crop data. In each case, the value comes not from the underlying model architecture but from the domain knowledge baked into the fine tuning.

What Europe should actually invest in

If the goal is genuine digital sovereignty combined with genuine economic benefit from AI, the investment priorities look quite different from building a flagship European LLM.

First, shared infrastructure for running open source models inside European borders. Not one model, but the computational and organisational capacity to host, run, and update models domestically across sectors. This keeps data within European jurisdiction and eliminates the single point of failure risk that comes with dependence on foreign platforms.

Second, domain specific fine tuning programmes for key European industries. Public funding mechanisms that incentivise data sharing between European firms within sectors, under clear governance frameworks, to produce specialised models that encode genuinely European expertise.

Third, research investment in model efficiency rather than model scale. This is an area where European academic institutions have strong foundations. The question of how to get more from smaller, more efficient models is arguably more important for Europe's position than the question of who can train the largest one.

Fourth, regulatory frameworks that make European AI trustworthy rather than just European. The AI Act is already shaping global norms in ways that create real soft power. Building on that through clear standards for transparency, auditability, and data governance turns a perceived disadvantage into a structural asset.

Digital sovereignty doesn't mean building your own version of something someone else already does better. It means ensuring that the critical processes of your economy don't depend on another party's goodwill, business model, or geopolitical calculations.

Open source models, running on European infrastructure, fine tuned on European industrial data, operating under European law. That is a sovereignty strategy. Spending the next decade trying to close a parameter gap with companies that have structural advantages Europe cannot replicate in that timeframe is not.

The value was never in owning the model. It was always in knowing what to do with it. And on that question, Europe has more to work with than it seems to realise.