Open source vs proprietary: control, dependency, and the real trade-off

Q: Terminological confusion ?

Open source in the strict sense (Open Source Initiative): code, training data and weights are public and freely reusable, including commercially. Virtually no major AI model reaches this level today.

The “open source vs. proprietary” debate on AI is often conducted as an ideological question. This is the wrong grid. It’s a strategic question: what do you control, and what don’t you?

Let’s start by cleaning up the vocabulary.

Terminological confusion

Open source in the strict sense (Open Source Initiative): code, training data and weights are public and freely reusable, including commercially. Virtually no major AI model reaches this level today.

Open weights: the model weights are published, but not necessarily the training code or data. You can download the model, run it, modify it, and eventually deploy it commercially (depending on the license). This is the case with Llama 3, Mistral, Mixtral, DeepSeek, Qwen.

Proprietary: the model runs on the supplier’s servers, and you access it via API. You have no control over weights, inference or future versions. GPT-4o, Claude 3.5, Gemini 1.5 Pro are in this category.

What you really control with open weights

With an open weights model deployed locally:

You control the version (you don’t suffer unwanted updates)
Your data never leaves your infrastructure
The marginal cost of inference is your infrastructure costs, not a per-token fee
You can fine-tune your proprietary data
You can audit model behavior on your test cases

What you don’t control: the quality of the base model (you depend on Meta, Mistral or whoever published the weights). If Meta decides to stop publishing weights, you keep the current version, but no longer have access to subsequent ones.

What you really control with a proprietary API

Essentially: the interface. You choose which prompt to send and how to handle the response. Everything else is under the provider’s control.

What this means in practice:

OpenAI deprecated GPT-3.5 at the end of 2024, forcing developers to migrate
Model behavior changes with updates (a prompt that was working well may go bad)
Prices may change (they have fallen overall, but there’s no guarantee that the trend will continue)
Providers may decide to restrict certain uses (content filters may evolve)

The real compromise

It’s not “open source = good, proprietary = bad”. It is:

For critical, long-term use of sensitive data: on-premise open weights offer greater control and predictability, at the cost of slightly lower internal resources and performance on general tasks.

For rapid prototyping, non-sensitive data, with limited technical resources: a proprietary API is more accessible, quicker to deploy, and often performs better on general tasks.

The hybrid strategy (most common in practice): proprietary API for non-sensitive uses and development, open weights on-premise for sensitive or high-volume uses.

DeepSeek and global competition

In December 2024, DeepSeek releases V3 - a Chinese open weights model that achieves performance comparable to GPT-4o on several benchmarks, with a reported training cost of $6 million, 10 to 30 times less than an equivalent US frontier model.

This publication illustrates that competition for foundation models is global. The US players’ claims to a sustainable technological lead are fragile. And the open weights ecosystem can progress rapidly, improving the alternatives available.

The trade-off: questions of trust and governance based on a Chinese company model are legitimate in certain contexts (defense, public institutions, sensitive data). This is not a reason to rule it out in all contexts, but it is a factor to be explicitly evaluated.

Terminological confusion#

What you really control with open weights#

What you really control with a proprietary API#

The real compromise#

DeepSeek and global competition#

Related

Running AI on your own infrastructure: on-premise, OSS, GPU and APU

Terminological confusion

What you really control with open weights

What you really control with a proprietary API

The real compromise

DeepSeek and global competition