Llama Paper Authors "Flee," Only 3 Remaining from 14-Person Team, French Unicorn Mistral Becomes the Biggest Winner

Image

Produced by Big Data Digest

In 2025, the creators of Llama are leaving Meta.

Most of them are heading to Mistral, the Paris-based AI startup that is counter-attacking the field Meta itself opened up, at "open-source speed."

Llama was once Meta's most ambitious AI project: in 2023, when ChatGPT and PaLM dominated the discourse, Meta unexpectedly pushed the open-source camp onto the main stage with a groundbreaking paper and a set of open-weight large language models. At that time, Meta's AI research team FAIR (Fundamental AI Research) was also at its peak.

Image

The industry-shaking paper: https://arxiv.org/pdf/2302.13971

But two years later, this path seems to have reached a crossroads.

Meta has not yet officially responded to the "talent drain," but there's already some discussion on social media platform X. Currently, only 3 of the 14 researchers credited on the Llama paper remain at Meta. One comment read: Meta opened a path to an open-source future, but watched as the builders of that path turned around and started anew elsewhere.

Image

01 A Continuous Talent Migration

According to LinkedIn records, the exodus of the Llama team was not sudden; it quietly began in early 2023 and was nearly complete by early 2025.

Among the first to leave were Meta's senior researchers Guillaume Lample and Timothée Lacroix. They were practically the founders of the Llama architecture, leaving in early 2023 and June, respectively. Subsequently, they founded Mistral AI in Paris.

Image

Caption: Timothée Lacroix, Arthur Mensch, and Guillaume Lample are the co-founders of Mistral AI. Lacroix and Lample were involved in writing Meta's original Llama paper. Image courtesy of businessinsider, provided by Khanh Renaud/ABACAPRESS.COM. Please remove if infringement.

Over the next year and a half, several other Llama authors, including Marie-Anne Lachaux, Thibaut Lavril, and Baptiste Rozière, successively joined this startup. Today, Mistral's core research team includes an entire former Meta team.

The others have also not strayed from the forefront of AI. Some went to Anthropic, DeepMind, Microsoft AI, while others joined second-tier research institutions like Kyutai and Cohere.

They averaged over five years at Meta, far from being "worker-like turnover." This is more like a cognitive restructuring—those who were deeply involved in Meta's AI system design are expressing their directional choices by leaving.

02 Meta's Open Source Ideal, Moving Faster Than Corporate Strategy

When Meta launched Llama, it made a significant strategic leap: instead of keeping models closed, it opened weights and shared parameters, allowing developers to replicate cutting-edge models on a single GPU. This was a direct challenge to OpenAI's and Google's closed commercial approaches at the time.

Technically, Llama's design is indeed lighter and more efficient. It balanced resource consumption, didn't rely on large amounts of private data, and ran faster. This "pragmatic" engineering aesthetic perfectly matched the idealistic vision of the open-source community.

But the problem is: when ideals move too fast, corporate strategy may not keep up.

The Llama model received widespread praise among developers, and Llama 2 became one of the most popular models on HuggingFace. However, from Llama 3 to Llama 4, industry sentiment began to shift. "Not new enough" and "slow progress" became increasingly common feedback. Especially after the explosive iteration of new players like DeepSeek and Qwen, Meta gradually fell behind.

A more serious warning sign is that Meta has been slow to release model versions with "reasoning capabilities," similar to GPT-4 Turbo or Gemini Pro. This means it has fallen behind in next-generation language model directions such as multi-step reasoning, chained calls, and external tool integration.

Image

The Wall Street Journal even reported that Meta is delaying the release of its largest internal model, Behemoth, due to team disagreements over its performance and leadership direction.

With a slowing product pace on one hand, and a large exodus of core researchers familiar with Meta's technical roadmap on the other, Meta finds itself attacked from both front and rear.

03 The Retreat of FAIR and the Establishment of a "New FAIR"

Over the past year, another key change within Meta was that Joelle Pineau, who led FAIR for eight years, announced her resignation, and her position was taken over by Robert Fergus. This new leader had worked at DeepMind for five years and was also an early co-founder of FAIR.

FAIR was once the core of Meta's research confidence: founded in 2014, it published influential results in various cutting-edge fields like graph neural networks, machine translation, and multimodal learning. Llama was precisely FAIR's crowning achievement.

But today, the core team of this group has dispersed, and its direction is also changing.

In the past, FAIR's ethos was "open + shared"; now, Meta's focus on "applications" and "efficiency" seems to outweigh the enthusiasm for scientific exploration. In such a contradiction, it's not hard to understand why many researchers chose to leave.

If only viewed at the personnel level, this wave of talent outflow from Meta could be seen as "normal team turnover," but the reality is clearly more than that.

Mistral is not just a company that absorbed former Meta employees; it is already a direct competitor to Meta. In multiple model evaluations, Mistral's Mixtral and Tiny Mistral, by balancing parameter scale and effectiveness, hit the market's demand for "deployable models." Most of these achievements were led by former Meta teams.

This puts Meta in an awkward position: it defined the first chapter of open-source large models, but the second chapter is being written by others.

04 Mistral: A Team That Left Meta

Image

Caption: Screenshot of Mistral AI's official website

Mistral AI's explosive growth began in 2023, raising over $100 million in seed funding just one month after its founding, and rapidly launching multiple large model families within the following year.

Pixtral is for multimodal tasks, Medium 3 targets STEM and programming tasks, and "Les Ministraux" optimizes edge deployment.

The recently launched OCR API and the Arabic model Saba indicate that its product strategy is no longer limited to the English context or research models, but is actively expanding into broader scenarios.

However, challenges are also evident behind this expansion.

Image

Caption: TechCrunch report on Mistral's $6 billion valuation

First is the dilemma of "disproportionate influence and monetization ability." Although the chat assistant Le Chat briefly topped the App Store download charts in France, according to various sources, Mistral's revenue remains in the tens of millions of dollars. For a company valued at $6 billion, this is far from enough to support an IPO or avoid acquisition speculation.

The self-imposed tension of the model's "openness" stance is also limited. Mistral was initially known for open-sourcing its models under the Apache 2.0 license, but as it entered the commercialization phase, the weights of its main models were not publicly released, retaining only some "research versions" for free use. While this "two-track" strategy balances revenue and reputation, it has also drawn criticism from some open-source communities: is it becoming increasingly "closed-source"?

A third hidden concern is its international expansion capability. Although Mistral has formed strategic partnerships with the French military, AFP, Stellantis, IBM, and Helsing, and is establishing an AI campus in Paris with NVIDIA and Bpifrance, its user base and ecosystem development remain primarily focused on the European market. In contrast, OpenAI and Google have built complete API platforms, developer toolchains, and consumer product matrices globally, possessing stronger stickiness and moats.

In summary, Mistral's team size, funding, and model capabilities have reached first-tier levels, but it needs more time to prove itself in global operations, infrastructure development, and long-term ecosystem building.

Image

GPU Computing Power On-Demand Rental

A100/H100 GPU computing power available for on-demand rental,

Billed per second, saving over 30% on average!

Image

Scan the QR code for details☝

Image

Image

Image

People who click 'like' become beautiful!

Main Tag:Artificial Intelligence

Sub Tags:Large Language ModelsAI StartupsTalent MigrationOpen Source AI


Previous:Prolonged Reasoning ≠ High Accuracy! Adaptive Switching Between 'Quick Response' and 'In-depth Thought': A Win-Win Philosophy for Token Saving and Accuracy Improvement

Next:Trae Starts Charging, Should Cursor Be Worried?

Share Short URL