The $113M Mistral AI Seed Round — Why So Much Hate? And Why Europe Should be an AI Powerhouse

I’ve seen a lot of frustration towards Mistral AI and their $113M seed round on social media lately. Here are some of my reflections about it. But before we start I have to, of course, point out that we wouldn’t be here without Transformer models (first described in the Google Brain paper in 2017), which changed the game for language models.

Oliver Molander
14 min readJun 22, 2023

--

Let’s deep-dive right away into it. I’ve decided to write this article mainly in a bullet-point format to make it snappier. I also recommend looking through the 7-page Mistral AI investment memo published by Sifted below:

Yeah — sure — the investment memo above is very high-level. Some people on social media have argued that it’s a “pedantic simplistic thesis that would barely get you a B+ in undergrad”. Pretty heavy shots were fired right there, to say the least.

But has e.g. noted by Matt Turck, it’s high level but do we think, say, OpenAI had a long detailed deck that Sam Altman presented slide after slide to Satya Nadella?

OpenAI raised $1B in 2015 in its first funding round (when they were still claiming to be a non-profit org) — without a business or monetization plan. I’m all about cost-efficiency, frugality and finding PMF before raising substantial amounts of capital — but not all startups are created nor should be created from the same mold, albeit there's certain playbooks in venture. This is especially something that we must absorb in Europe (both from the LP, VC and founder perspectives), in order to materialize the continent’s potential to become a tech and AI/ML powerhouse.

If one looks at e.g. Sweden — it seems like we have the courage to invest large sums upfront in ‘DeepTech’ ventures that are very tangible and have an industrial angle. Take e.g. Northvolt and H2 Green Steel as concrete examples. I want to underline that I have huge respect for these companies. BUT we need to become better at doing the same for transformative software companies with unique teams, access to talent, and secular tailwinds.

In pre-seed and seed the team is everything — whether you raise $1M or $100M. I saw one person commenting the following on Twitter about Mistral:

“Instead of funding a slew of startups with traction and revenue, the $113M went to guys who met 1 month ago, have no product, no users, and no experience running startups”

First of all, I don’t think that they met 1 month ago. Further, the team behind Mistral AI is truly A+++ for building the European version of OpenAI. I’ve heard amazing things about what Guillaume Lample has built at Meta and how pivotal he has been in their LLM efforts — and the same goes for Timothee Lacroix. People I know from Google DeepMind have also spoken highly about Arthur Mensch.

We need more of this in Europe — individuals who have built transformative technologies and large-scale systems to jump from their high-paying jobs in big tech and starting companies (naturally, all of them shouldn’t raise $100M right away depending on the dynamic of the business they're starting).

Founding teams don’t always have to have prior startup or business building experience if they have rare technical insights and operational (or research) experience. This usually holds especially true in more technical categories.

Here are two concrete examples from the data infrastructure world:

  1. Snowflake was founded by 3 engineers with no startup or business background. They had worked on database architectures at Oracle as the organization's top engineers, gathering insights and realizing a need for a cloud-first data warehouse architecture leading them to start Snowflake. The first backer of Snowflake in 2012, Mike Speiser from Sutter Hill Ventures, had to also take the CEO role for the first two years until 2014, after which Snowflake hired Bob Muglia from Microsoft to take the CEO role. Snowflake launched its first product, the cloud data warehouse, to the public in 2015 after raising +$70M. Later in 2015, they struggled to raise their C-round, and the $79M round was saved by the existing investors Sutter Hill Ventures, Redpoint Ventures and Altimeter. Fast-forward to 2020 and the Snowflake IPO become the largest software IPO in history and the company is today valued at $56B. I recommend reading this Medium article from Redpoint Ventures with excerpts from their Snowflake investment memo in 2014.
  2. Databricks was founded in 2013 by 7 engineers and academics — six of whom have Ph.D.’s in computer science and worked together at Berkeley. The current Databricks CEO & Co-founder Ali Ghodsi (who actually has strong ties to Sweden and took his Ph.D. at the KTH Royal Institute of Technology in Stockholm), joined forces in 2009 at Berkeley with Matei Zaharia, then a 24-year-old Ph.D. student, on a project to build a software engine used for data processing that they dubbed Spark. This would lead to the founding of Databricks in 2013. Ali Ghodsi has famously said that they looked to raise in 2013 a couple of thousand dollars in their seed round, with his six engineering/ academic co-founders with no real experience from the startup or business world. Famously Ben Horowitz from a16z came in and said that for it to be worthwhile for him to be involved, he needed to invest $13M in the seed round. This was after Horowitz had discussed with a Berkeley professor friend of his who said that Zaharia was ‘the best distributed systems person I’ve ever seen’. During the first 3 years of its existence, Databricks struggled to commercialize its offering with no real revenue, despite the strong adoption of Spark in the open source community. In 2016 Ghodsi took the CEO role — despite being an engineer and academic by background. Fast-forward to 2023 and Ali Ghodsi is praised to be one of the top enterprise software CEOs in the world, Databricks just crossed $1B in annual revenue and was valued at $38B in their latest funding round.

Every business and market has different underlying dynamics — this is especially important when looking at Mistral. I saw the following comment on Twitter recently regarding Mistral’s round and the dilution:

“If it’s such a potentially massive company, why dilute yourselves like this so early? Very weird round”

First of all, Mistral operates in a really resource-intense category of training and building Foundation Models/LLMs. This is reflected in the seed round structure — the $113M round was raised at a $260M post-money valuation — meaning that the founders ‘gave up’ 43% of the company in the seed. This led to reactions like this on social media:

“If they’re really an A+++ team, why do they need $113M to build out their MVP? Shouldn’t a team that brilliant be able to compete with OpenAI ,Hugging Face, etc. (not saying we need another tbh) with just their wits and some pocket change like the rest of us?”

Why did they do this? Well, e.g. GPUs/H100s aren’t cheap… To give some context, Bytedance (the mother company behind TikTok) just gobbled up $1B of Nvidia GPUs for AI this year. A large chunk of the $113M Mistral seed funding will go towards hardware to train these models. And accessing and gathering training data is not cheap. So a very noticeable part of the investment goes to training models and GPUs — this vs the usual people cost (the main line item in other startups). The GPU shortage & supply chain issues are a very real problem in the tech and startup world. A massive shortage of GPUs, has been hobbling OpenAI’s planned roll-out of many features and services, Sam Altman reportedly told a group of entrepreneurs and developers in a closed-door meeting in London a few weeks ago. And accessing high-quality training data is becoming increasingly expensive, as recently highlighted by e.g. WSJ. So when thinking about it — the Mistral AI founders left their high-paying jobs at Meta and DeepMind, where they had pretty much unlimited resources, to start a company that — relatively speaking — has no resources in comparison. And to make it really clear — no secondaries were sold in the seed round, the founders have not made a buck. They may have made paper money, but anyone who has worked in the startup world knows that that doesn’t mean anything. It’s like Matthew McConaughey said in Wolf of Wall Street: “Fugayzi, fugazi. It’s a whazy. It’s a woozie. It’s fairy dust”.

Now they have to prove that they can deliver in a highly capital-intensive category with all eyes on them. I think that quite many of us would have stayed at Meta and DeepMind earning seven-figure salaries with unlimited resources. And that’s why we need more founders like this in Europe.

When it comes to dilution in ‘mega’ seed rounds — I recommend reading this recent blog post by Alex Izydorczyk. As highlighted by Alex, there appears to be a misconception that this is relatively expensive capital: VCs and founders alike have commented on the dilution the Mistral founders took. A common refrain has been that the founders have taken an undue risk or that they would suffer less dilution if they raised a smaller round. This is incorrect. While it is true that startups traditionally raise incrementally larger rounds every 12–24 months from series seed to IPO and take only 15–20% dilution each round, there is nothing necessarily optimal about that strategy.

There are plenty of ways to Rome — also in the European startup ecosystem (Americans seem to have realized this already a while back).

I’ve seen discussions that there’s no need for a ‘European OpenAI’ — question the category they're entering. This is an example of some of the discussion on Twitter:

“When you are the first, you get to be visionary. When your entire case is ‘build it better for EU’, you are setting yourself up for a different comparison. Our seed deck was very high level too. But we were claiming to create a new category. I don’t think EU-LLM is a category.”

I agree that ‘EU-LLM’ is not a category. At the same time, you don’t have to be a category creator to justify a large seed round — we must remember this in Europe. Many have said that we had the ‘iPhone moment’ in AI when ChatGPT was launched in November ’22. The truth is that all of this truly started with Google’s ‘Attention is all you need’ paper in 2017 and then BERT later in 2018. Another ‘holy shit’ moment was in 2020 when OpenAI made GPT-3 available.

A vast majority of the value gets created in the 2–3 years after a major platform disruption. For example: Uber, Airbnb, and Instagram were all created within three years of the iPhone launching.

As noted recently by Brad Gardtner, founder of Altimeter Capital and an early investor in e.g. Snowflake, in an interview with Jason Calcanis:

“In 1998 we knew search was going to be big. That doesn’t mean that you should have invested in Lycos, or AltaVista, or Go, or Infoseek. But you wanted to be prepared in 2003 to invest in Google.”

Image source Altimeter

I also agree with Bessemer’s notion that a David and Goliath story is shaping the core of the LLM/Foundation Model infra market. To many, it appears an oligopoly of a few key players, including OpenAI, Google, Microsoft, and Anthropic, is solidifying to capture the bulk of the developer activity generated by this AI wave. But the long tail of AI models and enablers is blooming, in part driven by the open source community. Some open source models like stable diffusion and whisper are at times outperforming centralized players, particularly in areas like speech-to-text and image. Training AI models is also getting cheaper, leading to the emergence of a fragmented ecosystem of low-cost, smaller, or verticalized models.

If you’re a pre-seed or early seed startup your team is your make-it-or-break-it — so don’t bury it e.g. as the last slide in a pitch deck. Early-stage investors will back you and your team. So tell them why you are exceptional and uniquely positioned to tackle a huge problem. This being said, if you have strong open source or revenue traction: don’t leave this until the end, after spending slide after slide talking about “how big the market will be”. Instead, bring it right at the start and hit investors in their face with it. Then explain how you’re doing it. Unless the market itself is not obvious (unlikely in many cases), there is no need to remind investors that “Quantum will be big” or “AI/ML will be in most software”. No one really cares — because it’s obvious. Sometimes — like with Mistral AI — less is more. Especially if the background and story make sense.

Think about e.g. Snowflake and Databricks. Your story is more important than any template. The truth is that we need more individuals like the Mistral founders in Europe, who jump away from their high-paying low-risk jobs at big tech companies — where they have an abundance of resources — to start new category-defining or leading companies from Europe.

Final thoughts: Europe should be an AI powerhouse

The general narrative is that the US and China are leading the pack in AI/ML while Europe has taken its ‘usual’ laggard position. Someone cynical might say that Europe is where ChatGPT gets regulated, not invented.

However, this shouldn’t be the case when taking a closer look at the impact the European educational and research ecosystem has had on global AI development.

When talking about the influence that Europeans and the European educational system has globally, it’s hard not to mention France. The French AI “mafia” is quite unique:

  • One of the founding fathers of deep learning — Yann LeCun — is French (today VP & Chief AI Scientist at Meta Facebook)
  • Almost all the members of the Meta AI LLaMA Research team are located at FAIR-Paris; 11 of 14 creators of the large language model LLaMA come from Polytechnique & Normal Sup
  • The founders of Hugging Face, one of the largest platforms globally for deploying AI models — are French (Clem Delangue, Julien Chaumond & Thomas Wolf)
  • Dataiku, one of the leading enterprise AI/ML platforms (backed by e.g. Google Capital and valued at $3.7B in their latest round in December ’22) was founded in Paris in 2013 by Florian Douetteau, Clément Stenac, Thomas Cabrol and Marc Batty
  • And you guessed it, Mistral AI is French

It’s really impressive to look at how much influence France has on the global and European AI ecosystems.

Image source

But there are plenty of other examples in Europe beyond France. E.g. DeepMind has strong European roots being founded in the UK. Google recently merged Google Brain and DeepMind into one unit to drive their AI leadership.

Now called Google DeepMind (creative, I know), the new group is led by the British DeepMind CEO and co-founder Demis Hassabis — as Google continues its AI push as one of the absolute leaders in the market. Hassabis’s elevation also means that the heart of Google’s AI operation shifts to London, a move with profound implications.

And there are more examples from other countries in Europe. The illustration below is the evolutionary tree of modern Large Language Models.

Image source

As already mentioned, what we’re witnessing now in terms of LLms started with Google’s “Attention is all you need” paper published in 2017, where the Transformers architecture — powering e.g. all OpenAI GPT-models — was introduced.

However, without Tomáš Mikolov (a Czech computer scientist working in the field of machine learning), the evolutionary tree above would most likely look like a weed. Mikolov is primarily known to the scientific community and the general public for his dramatic improvements to the function of language recognition and processing applications, such as Google’s machine translator. He did this by creating new neural network models that considerably surpassed previous approaches to language modeling. The improvements that followed in natural language processing were the greatest seen in recent decades. He was named Neuron Award Laureate for major scientific discovery in AI in computer science at the start of 2019.

Also, the underground roots of the evolutionary tree above should have included e.g. the Ronan Collobert & Jason Weston ‘NLP from scratch’ JMLR paper from 2011. That was the first paper to show nice word embeddings and had a profound impact on NLP architectures. Ronan and Jason have strong European roots.

Ronan Collobert received his master’s degree in pure mathematics from the University of Rennes in France in 2000 and received his Ph.D. in 2004 from the University of Paris. Today he’s a Research Scientist at Apple after spending over 7 years at the Meta AI team.

Jason Weston earned his Ph.D. in machine learning at Royal Holloway, University of London in the UK. Today he is a Research Scientist at Meta.

And the list goes on… Peter Welinder — who’s the VP of Product at OpenAI and part of the management team — is Swedish.

These are just a couple of examples of how Europeans are driving AI development in some of the most influential players in the global AI/ML ecosystem today.

Luciana Lixandru, a partner at Sequoia, recently noted how it’s an exciting time for European tech. Europe is the home to some of the best universities in the world, the pool of quality engineers has never been higher, and the pace of innovation continues to escalate with fields like AI/ML taking the world by storm.

In Europe, we have world-leading universities and nearly 3 million engineers (but talent is more distributed than ever). I strongly recommend having a look at Atlas, Sequoia’s interactive guide to Europe’s technical talent to understand it better.

What’s my point here?

My point is that we have incredibly impactful AI knowledge and experience in Europe, with strong roots in European universities. BUT more often than not this knowledge gets swooped by American tech giants fast.

This is one of the reasons why Mistral AI is such a good thing, sending powerful signals across Europe. Further, after a time of exuberance and cheap capital, with frankly too many funds chasing too many deals in order to deploy capital, it’s good for VCs to truly embrace their role as representatives of an asset class (e.g. compared to private equity) that is power-law driven and has been behind five of the six most valuable public companies in the world today (Apple, Google, Amazon, Microsoft and Nvidia).

As noted by Founders Fund:

“We believe that the shift away from backing transformational technologies and toward more cynical, incrementalist investments broke Venture Capital”

And when it comes to Mistral, I can personally only agree with this quote from Twitter:

“Whatever people say about Mistral, the founders have fire experience to actually pull a rabbit out! I’m cheering them for sure”

Many, if not most, talents in e.g. the field of LLMs originate from Europe. It’s time for the new generation of European entrepreneurs to show what an AI/ML powerhouse we can be.

--

--

Oliver Molander
Oliver Molander

Written by Oliver Molander

Preaching about the realities and possibilities of data and machine learning. Founder & investor.

Responses (2)