DeepSeek R1 Revolutionizes AI with Open Source Surprise
A Chinese firm has released DeepSeek R1, an open-source AI model outperforming top competitors, causing shockwaves in the AI industry with its low $5M training cost.
File
DeepSeek R1 - The Chinese AI Side Project That Shocked the Entire Industry
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: DeepSeek R1 was released just a few days ago, and it has sent shockwaves through the AI industry. R1 is an AI model that has the ability to think, just like OpenAI's cutting-edge, state-of-the-art O1 and O3 models. But here's the thing. It's completely open source and open weights. DeepSeek, a small Chinese company, gave all of it away for free, and they even detailed how to reproduce it. But that's not even the craziest part. It was trained for just $5 million, as compared to the tens and hundreds of millions of dollars that most people in the AI industry thought was required to train a model of this caliber. And it has sent everyone in the AI industry scrambling to understand the ramifications. DeepSeek has been called everything from the downfall of major US tech companies like OpenAI to the greatest gift to humanity to a Chinese psyop meant to shake the US to its core. This story is wild, so buckle up. So just about a week ago, President Trump, Sam Altman, the founder and CEO of OpenAI, the founder of Oracle, and many others got together to make the announcement about Project Stargate, that is a $500 billion investment in AI infrastructure built in the US. That is on top of the billions and potentially trillions that have already been spent on GPUs, mostly coming from NVIDIA. Then, right after that, Mark Zuckerberg doubled down on how much his company, Meta, is going to spend on AI infrastructure. Also stating that they are going to continue to spend many billions of dollars building out energy infrastructure and AI infrastructure. So the theme amongst the biggest tech companies in the world is spend as much as we can to win at AI. And then something happened. On January 20th, 2025, a small Chinese research firm called DeepSeek released DeepSeek R1, a completely open source, open weights, AI model that has the ability to think, also known as test time compute, that is directly competitive, if not slightly better than the O1 model by OpenAI that cost hundreds of millions of dollars to train. And just like that, the AI world was flipped upside down. All of a sudden, we had this completely open source version of a state of the art model that we didn't think we were going to have so soon, let alone to be absolutely open source and essentially free. The initial reaction was extremely strong. I've made multiple videos about it. I'll drop them down in the description below. People looked at this and were stunned. The biggest names in the AI industry realized we now had a completely open source, state of the art model. And as everybody was taking this in and super excited that we can play around with it, reproduce it, suddenly the tone shifted. In the technical paper that was released alongside DeepSeek, it was noted that that model was trained for just $5 million. That is a fraction of the cost of what every other state of the art model costs to train. Now think about what this means. Meta, Microsoft, OpenAI, and all of the magnificent seven, basically the biggest seven tech companies in the world have been investing trillions of dollars building out AI infrastructure. And then all of a sudden, this little Chinese company comes along, open sources a model that's comparable to the best models out there. And not only did they make it completely free, but they said it only cost $5 million. And then all of a sudden, a lot of analysts are looking at these big companies spending billions of dollars per year and thinking, do we really need that? And a lot of people are pointing at these big companies saying, you guys are about to lose. You've invested so much money and it wasn't even necessary. Now, I will tell you, I do not agree with that whatsoever. But that is a theme going on right now in the AI industry. And then somebody on Twitter asked, how is DeepSeek going to make money? Because they're giving it away for free. How are they actually going to make money? And the API endpoint to actually run the model is really, really cheap. And you don't even need it. You can run it on your own hardware. And then this tweet went viral. DeepSeek's holding, and this is the Chinese company's name, is a quant company, meaning they are mathematicians tasked with building trading algorithms simply to make money. That's it. Many years already, super smart guys with top math background happen to own a lot of GPU for trading mining purposes. And DeepSeek is their side project for squeezing those GPUs. Essentially, this is not even the main function of the company. This was a side project. So a handful of smart people got together, figured out how to make a state-of-the-art model, incredibly cheap, append the entire AI industry, and it was their side project. That's insane to think about. And this went viral. And the memes were strong. Let me show you a few of the reactions from people in the industry. So here's one from Simp for Satoshi. Sam spent more on this, referencing this incredible automobile, which I know is multiple millions of dollars, and that's Sam Altman driving it, than DeepSeek did to train the model that killed open AI. Now, again, I don't really believe this. I will explain what I think is going on in a little bit. Here we have Neil Khosla, son of Vinod Khosla, saying, DeepSeek is a CCP state psyop, plus economic warfare to make American AI unprofitable. They are faking the cost was low to justify setting price low and hoping everyone switches to it to damage AI competitiveness and the US don't take the bait. Now, there was a community note saying there's zero evidence of this. And that wasn't even the craziest take. In Davos, Alexander Wang, the CEO of Scale AI, basically called out DeepSeek saying, no, they actually have many more GPUs than they're telling us simply because there is an export ban on China from the US that we cannot export our cutting edge chips to them at scale. And so in the research paper, if they admitted that they had a bunch of GPUs, obviously, the US would be pretty pissed. And in this clip, Alexander Wang talks about how DeepSeek probably has 50,000 H100s, which are Nvidia's top of the line GPUs. And the fact that they can't talk about it because it goes against the export controls that the US has in place. And maybe that's true. Although, again, remember, everything is open sourced and they really went into deep detail, they being DeepSeek, into how they actually produced this model for so cheap. And the company hugging face is reproducing it right now. Now, let me show you some posts from Emad, who is the founder of Stability AI, who basically ran the numbers and figured out, yeah, it's actually legit what they're saying. DeepSeek are not faking the cost of run. It's pretty much in line with what you'd expect given the data structure, active parameters and other elements and other models trained by other people. You can run it independently at the same cost. It's a good lab working hard. Now, it wasn't enough. He didn't put any numbers, but of course, he followed up and did. Check this out. So he basically says, for those who want the numbers, here it is. Optimize H100 could do it in less than 2.5 million. And he actually used ChadGPT-01 to figure it out. Now, I'm not going to go through this is a bit technical for this video. And again, now all of the focus is back to the major tech companies, Anthropic, Meta, OpenAI, Microsoft, who have raised and spent billions and billions of dollars to build out AI infrastructure, only to have the rug pulled out from under them from this tiny Chinese company. Listen to this. DeepSeek goes mega viral and they can handle the demand on their two Chromebooks they have to use for inference. Meanwhile, Anthropic cannot handle the load of their paying customers with billions in funding. Do I get this right? And that seems to be the sentiment across the board. Here's another one. I've made over 200,000 requests to the DeepSeek API in the last few hours, zero rate limiting and the whole thing cost me like 50 cents. Bless the CCP, OpenAI could never. Now, here's the thing. We've been talking on this channel a lot about test time compute. A lot of the scaling that's happening in AI right now is not at pre-training, not that $5 million that it costs to actually build out the model. But since these models can now think, and the more thinking they do, the better the results, that thinking is actually just compute. It's using compute. And so what's interesting about this is that even at test time, so they're hitting the API 200,000 times, zero rate limiting and extremely inexpensive, unless they are just losing tons of money and have a bunch of GPUs that we don't know about, they've figured out something about efficiency that the US companies have not. Alexander Wang follows up with a post. DeepSeek is a wake up call for America, but it doesn't change the strategy. USA must out innovate and race faster, as we have done in the entire history of AI, and tighten export controls on chips so that we can maintain future leads. Every major breakthrough in AI has been American. And continuing, China's DeepSeek could represent the biggest threat to US equity markets as the company seems to have built a groundbreaking AI model at an extremely low price and without having access to cutting edge chips, calling into question the utility of the hundreds of billions of dollars worth of CapEx being poured into the industry. So that's a huge, huge claim here. Now, it's one thing to be able to train the model originally at a very cheap and efficient price. But it's another thing to actually be able to run the inference at an extremely cheap and efficient price. Now, I said earlier, I don't believe it. And let me tell you why. So there's two possibilities. Let's just assume they were able to figure out how to make this model extremely cheaply. We're going to be able to replicate that. Awesome, right? Everybody wins. That's the power of open source. Now, at inference time, at thinking time, even if, let's go down the two paths, even if this model is able to run inference extremely cheaply, then we are getting to Jevon's Paradox. As the cost per unit of any technology decreases, the usage, the total usage, and the spend actually increases. We've talked about that on this channel. That is because as the unit cost of any tech decreases, the amount of use cases that it can apply to in a positive ROI way increases dramatically. That's what we've seen with every tech throughout history. Then let's think about the other path. They actually do have a bunch of GPUs powering it, and they're simply faking how efficient it is. Well, first of all, we're going to figure that out because we have AI companies throughout the world replicating DeepSeek R1 right now. But let's just assume they're doing that. Then that's fine. All of this investment is still very valid. And even if it is really efficient, all of this huge investment by these AI companies in AI infrastructure is still valid because at the end of the day, whoever has the most compute will have the smartest model. It doesn't matter if it costs $100 per token or a fraction of a penny per token. The more compute, the better. Whoever has the smartest AI will win. And here's Gary Tan, the president of Y Combinator, basically saying the same thing. And this is in reference to the chart that we just talked about where it is a big threat to US equity markets. Do people really believe this? If training models get cheaper, faster, and easier, the demand for inference, actual real world use of AI will grow and accelerate even faster, which assures the supply of compute will be used. Yes, that is the way to think about it. I agree wholeheartedly. But not everybody agrees. Chamath Palihapitiya, billionaire investor, former early Facebook employee, and all in podcast bestie, has the exact opposite to say. And he actually broke it down pretty well. So in his first point, he's saying in the 1% probability that the CCP has all of these chips that they shouldn't, we need to go investigate that. So that's point one. Next, he talks about training versus inference. We are in the era of inference right now. We always knew this day would come, but it probably surprised many that it would be this weekend. With a model this cheap, many new products and experiences can now emerge trying to win the hearts and minds of the global populace. Team USA needs to win here. To that point, we may still want to export control AI training chips. We should probably view inference chips differently. We should want everyone around the world using our solutions over others. Now I'm going to jump down to point four now, because this is interesting. And the part that I really disagree with, there will be volatility in the stock market as capital markets absorb all of this information and reprice the values of the mag seven. That's the magnificent seven companies like Tesla and Meta and Microsoft. So keep that in mind. Tesla is the least exposed. The rest are exposed as a direct function of the amount of capex they have publicly announced. Translating that, it basically means the company's stock might go down because of how much they have invested into AI infrastructure, because if everything's cheaper now, why did they spend so much? Do not agree with that at all. Again, let's look at Javon's paradox. The cheaper the tech, the more it's going to be used. The more inference needs to be used. Thus, all of that supply of GPU is going to be used. NVIDIA is the most at risk for obvious reasons. That said, markets will love it if Meta, Microsoft, Google, etc. can win without having to spend $50 to $80 billion per year. The markets might love that, but that is not going to be the case. Again, whoever has the smartest AI will win. Eventually, when we reach artificial superintelligence, it is literally a battle of who has the smartest AI. And what does that take? The most amount of inference or the most compute in general. And what does that take? The most chips, the most spend into chips. If we find really efficient ways to use these chips, great, everybody wins. But ultimately, the cumulative number of chips is really what's going to matter or compute. He goes on to criticize the US and saying that we've been asleep. And I'll just read this because it's an interesting take. The innovation from China speaks to how asleep we've been for the past 15 years. We've been running towards the big money shiny object spending programs and have thrown hundreds of billions of dollars at a problem versus thinking through the problem more cleverly and using resource constraints as an enabler. Now, a key concept to know is that if people are faced with bigger restrictions and bigger constraints, they tend to get more creative. They tend to be able to extract more efficiency out of less. And that's what he's really referring to here. I think the quote is constraint is the mother of innovation, something like that. But not everybody thinks it's just conspiracy theories and the end of US tech companies. Jan LeCun, the head of Meta's AI division, who is a big proponent of open source, has this to say. To people who see the performance of DeepSea can think China is surpassing the US in AI. You are reading this wrong. The correct reading is open source models are surpassing proprietary ones. DeepSea has profited from open research and open source, e.g. PyTorch and Lama from Meta. They came up with new ideas and built them on top of other people's work. Because their work is published in open source, everyone can profit from it. That is the power of open research and open source. And I could not agree more. This is a huge win for open source. This is going to allow many companies to start competing with the closed frontier models by having open source state of the art models. This story is still unfolding. It has been crazy to watch the AI industry react to the news that essentially everything that they thought might actually be changing right now. So what do you think? Do you think they have more GPUs than they're leading on? Do you think that they were able to basically come up with this amazing efficiency with just a handful of people as a side project? Did China just jump into the lead of AI? Or is this just a great gift to the world because it is open sourced? I'm going to continue following up on this story. I am enthralled with it. I am absolutely fascinated by what's happening right now in the world. And I hope I broke it down for you well. If you enjoyed this video, please consider giving a like and subscribe. And I'll see you in the next one.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript