AI Turmoil: DeepSeek's Impact & Global Market Shifts
DeepSeek's rise sparks AI market chaos, causing NVIDIA's massive stock drop. Explore implications, geopolitical tensions, and future AI developments.
File
DeepSeek DESTROYS the AI Industry. r1 model is UNSTOPPABLE.
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: The AI apocalypse is here, brought to you by DeepSeek, NVIDIA down 17% today, the largest single market value loss in US stock market history, half a trillion dollars. Basically, the entirety of the investment promised to us by the Stargate project is erased in an afternoon. DeepSeek is riding high, topping all the charts, being one of the most used AI apps right now. Must be feeling pretty good until it gets hit by a massive, massive large-scale cyber attack, limiting registrations, limiting its abilities for users to use it, etc. And meanwhile, the global stock market loses over one trillion dollars. A trillion dollar sell-off in one day. So what does this mean? Did the AI bubble burst? Is this the end of AI as we know it? Was Gary Marcus right all along? I was wondering if he's doing a victory lap on Twitter. Yep, yes. Yes, he is. So in this video, let's take a look at what's actually going on, what the new DeepSeek model is and isn't. Are the claims made by the model, are they actually true? And most importantly, I think this is what people are missing. What are the actual implications of that? So let's get started. So a quick refresher for those of you that may have not been following along. So DeepSeek is a company, they have multiple DeepSeek models, the V3, the R1, the R10. Each of them kind of mind-blowing for very different reasons. V3 shows that you can sort of distill information from a larger teacher model and make the sort of the smaller student model very smart and great at reasoning specifically about things like programming, math, etc. Here V3 goes up against a GPT-4.0, which is a good apples to apples comparison. They're both sort of the non-reasoning models. They don't have a chain of thoughts. They don't think before they answer. They kind of spit out the answer. This is different from something like the OpenAI's O1 model or DeepSeek's R1 model. Those are the sort of same class of models, so to speak. But notice here the AIME, so the high-level math that these are benchmarked on, right? So the 4.0, GPT-4.0 OpenAI gets 9% accuracy. The DeepSeek V3 gets almost 40%. Massive, massive leap. They also have the R1. So this is their answer to OpenAI O1. It's the reasoning model. It's able to think about problems before answering and sort of that thinking time, that test time compute allows it to very often produce better answers. In fact, you get better and better answers at exponentially higher costs of test and compute of it thinking through the problem before answering. So DeepSeek, this model out of China coming out and basically replicating the most advanced OpenAI model, putting it out for open source, doing it for just pennies on the dollar, all of that kind of was a big, big deal. And I'll explain why in just a second. But all right, so really fast what you have to know about kind of what's happening here is number one is U.S. is trying to limit the sort of access that China and other countries have to AI chips. So U.S. is trying to limit how many AI chips are exported to China, Russia, Iran, and North Korea and have unlimited access to U.S. close allies. Part of the reason is U.S. and China kind of have had this sort of influence war. China wants to have its own influence. U.S. wants to have its own influence. And sort of the latest kind of a battleground for that is this idea of whose tech will stuff be running on, specifically here, whose tech will AI run on, right? So who will be the hardware provider? Who will provide the rest of the kind of ecosystem for it? U.S. has been leading with their sort of closed proprietary models by OpenAI and the other major companies, you know, Amazon and all the companies that provide the cloud services, NVIDIA and their chips, et cetera. So those are the companies that U.S. was focusing on to make sure that it sort of provides the AI ecosystem across the world to the people that are aligned with us. So number one, China is throwing kind of a big wrench in that plan by saying, here's an open source model that's as good as sort of the best AI models available in the U.S. So that's kind of the geopolitical lens. The other lens is, of course, the U.S. stock markets, right? So tons of companies announce how much money they're investing in their AI efforts. You know, Microsoft and Google, they all have their large budgets that they spend on various AI infrastructure, on running these models, inference, various upkeep, et cetera. This model seems to suggest that a lot of that is just being completely wasted, right? How is it possible that this small company in China is able to build these models for so much cheaper? Six million dollars was, I think, the cost to train this model. Also, it sounds like this company, this was kind of a side project for them. They're an equity fund managing money. And I think this bought up a bunch of Nvidia chips to do some side crypto projects. So they kind of had it sitting there. It was their leftover. And so they were able to use that kind of the infrastructure, the chips that they had to create this model as almost a side project. So that's kind of the broad strokes. But let's get to kind of the heart of the issue. And that is, is it true? Is it true what this company did, what they were able to do so much cheaper, faster, better than the U.S. companies spending massive amounts of money? Like, are the U.S. tech firms just setting all this money on fire? Or, as the people suggested, maybe this company, you know, China in general, they're kind of trying to push this stuff out and they're not being honest about it. Maybe they have a lot more chips than they seem to let on. So the scale AI CEO, Alexander Wang, says DeepSeek has about 50,000 Nvidia H100s that they can't talk about because of the U.S. expert controls that are in place. So the idea being is that they have massive amounts of compute. They're saying that they're using just a little bit to kind of make this model seem better than it is, to kind of throw a shade at the U.S.-based tech firms. Here's Gavin Baker, who talks a lot about the various kind of investments in the space. Here's kind of his take on this situation, which I found to be one of the more nuanced takes so far. So a lot of people that are asking whether or not the claims by DeepSeek are true or not, the answer is probably right. The answer is yes, but like there's a little bit more nuance to the conversation. So first of all, the R1 is 93 less to use than O1 per each kind of API call. We'll get into what they did there to make it that much cheaper, that much more efficient. It can be run locally on a high-end workstation and does not seem to have any rate limits, which is wild. Also, obviously know that there's geopolitical dynamics here at play, and it's not a coincidence that this came right after the Stargate. It might have been announced a little bit prior to that, but right around the same time, and certainly we think that there's a various geopolitical kind of dynamics here. U.S. and China are competing to win this kind of AI race, AI war, if you will, although I don't like that term. There's certainly a very kind of competitive war that's on to kind of control the global AI infrastructure and, of course, to produce the best AI models internally. So the R1 is very similar to OpenAI's O1, although it lags the O3, the yet sort of unreleased model. By the way, the chatbot arena, kind of where all these chatbots get matched up against each other, where people vote blindly on which one is better. So the DeepSeek R1, as you can see here, it's sitting right near the top, just a few points behind ChatGPT 4.0 latest. Interestingly, it's just above the O1. They're all like a few points apart. Obviously, the DeepSeek R1 has much less votes, so that still can be kind of shuffle up and down. And of course, this isn't the end-all be-all sort of quality assessment of these models, neither are the benchmarks. We kind of have to look at it holistically. And also really, at the end of the day, it's about your own use cases for it. So coming back to Gavin's post, he's saying there were real algorithmic breakthroughs that led to it being dramatically more efficient, both to train and to inference. Again, we'll come back to that, exactly what they did in just a second. But I kind of wanted to get to the part about the actual, the chips, because that seems to be kind of the thing at the center of this whole thing we're trying to figure out. Did they actually manage to train this model with much less training compute than their sort of US counterparts? So he's saying it is easy to verify that the R training run only cost $6 million. So here's Imad Mustaq. So he was the guy behind Stability AI. He was the founder. You've probably seen him talking in a lot of different interviews about kind of the future of AI. Here, he breaks down kind of the math behind why DeepSeek isn't lying about having more compute and what the actual training costs are for VR and R1, kind of the two models that we're talking about here. And after running the numbers, so he's saying that it looks like how they ran it, DeepSeek, the chip efficiency was actually lower than expected and that optimized H100s could do it for less than $2.5 million. He's stating DeepSeek are not faking the cost of the run. It's pretty much in line with what you'd expect given the data structure, active parameters, and other elements and other models trained by other people. You can run it independently at the same cost. It's a good lab working hard. By the way, this is Neal Khosla. So he's the son of Neal Khosla, kind of this massive, well-known, very successful Bay Area Silicon Valley investor. So he's an Indian American billionaire businessman and venture capitalist, co-founder of Sun Microsystems, and the founder of Khosla Ventures. So Neal, his son, is one of the people that has presented this idea that DeepSeek is a CCP, so the China's Communist Party, a PSYOP, and economic warfare to make American AI unprofitable. They are faking the cost, was low to justify setting the price low and hoping everyone switches to it to damage AI competitors in the U.S. Don't take the bait. And of course, this created a lot of controversy. So because it seems like the father is a major open AI stockholder, they, of course, are investing in a lot of this U.S. tech. By the way, really fast, like a lot of our takes that we have on the internet, so to speak, it tends to be very polarizing. Like it's either one way or another. Very often people say things that are like true enough themselves, but because you tend to, in your mind, think like, oh, are they on my team or the opposite team? We kind of tend to either agree or disagree with those statements. Based on that, the Chinese state, the CCP, is likely supporting, to a great degree, a lot of these companies. Just like you see Sam Altman, the founder of OpenAI, on stage with the U.S. president, you know, them, you know, talking and being super nice and friendly and chummy. Like it shouldn't come as a surprise that the China, the other side of it, they also have a lot of influence, probably a lot more influence, into these Chinese startups. So just the point that I'm making is like this first statement could very well be true. It is a CCP state, a PSYOP, and economic warfare. That could be true, while still they're not faking the cost. So just keep that in mind. These are not necessarily like mutually exclusive things. There's probably some state influence. There's probably a lot more chips than we realize, but it also doesn't mean that the R1 model is, that the numbers are just made up. And here's why. So it's easy to verify that the R1 trading run cost $6 million. So this is, again, a Gavin Baker and confirmed by Imad Mustaq. So let's, for the time being, assume that this is accurate. I think based on the data we have available right now, based on what we know, it seems reasonable to assume that those numbers are accurate. With that in mind, while this is literally true, it's also deeply misleading, right? So the nuance here is that the $6 million does not include costs associated with prior research and ablation experiments on architectures, algorithm, and data. So this means that it is possible to train an R1 quality model with a $6 million run if a lab has already spent hundreds of millions of dollars on prior research and has access to much larger clusters. So that $6 million could be the real cost that it costs to train that one particular model on the amount of compute that they state. And DeepSeq has way more than 248 H800s. One of their earlier papers referenced a cluster of 10,000 A100s, right? So the real question is, can an equivalently smart team, right? So let's say you get the people that are just as smart, just as knowledgeable. You give them $6 million and you give them whatever, however many H100s, 2,000 plus, that they said that they used to train this model. Can that team now train up a similar model from scratch with $6 million? So it's that easy to replicate. Or there are massive costs and architectures and a lot of other stuff that's behind the scenes that we have no idea about. And finally, this. So this is another kind of a big thing. So as we've talked about on this channel, what is a distillation? So distillation is the idea that we take a large smart model, call it the, like, let's say the O1, and we use it to create reasoning, right? These chains of thoughts for solving any given problem. Then we use those sort of thoughts, that synthetic data, that reasoning, that rationale, whatever you want to call it. It's called by different names depending on where you look at it. We take that and we use that as synthetic data to train the next generation of models. Again, not a new idea. Again, this star paper from 2022 Stanford and Google literally explains this technique, or rather I should say this technique is star. That's literally what they called that technique of taking the thoughts, the rationales, as they call it. So they use a different word for it, rationales, and then fine-tuning the next generation of models using those rationales. Notice one of the authors, Noah E. Goodman. So this is from Reuters, OpenAI working on new reasoning technology under a code name Strawberry, prior to that known as QSTAR. And Strawberry has similar ideas to a method developed at Stanford in 2022 called Self-Taught Reasoners, STAR. That's what we're just talking about. And here's Noah E. Goodman, who is not affiliated with OpenAI, right? So he's at Stanford, I believe. So he's saying STAR enables AI models to bootstrap themselves into higher intelligence levels via iteratively creating their own training data. And in theory, it could be used to get language models to transcend human level intelligence. So again, so that idea that we're talking about, so it goes by different names, right? So nowadays, I think more and more people are saying distillation as in knowledge distillation, right? So Gowen continues, there was a lot of distillation. So it's unlikely that they could have trained this without unhindered access to GPT-40 and GPT-01. In fact, on the first day that I was testing the DeepSeq model, I have a video of this. If you type in, what model are you running on? Guess what it says? Do you think it says, oh, I am a DeepSeq R1 or V3, whichever one I was testing at the time? No, it says I am running on the GPT-4 architecture. And right there in that video, I say, I explain the reason why it says that is because all these models use this distillation process. A lot of them were built on the back of GPT-4 and then later GPT-40, maybe not the O1, but we'll get to that in just a second. So that's kind of also an important point to understand that, right, we have these tight export controls on these NVIDIA chips, although maybe they're not as tight as we think, because obviously China is, and various startups in China are getting their hands on seemingly a lot of them. And number two, while we have export controls on the hardware, we have no such thing on the, not the software, but rather the outputs, the synthetic data. So again, like we can say with near 100% certainty that these Chinese models were trained on synthetic data from these US models. Like I'd bet money on it. We know this because there's a certain telltale signs in those models that kind of really strongly hint at the fact that this is happening. So kind of the big point being here is that a lot of this is sort of subsidized by OpenAI, all the other research that's been published, probably maybe some other stuff that's not published, that's sort of acquired by other means by this synthetic data, et cetera. So does it mean that this is just a ripoff? There's no sort of innovation happening here. There's nothing new. Well, no, because again, they nailed the algorithmic breakthrough. So here's Jared Friedman, partner at Y Combinator. So he's saying lots of hot takes of whether it's possible that DeepSeek made training 45 times more efficient, but this person, we'll get to that person in just a second. So they wrote a clear explanation of how they did it. And so the person kind of wrote that article that we're going to be kind of looking at. So Jeffrey Emanuel, so he's a former quant investor and he wrote this, the short case for NVIDIA stock. So basically he's saying that NVIDIA is going to take a beating, lose a lot of money. And he wrote it on January 25th. And if he had his money where his article is, if he did in fact short NVIDIA before that, let's say on the 24th, well then as he said this morning, oh, as NVIDIA is down at 10% pre-market. So what's in the paper? So here's Jared Friedman. So kind of like a quickly going over and we'll look at the article as well, but he's saying use 8-bit instead of 32-bit floating point numbers, which gives massive memory savings. Compress the key value indices, which eat up much of the VRAM. They get 93% compression ratios. Do multi-token prediction instead of a single token prediction, which effectively doubles inference speed. And a mixture of experts model decomposes a big model into small models that can run on consumer grade GPUs. So that's been around for a bit. We believe we know GPT-4 is running on that. So that was a big breakthrough. Now more and more models are going in that direction. By the way, if you're into this, this is a great read. So it's a 60-minute read there. It's a hefty article. It's a big boy of an article, but I don't know, this guy seems kind of smart. Might be a worthwhile 60 minutes. The part about the theoretical threat really breaks down how DeepSight did it and why is it also shocking. You guys give me so much crap for using that word, shocking. If you can use it, I should be able to use it. He mentions that this DeepSeek company started as a quant trading hedge fund, but who knows if any of that is really true or if they're merely some kind of a front for the CCP or the Chinese military. Again, like I said before, very well could be some overlap. Does it matter? Does it not matter? It doesn't mean that what they did was completely false. And he starts with, one, this model is absolutely legit. Yes, there's a lot of BS with the AI benchmarks, but this model stands up. The two big wins are in model training and inference efficiency. They were able to train these models using GPUs in a dramatically more efficient way. Shockingly, stunningly more efficient, some may say, over 45x more efficiently. Again, he does the full deep dive into the 8-bit, 32-bit numbers, etc. Between this and there's another post by Dr. Jim Fan, the senior AI researcher at NVIDIA, talking about some of the other things that the DeepSeek model did. So instead of using sort of very heavy, big models to be judges, DeepSeek actually came up with their own way of doing that, which is much faster, much lighter. We already covered that in a previous video. Major advances in GPU communicational efficiency. They use the mixture of experts, which again is not the newest thing, but it certainly helps. And he talks about the DeepSeek API, how it's able to charge something like 95% less money for inference requests. And he also briefly mentions that some people are saying they're lying about their GPUs, etc. And again, while it's certainly possible, it is, I think it's more than likely that they are telling the truth and they have been simply been able to achieve these incredible results by being extremely clever and creative in their approach to training and inference. They explain how they are doing things, and I suspect it's only a matter of time before the results are widely replicated and confirmed by other researchers at various other labs. So I think kind of one of the big points here is that everybody that's publishing their research contributes to this field. Everybody else is able to look at that research, whether it's like from Google and Transformers or Stanford and Star, or in this case, DeepSeek that's publishing these sort of algorithmic breakthroughs. And now what one company or one organization discovers now becomes kind of the standard for everyone. That's kind of an important point because people are saying, well, who's ahead, U.S. or China? Well, China did just publish everything open source, published the research. There's other either organizations or universities in China publishing papers on how to replicate the O1 and kind of like explaining a lot of what goes into it. Was a lot of that possible due to U.S. tech and open AIs, you know, getting the synthetic data, the distillation out of the open AI models. Of course, obviously, will the U.S. tech firms now do the same with this research published by these Chinese companies? Yes, of course. Of course, if this works, they're going to use that and apply that to their model and probably see the same results. So at the end of the day, what does this all mean? Well, it's important to understand that in terms of AI progress, you know, I feel like AI progress, there's like two states that it's in. It's so over and we are so back. And right now it's both. It's over and we're so back because, again, the AI progress keeps going forward. The people are saying that, you know, scaling has hit a wall. You know, it's like that joke that I've already used too many times. Yeah, it did. It's like a vertical wall and AI progress is going straight up on top of everything that we're seeing, the progress with the O1 and everything else. At the same time, we're seeing Chinese researchers contributing this new algorithmic improvement that is just insane. 50x improvement in inference, probably more. 45x improvement in training. Right. So if AI was a stock, it'd be going like this. It just went skyward. It just went skyward. OK, so AI is is blowing up. It's getting it's getting faster, cheaper, better, smarter. It's it's it's over. We're so back. So AI is doing great. Who does this affect? Well, first of all, we've got to look at the different companies. Of course, Nvidia is down 15 plus percent because people are concerned that they're going to be losing kind of like their cash cow, the selling the chips for training the models. I don't know the sort of the chip hardware side quite as well. So take what I'm saying for the grain of salt. But if I understand correctly, so Nvidia sort of has the lead for producing the training AI chips. As far as inference goes, there's there's a lot of other things that can be used, right? We have the grok chips that are extremely fast for large language models. They're actually these large language models chips. I believe they call them LPUs. So instead of GPUs from Nvidia graphic processing chips or the tensor processing chips out of Google, they're the language processing chips. There are rumors that now Meta has a war room set up, right? Because obviously, you know, they're they're the ones that are trying to go after the open source ecosystem. And here's a Chinese company releasing the best open source model. Right. And so apparently the Lama project within Meta within Facebook has attracted a lot of attentions internally from high ranking technical executives. Of course, it's the hottest, latest, coolest thing. And as a result, they have something like 13 individuals working on the Lama stuff who each individually earn more per year in total compensation than the combined training cost for the DeepSeek v3 models which outperform it, which kind of goes to show you that saying is true. Necessity is the mother of invention, right? So when you limit the access to what you have, that sparks creativity. You have to figure out how to do stuff with less resources, right? Do you need 13 individuals pulling in five plus million a year, you know, executives working on this thing? Well, apparently not. And of course, next, we're going to see how a lot of these companies react. Keep in mind, so Elon is doing an earnings call on Tesla in a couple of days from now. Elon, of course, is very interested in his new AI startup, XAI. So I haven't heard him make too many statements, anything official yet, but we'll see kind of how he reacts to it because certainly people are waiting for him to make a statement. He probably doesn't want his company, the valuation of XAI, to take a hit. They've been doing very well raising money for it, but of course, stuff like this can negatively affect the perception of these startups. So for a lot of these companies, this is kind of bad on the surface certainly, but at the same time, assuming that they're able to replicate these results and add those algorithmic improvements, it would help their ability to produce better AI models cheaper. So again, if you don't think about the kind of the valuation and the, you know, the capital expenditures or whatever, this in the long term seems like it's a good deal. I mean, have you ever like bought a bunch of stuff to just make sure you have it for like a rainy day and like the next day everything goes on sale? I mean, you know, you have mixed feelings about it, right? I mean, you're upset because you just wasted a bunch of money, but the fact that it's cheaper moving forward certainly isn't in of itself a bad thing. This is similar to that. We got to keep in mind because again, if they're able to replicate it, that's good for their future costs of developing these models. And as for NVIDIA, so this person writes, keep in mind, this is him likely shorting NVIDIA stock and he created this shortcase for NVIDIA. That's what he called it. So he's putting it out there and likely made a lot of money on it. So of course, here he's going to present the negative case. That's kind of the point of doing this. So if you're interested, certainly read it. I'm not saying that it's either right or wrong. I'm just saying that you got to kind of understand how the game is played, right? People see a shorting opportunity, they short the market of a specific company, then they put out their short case. And if that sort of short case other people read and they're like, oh, I agree with this guy. Yeah. And then they either sell or they short the company themselves. The person that's out first kind of makes a lot of money. So putting out cases like this is part of that strategy to get the word out there. Doesn't mean it's wrong or right. You know, a bit of a conflict of interest. I think that's fair to say. He does talk about the optimistic perspective and potentially having kind of like the next big wave is the robotics wave. Certainly NVIDIA seems to be like it's well positioned for that, you know. But none of this is investment advice. This is not an investment channel. There are people, of course, that are very bullish on NVIDIA. Again, Imad Mostak, DeepSeeker 1 is bullish for NVIDIA. Plan is to build their LM operating system platform with Digits. This has 5070 performance and 128GB unified VRAM as an intelligence box. Two of them can fit R1 quantized and run it fast without fans. Gents, I know that we already have an OS with Vue SL. Guess what's next? So I sell Windows subsystem with Linux and then R1 quantized. So that's taking the R1 quantized is like basically like a compression, like making it smaller without losing too much of its functionality. And Digits is the greatest Blackwell AI supercomputer on your desk. By the way, for people that are deeply into the hardware aspects of it, NVIDIA chips and stuff like that, a lot of you, I'm sure, know more than I do about the hardware side of things. I 100% am aware of that. So please comment below. What's your take on this? So the R1, DeepSeek, all the algorithmic breakthroughs, are they good for NVIDIA, bad for NVIDIA? Is this the mother of all buying opportunities or are we going to be trying to catch a falling knife here? But I think the really big point here is that, you know, if you're playing the markets, there's obviously a lot of fluctuation and you might come out really well or perhaps lose some in the process. But if we're talking about AI, AI is still here. It's still doing really well. It's still accelerating and moving forward. Progress is continuing and nothing has changed. And also there's one very clear and massive winner in this whole thing and that's open-source AI. That's one thing that I don't think I fully appreciated a year or two ago, just how amazing it's going to be, how big, how powerful, how fast it's going to just catch up to everything. Open-source AI is winning. And since it's largely driven, you know, if you're in the United States, where a lot of the AI development is, right, so the government could, you know, at any point every four years when the leadership changes, it could swing way one way or another. You could say, you know, we're against open AI and start kind of clamping down on it. If these innovations are driven from China, there's largely not too much you can do. So open-source AI is bigger, better, faster. It's more secure than we realize. Secured in terms of like, it's not going to go away. It's unlikely to go away. It's going to keep continuing. It's shockproof. But whatever the case is, AI stock is going up. Open-source stock is going up. If you enjoyed this video, hit thumbs up. Make sure you're subscribed. My name is Wes Roth. Thank you so much for watching and I'll see you next time.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript