Chinese AI's Impact on Tech Stocks
A new AI model, Deep Seek, shook tech markets by reducing compute needs, impacting NVIDIA and more. Discover why it's a game-changer in AI advancements.
File
DeepSeek - The Chinese AI That Crashed The Markets
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: There's one new AI advancement that has the tech world sort of spinning right now. We're getting articles like this, a shocking Chinese AI advancement called deep seek is sending us stocks plunging. We can see on January 27th, the day I'm recording this video, NVIDIA lost over 17% of its value. According to Lior here, the release of deep seek made NVIDIA stock crash by 17% or $465 billion. Mark Andreessen, one of Silicon Valley's most prominent investors said deep seek R1 is one of the most amazing and impressive breakthroughs I've ever seen. And as open source, a profound gift to the world. So in this video, I want to do my best to try to explain what deep seek is, why the stock market is freaking out about it. Talk about some speculation around it. Talk about what I believe is going to be the longterm end result of something like deep seek, and even show you how you can use it yourself. If you want, I want to give you the whole picture of deep seek so that you're looped in on the biggest story in the AI world right now, as well as have all of the necessary context to speak intelligently about it and form your own opinions on it to really understand deep seek R1 and why people are freaking out about it. We need to go back to a research paper that came out last month in December of 2024 called deep seek V3. Now deep seek V3 was a large model with 671 billion parameters. However, it used what's called a mixture of experts model, meaning that it didn't use all of these parameters every single time it was prompted. In fact, it only used 37 billion activated parameters for each token. What made this model really, really special is this line right here. Despite its excellent performance, deep seek V3 requires only 2.788 million H 800 GPU hours for its full training. To put that into perspective, according to perplexity, GPT force training required approximately 60 million GPU hours. Again, compared to just 2.788 million H 800 GPU hours. And when open AI did it with GPT four, they were using really high end a 100 GPUs from Nvidia. These H 800 GPUs that deep seek was using was because China actually has restrictions on what GPUs the U S can send them. So Nvidia developed these H 800 GPUs to make them compliant and actually be allowed to send them to China, but they're not nearly as powerful as the GPUs that the U S based AI companies have access to. So they were able to train this model about 95% faster than what something like GPT for always trained on with less powerful GPUs than what companies like open AI had access to. And then when we look at the benchmarks here, we can see deep seek V3 is this blue dashed line. GPT four Oh, is this sort of darker yellow line here? The second to last line and Claude 3.5 sonnet is this final line here. And in things like math and coding, this deep seek V3 scored pretty close. I mean, in math, it did a lot better than GPT four Oh, and about on par with Claude on the MMLU, which tests for a variety of tasks for large language models. It was the second highest only behind Claude 3.5 sonnet in math. It scored higher than pretty much everything. And in this code force benchmark, it crushed the other models. And in SWE bench, which tests how well AI does on solving problems on get hub, it was only barely outscored by Claude 3.5 sonnet. So this deep seek V3 again required 95% less compute to train and got results on par with GPT four Oh, and Claude 3.5 sonnet all while being open source and publicly available. However, around the same time, this came out in December, we got access to models like Oh one Oh one pro open AI showed off their next models in Oh three. And so some of these benchmarks didn't look as exciting because people were considering Oh one and Oh three, the new state of the art models and these are being compared to the sort of last generation of state of the art models. So why are people freaking out about it all of a sudden now this week? Well, last week, deep seek released some new research in their deep seek R one. Now deep seek R one uses deep seek V3. So the model we were just looking at that was really fast and a lot less expensive to train on lesser GPUs. It's using that as its underlying model. However, this model went through a new fine tuning method on top of the existing V3 model. So if we read the abstract here, it says deep seek R one, a model trained via large scale reinforcement learning without supervised fine tuning as a preliminary step demonstrates remarkable reasoning capabilities. Basically they asked it a whole bunch of questions that they already knew the answers to and had it double check itself against the existing answer key essentially. Now this is a super oversimplification, but that's kind of how the reinforcement learning worked. It was unsupervised reinforcement learning. So it would ask it for example, a math question. The model would try to figure out the answer on its own and then double check its response to essentially like an answer key of the known answers. It did that with math. It did that with coding. It did that with the various skills that they wanted it to specialize on. Now the other thing that makes R one stand out specifically is that it actually uses chain of thought prompting at the time of inference. So when you put a prompt in, you'll actually see it think through and even correct itself. It might think through an illogical way of doing something and then go, actually this might be the better way to do it and then think through that. So it has this reasoning process that it goes through right after you ask it a question. Now that's not actually specifically detailed in the research paper, but we can see here that the template requires deep seek R one zero to first produce a reasoning process followed by the final answer. And as I recently mentioned, we saw Oh one from open AI. We saw demos of Oh three making deep seek V three feel a little bit less impressive. But when you combine the fact that V three was very, very inexpensive to train on lesser hardware than what we have in the U S and now with this new R one model that uses reinforcement learning to fine tune the model, as well as a sort of chain of thought reasoning when you give it a prompt. Now we're actually getting this open source model to give us results just as good, if not better than what we're seeing out of open AI state of the art closed model. That's why people are freaking out. So check out the benchmark comparison here. Once again, the blue with the little lines through it is deep seek R one. This dark gray bar here is open AI Oh one. And this light blue line on the very right was deep seek V three, the model that sort of preceded this R one model. We can see in pretty much every single benchmark in either outperformed or performed just as good as open a eyes Oh one model. So it's about on par with what we're getting out of Oh one for code. It's beating Oh one in math in sort of general purpose use. It's about on par with Oh one. And in this ability to solve GitHub problems, it is leading the pack open AI Oh one low source model, minimum 20 bucks a month to use trained on thousands of a one hundreds or H one hundreds from NVIDIA. And now we've got this deep seek R one model that was trained on lesser GPUs in way less time that does just as good. And that freaks people out. Now coming back to these headlines, a shocking Chinese AI advancement called deep seek is sending us stocks plunging. Like we saw NVIDIA had a big drop today. The thinking behind this drop is that, well, maybe we don't need nearly as many GPUs as we thought we did to train these next level AI models. If all of the big research companies that are training AI models can now do it at 5% of the time and cost as before and still get like Oh one level results. Why are people going to need to buy as many GPUs? And this had a ripple effect. We also saw meta and Google and Oracle and most of the big tech companies had a drop as a result today because of this. What makes this even more fascinating is this is basically a side project of this company. According to this tweet from Han Xiao, I'm not sure if I mispronounced that or not. He's saying that the company that owns deep seek is a quant company. They've been working together for many years already. They're super smart guys with a top math background and they happen to own a lot of GPUs for trading and mining purposes. And deep seek is their side project for squeezing those GPUs. So they bought the GPUs for investments in crypto and quant trading and things like that. And apparently they had more power than they needed. So they started training their own models. Han here followed up saying that nobody in China even takes them seriously. So it's not that Chinese AI teams are lean and great and can do such great things, but it's only deep seek that's lean and mean. Chinese AI companies are just as fat and heavy on marketing just like their American counterparts. But I actually do think Nvidia is going to recover. I think my personal opinion is that people were starting to get worried that Nvidia was getting overvalued and they saw this news as a chance to get out. A few people probably started getting out. It caused a little bit of a panic, more people panicked and got out. Not financial advice, but I think it's going to recover. There are a few things I want to share about deep seek that make them a little bit more ambiguous and also some counter arguments that I want to share that make me think that selling Nvidia right now was probably a little bit misguided. Let's start with some of the controversy around it. According to this investopedia article here, analysts at Citibank expressed doubt that deep seek had achieved its results without the most advanced chips. They maintained their buy rating on Nvidia stock and said they don't expect major us AI companies to move away from using its advanced GPUs. Alexander Wang, the CEO of scale AI also sort of disputes whether or not they actually used as few of the GPUs that they said they used and whether or not they actually used H eight hundreds, the sort of dumbed down versions of the H one thousands or they use something else. So he says he thinks it's closer to 50,000 more powerful Nvidia hopper GPUs or H one thousands, but believes the company can't disclose the truth due to us export controls on AI chips. Again, I mentioned this earlier, but the reason they're claiming they used H eight hundreds is because the U S basically limits the amount of compute power that are available in the chips that we sell to China. So Nvidia built these H eight hundreds, which are less powerful in order to still be able to sell these things to China. But according to Alexander Wang here and also the city bank official, they actually believe that they were using a lot more GPUs than they claim to be using and a lot more powerful GPUs than they claim to be using. They're just claiming they used H eight hundreds to not get themselves in trouble. There's also been some rumors floating around that maybe they didn't start from scratch. Maybe they use like a llama model as a starting point and then trained on top of that. From all of my digging and research, there's no real weight that I can find to those claims other than the fact that sometimes some of the prompts will claim that the model was created by open AI. Or if you asked it to troubleshoot something for you, it might give you instructions on how to troubleshoot something in chat GPT. But the reality is it was likely trained on just a ton of open internet data. And there's a lot of chat GPT and open AI instructions publicly available on the internet. Therefore a lot of that was probably in his training data just by default and the way they collected the data. There's this site manifold dot markets here where you can bet on random things. And they asked, did deep seek lie about the number of GPUs they use in training of V3 right now? It's saying there's a 38% chance that they did. So seemingly most people don't believe that they actually cheated with this. But as of right now, everybody's just kind of taking deep seeks word for it. Like we haven't seen receipts or anything. Now I summed up my overall thoughts in this X post here. Most people are saying the dip is because models can be trained with way less compute now and that's not good for Nvidia. And that's most likely the reason for the dip. But here's my counter arguments. I just went over this one. Many analysts claim that deep seek either trained on much more powerful GPUs, but can't talk about it due to restrictions. Or they started with a different set of model weights like llama where the expensive part of the training had already been done. This is just speculation, but it is fairly widespread speculation. I also believe that if we know we can use less compute to train fairly powerful models, people will still throw way more compute at it to train even more powerful models. So my second point here, if in fact it did just get much cheaper to train O1 level models with far less compute, many companies will likely still throw more compute at it. If we can train this level of model with this low of compute, imagine what we can train if we 10 X or 100 X did. And then finally, the point that I think is the most important point. And after I said this, I noticed that a whole bunch of other people said it likely before I did. If it is actually a whole ton cheaper to train new foundation models, that really means many of the big companies like open AI have even less of a moat than we thought. It opens the doors for many new companies and many new open source models to pop up, which all need compute. The lack of compute needed to train a single model seems like it will counterbalance with more companies buying GPUs because now they too can create their own foundation models specifically tailored to their needs. Essentially, maybe companies will buy less GPUs per company, but this could be counterbalanced by a lot more companies getting into the game due to lower barrier of entry. And of course, after I posted this, I saw this post from Gary Tan posted several days before my post say, do people really believe this? If training models get cheaper, faster and easier, the demand for inference, actual real world use of AI will grow and accelerate even faster, which assures the supply of compute will be used. And this was in response to somebody saying China's deep seat could represent the biggest threat to us equity markets. As the company seems to have built a groundbreaking AI model at an extremely low price and without having access to cutting edge chips, Satya Nadella, the CEO of Microsoft pointed out Javon's paradox strikes again, as AI gets more efficient and accessible, we will see its use skyrocket turning it into a commodity. We just can't get enough of this is that same 0.3 that I was making here. If we take a peek at what Javon's paradox is here on Wikipedia, the Javon's paradox occurs when technological advancements make a resource more efficient to use, thereby reducing the amount needed for a single application. However, because the cost of using the resource drops overall demand increases to the point where total resource consumption actually rises rather than falls. So basically if we can do more with less compute, that doesn't mean people are going to buy less compute. They're going to buy more compute to do even more with less compute. And also the barrier to entry just got lower for more companies to develop their own models. So I feel like in the end, this will be a net win for Nvidia. But again, this is not financial advice. I would take everything I'm saying with a grain of salt. I'm just sort of digging through all the resources I've come across and trying to put the puzzle pieces together for you. Now, anybody can use deep seek right now. There's multiple ways to do it. You can go to deep seek.com and play with it straight on their website. You can click start. Now it'll log you in through a Google account. And if you want to use the R one model, you click this button that says deep think R one that'll make sure you're using R one. I'll tell it to invent a complex logic problem and then solve it. And when I do that, you can see it actually says it's thinking and you can actually see it think in real time. Okay. I need to invent a complex logic problem and then solve it. Let me start by brainstorming, et cetera, et cetera. Wait, I remember there's a classic puzzle type where there are three types of people, another angle. Wait, here's another idea. Let me outline the problem, but it makes it more complex alternative. Like you could just see it thinking through all of this stuff as I'm talking through it here. This is what makes our one different than V three. Again, the underlying model that this was built on was that V three that I talked about in the very beginning. This R one is the one that introduced this extra thinking through the problem as well as the reinforcement learning fine tuning process. Okay. So that was wild. It actually thought through this process for like a good five minutes or so you could see all of the thinking that it did here and it thought for a long time, 208 seconds. So I guess closer to like four minutes, it then created its own logic problem and then went on to solve the logic problem. But that's not the only way to get it. As of right now, deep seek is also the number one app inside of the free app store on the iPhone. So if you want to use it on mobile, you can get it there as well. All of this news and everybody talking about it has actually caused it to pass chat GPT. Now, if you do run into issues trying to actually use deep seek, this article also came out on business insider, deep seek temporary limited new signups setting large scale malicious attacks. I don't know if this is still ongoing. I didn't see any errors or messages when I tried to log into deep seek, but the article does say deep seek said only users with a China based phone number could register for a new account, a measure taken because it had recently faced large scale malicious attacks. Apparently the issue has sort of worked itself out as of the time of this recording, but just know that it could be up or down a little bit if you are trying to log in and use it. There's also a couple of ways to use distilled versions of deep seek. Now a distilled version is using a smaller underlying model. So instead of using deep seek V3 as its underlying model, it might be using something like Quinn seven B Quinn 14 B or one of the Lama models. If you head over to the console.grok.com, you can actually use deep seek in grok. They're using a distilled Lama seven DB model. So that underlying model that it's using is Lama seven DB, but it's using our ones sort of thinking ability on top of it. And because grok is insanely fast, you get results really, really quickly. So if I ask the same question and tell it to invent a complex logic problem and then solve it and then submit it using grok, we can see that it is thinking, but it's just going really, really, really fast because that's just what grok does when it's using grok's cloud GPUs, eight just blows everything else out of the water in terms of speed. Now that it's done, we can see all of the thinking through that it did here. It's a little bit less formatted than the main web version, but you can see here's where the thinking starts. We scroll all the way down here and we can see that it gave us a complex problem and solved its own problem. And it did the whole thing in just a few seconds. And then finally, you can run it completely locally if you want. I recommend a tool called LM studio for this. It's a free tool that makes it really easy to download and add models. So I downloaded LM studio here. If I want to add new models, I just come to this discover button and then I could just type deep seek up here in the top. It'll find all the different versions of deep seek that you can download and use. You'll find the one you want to use. And then you click this little download button to grab it onto your computer. And then you could run the model locally on your own computer. Right now I'm using the deep seek are one to still Quinn 14 B model. So the underlying model is the Quinn 14 B for this one. Let's give it the same prompt, create a complex logic problem, and then solve it. And we can see it's got its whole thinking box where it goes through and thinks through the whole problem on its own. We can see it thought through the whole process. It actually thought for one minute and 55 seconds to get through this whole process here when I ran it locally. Now I am using an NVIDIA 5090 GPU. So I'm pretty much using the most top of the line consumer GPU you can get. And it did it all in about two minutes. We can see how much it actually thought through before finally at the end, giving us the logic problem as well as its answer to the logic problem. And it did it at a rate of 63.42 tokens per second. The nice thing about using LM studio is once you've downloaded the model, you still need to be connected to the internet to download the model. But once the model's downloaded on your computer, you can unplug the internet, turn off your wifi. And this would have given me the same response in the same amount of time. I can be completely offline. It's sending nothing to the cloud. If I was worried about privacy or data protection or anything like that, I can run these models completely offline with no issues using something like LM studio. And I would know for certain that none of my information is actually getting back to any sort of cloud provider or anything like that. And so that's how you can use deep seek right now. If you want to, again, you've got deep seek.com. You've got the deep seek mobile app. You can run it straight from Grok using a distilled Lama version, or you can use LM studio and use any variation of distilled model on your own local computer. And if you think the story for deep seek ends there, well, the day I'm recording this January 27th, that same company dropped new research this time in an AI image generation model. This new model is called Janice pro seven B. So not only are they creating top of the line, pretty much state of the art, new large language models at cheaper costs and doing it a lot faster. They now appear to also be doing this with AI image generation. Now I haven't played with this model myself yet. So I don't know too much about it. It literally came out while I was in the process of recording this video. But if we look at their benchmarks here, we can see this new Janice pro is the blue with the white lines on it. And it pretty much outperformed in both of these benchmarks against SDXL, stable diffusion, 1.5, Pixar, a Dolly three, SD three medium and EMU three gen, which I believe is Meta's AI image model. So not only are they disrupting the large language models, they're now also trying to disrupt the AI image generation models as well. As I learned more about this Janice model, we'll talk about it in some future videos. I just wanted to add this to the mix because it's that same deep seat company that has people kind of freaking out right now, but there you have it. There's kind of the lay of the land. You're going to hear a lot of people talking about deep seek and deep seek R1. It's going to be in the news more and more. There's going to be a lot of videos about it. A lot of X posts about it. I wanted to break down the facts and what we know about it and some opinions from other people and just make sure you had the lay of the land and know exactly what it's about and can speak intelligently on it. There's probably a few things I've left out. I'm sure they'll get mentioned in the comments if I did, but that is deep seek R1 and now deep seek Janice. And that's why Nvidia and the stock market were affected by it. At least that's why people are claiming the stock market was affected by it. I think it's just a sort of short term thing, but we'll see how it all plays out. Hopefully you enjoyed this video. Hopefully you learn something new. Hopefully you feel more looped in. If you like breakdowns like this and you want more AI news, more tutorials and to learn about more cool AI tools, make sure you like this video and subscribe to this channel and I will make sure a lot more of this kind of stuff shows up in your YouTube feed. And of course, if you haven't already check out future tools.io, this is the site where I curate all the cool AI tools I come across. I keep your AI news up to date on a daily basis here. And I have a free newsletter where I share just the coolest tools and most important news from the week. It's twice a week. It'll hit your inbox. And if you sign up, you'll also get free access to the AI income database, a database of cool ways to make money using various AI tools. Again, it's all free. You can find it over at future tools.io. Thank you so much for hanging out with me. Thanks for nerding out with me. Really, really appreciate you. Hopefully I'll see you in the next one. Bye bye.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript