Speaker 1: Hello all, my name is Krish Naik and welcome to my YouTube channel. So guys, recently the talk of the town is all about DeepSeek and I hope you have heard about DeepSeek R1 model, the kind of buzz it is currently making, all the American AI companies are worried, you know, even Google, you know, OpenAI, so many big, big companies who have probably spent so much of money to train this amazing MOE models or other kind of LLM models or reasoning models and because of this DeepSeek, the model that has specifically come right now, it is very, very much cost efficient with respect to inferencing, with respect to inferencing, sorry, with respect to training and obviously the cost of training has also got reduced. So I know everybody may be reading some of the other things in the internet, in blogs and all, in LinkedIn posts and all, but, you know, I really wanted to make a dedicated video to provide you with some of the information to make you understand all about this. We'll talk about what exactly, who is DeepSeek, which company is this, you know, and how did they specifically train this particular model, why this model training was very much efficient when compared to all the other models. We'll also be discussing about the kind of techniques that they have specifically used to train this model. Along with this, we'll also be seeing the demo of DeepSeek, you know, chat, and I'll also try to showcase some of the problems, some of my developers, you know, as soon as we understood, hey, this LLM model is quite efficient with respect to inferencing cost, we started using this for developing our Gen AI product, which we are specifically doing. One of our developers, Mahendra, he started exploring and he showed a lot of things which DeepSeek was not answering properly. You know, when I say not answering properly, it's more like keeping quiet, okay? So that also I'll probably show you. Please make sure that you watch this video till the end because there will be a lot of things that you'll see. There will be a lot of things to discuss. So let me quickly go ahead and share my screen. So here is the DeepSeek website. You can probably go ahead and get the access of DeepSeek version 3. You can just directly go ahead and click on start now. There are various models that you'll be able to see over here. DeepSeek 3.3, DeepSeek 2.5, QAN 2.5, LLAMA 3.1. Sorry, this is mainly developed by them and we are comparing all the different performance over here, right? And over here, you can see, with respect to this, blue color, specifically, if I see the metrics, it is superbly, amazingly higher than all the other models out there. But as soon as this DeepSeek model came, I would definitely like to showcase this amazing thing. It's just like to all the American companies. Okay, fun apart, guys. I just wanted to show you this thing. But now let's go ahead and understand about all the specific models. And if you don't know, today itself, DeepSeek has also announced a new multi-model, which is called as Janus Pro, okay? And this specific model is just like for image generation. And with respect to the metrics, it is better than DALI, you know, which OpenAI has probably come up with, okay? So let's understand each and everything. So what exactly is DeepSeek? It's a Chinese AI research lab established in 2023. And they have actually taken this particular project as a side project itself, okay? So this is like a quant company. Quant company basically means you have people who are very much expert in mathematics, physics, all this hardcore problems, you know, where they have probably done PhD and all. And that is the reason they could crack this problem statement over here, okay? And here you can see that has rapidly emerged as a competitor to giants like OpenAI with this DeepSeek R1 model. Despite being a newcomer, it challenges established players through remarkably cost efficiency and innovation, okay? We'll try to understand what is this, how it was really cost efficient, you know? If you are watching my videos with respect to generative AI LLM models, I've already made videos like how an LLM model is basically trained, you know? There is a very important step which is called as supervised fine-tuning technique, right? And they have completely replaced this supervised fine-tuning technique. Why I'm saying you this? Because if I go to DeepSeek over here in the GitHub, right? And the best part is that they have completely open-sourced all the techniques that they have specifically used. Now, this is a major blowout. This is a major blowout to companies like OpenAI, right? The name is OpenAI but most of the thing is closed over there, right? So that is the reason. So what they have actually done, see, they have probably announced about DeepSeek and all and over here you can see with respect to the performance, right? This blue color is specifically with respect to DeepSeek. Then you have this OpenAI models, right? OpenAI Mini and all. Here you can see with respect to AIME, a kind of performance metrics, code forces with respect to code solving, GPQA, Diamond, Maths, MMLU, SWE Bench verified. It's pretty much better, right? With respect to all the other metrics. MMLU is a little bit less than OpenAI but here you can probably see it's pretty much good. Now, how they were able to do this, you know? Now, this is the statement that they have actually done and all the research papers have also been probably uploaded, I guess. And here you can see the paper link. You can just go ahead and click it and here you can probably see this, right? So this is the entire PDF. Even you will not be getting in our shift. They have directly put in the GitHub over here. So DeepSeek R1, incentivizing reasoning capability in LLMs via reinforcement learning. This is the most important word. Reinforcement learning. Now, let's understand. What did they do? Before, if you want to probably train any LLM models, all the companies were specifically applying supervised fine-tuning technique. Okay? Now, in this supervised fine-tuning technique, they were specifically using this to create the base model. But here what they did, instead of this, they applied directly reinforcement learning. Now, if you know about reinforcement learning, right? The agents become better and better, right? With respect to surroundings, different, different things. Here, this approach allows the model to explore the chain of thoughts for solving complex problems, resulting in the development of DeepSeek R1.0, right? It demonstrated capabilities such as cell verification, reflection, generation, long COTs, chain of thoughts, right? Chain of thoughts. Chain of thoughts basically means from one or the other event, they are able to remember multiple things, right? And because of this, the reasoning capabilities of these LLMs has been amazing. Okay? And that is the reason. This was one of the things. One is post-training, right? So, you have something called as post-training, pre-training and all. In the post-training, last skill reinforcement learning on the base model. See, we create the base model till one specific stage, right? After that, on top of it, they have also applied reinforcement learning. Now, because of this, the performance has probably increased by a lot. Like, training time has been decreased. We introduced our pipeline to development. DeepSeq R1. This pipeline incorporates two RL stages aimed at discovering improved reason patterns aligned with human preferences, as well as two SFT stages that serve as a seed for the model reasoning and non-reasoning capabilities, right? So, this is the pipeline that they have specifically used. Again, I will repeat it. The pipeline incorporates two reinforcement learning stages aimed at discovering improved reasoning patterns and then two SFT stages. It is not replacing SFT, but on top of that, it has basically added this reinforcement learning stages, okay? And the second reason, smaller models can be powerful too. So, they have also applied distillation. Distillation is a process, you know, here you can see, we demonstrate our reasoning patterns of larger model can be distilled into smaller models. Larger model is basically made it converted into smaller models, okay? Resulting in better performance, all these things are there, right? So, all these things are there and you can probably also go ahead and see in the hugging phase. Even you can also try it with Ollama. I have already done it, but I will create a dedicated video later on. Now, this was the major things with respect to this. Now, here I was actually discussing about, right? Now, in this key innovation and strategies, here you can see cost efficiency. They spent somewhere around 5 to 6 million dollars, that is what they are probably stating, you know, to train the foundation model. On the other hand, other companies like Google, Facebook, OpenAI, they have spent more than 100 times of this particular fund, right? Let's say 100 million, 1 billion, somebody say, it's more than 100x times. I will not say 100 times, but 100x times. I should probably keep over here x, okay? Inferencing wise, operational cost are also significantly lower, enabling scalable deployment. I specifically use this. My developers are also using it. They are saying that this is super, super fast, okay? And if I talk about the cost, with respect to the inferencing cost, I think 1 million tokens, for 1 million tokens, OpenAI charges somewhere around 50 to 60 dollars. Whereas this, they are charging in cents, I think 60 to 70 cents. That is what I was able to see in some of the documentation that they had, okay? Hardware constraint as a catalyst. Now, how they were able to do this, it is all about innovation and strategies, guys. Due to the US export restriction, Chinese firms like Deepsea could not access Nvidia's top-tier H100 GPUs, okay? That is what it is basically said. And with the help of this particular GPUs, many bigger companies are creating bigger, bigger models. Instead, they were just using H800 and A800 chips, right? Now, how this particular chips were able to do this? Because they definitely brought some kind of innovation with respect to training the model. And that is the reason they were just able to use this and they were able to train it, okay? Now, architectural breakthroughs. Mixtures of experts activates only subset of the model. So, mixture of experts, multi-head, latent attention, these are all techniques for specification. These are all techniques specifically used. And that is the reason they were able to just, even though the GPU was not that powerful, they were able to do it. And the next thing is that they have open-sourced every details like how they were able to do it through that particular research paper. Now, just imagine the kind of competition will come now. Other companies will also come and see it, you know. Today, Sam Altman also said that, hey, Deepsea R1 is an amazing model, but don't worry, we are also coming up with something more amazing, right? Now, because of this, what will happen? More research, more competition will probably come, you know, and more better model will be coming. And the best part is that this will be very much helpful for all those users, all those companies who are specifically using the services to use them, right? And because of that, the cost will decrease and that is what I am seeing. In the future, the cost should keep on decreasing with respect to this, okay? And these are all the remaining things which you can also see. I will put this GitHub link in the bottom one, okay? In the description of this particular video, okay? Now, the next thing. Let's try some of the things. So, here I have actually got the access of this. You can just go to chat.deepsea.com. Now, I don't think so I will be using chat GPT anymore because this is pretty much good. So, let's see. And how this specifically does the reasoning. This is the beautiful part, okay? So, let me just go ahead and say, hey, please write me a blog on agentic AI, okay? I am just probably around... 500 words, okay? Around 500 words. Let's see. Now, see this. How this reasoning specifically happens, okay? And soon, I will also be coming off this particular video. Our team is already exploring this. We are creating the Gen AI product. There are some concerns which we have. Probably, we will test it out and then probably, you know, those kind of videos also we will try to display. So, I need to... Okay, so now the reasoning, thinking has started, okay? First, I should define the agentic AI clearly. These are all things. Next, I should outline this. Then, this, this, this. See, automatically, it is reasoning itself, right? And this is what is the power of reasoning itself, right? Just imagine if I am trying to use this kind of model along with my agentic AI application. Just imagine what kind of work it will be able to do, okay? Ethical consideration. And this is like... This model is just like, you know... I know this is created by Chinese AI company. So, Chinese government has probably told, just shut your mouth, okay? You don't have to probably speak more than what is required, you know? Just try to see this because I am going to show you one thing. Just try to see this because I am going to show you one example, okay? And that is what my developer, right? Mahendra. He probably showed and he sent me the screenshot and he told Krish, sir, please do make sure that you mention these points also, okay? Now, here you come, right? The entire... The dawn of autonomous decision-making, autonomy, adaptability, these all things are there. And here you can see that how beautifully it is able to do all this process and it is able to do it. So, the thinking part, the reasoning part is quite efficient. It will just ask, hey, it is not... See, over here. I should also touch the future of agentic AI. I did not say to probably talk about future of agentic AI but it is making sure that, hey, I give what is more than that, right? So, this is good, okay? Now, let me ask one more question, okay? So, now here I will write, mention all the states of India, okay? Now, because of this question, you know, I don't know whether you have tried it or not but let me just press enter, you know? And here you go. Okay, I need to list all the states. It starts with, oh, right, oh, oh, okay. Now, now, now you see what will happen. There were 29 states, 7 unit treaties after reorganization, this, this, this is fine. One state is not visible. See, suddenly, sorry, I am not sure how to approach this type of question. Let's chat about math, coding and logic problem instead, okay? So, obviously, everybody knows that regarding Arunachal Pradesh, right? So, that kind of question when it comes, right? It is not going to give you that answer, okay? Because it is a critical question, specifically with respect to Chinese and Indian relationship, right? Similarly, if I go and ask related to any leaders, it is not going to give you the answer. So, this is completely controlled by Chinese, right? So, here it says, it says, hey, this is my limit, I have to probably speak till here, okay? So, I hope you like this particular video, guys. Go ahead and check out all the information will be given in the description of this particular video. Now, more tutorials and even in my agenda, I am going to be talking about the the Egentic AI batch, you know. I am including even creating Egentic AI application using this DeepSeq R1 model. So, we will try to do that with Ollama. So, I am also planning to include this because this is what is all innovation all about, okay? But still, there are some concerns with respect to DeepSeq, wherein, you know, all the information specifically will be stored in the Chinese server itself. So, that is one. And we don't know, like, how they are going to specifically use that particular data also. So, this was it from my side. I hope you like this particular video. I will see you in the next video. Thank you. Take care. Bye-bye.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now