DeepSeek R1: Open-source AI Rivals OpenAI's Best
DeepSeek R1 astonishes tech world as open-source AI model rivals OpenAI, offering powerful, cost-effective solutions. Explore its groundbreaking capabilities.
File
Chinas DeepSeek R1 SHOCKS The AI Industry (BEATS OpenAI) DeepSeek R1
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: So this was by far one of the most surprising AI releases this year. And of course that's pretty funny because the year has just started. Now I'm referring to DeepSeek R1. So you can see they tweeted DeepSeek R1 is here and this is a model that is truly surprising because the performance is on par with OpenAI 01. But that is not the surprising part. The surprising part about this is that this is actually a fully open source model that is available to basically everyone for free on their website. And trust me, the research surrounding this and how effective this model is, is truly astounding. So let's dive into exactly why the internet is actually raving about this because this is something that I would say caught OpenAI by surprise and truly does show where the AI industry is headed as a whole in terms of how quickly these models are getting better. So this bar chart diagram that you can all see right here basically shows you why this model has been all over the internet as of recently. This is the DeepSeek R1 model which is a direct competitor to OpenAI 01 model. Both of these models are based around system 2 thinking which is essentially where these models think for longer. And it's a new concept that we've only recently got to explore but as you can all see this is something that is yielding outstanding results across the board for any models that are developed in this way. And now most surprisingly it's incredible because the current available models, if we're looking for whichever models are currently the best in terms of performance, most people would say it's OpenAI. But now somehow China has managed to somewhat outshine the master in AI or the leaders and innovators by releasing such a powerful system. You can see here that the DeepSeek R1 in this striped purple blue bar is right on par with OpenAI 01 and of course exceeds the OpenAI 01 mini in a variety of different benchmarks. Now you have to understand that a lot of these benchmarks are really really difficult and I would even argue that some of these are potentially completed considering that some of these benchmarks have maybe a two to five percent error rate. So it's quite likely that these models are remarkably effective at these tasks in a way that people can't understand. And this is why I think the industry is starting to realize that if we get these kinds of performance gains from AI models and we do know that now the cycle is speeding up in terms of capabilities, where are we going to be in three years perhaps? Now this was pretty crazy because a lot of people who are excited about R1, like I am too, is because not only is this on par with OpenAI's R1 model, this is something that is remarkably cheap which now means that developers who are trying to maybe build a product or just using this for testing can access a state-of-the-art system for pennies on the dollar. So this is going to be something that is game-changing for the entire industry because in a variety of use cases we know that these models are extraordinarily expensive. Now this part right here is what I think is also a remarkable piece of innovation from this team and this is something that most people skipped over. Whilst everyone was focused on the R1 debate, in the research paper they actually give us a glance at how the distilled models perform. So if you don't know, model distillation is where you have a teacher model and you distill that knowledge down into smaller models making them more effective and smarter. Now this is of course done due to efficiency. You can see right here that you essentially get the knowledge of a super smart model into a much smaller model and because of that you essentially save a lot on time and of course a lot on your inference. Now the crazy thing about this is that without distillation we wouldn't be able to achieve relative results that would achieve that level of performance at that size. So distillation is something that is remarkably effective. So what we're seeing here is these frontier models like GPT-40, Claw 3.5 Sonnet perform well on these tests but then surprisingly we're seeing the R1 model which is the new model from DeepSeq but that model being distilled down into a 70B model, a 32B model, an Alarmer 8B model and even surpassing these models in certain use cases and I think that is incredible. It's like imagine getting the level of Claw 3.5 Sonnet but in a 70B model size. That is going to be an absolute game changer for many individuals. Of course I always recommend you to test these models on whatever specific benchmark you're going to be using them on. Some people might use it for Creative Horizon, some people may use it for reasoning about math, some may use it for reasoning about science or some may use it for coding. Whatever it is definitely make sure you thoroughly test the models before running away with excitement as sometimes certain models tend to have certain characteristics if you will say that are just a lot better than other models. But as I was saying this is something that is absolutely remarkable. So we can get truly truly effective models at a much smaller size that have the reasoning ability of larger models and I think this is going to be a trend for the future so that who knows maybe in the future we're going to have ridiculously smart AI models that are on our side. We can see that the GPT-4.0 on these benchmarks had achieved like around a 70 score and a 50 score on the GPQA which is some of the hardest tests but these distilled models are achieving comparative results at a much smaller size which is truly incredible when we realize what's going on here and we do know that as time goes on this is quite likely to improve. Now this wasn't the only remarkable thing that existed in this paper that most people were surprised about. One of the most remarkable things that they actually talk about is the fact that there is self-evolution and the emergence of sophisticated behaviors as the test time computation increases. As you know one of the things that these models do is they will think for a longer period of time and the longer they think the better the responses get. So you can see right here that it says behaviors such as reflection where the model revisits and re-evaluates its previous steps and the exploration of alternative approaches to problem solving arise spontaneously and what's interesting is that these behaviors are not explicitly programmed but instead emerge as a result of the model's interaction with the reinforcement learning environment. This spontaneous development significantly enhances the model's reasoning capabilities enabling it to tackle more challenging tasks with greater efficiency and accuracy. Essentially what they're stating here is that because the model is thinking for so long oftentimes during those reasoning thought process it might come up with interesting ways to solve the problem and this is something that is emergent in this system because they didn't explicitly train it to do that. So this is actually an example of this in practice. So you can see right here the model is reasoning through this math equation and it's reasoning through this problem by thinking about how it can solve it and the bit that they've highlighted here you can see it says wait wait wait that's an aha moment I can actually flag here and this is something that is super surprising because this is where the model learns to rethink using an anthropomorphic tone and one of the things that you know most people have seen about this DeepSeq R1 model is that it actually thinks somewhat like a human. It's weird you're essentially hearing the internal thoughts of an AI model that's able to reason through certain problems and surprisingly it managed to say wait wait wait aha I can actually flag this here and solve the problem better. I'm going to show you guys another example of this which is just absolutely incredible. This is where we have one user on Twitter who is actually sharing the internal thought process when they prompted the model to think of a random number and you can see if you actually read this in detail the model somewhat talks like a human. Like if I was reading this I would say yep that's a human that's thinking about a random number. You can see it says the user wants me to think of a number between one and ten thousand but not tell them they mentioned not to overthink it so maybe just pick a random number quickly let me see one of my numbers is seven seven seven but hey that might be too obvious maybe one two three four wait that's too sequential maybe a number that's common but not too predictable 42 that's from the Hitchhiker's Guide to the Galaxy alternatively 1004 nope 187065 wait that's from a song and the crazy thing about this is that like the crazy thing is that we've never been able to see the internal reasoning thoughts from OpenAI's model. They've always wanted to keep that hidden. Many people have speculated because of course they're trying to protect their methods of reaching advanced reasoning from their competitors which is completely understood but what if the reason is the fact that a lot of the time these models are thinking in ways which are extraordinarily human that would make you anthropomorphize the model which basically just means we're gonna start to think that these things are truly human. I mean that is something that's always up for debate the consciousness stuff but to see the internal thoughts is super fast even for this example right here it says hey want to be emanentized give me a single word answer and you can see it actually talks about how hmm I first need to understand what this actually means and then it says hmm given that I'm an AI I actually don't have any personal desires. It says if I say yes it might support this idea and it's just something that I think most people should start to realize because I think it's going to be a heated debate as these systems get more and more intelligent. The reason why like I said before most people are bullish about this kind of technology is because as they say that this moment is not an aha moment just for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning because rather than explicitly teaching the model on how to solve a problem you simply provide it with the right incentives and then it autonomously develops advanced problem solving strategies and this aha moment serves as a powerful reminder of the potential reinforcement learning to unlock new levels of intelligence in AI systems paving the way for more autonomous and adaptive models in the future. If you aren't familiar with reinforcement learning I may include a short snippet of what exactly that means but reinforcement learning is basically where you reinforce the good behaviors and over time when it's constrained to an environment or a set of rules these emerging capabilities do spawn in and usually they are basically magic in the sense that you couldn't have predicted them before and they usually allow the AI to do something crazy. Now if you want to take a look at the full R1 evaluation you can see right here how the model looks in terms of comparisons to other models like CLAW 3.5 SONNET, GPT-40, the DeepSeq v3 base model that they released before, O1 mini and O1 and we can see that this model is definitely up there in terms of the capabilities. Again if we look at the distilled model evaluation we can see that these models also do perform exceedingly well on a variety of different benchmarks. Now the craziest thing about this all is that someone asked on twitter how is DeepSeq going to make money but someone actually responded DeepSeq's holding is a quant company. Many years ago you know these were just super smart guys with top math backgrounds and they happen to own a lot of GPU slash trading equipment for mining purposes and DeepSeq is actually their side project for squeezing those GPUs so I think this is something that's absolutely incredible. This company's side project has managed to catch up to OpenAI so I'm actually wondering how much longer their moat may last for. Overall it seems to be an exciting time to be in the AI industry as there are non-stop updates with regards to how smart these models are.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript