Speaker 1: An open-source O1 model is finally here. DeepSeek R1 is on par with OpenAI's O1 thinking model. It is completely open-source, including OpenWeights, MIT-licensed, and it is a fraction of the price of O1 if you want to use the hosted version. The future is here. I've been saying for a while that open-source is three to six months behind closed-source, and now about three months after O1 was released, we have a completely open-source version of it. So I'm gonna tell you all about it right now. All right, so first things first, let's just look at the benchmarks before I go into the details. So the dark blue bars are DeepSeek R1. The dark gray is OpenAI O1. Lighter blue, DeepSeek R132b. Lighter gray, OpenAI O1 mini. And then in the lightest blue, we have DeepSeek V3, which is a non-thinking model. Now, look at these results. So on the AIM 2024 benchmark, we have DeepSeek R1 beating OpenAI O1. On Codeforces, it's pretty much equivalent, off by 0.3. GPQA Diamond, it's not quite as good as OpenAI O1, but it's close. On the Math500, it beats O1. MMLU, it's behind, but barely. And then for the Suibanch, it beats O1, barely. So these are incredible results for a completely open-source model. The company DeepSeek went through the effort of putting this together and then open-sourced it completely free. Open-source, open weights, I cannot stress that enough. This is such an exciting time. And here's the thing about open-source. This is kind of like the four-minute mile. Now that other open-source companies have seen that it's possible, not only that, they've published the roadmap for exactly how to accomplish it, we're gonna see a flood of these open-source thinking models. It is a very cool thing to do by the team at OpenSeek. But that's not it. Look how it performs against Claude's cutting-edge model and GPT 4.0. So against Claude's models, pretty much across the board, except for SWE verified, which is interesting, which is the coding benchmark, it is winning. And then of course, against GPT 4.0, across the board, it beats it pretty handily, actually. So let's see what the DeepSeek blog post says and break it down. So first, performance on par with OpenAI O1. So that means if we continue the prediction that open-source is gonna be about three to six months behind the closed-source models, we should see an O3 level model within the next three months. It is fully open-source and they provide the white paper for it. And it is MIT licensed, distill and commercialize freely. Do anything you want with it. It is commercially viable. And you could use it for free at chat.deepseek.com. Now I already ran a test on it. I'll get to that in a moment, so stay tuned for that. And they've already released distilled versions of it. So distilled from DeepSeek R1, six small models fully open-sourced. If we click into this, we can see distilled from DeepSeek R1 distilled Quen 1.5, 7, 14 and 32B and R1 distilled Lama 8B and 70B. These perform incredibly well. Look at the performance, GPT 4.0 versus the distilled version with Lama 70B. 70 versus 9.3 on the AIM benchmark. Look at the live code bench score, pretty much dominating the other non-thinking models. I guess, obviously, but even the O1 mini model, really, really good. But still the O1 mini dominates with the CodeForces competition. Although with a 1600 score, still pretty darn good. So as I mentioned, license update, it is MIT licensed for clear open access. Open for the community to leverage model weights and outputs. You can download the model weights right now. I'll drop all the links in the description below. It has API outputs that can be used for fine tuning and distillation. And as I mentioned, it is a fraction of the price. I'll talk about the pricing in a moment. And here it is. They dropped the technical paper. I'll probably do a separate video doing a deep dive into the paper. And here's the pricing. And they compared the pricing to the O1 models because, of course, it shows how much cheaper it is. So here's input API price. Here it is way down here. You can't even see the color, 0.14 per million tokens. Then here's O1 mini 1.5 and O1 at 7.5 and the O1 preview as well as 7.5. Here is the input API price with the cash miss. Still, it's about three times as expensive, but a fraction of the O1 models. Now, here's where it gets crazy. Here's the output price, 2.19 per million versus 60, 60 for O1 preview and O1. Now, O1 mini is much cheaper, but still much more expensive than the DeepSeek R1 version. This is what open source does. It drives down the price and increases competition. So, of course, I gave it the how many R's in the word strawberry test and let's see how it did. So it does think and you do get to see the chain of thought, which is really cool. So, okay, let's see. And it's funny, as you're reading through this, you're going to notice that it has very human-like internal thoughts. It does a lot of, okay, hmm, let me think. So it's very human-like in its thinking. So, okay, let's see. The user is asking how many times the letter R appears in the word strawberry. All right. First step is to make sure I spell the word correctly. So there it is. Let me check again. Sometimes people might miss letters. Let's break it down. So they're just trying to spell the word strawberry first. Wait, there's only 10 letters, but strawberry has more letters, right? Isn't that a weird thing for a model to say, right? It's just so human-like, it's interesting. Let me check again. Maybe I missed some. Let's write it out properly. So it goes back and forth, finally writes it out. Then we're counting the letter R's. So here we go. Break it down. No, no, yes. No, no, no, no, no, yes, yes. Okay, so there's three. So that's three R's. Wait, but I think the correct spelling of strawberry actually has two R's. So it's interesting. It actually corrected itself incorrectly. Wait, maybe I'm confusing it with strawberry versus a different spelling. Let me verify. Okay, so it does it again, goes back and forth, does it again, and then finally, the word strawberry is spelled. Breaking it down, the letter R appears three times. Positions three, eight, and nine. There are three R's in the word strawberry. So although a lot of the internal thinking was maybe not accurate, it arrived at the correct answer. Let's try another one. All right, so everybody's favorite, the marble question. A marble is put in a glass cup. The glass is then turned upside down and put on the table. Then the glass is picked up and put in a microwave. Where's the marble? Explain your reasoning step-by-step. We probably don't need the explain your reasoning step-by-step. I'm actually gonna go ahead and remove that because that's typically for non-thinking models. They think step-by-step by default. So let's see what it does. Okay, thinking. Let's try to figure out where the marble is after all those steps. So first the problem says a marble is put in a glass cup. The glass is then turned upside down. The question is, where's the marble? Look at that. It is so interesting. They literally include, let me think about it. I don't think the 01 and 03 models do that. I actually find it kind of endearing. It's kind of interesting to see. And first of all, look how much output there is. Look how much thinking there is. So when you put the marble in a glass cup, normally the marble is just sitting at the bottom, right? Then when you turn the glass upside down, the marble would fall out unless something is holding it in. Maybe the marble is inside the glass when it's turned. Okay, so I'm not gonna read the whole thing, but it goes back and forth. It considers a bunch of different options. Wait, let me visualize this. And it really goes over every possible outcome. But standard marbles are usually smaller than the mouth of a typical glass. This is so good. It is really, really good. But I'm a bit confused. Let me check again. And then all the way at the end, let's see what the output is. Placing the marble in the glass. Inverting the glass. Gravity pulls the marble downward unless the glass has a narrow mouth. Smaller than the marble. The marble falls out and lands on the table. Since marbles are typically smaller than the mouth of a standard glass, the marble drops onto the table. Placing the inverted glass on the table, the glass is now upside down over the marble, but the marble is no longer inside. It's on the table beneath the glass. So that's interesting. It might be true. There's actually no real way to know whether the marble's in the glass or outside of the glass, because as soon as you turn it over, it falls out. And then moving the glass to the microwave, the marble remains on the table where it fell. That's a perfect answer. Now, one thing to note is that you cannot use search and DeepThink at the same time. So very similar to the thinking models from OpenAI, you cannot yet incorporate web search into them. All right, one more test. This is one that most models still struggle with, including the O1 model. So let's give it a try. Give me 10 sentences that end in the word apple. Okay, let's tackle this request. The user wants 10 sentences that all end with the word apple. First, I need to make sure each sentence ends with the word apple. I should start by brainstorming different contexts. Okay, so I'm just gonna go straight to the end. One, two, three, four, five, six, seven, eight, nine, 10. Perfect, there it is. And it actually highlighted each one. This is a phenomenal model. Now, I know I haven't done a full benchmarking in a while. If you wanna see me do that with DeepSeek, let me know in the comments below. All right, so here's the paper. It details everything you need to know about how they achieved O1 level reasoning with an open source model. We introduced our first generation reasoning models, DeepSeek R10 and DeepSeek R1. DeepSeek R10, a model trained via large scale reinforcement learning without supervised fine tuning as a preliminary step. It demonstrates remarkable reasoning capabilities. Now, what does that actually mean? So DeepSeek actually solves the cold start problem. They use the AlphaGo technique of pure reinforcement learning, basically just trying a bunch of things without the need for actual human feedback. Human feedback, as I've said, is always going to be a bandwidth limiter. So DeepSeek R10 naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability and language mixing. To address these issues and further enhance reasoning performance, we introduced DeepSeek R1. It incorporates multi-stage training and cold start data before reinforcement learning. And then it goes on to say it achieves really great performance. They also have six dense models that they distill from DeepSeek R1 based on Quen and Lama. So one thing that's interesting about how they did it was they used a group relative policy optimization strategy rather than having a critic model. So typically you'll have this other model and you'll come up with a bunch of potential candidate answers. The critic model will say, okay, this one's good, this one's bad. But basically instead they take the candidate results and they find a baseline and figure out which one might be right. So removing the critic model altogether. Here's the prompt for it. So here's template for DeepSeek R1-0, a conversation between user and assistant. The user asks a question, the assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed with think and answer tags respectively. So think, think, reasoning process here, answer, answer, answer here, user, the prompt, this is where the prompt will go, and then the assistant answers it. So that's the template for prompting. Now here is the aha moment according to DeepSeek. So DeepSeek R1-0 learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model's growing reasoning abilities, but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes. It underscores the power and beauty of reinforcement learning. Rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives and it autonomously develops advanced problem solving strategies. This is exactly what the AlphaGo team found. Rather than giving it a bunch of examples of winning games of Go, it instead just set up a reward model to say, hey, this is what happens when you win, this is what happens when you lose. And it just played itself over and over again. And as we all know, far exceeds any human player in terms of winning ability and it even discovered techniques and strategies that humans had never thought of, the famous move 37. All right, so I'm not gonna go through the whole paper, but if you wanna see me do a deep dive in this paper, let me know in the comments below. So that's it. A huge day for open source, a huge day for AI. You know I'm gonna be downloading this model and playing with it. I encourage you to check it out. I'll drop all these links in the description below. If you enjoyed this video, please consider giving a like and subscribe and I'll see you in the next one.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now