Speaker 1: One of the hottest things with AI right now is reasoning models. These are LLMs that are able to think to themselves before giving a final answer to the user, and you get some insane results with this kind of setup. There are a lot of reasoning LLMs already out there, like O1 and O3 from OpenAI, and QWQ, an open-source version, from Quen. And now DeepSeek is getting in on this game. They have just released an incredible new open-source reasoning LLM called R1, and they have a bunch of different versions, so you can actually go and run a variation yourself, or get something that is literally more powerful than OpenAI's O1, Cloud 3.5 Sonnet, basically every LLM out there. There's so much hype about this model right now, so I had to go and check it out myself. I was actually planning on making a different video for today, but I tried on R1, I'm now bought into the hype, and I'm going to show you exactly why that's the case. And this isn't just going to be another video like other YouTubers out there. I'm going to talk about how you can run R1 yourself, and we're going to use it in Bolt.diy, which is our open-source AI coding assistant, to show the power of it, making an incredible app really, really fast. So let's dive right into it. So here we are in the official blog release for DeepSeek's R1, here on their website. And I'm going to quickly cover this, just to talk about the benchmarks and the pricing, and then we will dive into how to run DeepSeek R1 yourself, and test it out as an AI coding assistant with Bolt.diy. So first of all, we see immediately that this is a fully open-source model, and it is MIT licensed. So you can use this commercially completely freely, which is just absolutely incredible. And it is a thinking model. So in this diagram that we see right here, or kind of just this example from their chat platform, you can see that it thinks through its response before it gives us the final answer. And you can go and play with this right now. You can go to chat.deepseek.com, and have a chat GBT-like interface playing around with R1. So I'll have a link to this, as well as all the other resources I'm covering here in the description. But yeah, you just go to their chat platform, enable DeepThink, and then when you ask it a question, you'll see here that this kind of grayed out text right here is it thinking to itself before it then gives us the final answer, and more bolded text that we'll see in just a second here. And this kind of reasoning approach always gives better answers. Maybe not always, but almost always give better answers. And that's why we're seeing so much hype right now around models like O1, O3, QWQ, and now R1. But R1, honestly, it seems to be performing better than all of those other reasoning models, and then also any regular LLM as well. So absolutely fantastic. And yeah, we can look at the benchmarks right here. So take this with a grain of salt, because benchmarks aren't always the most trustworthy, and this is provided right here on the DeepSeek blog. So they obviously have a reason to be a bit biased, but just take a look at how incredible this is. And we'll also see later on that it's very much backed up when we test it with Bolt.DIY. But yeah, I'll zoom in here so you can get the full picture very clearly. We've got DeepSeek R1 on the left-hand side here, and then it's being compared to O1, O1 Mini, and their original best model, which was DeepSeek V3. And just look at the performance here. It performs as well, sometimes better, sometimes a little bit worse than O1 in every benchmark. Absolutely incredible, because it is a smaller model, it's open source, and they've got versions you can run yourself, and it is way cheaper. So it's just so cool to see how insanely powerful it is. So yeah, let me get out of this, and I'll go back over to the blog here. Because the other thing that is extremely cool with R1, this is what actually makes it so you can run the model yourself, is they took their competitors. Let me zoom in on this a little bit here. This is just so cool. They took their competitors, Quen and Lama, and they made smaller versions of R1 based on those models. So they're basically fine-tuned, added in all the R1 reasoning to these models like Quen and Lama. So you can download these yourself on OLAMA, Hugging Face. I'll show you how to do that in a bit. So you have all the power of this insane reasoning model right on your computer. And the coolest thing is, even their 14 billion parameter model right here, their Quen version, is on par with O1 Mini, which is incredible because a 14 billion parameter model is generally pretty underwhelming when it's not this one. And usually, it's not even close to something like O1 Mini. And this kind of model, almost anyone can run on their computer. 14 billion parameters, it's, of course, a lot in the grand scheme of things. But relative to so many other models that we see, like Lama70B or Claude Haiku or Claude Sonnet, those models are so much bigger. So this is just incredibly cool. And the other thing to note is that if we go down to the pricing right here, just look at how cheap this is. 55 cents for every million input tokens, and then $2.19 for every million output tokens. And if those numbers don't mean a ton to you, just look at this graph right here. The numbers cannot lie. The benchmarks can be a little bit biased. I'm gonna give it that, of course. But with the pricing here, I mean, this is very clear-cut. It is just so much cheaper to use R1 compared to even O1 Mini. But then also, it has as good a performance as O1, and it is dozens and dozens of times cheaper. Just absolutely incredible. And then compared to other models, like Claude 3.5 Sonnet, it is still cheaper than those even. That's $3 for every million input tokens here. So, such an affordable model. Absolutely incredible. And I show Claude 3.5 Sonnet here just because we're gonna be comparing R1 to this model as a coding assistant in a bit. I'll also have a link in the description to this benchmark. This is one that I haven't seen thrown around on YouTube or the rest of the internet as much. But this is also a very cool benchmark, just going into a bit more detail than the other one on the DeepSeek blog. So, absolutely phenomenal. All right, so let's talk about the different versions of DeepSeek R1 and how you can run most of them on your own computer. So, we're here right now in the OLAMA page for DeepSeek R1. This is one of the platforms where we can run these models ourselves. And I'll show Hugging Face in a little bit as well. So, to install OLAMA, you just go to their homepage here, olama.com. Super easy to install for any operating system. And then you can instantly, within a terminal, like I'll show in a little bit, run any of these models that we're gonna be talking about right here. So, when you go to the DeepSeek R1 page, you can see the tags here for all the different versions of this model. It comes in so many different sizes. And the biggest one here, the 671 billion parameter version, this is the most powerful one that you're going to be using if you're going through the DeepSeek or OpenRouter API. And it is a whopping 400 gigabytes. Super, super big. And this is a Q4 quantization. So, it's not even the biggest version of this model that you could download. But even with this 400 gigabyte version of this size here, it's still way too big to run on your computer unless you're investing tens or hundreds of thousands into your setup. So, generally, you wanna go with some of these smaller versions. But like we saw, even the 14 billion parameter version, which almost anybody could run on their computer, is on par with O1 Mini. It is absolutely insane the kind of power that we can get. And we can download it and run it right here through Olama. And if you scroll down in the Olama page, you can see for all the different sizes of these models, what they are based on. So, like for the 1.5 billion parameter, it's based on Quen. For the 8 billion, it's based on Lama. We see Quen for most of them. And then Lama for the 70 billion as well. And then they got the benchmarks, just like we were seeing in the blog post. And so, what I can do right here, if I want to run the 14 billion parameter version, download it on my computer, I just select that, like I already have selected right here. And then I copy this command. And then assuming I already have Olama installed, I just have to go into a terminal, paste this command, and it'll install it if I'm running this for the first time. Otherwise, it sends me right into the chat right here. So, I can ask it a question like, make me a fast API endpoint. Just something super generic here, because I just want to show what the output looks like. So, it's going to have this thinking tag at the start. So, everything that we see here until the ending thinking tag is not what it's producing as a final response to the user. This is just it thinking to itself. So, if you're building an AI agent, where you only want to display the final output, for example, you would just take the output from the LLM and then remove everything that is within these thinking tags. So, you can see this is the end tag right here. And then everything that we get after that is the final response from the user. So, it's very similar to what we saw on the DeepSeek chat platform. Also very similar to what you'll see with other reasoning models like O1 or QWQ. So, that's running it on Olama. It was that easy to bring it onto my computer. And now I could use this model right within Bolt.DIY, which we're gonna be using later to test R1 as an AI coding assistant. You can also install through HuggingFace as well. I know a lot of people love Olama, but then a lot of other people like using other platforms like LLM Studio and Llama CPP as well. And you can do all of that through HuggingFace. And it actually wasn't the most intuitive for me to find how I could install it for LLM Studio and Llama CPP through HuggingFace here. So, I wanted to show you that really quick. So, if you go to the main R1 page in HuggingFace, which I'll have linked in the description, and you scroll down a little bit here, you'll see all the different distilled versions that we just saw in Olama right here. And so, I can click on the download link to bring me to this page in HuggingFace. So, let me do that. And then before I even click on use this model, you can see right here that for local apps, there's only VLLM. What I can do is click on quantizations. There's 43 models available here. And then the one that I'd recommend selecting here is the one from Unslaught. It doesn't really matter a ton. You can use the LLM Studio one as well. But I'll just click on this one right here. So, it's the version of Quen 14 billion parameter, the one that I just installed with Olama. I'll click on this. And then now if I click on use this model, boom, there we go. We instantly have access to install it for LLM Studio and Llama CPP as well. And you're gonna have a very similar process to installing it for Olama, just like we saw right here running it in the terminal. So, that's how you install it through HuggingFace as well. That can also work with Bolt.DIY. But now it is time to test this out in Bolt.DIY. So, I'm going to take the most powerful version of the model, the 671 billion parameter through the DeepSeq API. You could also use OpenRouter. And I'm gonna build a relatively complex app with this to show how powerful it is and show that it actually kicks Cloud 3.5 Sonnet in the butt. So, we're gonna build the same thing with Cloud 3.5 Sonnet as well and just see that the DeepSeq Reasoner is actually a good amount better. It's so cool. All right, so at this point, we've talked a lot about R1 without actually showing how powerful it is. And that's what I'm gonna do right now with a simple demonstration in Bolt.DIY. So, this is our open source AI coding assistant that is based on Bolt.NEW. So, it's essentially the open source version of Bolt.NEW and you can use pretty much any large language model you could possibly want. So, I can even go through Olama and I can use all of the models that I downloaded, including my distilled versions of R1 that I just showed downloading through HuggingFace and Olama. But in this case, because I wanna use the most powerful version of R1, I'm gonna be going right through the DeepSeq API. And what we're gonna be building today is a chat interface to work with an N8N AI agent that I have built right here for my Automator Live Agent Studio. So, this is kind of a base agent, just a super simple example here of an agent that I can hit through an API endpoint. And we're going to, with a single prompt, I'm gonna one-shot this pretty much. Just build a full front end for us to have conversation and chat history management, talking to that API endpoint in N8N. And we're gonna be comparing it to Cloud 3.5 Sonnet as well, just so that you can see how well R1 does and then how much even other really good LLMs like Sonnet can fall apart when we have more complex requests for apps like this. So, this chat app that I'm gonna build right here with R1, I actually built something very similar in one of the previous videos on my channel, using Lovable. So, generally, when you use a platform like Lovable or Bolt.new, you get a bit better performance than using Bolt.diy because they're optimizing for a single LLM, which is Cloud 3.5 Sonnet. And Sonnet is usually the king of AI coding. And so I created this app that works really, really well. And it has all the integrations with Superbase that I'll show a little bit when I show you the full prompt that we're gonna be using here. But it doesn't look the best, honestly. It has all the functionality here. I can send messages and I can go through the conversation history, but it doesn't look the best. I mean, you can't even distinguish between my messages and the messages from the agent. The toggle doesn't really look the best here and just leaves kind of a blank bar. Things don't look the best. And so what we're about to build with R1 is going to look even better than what we can build with Lovable. And I've never really been able to truly say that with Bolt.diy, but this is the first time that we have a model so impressive with R1 that I can actually say that with confidence. And so this prompt right here, this is what we're gonna be using to build our chat interface to work with our N8N agent. So I'll have a link to this in the description of the video. It's very similar to the prompt we used in my last video to build what we just saw with Lovable. The only things that I changed a little bit is the URL because I'm using an N8N agent. Instead, before I was using something that I had running locally with Python, but the rest of it is generally the same. So I'm telling it how to work with the API for the N8N agent. I'm giving it my Superbase public credential and the project URL. So never ever give sensitive information to an LLM. This is my public key in Superbase. So that's why I'm able to show it here and feed it into the LLM. And then I'm telling it how to understand the conversations through the messages table and then also giving it some features that I want. So I go into this prompt a lot more depth in my other video, but right here, I'm just showing you very quickly what I have. Building something that's actually reasonably complex. Like this is something that a lot of LLMs are going to completely fall apart making, but R1 is going to kick it in the butt. And then even saying here at the end that I want Superbase authentication. So I took all this right here. I'm just going to copy this and then I'm going to go into my bolt.diy. Full screen it again, paste this in. I'm going to take out the header at the top here. And then I'm just going to use this right away for my prompt. So this is going through DeepSeq. You could use OpenRouter as well. And I could run things locally, like I showed earlier with Olama. I just want to use the best model here. So let's go ahead and send this in right away. So I also have some settings selected here. If you go into bolt.diy, go to the settings pane. You can see that was in the bottom left, by the way. I think my face is covering that. But if you go to the features here, I have all of these features enabled. So you can copy my settings if you want. Using the auto selection of a code template is usually a good way to get a better project right out the gate. So I recommend using those settings. But anyway, you can see here that under the hood, it is already starting creating our app. So I'm going to pause and come back once R1 is done creating our full application. So just a minute later, we now have our full application ready for us to try out. And as promised, this is a single shot. So this is the prompt that I have right here that I just showed you in the Google Doc. We've got the template that it chose. And then it just added on a few files, installed all the dependencies, and then ran the application for me. This entire thing only took 9,000 tokens. And so based on the pricing that we saw earlier, this is multitudes less than a single penny. It was so cheap making this entire thing. And I haven't even tested it yet because I'm that confident that with R1, we can build something that works right out the gate. So I'm going to try this with you right now for the very first time. So I'm going to click on this button right here on the top right to open it in a new window so we can get a full screen here. And then I already have a test account that I have created with my Superbase here. So we'll make sure that the authentication is working. And boom, there we go. It is working. And we've got all of the conversation history here. So wow, look at this. This looks so good. I mean, compared to what we saw with Lovable, this is so much nicer. And we can toggle this. It looks really good. We can even see what it looks like if I put it on mobile view. Look at that. This looks so good. And maybe I need to make it a little bit bigger here. There we go. This looks good even on mobile. This is beautiful. And I can start a new conversation here. Well, I guess there isn't a button for that. So maybe I have to add that. I mean, it's not going to be perfect. I can't expect it to be absolutely perfect here. But if I open up a... Let me refresh here because I want to have a blank conversation. I guess I can't do that. So let me just open this back up here. And I'll just say like, hello. Boom. And let's see what we get back. All right. So it's getting a response. Looks like it duplicates the message, which is a little iffy. So there's a couple of things we want to fix. But overall, this is working so well. We instantly got a response from our agent here. So I can say, what can you help me with? And this is updating in real time. And as long as I fix a couple of little glitches here, like this duplicate message and being able to start a new conversation, this is going to be absolutely perfect. So all with a single shot with R1, we were able to build this. Also, really quickly, before we go over to Claude, I just wanted to say that with a couple of messages here, I was able to fix all of the issues that we saw. So it's very easy to not just in one shot, get something really awesome at first with R1, but continue to iterate with it to fix everything that AI coding systems are always going to mess up a little bit here. So let me open up the full screen window here and log back into my test account. And then I'll show you everything that it fixed. So we got all the conversations here on the left-hand side still. Everything still looks good. But we had those duplicates earlier. So now, first of all, I can click on new to get a new conversation. So we were missing that before. And then I can say something like, hello. And we'll see that we don't have the duplicate user message anymore. So that is fixed as well. And there is a logout button. We were missing that before. So I can log out and get back to this screen. So I wanted to not just show you this final perfect result. I want to be very frank that it's not perfect. R1 is not perfect. And I want to be honest with that. But we were still able to do a ton with just one shot. Had a couple of issues that I showed you, but then we were able to fix them very quick. And it's not going to be that easy with Sonnet. So let's dive right back in and try building this with Sonnet as well. So here we are, moment of truth. Let's see how R1 compares in Bolt.DIY to the model that basically every AI coding assistant out there uses. Windsurf, Bolt.new, Lovable. They all use Cloud 3.5 Sonnet. So I have the exact same prompt, working with the same super base, same agent, nothing has changed here. And so we're going to give it a go and see what it can do. And I showed you exactly what R1 gave me in one shot on my very first try. And I did some fixes later, but I specifically showed you what it looked like after one shot. Because that's what we're going to compare it to right now with Cloud 3.5 Sonnet. So I'm going to give it a chance to rip through everything here. I'll pause and come back once we have our app and we'll compare the two. Here we go, Cloud 3.5 Sonnet finish. This is our result after a single shot prompt. And it actually looks decent overall, but it is definitely not as nice looking as R1. And we're missing some things like creating a new conversation and logging out, just like we were initially with R1. So I can give it a chance with a couple more prompts after like I did, but let's test out this one shot first. So I can click through the conversations and it really doesn't look as good. It's still working overall. But yeah, let's try sending a message here and see what we get. So I'll say, hi, it's spinning, getting a response back and nothing happened. It did not work with R1. In our first shot, we were able to communicate with our agent and here in Cloud, it didn't work. And trust me, I did test this a few times beforehand, just to make sure that this isn't an unlucky shot with Cloud. And I can confirm that in general, it does mess up with these kinds of things quite often. And so I can definitely start prompting it here and saying like, hey, you messed this up and all you got to add all the other things that R1 missed as well. But in general, it's very clear that R1 did much, much better. So model is absolutely insane compared to everything else that we see out there. So I know that was just a super quick demonstration of R1, but I hope that you can see how powerful this model is. And it's just so cool that it's open source and that we're not stuck to just using APIs from Anthropic or OpenAI, but we can actually run these models ourselves, use them through platforms like OpenRouter and get access to all this power for super, super cheap. So let me know in the comments what you think. Try out this model yourself and tell me what kind of really cool things you're able to do with it with Bolt.DIY and in AI agents that you build yourself. If you appreciated this content, I would really appreciate a like and a subscribe. And with that, I will see you in the next video.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now