Speaker 1: Just 4 months ago, the O1 preview model shocked the entire AI world with their groundbreaking chain-of-thought ability, something no other model had done before. Now DeepSeek has dropped the R1, a fully open-source model claiming to rival the O1's power and even compete with the Sonnet's 3.5 encoding capability. Big claims, right? So I put the R1 to the test as a coding assistant in Rokeline to see if it's really stack up against the Sonnet 3.5 or if it's just another overhyped release. The results? Let's find out. There is multiple ways to find and use this model for free. The first one actually is really straight and forward. Just go to the chat.deepseek.com and click on Deep Seeking and all you have to do is just creating an account. But as an ABI, there is two ways right now because this model is completely new. The first one is the DeepSeek ABI provider itself, which is really cheap and very easy to use. And the second one is using OpenRouter.ai, which is kind of what I'm going to do for this video. And they have the DeepSeek R1, which is using the DeepSeek ABI provider. I'm pretty sure in a few days, we will have more options instead of using the DeepSeek ABI. We will have like hyperbolic, deep inference, all this kind of other providers. Well, the DeepSeek R1 provider benchmark, it looks impressive. Like they're comparing the OpenAI 01 latest model to the DeepSeek R1, which is like a very, very good result. I'm going to completely ignore it and look to the LiveBench.ai benchmark, which is more honestly trusted benchmark more than the provided one. In the global average, it seemed like the DeepSeek R1 is just behind the 01 model and above the 01 preview model and the Gemini experiment and the Gemini 2 flash thinking experiment model. And the SONET is down here, okay. I'm going to sort it by the coding average. SONET is still holding the state-of-the-art coding model with just one point. In my opinion, this one point, it doesn't matter because if you look to the ADAR leaderboard, you'll find that there is a new actual state-of-the-art model called the DeepSeek R1, dethroning the SONET 3.5. And look at this, the DeepSeek R1 is number two and Cloud 3.5 SONET is number three. And the DeepSeek Shad version three is number four. They're making hamburger sandwich over here with the SONET 3.5 stuck in the middle. Before we start, let's make sure that Rokline is updated. And it seems like I have a new update today, the 3.2, which is changing the Rokline to raw code. And I'm going to talk about it later. So in my Rokline, or the new name of it, raw code, I created a new profile configuration called the DeepSeek R1, which the provider is OpenRouter, give it the API key and selected the DeepSeek R1 model and hit done. I didn't change anything except just creating this new profile. With that, I'm ready to start. I'm going to start nice and easy, asking only to create a terms and condition page UI. This is my prompt. I'm using the DeepSeek R1 from OpenRouter. Create a new page for terms and condition, make it responsive using the theme switcher of this project. Make sure you can switch between the dark and the light moods inside this project and add it inside this terms and condition page. And this is like a very simple request, so we don't have issues. And this is the result that I got eventually. I ran into a couple of errors. For example, the first one, it imported the wrong header, which is very odd. I didn't ask it to import header, but it seemed like it's aware that every page like this needs some sort of header to connect it to the website. And it simply told it, please switch to the header of this component. And it did. Then it got confused, created a wrong way to basically switch the themes. And I told it, please look to the main page of this website to understand how they handle the theme switching and the translation switching also. I have two languages in this app. And surprisingly, it's like understand it very fast. Believe it or not, sometimes Sonnet get this wrong when I was working with it and it created easily. Then I got another error, which is a very normal error to get. It added a new translation to the page that is not existing inside the translation files. So I gave it the error and I told it, we need just to add translation to the Arabic file and the English file. And it went ahead and did this in like a very fast way. And he didn't have to worry about throwing the translation, which is amazing. First of all, a huge improvement, the DeepSeek R1 is amazing. From what I have seen so far, I started simple, but I will get a much complicated task right now. But let me talk about something that I found out. There is something important that I want to talk about before continuing this video is the pricing. The pricing is a very important factor for me as a developer who use large language model every single day when I'm coding. Let's take a look to the state of the art 3.5 Sonnet model. We have for input $3, $1,000,000 input $3, which is already higher than the DeepSeek R1 pricing and $15 for $1,000,000 output. If you add the $4 of caching, read in and for caching and read the cache, which will make it around the $22. This is just $1,000,000. Okay, let's take a look to the DeepSeek R1 and the DeepSeek R1 model pricing is for $1,000,000 input $0.55 and for $1,000,000 output $2 and about $0.20. And if you calculated caching, which is about $0.14 hit and $0.55 caching, if it's a miss, which trust me, it's like, it's nothing, this is like, it's not a high number to worry about. And if you're one of those lucky people that can run DeepSeek R1 locally, there is multiple ways that you can find it. The first one is on Ullama and Ullama is like the go-to for downloading this kind of models. We have the 7B, 7B, which is about four gigabytes and have the 8B about five gigabyte. And there is the 32B, which I have heard that the 32B model is like the O1 mini and there is the 70B model. And there is also the 671 billion parameter model, which is like the biggest one of them. The big chunk is small. And if you go, went ahead to the DeepSeek R1 page in the Hugging Face, you will find that every single one of this variation is actually using a large language model that we know. Like for example, the 70B is using the Lama model and the 32B using the Quain model. And all of them are 100% open source model that you can use commercially. Let's test the R1 in the UI capability. This page will be a good example. It's kind of very basic. It was made with the Haiku model a couple of weeks ago, but let's see if R1 can design a very good Contact Us page, which I think is slightly harder than the last task. And I'm going to add the header, the footer, improve the UI. So let's get started. This is my prompt. I want to improve the look of this page, which is the Contact Us page. It's empty in terms of design. I want to use the header, the footer, and I want the R1 look to the logic of the main page, which have the seam switching and the translation logic. So it can understand how my project work in the background. I'm going to hit enter. All right. I noticed something weird. I don't know if it's like the UI or the model itself. When I give it like a large, slightly large task, this is not a huge task. It hit an error, which is applying the code to the file that it's working on. And it seems like it's stuck at this error. When I fixed it, it seems like it's stuck on the same error. So what I did is breaking this task to smaller task, like adding the footer is a small task, and improve the UI design of this page by adding a certain icon. And I noticed that capability of analyzing what it need to do is far impressive than the Sonnet 3.5. And this is the result that I got. I don't like it, but it's 100% functional. Like it's supporting the dark seam and light seam without me asking. There is nothing broken here, which is kind of good. So the next task, I should like tell it to make a more creative and modern design like I used to do with Sonnet 3.5 and see where this will take us. After some time, the result of the UI got improved, giving me a colorful title. The icon change over here, look, there is more a hover effect, but this form is still the same. I will change it. And I noticed something which is kind of very annoying about this model. The only thing so far is how slow it is when I'm working with it. I usually get the response very fast, but sometimes I have to wait like a half a minute to get a response, which is a lot for me. The latency, which is the average time for provider to send the various token is five and almost six seconds, which is, to be honest, a lot for me. And I noticed something else that the current prompts and the compilation and code is going to be used to train their models, which is something that I didn't notice at first until I hovered by accident on this icon, which is slightly annoying, but I'm not coding something that important, to be honest. But deep six should give us a warning when we're using their API that we're going to use your prompt and your code to improve our models, right? This is the final result that I got, and I honestly like the new design. I mean, the labels is not the same, but this can be fixed if I told it to continue because I hit another error over here. And now I want to look at something else, which is capability of analyzing and providing a good feedback. I will give it this question. What do you think about the server folder and how to improve it? The mode that I am using is called ask, and the model is the DeepSeeker one. The ask mode is kind of the either ask a question about your code capability, but it's in raw client or it's new name, raw code. I'm going to hit enter and see what we will get. And this is the response that I got. It looked to the entire folder and it realized I'm using the MVC battery and with Express GS, which is like the typical battery that you use when you use the Express server. It's kind of locked to everything, the validation layer, the error handling. It wants me to do some sort of improvement. Here it wants me to adopt Zod for validation, which I'm already doing, but it wants me to build it in the schemas directory, centralize the error handling, using the RBAC middleware, which is kind of host middleware to configure the RBAC system, adding the swagger, which is the most important kind of feedback that I got because I don't have swagger documentation yet in this project. The reason is it's like an MVB project that we have to ship by the end of this month to the client on the server to see it and give us feedback, security, hardening, then testing structure. And Prisma is to practice, it wants me to improve the current Prisma connection that I have. And it's not a lot, to be honest, like I expected more, but it seemed like I am all right in terms of the MVC and the logic that I wrote, which is kind of good. But yeah, the feedback that I got is really decent. There is a few things that I feel like I need to add, like the swagger and security hardening, I need to improve the security right now. But that's it. I feel like that's the good response. I found that until the 32-bit billion parameter model, which is about 20 gigabytes in terms of size, you need an RTX 3080. It can be reasonable, you can run it using the GPU, but after that, I really don't know what you need to run this kind of model. My final opinion about this model that it's amazing, I mean like all model was released in the preview mode in September. And this is in six months, we have an open source model that's almost like the OAM model, which is like crazy, like the open source model right now is catching to the closed source model that we have in very short amount of time, which is good for us. But it's kind of bad for the companies, which I honestly don't care. Right now we have a cheaper OAM model, and instead of you buy it on the OpenAI website for $200 subscription, you can get the alternative of it in less than $5 for each 1 million output and input in terms of tokens. The OAM model is really expensive, like it's $60 for 1 million output token and $15 for 1 million input token and caching is $7.5. This is like the OAM model, which is kind of extremely expensive. I know that the OAM is trying to squeeze a lot of money out of people that are using their models. But right now we have an optional open source OAM like model that's very good and very cheap also. But in terms of coding, it's really good if you are right now ready to sacrifice the speed, but you will get a good result with also a very good analyzing process. Yeah, go ahead. Use the R1 instead of the Sonnet 3.5. But the speed of the response is right now the only thing preventing me from completely switching to the DeepSync R1. I will stick yet with the Sonnet 3.5 until I get some sort of improvement of the speed response for the R1, but I'm pretty sure I'm going to use it. In the back end, I didn't test it. I will test it tomorrow in the back end with the new coming video. So that's it for this video. If you found it respecting your time and providing you with a valuable information, please hit the like and subscribe button. It will help my channel a ton. It also will push this video to other people. So finally, thank you for watching and see you in the coming video.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now