Exploring DeepSeek R1: A Detailed First Look

Convert Your Audio To Text

4.9/5

3720 customer reviews

Dive into DeepSeek R1 model's features, pricing, and initial trials. Learn about its unique capabilities, practical applications, and initial test impressions.

DeepSeek-R1 Is Challenging OpenAI - How Good Is It TESTED

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: We just had a pretty interesting model release, DeepSeek R1, so I had a look into this, and it looks pretty interesting. It is an open weights model, basically you can call it open source. It's a pretty large model, so not many people will be able to run this locally. But other than that, a pretty cool model. I've been looking into it, kind of see what we can do. I signed up for the API, I've been testing it just, yeah, not much, but a bit. And what I really enjoy about it so far is kind of the, we can take a peek into the reasoning tokens. I had a lot of fun with that. So we're going to do some runs with that and kind of look at the reasoning tokens. I just came up with my own problems. We might do some coding with it, we'll see. I don't have any good tests for that at the moment, so that might be in a new video. But we will make some quick tests in this video. So they have a pretty interesting GitHub, if you want to read more about it. They have basically summed up everything they did. There are some strange things about it too, like I saw a lot of people ask it, what model are you, and they get back this, I'm Claude from Anthropic, I'm OpenAI model, something like that. But I have not tried that, so you can see total parameters, 671 billion, so yeah. It's a mixture of expert, I think it's 37B, something like that. I'm not that up to date. They also have put up some evaluation results. I don't really care too much about this. I kind of like a bit to look at the coding benchmarks. Like Codeforces, LiveBench, I think they have SWE verified here. And they have done a pretty good results here in these benchmarks. It's kind of hard to tell if these are real or not, I can't really. But I've been trying out the model and it is pretty fun. So the big difference is the price, I think. It's like X amount times cheaper than the O1 model. Yeah, they have like a comparison here. So you can see DeepSeek R1 kind of comes in down here. So 2.19 dollars per 1 million tokens. And we have like O1 is like 60. So yeah, that's a big price difference. For that, it's pretty cool. But yeah, I've just tested it, so I can't really tell about the performance too much. But I think we want to head over to Cursor. Before, let's take a look at the setup here. So I just signed up for the API key here on platformdeepseek.com. Just to test it out. So if you look at the API guides here. Unfortunately, they don't support function calling and JSON output yet. So it means we can't really make any like agentic systems with this. With like tool calling and stuff. But we can do some fun tests, I guess. I really want to have some function calling before I can kind of do some interesting stuff. At least that's what I think. But they have provided some streaming examples here. So the documentation is fine for this model. Yeah, I just copy kind of the documentation here. I put this onto Cursor and I just added my DeepSeek API key. Like I mentioned in previous videos. I think it has become very hard to kind of evaluate how you think a model will perform. By just running it on like day one or two. I have to try it over like a few weeks. Like I have very extensive experience using like Cloud 3.5. And it's still my go-to. Sometimes I do like an open A1 call if I'm stuck on 3.5. Just to get some different angles. I just feed sometimes the A1 response into the 3.5 response. And I mix it up. But like a brand new model like DeepSeek Reasoner here. I just think it's very hard to do like a day one. Yeah, I don't know what you call it. Test benchmark. Give any impression. But I thought we can do something. Just some simple stuff. Just for fun. Why don't we create a front end. And I thought we can just feed the first response into like a second loop. To improve it. And then we can check out if we thought it was improved. By sending it two times through the loop. So let's start with a prompt that could be pretty straightforward. So the prompt is going to be write an app in pure HTML and CSS. Where the user can upload a PDF. The app will extract the URLs in the PDF. And display them in a clickable list. So I just want to see if we can do this. Hopefully we don't need a lot of styling. So I'm going to run this. So let's do python main.py. And I'm just going to open up a new cursor where we can run this from. Okay, so we kind of went through the full reasoning tokens. We got the code here. So I just went ahead and copied the code. I just want to see like the final features here. Clean model interface. Upload PDFs. Clickable lists. Okay, limitations. Process text based URLs. Yeah, that's fine. Save the code. Open it. Upload PDF. Great. That's pretty cool. And we go through the code again. Because this is the second reasoning step. So what I did. You can see the app uses PDF.js. Library loaded from CDN. For PDF processing. Work entirely in the browser. No server side processing required. URL. And happens locally on the user's device. Perfect. That's exactly what I wanted. So that's pretty impressive. Let's pop this up. So here's the code. Pasted it in here. And now I'm just going to open it. Okay, so here we got it. Sorry for burning your eyes. So select the PDF file to begin. So I'm going to add my agent PDF here. Boom. Perfect. So we found 12 URLs. Let's see if this works. So let's open some of them. Yeah, that works. Okay, that's a cool start. I kind of like that. Is there anything else we can expand upon this? Because this was pretty neat. Kind of impressed by that. It's not the hardest challenge. But it's a very neat app. Just to have if you just want to grab some URLs from a PDF paper or something like that. So that was pretty impressive. I want to do one more thing regarding this. So it could be pretty cool. Okay, so for the second run we're going to do the same prompt in the first run. And on the second run we're going to use an f-string. Feed in the provided solution. Expand the solution to click the extracted URLs from archive. I don't know how you pronounce that. So the app should first make a list from the paper we upload. It's going to go to those PDFs and download them in the CVD. I think I'm going to download the PRD from the URLs in the CVD. And then extract the URLs from the downloaded PDFs. And display them in a clickable list. Do all this in a structured format. That is easy to follow. So the idea is that we first open up a PDF. We find the URLs in this. We open up more PDFs. Find the URLs in this. And so on and so on. So let's see. I don't think this is going to work. But that's for fun. Let's run it again. I was pretty impressed by the first app anyway. So this is just a bonus. So yeah. Just going to let this run. Take you back when we have the code. And look at some of the reasoning behind this. If it works. Okay. So we got the response back from the second run. We have a code here. So I'm going to grab this. But if we look at the key features and structure here. They have some concerns. They are aware that the course could cause some errors here. Course restrictions may block some of the PDF downloads. So it kind of says on already that this might be an issue. But I copied the code here. And let's paste it in. And save this. And let's open this here now. Yeah. This was one. App two. So I'm going to open up the same file. We extract the first links. But if we click here now. You can see error processing. And if you go to the console. You can see we have this course error. So I'm going to try to fix this. So let me just copy this. Let's try to fix this with cursor here. To actually see if the code was somewhat good here. Okay. So after going a bit back and forward with cursor here. We have to solve it by creating a server. And creating a public index.html. So we're going to run a server that we can handle this. So now I think the app should work. So let's open up our localhost 3000 here. Let's grab our PDF. So I have this AI agent PDF. The first thing we're going to do. Look to this PDF. File all the arcxiv links in this PDF. So you can see here now in downloads. We downloaded this PDF. Infrastructure for AI agents. And now we can start clicking. So if we click on the first PDF. We search through that. Find all the links in that PDF. We also download this. So now we have this PDF. And then we can go on this rabbit hole. We can open this PDF. That it had no links. That had many links. So now we can just go down the rabbit hole. And yeah. Click our way through a bunch of different. I would say linked PDFs. This is for IDEs for AI systems. This was for frontier AI regulation. So we kind of made an app that kind of goes through one PDF. In the next we can search other sources. And we can go down the rabbit hole here. It's not very descriptive. We could have changed the title and stuff. But that's the basic idea of the app. So I thought that was pretty cool. And yeah. I think DeepSeek did a good initial code. But of course it was hard to handle when we stated we only wanted HTML. So I just used Claude here to set up the server. That we can run this chain. So that was pretty cool. I think next. I just wanted to finish this off by doing a text example. Looking a bit more at the reasoning tokens. Than we did in the first example. So I come up with a test I wanted to do. It's kind of this vague world situation. What you want to call it. I'm not quite sure. But let's head over to DeepSeek again. And test out my test. So I've been testing it out this morning. And I just wanted to run some strange queries. To see if we can get some reasoning. To find answers that are kind of in between the lines. So the first question I had here was. I walk down the street towards my house. After buying some blue paint. For the new upstairs room. Also the weather is nice. Almost too warm. I then get a message on my phone. Go to the hospital now. What is happening? So this is kind of vague right. But I wanted to see if it can kind of take these things. And kind of guess. So my idea was that. I wanted to see if it can guess that. Since we have gone buying some new blue paint. For the new upstairs room. Upstairs with two s's. And it could kind of refer that. They are expecting a baby. So I wanted to see if the model should guess that. And I throw in some kind of strange text here. So the weather is nice. Almost too warm. That's a distraction. So I wanted to see how it handled this query. Okay so let's run this now. So you can see when we run this. We set this up to stream. So it looks pretty good. The speed of the streaming from the model is kind of nice. So what I want to see now. Is just look at a few lines of these reasoning tokens. That was what I found most interesting reading. And I had some very fun episodes. Because it was pretty funny. So that wasn't too long. I guess it was kind of long. But I want to go through some of the answers here. And see if it can pick up on. What we meant by this line of questioning. So it kind of starts off. First I need to break down the elements provided. The person is returning home. Has some blue paint. Renovating or decorating. The weather being warm could be a clue. Might be related to something else. The urgent message to the hospital. Is a critical part here. Okay so I'm going to read a bit here. To see if we find any link to the family having a baby. Okay so it goes through here. Emergency involving someone they know. Health issues. Unrelated emergency. And here it comes down to. Connection between the paint and the emergency. Maybe the blue paint is a clue. Yeah that's correct. Could it be toxic? If they left the paint somewhere unsafe. Like a child or a pet got into it. And was poisoned. But the message is received while they are walking home. So they haven't arrived yet. That was good. Because I was thinking. The paint isn't home yet. Alternatively someone else at home. Used the paint improperly. Leading to fumes or a fire hazard. Or maybe the room being renovated. Has a structural issue that caused the accident. So it doesn't kind of make the connection yet. Between the blue paint. That is kind of renovating a room. For the new baby. So I want to see if we get back to that later. In the text here. Okay so after going through this. I don't see anything anywhere. Where it links to this. Might have something to do with an upcoming. Family extension. But the most plausible conclusion is. A loved one was involved in a home renovation project. Likely suffered an accident or a medical emergency. Prompting someone to alert. The blue paint may have created a hazardous situation. Chemical exposure or something like that. So not what I was looking for. But what we want to do now. Is introduce like a. I want to do like a second. Reasoning part here. So like I said we're just going to use an f-string. Feed in the first reasoning run. Rethink the reasoning. And give possible other scenarios. That might be the reason for the message. So I just want to see if this changes anything. So I'm going to run this again now. And I want to see on the. We're going to check the first run. And of course the second run too. If we can see something about. A new baby coming into the family. Okay so here it has a pretty interesting line. Wait. Could the paint be a metaphor. Or a clue for something else. Like blue. Indicating sadness. Or emergency services blue lights. Not sure. Maybe the color isn't important. But the activity is. Carrying paint might. Where at the store. Which could be an accident. But that's stretching it. Another thought. The sender might be in labor. If the users partner is pregnant. The message could be along into labor. But why hospital now. But why hospital now. Unless it's an emergency. During delivery. Okay we are on to something here. That is pretty interesting. That was my idea right. Or a child. So I want to see if it picks up again. On the. The sender might be in labor. Because that is the answer I was looking for. Okay so that's pretty cool. Okay so after going over a few answers. I think we are on to something here. So first let's list out the elements again. Blue paint. Upstairs rooms. Warm weather. Hospital message. Maybe I need to consider other angles. Could the paint color matter. Blue paint. Maybe it's a specific type. Like for a nursery. Okay good. If someone is pregnant. The baby is coming. They might be rushed to the hospital. But the user is bringing paint home. Perhaps preparing the nursery. So the message could be that the partner went into labor. That's a possibility. The warm weather is not directly related. But the heat might be induced labor. Not sure but possible. Yes. That is pretty cool. So let's see if the final answer. Kind of goes back to this. And picks this option. As the most likely scenario. Okay so after going over it again. The most likely scenario. Is that a family member or friend. Was injured while preparing the paint. To the upstairs rooms. So we didn't kind of arrive where I wanted to be. It was when it mentioned carbon monoxide poisoning. If the house had a faulty heater. And the weather caused a malfunction. So it is pretty fun to read these reasoning tokens. Because it's tried to think about everything. And again it mentions the nursery. Could be about a baby arriving. But the paint color being blue. Might indicate a boys nursery. So maybe the baby is coming early. And the partner is at the hospital. So I was a bit disappointed. It didn't go for this solution. But I don't know the training data. That could be pretty much anything. But it did bring it up. And I thought that was quite impressive. Anyway. But I think that's enough for my. I don't have really any benchmark test. I want to do with this DeepSeek R1. So yeah that was my first impression of DeepSeek R1. Pretty cool model. Looking forward to trying it out a bit more. When it comes to Cursor. We are going to test it out there. Hope you enjoyed it. And I'll see you again soon.