Speaker 1: Hello everyone and welcome to machine learning and AI tutorials. In this machine learning and AI tutorial, we explain how to install and run DeepSeq R1 model locally on a Windows computer. DeepSeq R1 belongs to the class of reasoning models. Reasoning models perform better at complex reasoning problems and tasks compared to the classical large language models. Complex reasoning problems are problems that appear in mathematics, science, and coding. According to the information given on the GitHub page of DeepSeq, the performance of DeepSeq R1 is comparable to the performance of OpenAI 01. However, DeepSeq R1 is released under the MIT license, which is a very liberal license allowing even for commercial use. Over here you can see the graph showing the performance of DeepSeq R1 and comparison of DeepSeq R1 with other models. Also over here you can see a few distilled models that I will explain in the sequel. You can see that DeepSeq R1 has a similar performance compared to OpenAI 01, which is really interesting. So what are the distilled models? Well, to run the full DeepSeq R1 model locally, you will need more than 400 gigabytes of disk space and a significant amount of CPU, GPU, and RAM resources. And this might be prohibitive on the consumer level hardware. However, DeepSeq has shown that it is possible to reuse the size of the original DeepSeq R1 model and at the same time preserve the performance. Of course, not completely. Consequently, DeepSeq has released a number of compressed or how they call it distilled models whose size varies from 1.5 to 70B parameters. To install these models, you will need from 1 to 40 gigabytes of disk space, which is really reasonable. In this tutorial, we will explain how to install and run distilled models of DeepSeq R1. And in the future video tutorial, once I erase everything from my disk, I will also try to install DeepSeq R1 full model. Okay, let's start with installation. In this video tutorial, we are going to use OLAMA to run DeepSeq R1. OLAMA is a very simple to use framework for running large language models locally. So let's start. First of all, go to the OLAMA website. Here's the website. Then click on download to download OLAMA. Then select Windows and finally download for Windows. Save the file in the downloads folder and wait until the file is downloaded. The file I think is around 800 or 700 megabytes. Okay. Once the file is downloaded, we need to execute it. So let's go here and click on the OLAMA file. And over here, it's going to take no more maybe than a minute to install OLAMA. So let's click here, install, and you can see that OLAMA is being installed. Here you need to be patient. And after OLAMA is installed, you will see over here this window OLAMA is running. In fact, OLAMA is running in the background. Now click here, expand here, and you will see this cute icon showing OLAMA. You can also click on logs to view the logs. And over here, if you click on the app, you can actually open this file to see this log over here. You can track what's happening. Okay. Let's now verify that OLAMA is running in another approach. Click here and start command prompt by typing command prompt and clicking here. And over here, type OLAMA. And if OLAMA is properly installed, you should see this response. Good. We can continue to install DeepSeq R1 model. So let's do that. To install DeepSeq, go back to the OLAMA website and over here, search for DeepSeq-R1 and click over here. Let's explain what's written over here. This is the official OLAMA website for downloading different DeepSeq models. And over here, you can see the options. If you click here, you can see all the models that are posted. This model, 671b parameters, and its size is 404 gigabyte. This is the largest model. That is, this is the full model. Another thing to consider is to click here on tags, and over here, you can see additional models. That is, the additional tagged models. There are different models over here. Some of them are even quantized. For example, you can see even this model, 671b FB16, full precision 16-bit model. And you can see how this model is large. It's 1.3 terabytes. So you will need 2 terabytes disk to run this model. Okay, in this video tutorial, we are going to install one of these models over here. That is, we are going to install the distilled model. So, just to be somewhere in between the smallest and the largest model, let's click over here on 14b model. Okay, here it is. Now, after you click over here, you will see that this installation command is generated. I'm not going to directly execute this installation command, since if you directly execute it, Olama will install and immediately run the model. I don't want to do that. Okay, so I'm just going to copy this. Then I'm going to go to the command prompt. And here in the command prompt, I'm going to paste. However, I'm not going to type here run, I'm going to type pool. And in this way, I'm going to just download the model. So let's click on enter and let's hope that everything will be fine. Okay, so what's happening over here? The installation or actually the download process started. You can see how much we need to download, around 9 gigabytes. And you can see the elapsed time. So it's going to take around 2 minutes to download the model. Consequently, you need to be patient and wait. Okay, after the model is downloaded, let's learn how to run the model. First of all, you can type olama list to list all the models. And you can see the model over here. To run the model, you just need to type olama run and the name of the model. So let me copy this name, paste here, and let's see what's happening. Enter. Now, I'm going to open task monitor over here. And I'm going to look into the performance. And let's analyze what's happening over here. So first of all, now the model is being loaded. You can see the memory consumption. Here you can see the CPU. Here you can see the memory consumption. And of course, you can also see GPU over here, GPU consumption, etc. A nice thing over here is to observe that GPU resources are actually being used, which is really nice. Which means that this model and olama are going to use my GPU. So let's see what's happening. Now the model is being loaded and let's ask a question. Who are you? That's the first question. Let's see. Wow, this is super fast. This is super, super fast. How to solve a quadratic equation? Let's type something like this, plus 3x plus 90. Let's see what's happening. Okay, looks really good and bright. It's super fast. And let's see here what's happening with GPU. You can see the GPU resources are being used, which is super, super nice. And as the model is running, you can see everything works as it is. Wow, amazing. This is super fast. This is truly amazing, to be honest. And let's do one more test. This might be a little bit tricky question. How to solve this equation? So let's type it down. Sine x plus cosine x is equal to, for example, 0.45, multiplying x. So let's see how to solve this equation. This is the so-called transcendental equation in mathematics. And let's see, can we solve this equation? Okay, so let's see. Wow, this is very fast. Nice. So even probably doing bisection method, etc. Wow, this is really amazing. Okay, so this is actually using bisection method. Okay, let me close this and let's see what's happening. And let's analyze. I need to find the value of x that satisfies this. First of all, I know that both sides, cosine function, period, and still in between. Seems like there might be multiple solution there. Okay, good. Grabbing both sides, looking for their intersections. So let's write, let me see, write a Python code for implementing a bisection method for solving nonlinear equations. Okay, so let's see how well it will do that. This is actually, and often you'll find this question as an interview question, if you go for some job or something like that. So let's see what's happening. Okay, it's actually writing a Python code. And over here you can see that this model is using my GPU. Amazing. And it's super fast. Okay, and finally let's try to run a little bit, little bit bigger model. So let's exit. You can exit by pressing control D. And then let's go back to here and let's try, for example, 32 billion parameters model. And let's see how fast it is. Let me now copy this command and let's go over here. Let's execute this with olamapool. And let's see what will happen. Now again it's going to take maybe eight, maybe even five minutes to download this model. So let's wait. And let's try to run this model. So let's type olamalist to see the models. And let's run this model over here. olama run. And let's type this. And let's see what happens. Let's look at the GPU. You can see my memory consumption over here. Let's see the GPU consumption. And let's see what happens. Wow. This model, of course, has a larger memory footprint on my GPU, which is as expected. And let's try to run it and see what happens. How to solve a quadratic equation to multiplying X. Or actually, how to solve an equation. Let's do something more interesting. For example, like this. Let's construct a polynomial. X is equal to zero. And let's see. Okay, now we can see that writing is a little bit slower. However, we can see that the model is still working. You can see the GPU consumption. Over here, you can see what's happening. Okay, not bad. So let's see what's happening. Good. This is actually very acceptable. Okay, that's all for today. I hope that you like this video. If you like the videos I'm creating, please press the like and subscribe buttons. And see you in the next video tutorial.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now