Save Money Using Open Source AI Locally on Your Device

Convert Your Audio To Text

4.9/5

3726 customer reviews

Discover how to run open-source AI models locally to cut subscription costs and ensure data privacy. Learn installation tips and practical uses.

Save Money - Kill your AI Subscriptions - Run local open source AI - Deepseek R1

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Open Source AI has finally caught up, so it's time to save money and run them locally. So I've got 4 big subscriptions I want to cancel. I'm going to spend a week migrating a lot of these cloud models I use over to a local implementation and I'm going to show you how. It may happen over 3 or 4 videos, so stay tuned. I've got Cloud Sonnet 3.5 on desktop. Also part of that is Sonnet 3.5 on mobile, $18 US per month. I used to run ChatGPT but I switched to Sonnet 3.5 a while ago. For coding I run CursorPro which is $20 per month and I also use Grok Premium for live news data on X which is $8 a month. So in total I could be saving $46 if I can get all this running locally. As you've probably heard DeepSeek R1 has finally come out. It's a strong competitor to OpenAI 01 model and also Sonnet 3.5. It's fully open source and you can run the whole thing locally. So just heading over to deepseek.com you can check out the comparison stats. As you can see it is competitive to other models. Let's switch over to their Twitter. They've actually got a good graph. As you can see here the AI model is great at maths and coding. So I want to test that out in today's video. So in order to run it locally all you have to do is head over to olama.com slash download and download for your operating system the Olama install. A good choice to run with Olama locally is OpenWebUI. It's a way to run the models and actually talk to them through a browser. There's some good documentation at their doc site but basically go to the quick start guide and run it using docker. I recommend you run docker desktop. Once you run the commands you'll see that on docker desktop OpenWebUI will pop up as a server running in the background. Then you go back to Olama and you can actually install the model. So all you need to do here is select the model with the amount of parameters you want to run on your local machine. I'm running an old MacBook Pro M1 Max. I'm able to run 1.5 billion parameters, 7 billion parameters, 8 billion parameters locally pretty easily. However when you get up to the bigger sizes like 14, 32 and 70 it starts to struggle and overheat. But I'm just showing you that old machines can still run this even though it might be a little bit slower response time. Make sure you have enough hard drive space to run it as well. So the small parameter model is 1.1 gig all the way through to 43 gigs for the 70 billion parameter model. So you just select the model you want to run. It'll tell you the command here to run in the terminal. Once you run this command it will download the model and then open up a prompt with it. The 1.5 billion parameter is pretty quick to respond. DeepSeq R1 is a reasoning model. It loves complex problems. So if you throw a complex problem at it, it will go through several responses of iterating through possible answers before it gets the correct answer. You'll see this through the think tags which has an open think and a closed think and then it will spit out the answer. So let's try giving it a harder problem. As you can see here the think tag was quite short for the 1.5 billion parameter model. The response was really quick. It gives a really detailed answer. Basically I did a comparison between other models. Obviously a good example of it comparing a complex set of problems and spitting out an answer. Alright so let's try and use the biggest model on my machine that will work. So that's the 70 billion parameter model. I just grabbed the alarmer run command from there. I've exited 1.5 billion parameter model. Let's enter the 70 billion parameter model here. This takes a little while to load up on an old M1 Pro Max. I'll do some serious testing when I open up the web UI which is a lot easier to read, especially around coding problems. What we'll do is actually get it to run through building a JavaScript game and we can test it out. So this is the 70 billion parameter model. It will spend a lot more time reasoning and thinking through the complex problem and giving out a better quality answer. So here you can see here's a complex math problem. You can see how it's thinking about the problem. First it breaks down the left side of the equation, then it moves over to the right hand side of the equation. So running the 70 billion parameter model on an old laptop is not a good idea. The laptop's almost overheating, I've got a fan blowing on it. It's also been about 10 minutes for the answer to happen. What I'd recommend is you get a GPU machine, gaming machine, to run the 70 billion parameter model at home. Or maybe get the latest M4 MacBook. What I'm going to do is switch down to the smaller model after this and we'll get some better response times. Yay. So the answer's finally here. It's not written in latex so you can barely understand it. So let's get the web UI up and running and we can actually see it properly. We'll run the same test with a lower parameter model. So what I'll do is just scroll up to the top, copy paste the question, and then launch the web UI. Okay, so the smaller parameter model was able to respond more quicker. Still took it two minutes. So here's the answer from the 70 billion parameter model. Okay so we've just compared the 8 billion parameter model answer to the 70 billion parameter model answer. And basically it's getting the same final answer. So a complex maths question is the same for both parameter models. Obviously one takes a lot longer to get to the answer. So with the 8 billion parameter model we can also upload PDFs, so let's do that. So what I'm going to do, because the cutoff was middle of last year, I've found some relevant news that's happened in the last 24 hours, which is basically Donald Trump, the new president of the US, has signed a whole bunch of executive orders. One of them that I'm going to ask the 8 billion parameter DeepSeek R1 model on is the TikTok ban for 75 days. Let's see if it can answer it correctly. So there's no way it should know the answer to any of these questions. So let's upload the PDF, and what I'm going to ask is how many days was the TikTok ban extended by? And let's see if it can answer the question based on the information that's inside the PDF. Okay, so you can really easily see how long the thinking section goes for. It was pretty quick, like 30 seconds to answer on an old MacBook Pro M1 Max 8 billion parameter model running locally, no cloud. The think tags, it's going through the whole document assessing what it's about. TikTok's only mentioned twice in the whole document. So I quickly found that there's 75 day suspension. And after the think tag, it goes into the answer, which is basically the TikTok ban was suspended for 75 days. So that's the correct answer. So yeah, you can upload all your knowledge, PDF, spreadsheets into DeepSeek R1 8 billion parameter model running on an old laptop, have a web interface through open web UI, and then also have the ability to upload documents or knowledge to ask questions of it. So locally, I'm pretty much achieving most of the objectives of ChatGPT Claude of what I'm paying for on my desktop, and replacing that with a locally run model. So that's pretty good. That's a win. There's a few benefits of running it locally, it's a lot less risky. The provider of the model can't spy on what your your questions and answers and knowledge that you upload is, because it's all running locally. When compared to running it in the cloud, where you actually using their deployment and speaking to their website, there's potential for them to spy on what you're saying. Running it locally is a lot more secure. Any PDFs, the spreadsheets that I upload to it, aren't going into the cloud, they're just staying locally on my machine. I'm also not paying monthly fees or a SAS subscription model. So I'm saving a bunch of money there. Some of the bad sides is you basically need a semi powerful machine to be able to run it. My computer is like three years old, so it's getting a bit older now, overheating for the bigger models, so I have to run the smaller ones, which may not be as accurate. So I'm pretty confident I'm able to replace a ChatGPT or Claude Sonnet 3.5. That's only for the desktop. It's going to be a bit of effort to try and replace the app on my mobile phone, which I sometimes use on the go. So let's try and figure that out for the next video. It's great to see that the fully open source and available AI models are out there on Hugging Face, they're out there on Alarma, where you can quickly download and run it locally. That they're actually competitive to models that are cutting edge, like OpenAI 01 model or Claude Sonnet 3.5. So the next few videos I'll be trying to run DeepSeek R1 on my mobile phone. I don't know if I can achieve that. I might have to run a server somewhere to do it, either locally or in my own bare metal cloud provider. The video following that, I will also try and integrate DeepSeek into Cursor. There may be an easy plug-in or extension to do that, so that would be another video. And then the next video, I'll be trying to hook it up with live news data feeds to try and summarize what's happening in the news and on Twitter. So I hope this video has helped you look at other options that you can use instead of paying AI cloud services up to say $18 or even $200 per month for cloud fees. You're able to cut that cost down and take control of your AI and run it locally on your machine at home. Thanks for watching this video, stay tuned for the next one and I'll see you later, bye.