Speaker 1: Okay, whisper start listening for commands Would you look at that What I really really found interesting about this is that it picks up Sounds okay. Now. I am NOT messing around. I'm not kidding this picks up sounds And there you go guys. There you have it. Whisper running 100% offline. Hello there my gorgeous friends on the internet, welcome back to another video. So today we are going to be talking about Whisper. And essentially Whisper is an open source project by OpenAI and if you want to learn about Whisper I'm going to put the link to the repo down in the description below. what Whisper is it's pretty much a general-purpose speech recognition model and it is trained on a large data set of diverse audio and it's also a like multitasking model that can perform multilingual speeches and this is really an amazing project. Now this project is built in Python you can see here a 100% Python Honestly, I have tried the Python like the Python library and all of that and it sounds amazing Honestly, it really is amazing. But my problem with the Python version is that it's really slow for inferencing actually and Which brings me to whisper CPP shout out to this guy who is working on whisper CPP I'm not sure how I can pronounce his name um jogi jogi jogi i'm sorry if i'm bullying your name buddy thank you so much for ear building um for those of you who use lama there's also lama c.cpp here where you can you know you can easily quantize your large language models here and like pretty easy straightforward and stuff like that but that is not the objective of this video. So I'm going to talk about this project and this is a very interesting project because this is pretty much a part of OpenAI's whisper model in C and sleep C++ and essentially is what I like about this one is it's really really fast. And when I mean fast, it's blazing fast, like no jokes. I have been looking into this project. I am thinking of probably integrating this project into Jarvis for the speech to text recognition, because right now we are using speech recognition library from Python, and it's really, really not that good. So I've been looking forward to probably integrating something like this this is really an amazing project guys you should check this out now a quick overview of what um whisper is like i mentioned earlier you don't need to pay for this this is completely free 100 free by openai of course i know some people use the api version which you can get your transcriptions from the OpenAI servers, but again, you don't need all of this. You don't need to pay to actually do this. So in this video, we are going to be doing this on our computer. Yes, that's right. We're going to be working on that locally, installing it locally, like downloading it locally and running locally and run a test. So we're going You're gonna head straight forward to the installation and we're gonna go to this link right here. I'm gonna put this link down in the description of this video so you can find it easily. And we're gonna head straight forward to the README and if you look at this, this is really really amazing. You can run it on iOS, Android, Linux, WebAssembly, Docker, Raspberry Pi. It's really really amazing. macOS, which is where we're going to be running it on today. We're going to be running on ARM M1 chip, Apple Silicon. And we have a couple of demos here and all of that. Essentially, we are going to be trying it out today. So first things first, quick start. We're going to be cloning the repository. Right. Going to clone the repository here. Git clone. That's right. so we're gonna wait for it to clone the repository and then we're gonna like open the repository and work with it. Now we've just downloaded the repository. I'm gonna hit straightforward to opening in VS Code here. Bear with me. I'm gonna stretch out here a little bit. So once we download the project, we have all of these files that it comes with and don't be intimidated okay this is you don't need to understand everything that's on here because for the sake of this tutorial for the sake of this video or for what you're trying to do you don't actually need to understand all what is going on on here so what we are gonna do is we are gonna follow his steps right here after downloading that we're gonna download the model and essentially what this model is you know for the transcription to take place, they need to be some sort of model. This is like the brain, you know, like, that's actually what does the transcription. So we need to download this model for us to be able to do the transcription. And again, he said here, then download one of the Whisper models and convert to GML format. For example, we're just going to go with the base model here. So we're going to download the base model before downloading the base model. we're gonna okay sorry about that before downloading the base model we have to cd into whisper and then we need to paste the command here and then it's going to download the base model to our computer so while that is downloading i'm going to explain a couple of things here now i was mentioning about the models. So you don't necessarily have to use the base model. Generally, I honestly think the base model or the default small model, English model is really good. Like you don't need the huge model. So here is the documentation. Again, I'm going to put the link to all of this down in the description below where you can check it out by yourself. So here are the available models. You have the tiny model, tiny English model, base model, base English model, small model, until the large model. You can see the sizes here are not actually that huge, you know. And in most cases, I realize that the base model is actually pretty impressive. Like, you don't need to get the large, huge model quantized version. Yeah, that's just pretty much about it so we're still downloading here we're downloading the base model again we are downloading that into our models folder so if you come here um this is it here while it's not yet done downloading this is this is a bean file so it's downloading that to our computer now we just gonna have to wait here i'm gonna pause the video and once it's done downloading I'm gonna play it again. We're even almost done downloading by the way. I'm just gonna pause the video. All right, nice. I don't even know why I paused the video, but there we go. Now after downloading the model has been downloaded and we have it down on our computer right here. So for us to, you know, for simplicity, I'm gonna be running here in the default VS Code terminal for ease so we can all see what we're doing here. Let me just increase this a little bit here. So we're gonna have, we're gonna do two things here, okay? The first thing we're gonna do is we're gonna build, right? So now we have downloaded the model successfully. And right here, we have our model here. It's a bin file, like I mentioned earlier, you can open it. And after downloading that really quick, you just need to build the project with the command make. You hit make and you hit enter and it's gonna build the project, right? You hit that command in the root directory and it's gonna build the project itself. So I'm gonna pause this and when it's done, we're gonna resume. Amazing, amazing. So we are done building. And after building, you're pretty much ready to go out of the box. So all you need to do is, there are two things. There are folders which we're gonna be working with. We're gonna be working with examples. When we go into examples here, we have server. In the server folder here, you can start up, run up a local server, which does the transcribing and returns like the text. Pretty easy, easy peasy, lemon squeezy. And here we have the command here. So this is what we're gonna be working with today. So the reason I wanted to present the commands part really, like the reason I'm trying to focus on this part is because currently we are thinking of integrating Whisper for better text, speech to text into Jarvis. So we're gonna try it out together here. I'm gonna clear this real quick. And before we try that, I'm gonna run the default one, which is what the example that he gives that we can try it out. So we're gonna hit right here, we're gonna say dot main and then tab. So it's transcribing right here. So we can see like log. So what model it's using, and then here we have the transcription here. And what I really like about this is the fact that it gives us timestamps, and this is really, really amazing. Again, you can achieve this with the Python library from OpenAI, but guys, just look at how fast this was. In a couple of seconds, I was able to transcribe an audio. And this is what it says. And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. Now, after doing this, this we know it works, we're gonna head straight forward to the command. So we need to make command to make that available, of course, we're gonna build command. And after building it, we're gonna have it here, right? So if you look here, after running this command, you should have something like this here. Again, guys, I'm not a C++ coder. I'm a Python lover instead. Like I don't do C++, so bear with me here. Now, after running that command, you can pretty much, that's just pretty much about it. you just hit dot slash command and then you hit enter. And that's it. Right now you have the devices here. These are my devices. And you see it's transcribing what I'm saying. Out of the box. If I keep quiet and I don't speak, it's not gonna transcribe. It's just gonna be listening. This is kind of like Google Home, you know, like you put like a wake up word, a wake up word, you know, look at this. I'm gonna stop this real quick. Oopsie, no, not now. So I just wanted to stop that real quick before we try another thing here. And what I want us to try, what I really, really found interesting about this is that it picks up sounds, okay? Now, I am not messing around. I'm not kidding. This picks up sounds. And what I mean by sounds is if you clap, like if you do sounds, it knows what kind of sounds you're doing. So we're gonna try it out real quick. What was the command again? It's up. Bear with me here. Okay, whisper, start listening for commands. Would you look at that? No way.
Speaker 2: Hahaha. No way.
Speaker 1: And there you go guys, there you have it. running a hundred percent offline this is really really amazing and I don't know about you but I see the potential here thank you so much for watching this video I really appreciate you coming back here back with us watching our videos and if you enjoyed watching this video please give it a thumbs up and hit that subscribe button down there please do as it help us grow our channel and it help us reach a lot more people I mean of course the liking help us reach a lot more people now that's it pretty much about it
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now