Run Faster Whisper Locally for Quick Transcriptions
Explore the setup of the Faster Whisper model for speedy local transcriptions and a voice chat project leveraging Opus and WebSocket streaming.
File
Open Source Faster Whisper Voice transcription model running locally. Install instructions included
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: In this video, we're going to talk about how we can run the open source Whisper model locally, using the faster Whisper library, which actually does transcription much faster than say open as Whisper API, this is actually part of a larger project, which in which I'm building to a voice chat with Opus, which actually works very well, but it's pretty, it's a pretty complex setup, and installation process. So I wanted to just make a video separately on the local Whisper, the faster Whisper, and then probably tomorrow or the next day, I'll actually make the video on this. But just to show you when we run this two way chat, which is powered by the local Whisper, we see that we loaded the large v2 model, there's a v3 model as well into my VRAM memory in my GPU. Actually, the large v3 works as well, I was just testing different things. Also, there's different ways of actually loading the model with int8 float 16. So you can use CPU if you don't have a GPU, you can actually run it on your CPU as well. Of course, the inference in that case would be slower. But this larger project two way voice chat is going to feature local Whisper model, cloud three Opus, and 11 labs real time WebSocket powered streaming responses. So it's going to work like this. I'm going to say something to the model.

Speaker 2: Hey there, how are you doing? See, we got transcription very quickly.

Speaker 1: We're waiting on Opus. Sorry, I actually ran the wrong file.

Speaker 3: But this is it.

Speaker 1: Let's try again.

Speaker 3: Hi there.

Speaker 1: How are you doing? Transcription super fast. And Opus starts typing.

Speaker 3: I'm doing well. Thanks for asking. Excited to dive into some engaging conversation, and we can actually interrupt the conversation.

Speaker 1: I'm smart.

Speaker 3: I'm always eager to hear you.

Speaker 2: Can you tell me about large language models? Language models are fascinating computational systems that can process, generate and understand

Speaker 3: human language.

Speaker 1: As you can see, the audio starts immediately before the streaming from Opus is completed. So this is what I was trying to accomplish as fast as possible. So this is this works really well. Of course, it does have a few hiccups. But so this is a complex project. I just want to showcase it to you. But underlying one of the powering models, the local, running locally is the Faster Whisper model. So I'm going to talk about that and how you can install it because the installation process do require a few tricks as well. Code files for this project will be available at my Patreon at the Connoisseur level. Also I have two openings for the AI Virtual 3 level. This comes in one on one meetings everyone for one hour. If that is something you were interested in, if you want to get in touch with me one on one or need to help with your project, maybe you can consider this. Also I have lowered the Devon like Opus auto coder code to down to Connoisseur level as well. If there is something you would like to check out, this one uses Opus as an auto coder, but it actually also also executes code using both Docker and local mode. You can watch that video at my YouTube channel. So I have set this code up so that you actually start recording your voice by holding the spacebar down. And when you're done, you just lift it up and all of the transcript happens for you. And then we just print it. Once you have the full transcription, it's a transcript by audio method. You can actually do whatever you like. So let's run and test this. Actually, we are using the V3 model, I believe currently. I hope sometimes it has difficulty loading up into the VRAM, especially if I exited out of the program and got back in because some residual memory still remains. Let's see. We were able to load it. So now we can just talk. Maybe perhaps I'm just going to miss the spacebar and say, I'm testing this for my YouTube viewers. Let's see how this goes. I stop it takes the English and I'm testing this for my YouTube viewers. Let's see how this goes. It actually detects the language automatically. For example, I can say, should compile and pull from say, I detected as French, for example, let's try German, we gate as in Okay, I believe that means how are you? So it's pretty good. It's very fast. That's the beauty of it. This normally would take a few seconds with OpenAI Whisper API, especially with the internet speed. You can actually also run it, luckily, it CPU, I actually never tried that. Let's actually go ahead and try it. I'm curious to see what will happen. Okay, we were able to load the model, but I believe we loaded into memory, actually, and the CPU actually fired up. That's why I wasn't able to unpause to record my audio, but the GPU memory is empty. So let's go ahead and transcribe something. Hi, there. How are you doing? 123. So the recording is stopped. And as you can see, this is a bit slower, and our CPU is gearing up a bit of an older laptop. Yeah, so it's not very fast. In this case. Yeah. Okay. So it has taken seven or eight seconds. So that's not great. If you have a great CPU, perhaps this will be faster, maybe two, three seconds, maybe similar to getting a transcription through an API, but I guess you can still use it, I suppose. Just keep that in mind. So we just tested it with that, and also run it as float 16. So let's talk about the code, and then we'll talk about how to install it. So the requirements for this is faster, Vsper. And the thing is that you're going to need CUDA and QDNN, I believe, which we're going to talk about. There are installation instructions, but for some reason, I found this issue, I had to actually downgrade the CTranslate, FasterVsper actually installs CTranslate 4.0, I believe. So I had to downgrade to be able to use the QDNN the right way, and then we need pi input to input keystrokes, in this case, the spacebar to start recording, and sound device to capture the audio. And then we pretty much define a Vsper transcriber class, you can decide on the model size, there's V3 and V2, here's a sample rate to record audio. And these are the attributes, you can initialize the model, like you said, with float 16, int8 float 16, or CPU, just int8, we declare a global attribute, right, for this class is recording false, then we have an on press, then if when we on press, if we are not recording, then we set is recording to true, we print recording started. And when we release the keyboard, if we are recording, then we set is recording to false and recording stopped. Then recording audio requires NumPy to be initialized, I guess I should put that in the requirements, we do initialize an empty NumPy array, and then the frames per buffer, I believe you can play around with this, but I'm not an expert in audio, so I'm not sure, but this works in our case. Now with keyboard listener, we pretty much if we are recording, then it's the important sound device imported as SD, we are going to start recording, SD dot record frames per buffer, takes in these parameters, and then it waits, and it does some computation here. And this is the infinite while loop, you're continually checking is recording when we release the spacebar, then we're no longer recording, we break out of this, and I rejoin, I believe the entire recorded bits, and then we return to recording. And then we have a save, save, save, we save this to a temporary file. And then we use transcribe audio, which is going to use that temporary file and use the model that transcribe file path, I'm not sure what this beam sizes, but default was set to five. Now we print the detected language with probability. So this transcribe is going to return segments and info, info has many interesting, it has I believe many information in it, I haven't looked into it deeply, but it has definitely the language and also the language probability, which we were printing earlier, it also detects the language. So that is nice. We initialize a full transcription, empty string, and now we loop over the segments, which is returned by the model that transcribe. And now we get the segment that text, and now we print each, each text, and now we append it to full transcription, then we remove that temporary audio file, then we return to full transcription. It is a run method, which actually calls all these methods to record audio, save temp audio and transcribe audio, and then just print information. And when the script is run, we just run the run method. So this is it. In a nutshell, you can use this locally. If you have a severe if you I have a my, my video card has six gigabytes of video RAM and it actually works fine. So if you have at least that you should be okay. And if you don't, you can still run it with the CPU mode. So now we're gonna before we start talking about how we can install it, I just want to say that if you're enjoying my content, you can visit my website at echo hive.live and watch all 250 plus coding videos that I've done, almost always dealing with large language models, building interesting, interesting projects, whichever is sure I find most interesting at the time, exploring many different domains. And if you're a patron, you can find a code download links just by clicking. And if you become a patron, you'll have access to over 200 project files, you can actually download them, modify them, make it your own. So it really is a great convenience. And also you'll be supporting me at the same time. And if you do become a patron, I do appreciate that very much. You can also check out some of my other projects such as auto streamer and code hive. Okay, so let's talk about faster whisper. I'll put the link in the description is as you know, whisper is actually an open source model by open AI and they this repository offers a very fast version of it, which you can implement locally, you can read all about it. It actually talks about installations, installation instructions, you pretty much install this library by just pip installing faster whisper. And then here are some example code pieces you can use. But see what happens is Yeah, here we go. GPU GPU execution requires the following MDL libraries to be installed. So sorry, as a cube loss, and QDNN, I think is both from CUDA. And you can go and follow the official installation instructions, but I am on Windows. And I think the proper installation for Q blast was not available at the time. But there's this other installation instructions. And what I did was, so here it says download the libraries for perfuse. So here is installation instructions for using Docker for Linux, for example. So just read through this, you can you can install it like this, or it says perfuse whisper standalone when, which is actually another interesting library, which you want to check out. It's a standalone executable. But this is a single archive is what I clicked on. And right here, you can actually find downloads for Linux and Windows, I believe I downloaded this one, or this one, I believe. So once you download it, and unzip it, you're going to see this type of DLL files. And you just have to point your systems path variable to wherever these are, or, or you can actually go to Windows and put it under system 32, system 32 folder, I actually put them down here somewhere, so it's there here somewhere. The reason is because system 32 folder is automatically in the path. If not, you would have to go to your environment variables, and click on this environment variable, and it's going to open to a window. I don't want to open it since it has my API keys. But at the bottom portion, there's a path, you just have to point, you just have to add to that path, the folder which contains this files that we were talking about these DLL files, and then you'll be good to go. And another thing is that when you when you pip install the password whisper, you will most likely going to have to downgrade the C translate 32040. So what's what might happen is that if you install faster whisper, and then try to install this, and usually speaking, it might it may say that the requirements are the satisfied and not installed. So you may have to do pip, like pip install, and the name of the library, like the version along with the version number, and dash dash upgrade, I believe. So this will force it to actually downgrade it to the version number that is necessary. I found this at their issue right here, and I put the link here, if you wanted to go there and check it out. This is where I actually realized cc translate 4.0 might not work with you. So therefore, c translate 3.24 zero works best. Also the link to this kubelast and qdnf is here as well to download link. So this is it this if you can get this working, you'll have super fast local transcription, you can use this to control all sorts of stuff built new cool user experiences such as I'm trying to do with this two way chat. So this is about it. I hope you enjoyed this. I'll be making the video on this to a voice chat, which works really well. It's an exciting implementation, hopefully, very soon. Thank you for watching, and I'll see you in the next video.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript