Exploring OpenAI Whisper: Install and Test on Ubuntu

Convert Your Audio To Text

4.9/5

3721 customer reviews

Learn to install and test OpenAI's Whisper for transcription and translation using Ubuntu and Python PIP on Windows. Ideal for AI enthusiasts.

Transcribe and Translate Audio with Whisper Using Ubuntu

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hello out there in YouTube land, welcome to the channel and welcome to the video. Today I want to get my feet wet. I have been slacking so much in the AI front, and it's kind of time for me to actually take some, take some steps forward to actually play around with some of these tools. So the first tool that I'm going to play around with is Whisper by OpenAI. Whisper allows you to do audio translation and transcription. So I'm going to install this the hard way. So I'm going to download all the packages using Ubuntu, using WSL on my Windows machine. So hopefully you find this video helpful and let's get into it. To do a sudo apt update and then a sudo apt install python3-pip, and this is just to make sure that my operating system, in this case Ubuntu, has the correct references to repositories where packages resides, and that I'll be able to actually download PIP for Python 3. Again, PIP just being a package manager that allows you to install open source libraries

Speaker 2: for projects.

Speaker 1: Next thing I'm going to need is PyTorch, or it's just called Torch in the PIP library. This is going to be the machine learning model that Whisper is actually built on top of, or it's using in that case. So I'll then go to Python 3, actually I can just call it PIP 3, PIP 3 install Torch.

Speaker 2: All right, so it is done with that.

Speaker 1: I do notice there is an error here, well a warning. There is a script that's used with PyTorch that exists in this directory that's not on my system path. So I want to add that to my path, so I'm going to go ahead and grab this right here. I'm going to do a vim star dot profile. This is going to be a special file that's in my home directory, hence the tilde character there. All right, and I'm just going to drop down to the end of the file, and I'm just going to say export path equals path colon, and then this right here, this path in that case. And I'll save this, and I'm going to exit my, I'm going to start a new session. So my path should be picked up. And there we go. It's at the end of my system path in that case. All right, next up I'm going to install, let me change to my home directory here. I'm then going to install FFmpeg. FFmpeg is what allows you to convert files, file formats. So next we're going to do a sudo apt install FFmpeg.

Speaker 2: All right, perfect. That's been installed.

Speaker 1: Next, the last thing we're going to install is actually the OpenAI, open source software. So we'll do a sudo apt install OpenAI whisper. Did I get that wrong? OpenAI. Oh, this is with pip. So it's actually pip3. Install OpenAI whisper. All right. So now that's been installed. So I should be able to just call directly whisper, dash dash version. Very nice. That's installed. And there are plenty of options for whisper. So you can do dash dash help. All right. And yeah, obviously I'm not going to be able to go through all these. But notice there is a language translation. So the software is actually going to try to detect the language. So you basically give it an audio file. It will basically scrub through it and it'll try to detect the language that it is. But if any reason it's having issue or you need to explicitly set the language, you do have an option for that and these different languages as well. And it has several different outputs. By default, it's going to output all of these. So text files, files for actual captions, JSON, so on and so forth. And you have different models as well. And models just mean more accurate or how much time it's going to take to process. So yeah. So the simplest form is going to be whisper. We're going to give it an audio file. And then we have different options. So let's go ahead and clear this. So the first thing I need to do is actually have an audio file. So I'm going to open up Audacity. And let's go ahead and just record something. All right. In today's video, I am trying to install and use OpenAI's Whisper. I have already installed Python 3 PIP FFmpeg. And I'm using Ubuntu as my operating system so we can see how this works on a Linux machine. I look very forward to testing this out and seeing what it can do. All right. Let's go ahead and export this. All right. So let's go Whisper. And then the name of my file. All right. I have a Whisper test wave file. The default model is small. So I actually don't have to do that if I wanted to do another model. I can do such as tiny, medium, or large. The small model and lower works good for English is what I'm seeing on the Whisper GitHub page. So I'm just going to nothing else. Just Whisper and then the name of my file. And hit enter. All right. We can see that it's using the first 30 seconds to try to automatically detect the language. It found out it was English. I can tell this video was underneath 30 seconds because it does give you these timings right here. So it's just below 24 seconds. And there's a transcription of my audio right there. And it did a really good job as far as I'm concerned. I'd probably maybe add some commas here between the Python 3 pip and FFmpeg. And I do know if I use a different model such as medium, I think it will use that. And could have also been the cadence of my speech as well. But as far as a rough draft of just saving an audio file, just sending the audio file to Whisper and having it output, amazing. If I do an LS, well, LS, sorry. If I just list the contents of my directory, I get these different file formats. Because I didn't specify a specific output. So by default, Whisper is going to output it in all these different formats. I also here can do translation. I'm going to try to do a little bit of Spanish. I am not great, by the way, duolingo. So let me just record a little bit of Spanish and see if it can translate. Hola, como estas? Mi nombre es Trey. Yo necesito un pizza. Muchas gracias. All right. Let's go ahead and say, well, export this as a wave. Let's do Whisper Spanish. And first, we'll just go ahead and do a Whisper. We'll do the Whisper Spanish. We'll go ahead and clear the screen out. Let's see, does it detect? Oh, and it did detect. Hola, como estas? Mi nombre es Trey. Yo necesito un pizza. Muchas gracias. Ah, well played. And now I think it should have a translate option. English. If I'm doing it correct.

Speaker 2: Okay, so I did not do that correct. I'll put task. There we go.

Speaker 1: That's what I was looking for. So not translate. It'll just be task translate. Yeah, task translate. I don't need to specify English.

Speaker 2: All right.

Speaker 1: Thank you very much. Nice. And yeah, so that's just a very simple translation. All right. And we're done. As you can see, that was pretty simple to get set up even manually. There are other ways to do it where you don't have to do it the hard way. I'm a glutton for punishment. So I just really like to have my hands on keyboards and installing tools. But there are other ways to do it, such as Google CoLab or CoLaboratory, where you have a virtual machine and it's very easy to get that set up as well. But I decided to do it this way. That's just what works for me to have a better understanding of these tools and what's required. I hope to take a further step into AI to understanding more about it, understanding the models, why some data sets are larger than others. So I really hope this video was helpful for anyone who managed to go all the way through and go through all my stumbling of words even now. But I hope to make more videos and more content. So please like and subscribe if you found this helpful or if you would like more videos in the future. Feel free to leave a comment. Thank you very much. Take care. Have a great day, great evening, wherever you are in the world. Take care. Peace.