Install and Use Whisper AI for Free Transcriptions
Learn how to set up Whisper AI on your PC for transcribing audio files easily. Utilize this free, lightweight AI tool for text conversions without costly services.
File
FREE OFFLINE Audio to Text Whisper Install Guide OpenAI Whisper ASR
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: What's up, I'm Troubleshoot. Welcome back to another video. In this one, I'll be showing you how to install, set up, and use Whisper AI on your PC. Essentially, you can take any MP3 file and transcribe it into text. It works really well, especially for English, but the best part about it is that it does support many more languages and even translation. Essentially, if you've ever been on YouTube into, say, a random video, and you've clicked the settings button down here, you've got the captions that are automatically generated by YouTube. But on top of this, they're now adding an audio track button that allows you to change the language on videos, which is really useful. Of course, creators will need to upload their videos with multiple different languages. YouTuber Theo Joe mentions it here and later on says that even though YouTube's transcribing is good, it can be better. And that's exactly what Whisper is. It's an incredibly lightweight AI that runs on your GPU in your PC and will allow you to transcribe audio completely for free. There's no need to pay for tokens to use Microsoft or Google or Amazon's transcribing services. You literally just give it an MP3 file or whatever you want in your command line, and out comes some text. Assuming you have a relatively okay PC setup, you really don't need anything fancy for this. Though, of course, having a graphics card does help a lot, and especially if it's an Nvidia one. In the description down below, you'll find a link to an article that I wrote about installing and setting this up. Everything that I cover in this video will be available here if you prefer it in text format. To begin, without further ado, how do we install Whisper and get it working on our PC? Well, there's a few things we need beforehand. The first being Python. This is super easy to install, and you can check to see if you already have it by holding start and pressing R to bring up the run dialog, CMD, and hitting enter to open command prompt, and inside of here, we can type in Python space hyphen capital V and hit enter. Then you should see a response like this telling you what version of Python you have. If you don't have a response here and it says command not recognized, you don't have Python installed at all or properly for that matter. If you see anything other than Python version 9, unfortunately, you'll need to either upgrade or downgrade, but you can install multiple Python versions, and once again, the guide linked down below does explain that. I've got 3.11 running for my main projects, and even though OpenAI's Whisper is designed around 3.99, it can work with a few later versions. I just found that 3.11 was too far ahead, and I wasn't able to install some of the necessary components. Anyways, to download Python 3.9.9, you'll find a link in the description down below and on the aforementioned article. Scrolling down to the bottom, you'll find a bunch of different packages. We'll be downloading the Windows Installer for 64-bit, recommended. Then we'll click on it to open it up, click run, and you should see something similar to this. I've already installed it, but regardless, in the bottom over here, you should see something about adding Python to path. You need to make sure that this is ticked in order for this to work properly if you don't already have a different Python version installed. If this is your second Python version that you're installing, you can simply install it alongside your existing one in a different folder. Just make sure you remember where you install it to. When you've clicked through the installer and Python is successfully installed, you can check with python-v to see what version it is. And if it's not 3.9.9, assuming you have multiple Python versions installed, navigate to the other installations. And what we'll be doing here is renaming python.exe to say Python 3.9.9, for example. That way we can refer to it in the command prompt and it'll pick the right installation. So here in this Python installation, which is the second one on my PC, I've now got Python 3.9.9. Now, assuming that this is a second installation of Python you've done for a different version, we'll need to copy where we are here, hit start, type in path, and open edit the system environment variables. Then environment variables, scroll down at the top, select path, edit, scroll down on the list here, new, and we'll be adding where we have Python installed. For example, see Python 3.9.9, and we'll be adding the scripts folder as well to the very bottom of the list, or at least after your other Python installations. When you've done so, click okay, okay, okay, and restart your command prompt if you have one open. When you've done so, you can run Python 3.9.9 or whatever you renamed it to, hyphen V, and you'll see the correct Python version. Sweet. Now that we have Python correctly installed, let's make sure that we have Nvidia CUDA installed. If you have an Nvidia graphics card, you can run Whisper on just a CPU instead of a GPU. And as far as I understand it doesn't even have to be an Nvidia GPU either, but in the description down below, you'll find a link to the PyTorch website, which is a project used to power this. Scrolling down, we can select stable, then windows, then we'll select pip from the package over here, language Python, and finally compute platform we can leave as 11.6. If you don't have an Nvidia GPU, select CPU here instead, and skip the next step about installing CUDA. To download CUDA 11.6, you'll find another link in the description down below to the Nvidia developer page. Select windows, then your windows version, exe local or network. They both have the same download size, just the network one is a bit smaller. When you open it up, then it downloads all of the files, and you'll be clicking through the installation as you would any other program. It's relatively big. When it's done installing CUDA, we can head back to the PyTorch website and copy this command here. Right click, copy, and now we'll open a command prompt window where we'll paste it in. So we can paste it in and hit enter. But if you have multiple Python versions, you'll need to start with Python, whatever we named it, say 399, space, hyphen, lowercase m, space, and paste in the command as such that we copied. Just rename pip3 to pip, then hit enter, and all of these will be installed as necessary. When we're done installing PyTorch and everything necessary for AI on our PC, the last thing we need to download is FFmpeg to actually turn audio into things that Whisper can process. You'll find a link for this in the description down below. Head across to the windows icon here, just hover over it and select one of these downloads. I'll choose say BTBN here. Now we'll have a whole bunch of downloads. We're looking for Win64, and I usually select the bigger, in this case, GPL. We'll click the zip, open it up, open the folder inside of it, then the bin folder if necessary, and you'll see FFmpeg.exe. What we'll do here is we'll extract it to a folder that we'll add to PATH. You'll already know how to do this if you have multiple Python versions and you followed the steps previously. I'll open up a new file browser, navigate to say C drive, and over here I've made a new folder called PATH. Opening it up, we have nothing in this folder, obviously. Simply drag and drop all of the FFmpeg.exe out of the zip that we just downloaded and into this folder on our C drive. Now that we've extracted it, click at the very top and copy the PATH here, then hit start, type in PATH, and open edit the system environment variables. Once again, environment variables, scroll down here, click PATH, edit, new, and paste it in at the very bottom of the list. Okay, okay, and we can close the file browser and look at our PC once more. Now, if we open up a new command prompt window once more and run FFmpeg, you'll see a response like this, meaning FFmpeg's successfully installed. Sweet. Now we can finally continue with installing Whisper. Once again, refer to the article linked down below for command reference. These are the commands we'll be copying here, so copy it here. All we need to do is paste it into our command prompt and it'll install both setup tools, which is required for Whisper, then it'll install the actual Whisper project itself. Now, once again, if you have multiple Python installs, we'll be running say Python 399, whatever you call it, hyphen m, space, and then the rest of the command as follows, pip install setup tools rust, and of course, the actual Whisper project. So for me, I'll be running the second one that I've referred to here, copy, paste, paste anyway, and when it's done running through, we'll have Whisper successfully installed. Now, all we need to do is simply run Whisper, assuming you can spell Whisper. There we go. We have a whole bunch of text response here about the project, as well as all of the supported languages, which are a ton of them. And of course, some other options. You can run Whisper hyphen h for way more information and help commands. There's tons of things here, but assuming you see something like it's not a recognized internal or external command, make sure that the Python scripts directory is added to your path. Wherever Python's installed, we'll navigate there, Python 399, and then the scripts folder here. Inside of it, you'll see whisper.exe. This is Whisper here, and this is exactly what we've installed. So how do we actually use Whisper? Well, it's really simple. OpenAI has the Whisper project here on their GitHub as just plainly Whisper. Simply scrolling down, they tell you some information about how it works, some installation steps that are somewhat helpful. They're just a bit confusing if you've never done this before. And finally, available models and languages. We can select from these models here that require X amount of VRAM. And of course, if you're running it on your CPU, this is just X amount of normal RAM and the relative speed to the largest model they offer here as large. Now, I've tried all of these for converting MP3s of me speaking and other people speaking to text files, and they all work really well. I started on base and worked my way up. The higher you go, the more nuances it seems to understand. It seems that it also has punctuation and things a bit better the higher up we go. Medium is about as high as I went, and it was more than fine, but they do have large. And in fact, they have a large V2 as well, which works really well. Scrolling down further, they have some information about word error rates. You can see certain languages are much harder to understand. Apparently, Spanish is the easiest to understand for the AI, as well as some command line usage. You'll see the model here. In this case, they suggest medium, but we can choose from tiny, base, small, medium, large, and large V2. So let's actually do this. This is for multiple audio files, but we can run it on one. We can specify a language for it. If it gets a bit confused, you may sometimes need to do this. And of course, if it's a different language, you can translate it back to English, which is great. On top of this, you can also use it in Python code to transcribe MP3 files and things like that. And on top of this, you can also do it live with just audio input, which is crazy. They don't have any information about that here, but as far as I understand, that is something you can do. Anyway, let's go ahead and transcribe a simple MP3 file. So of course, me being a YouTuber, I've got many hours of me speaking recorded. I'll open up a new folder in my downloads. I'll copy, paste in the audio from my last video, for example. And all we need to do is open a command prompt window in this folder. So I'll click up here, type CMD and hit enter to open a command prompt window in my downloads folder. As you can see, then we'll simply just type whisper space, followed by the name of the file. If it has spaces in it, you will need to include quotation marks around it. Hitting tab should cycle through file names in a folder. Now, if I simply hit enter here, you'll see it do some code magic and pulling up my task manager. You'll see some things happening on my GPU here as well. The RAM goes up quite a bit. And if this is your first time using this program, you'll see that it downloads a couple of files, some of them rather large, based on what model you give it. Now, by default, I think it's using the tiny or base model. So things may not be too accurate here, but you've already got an idea of what kind of quality it has. You can hit control C to cancel it at any time. Anyways, this is what we're looking at here. It's picked up my name as tropical shoot, close enough. But besides that, words that it does know, it seems to know very well. If we use hyphen model space, tiny, for example, it'll use the tiny model. Excuse me, that's two hyphens. Model tiny, it'll download as such. Then it'll load into your VRAM or RAM after trying to guess what language it is. And it'll simply translate as you'd expect. Well, there we go. Trouble shoot. Crazy. So right now it's transcribing my file. It's keeping timestamps, which is great, especially if you want to use this somewhere else. There we go. It's now done. And when it's done, you'll see multiple files spat out right next to the actual audio file. The first one we have is text, and this is probably what you're looking for. It's just a huge text dump of everything that was set in this file. If we open, say, the SRT next to it, I'll just drag it into notepad here to skip the open with dialogue. And you can see how this file is laid out. It's the different paragraphs or rather speaking sections. It counts up with the timestamp and what was set. Ultimately, I prefer the VTT format here. It's just a bit better laid out. It's got a time as well as what was said. Sweet. So everything's laid out relative to the video. You can search and jump around and do whatever. If you're transcribing voice notes of lectures and things like that, it's great. Everything is here for you and it's all on your PC. Once again, it really isn't something that requires a huge graphics card, an expensive graphics card, like say, stable diffusion for generating images. It's designed to run light on many devices. It's a great project released open source by OpenAI. They've done tons of great work and this is definitely an example of really good work. I've needed to transcribe a lot of files myself and this has come in really clutch. It's completely free to use compared to other services that you may need to pay for. And it's actually really, really good. If you need any more quality, just pump up the model, use a larger model and things really do improve. You can see here that most of these models here have an English variant for English only and should be a ton more accurate on English content. If you type in hyphen hyphen model tiny, it'll use that one here or any of these. But if we add tiny.en or base.en, it'll only be an English model, which should make it far more accurate with just English. Anyways, that's about it for this super quick guide. Once again, I've been Troubleshoot. Thank you all for watching and I'll see you all next time. Ciao.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript