Free Audio to Text Conversion with Whisper: A Guide
Learn how to transcribe audio and video using Whisper on Google Collab with troubleshooting tips and model selection guidance. Free ebook included!
File
Convert AudioVideo to Text for FREE with Whisper AI (Updated Tutorial eBook freebie)
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello everyone, and welcome back to my channel, Jennifer Marie. In today's video, I'm going to show you how you can convert audio or video files into text for free using Whisper. I have a popular tutorial on this already, however, this tutorial is going to show you more troubleshooting tips for those of you who had difficulties with the last tutorial. I'm also going to show you how to specify the language and model type that you want to use. In order to use this method, all you need is a free Google account, and you're going to go to Google Drive. So drive.google.com and click here on new. The first thing you need to do is install Google collab. If you haven't already done so, scroll down to more. And if you have installed Google collab or Google collaboratory, you'll see it on the list here. If you haven't simply click connect more apps. And from here where it says search apps, just type in collaboratory and click on that. And you're going to install collaboratory by collaboratory team. Just click on it, select install and follow the process. I've already done this, so I'm going to click the X, go back to new, then more, and then click on Google collaboratory. So what we're going to do is install Whisper and FFmpeg onto Google collaboratory. And that way we do not have to install Whisper on our computer. And this is great for those of you who do not have a super fast computer. So you'll want to start a new notebook. It should open up like this. If not, you can just go file, new notebook. So the first thing we're going to do is hover your mouse where it says untitled one, double click on that and rename the file. So audio sample one, for example, then press enter. Now we have to change the runtime type. So click here on runtime, then select change runtime type and change it from CPU to T4 GPU and press save. So now we're ready to install Whisper and FFmpeg. So Whisper is what we're using for transcribing audio and video files. And FFmpeg is a multimedia framework that will help us recognize these files. What you're going to do is click here on this code cell and then paste this code. You have to copy and paste this code exactly as it is. Exactly. Okay. So you'll have this code in the description below. I've also created a free ebook that you can download completely for free if you sign up to my mailing list. So if you go to transcribe.jennifermarievo.com, again, you can click on the link in the pinned comment or in the description below and click sign up today. If you fill in your name and email address, I'll send you this ebook I created. I'm really proud of it. It has all the steps you need to follow along with this tutorial with pictures and the codes that you can copy and paste. So make sure to get that as well. In the last video I did, I saw people were having troubles installing this code. If you're having troubles, before you press run cell, make sure there are no spaces after or before any of these lines. So make sure there's not a space before the first exclamation marks. You want it to be exactly like this. Okay. No spaces before or after each code line. Then click here on run cell. So this could take a few minutes. Once again, it's not installing on your computer. It's installing temporarily on Google collab. So just leave it. It shouldn't take more than a few minutes. And you'll see that it's completed when there is a green check mark. You can see this only took a minute and 38 seconds to install. And we have a green check mark here and here. So now we're ready to upload our files. To do that, click here on the files icon and you can drag and drop the audio or video file you want to transcribe here or click right here on upload to session storage. Now, another very important tip before you upload your file, make sure to rename it on your computer to something simple. For example, audio one, video two. That's because later on you have to type out the file name and many people have troubles with this process because the file name is complicated and they make a mistake. So just rename your file before you do this to make it simple. You'll then get this warning that pops up. This is just letting you know not to delete the file off your computer because when you upload a file to Google collab, it doesn't remain there forever. It's just here while we're doing this transcription process. So just press okay. So now this is very important. Make sure to wait until your file finishes uploading. You can see here this little icon that's blue. This here is the upload progress. So my file is still uploading, so I cannot start the transcription process until this has completed. So just be patient. So once the file has finished uploading, you'll see it on the side here and now we're ready to run Whisper and that's what's going to transcribe our file. So click here on the plus code icon and down here you're going to paste this code. So control V to paste. Again I have this code and all the codes you need in my free ebook. What you're going to do is replace your file name with your file name exactly as it is. If it has spaces, put spaces. If it doesn't, don't put spaces. So mine is audio file one dot wave. You can see that it popped up here so I know I have typed it correctly. It doesn't always do that, but once again make sure to type the file name exactly as it is with the extension. So you can see here we have model medium. There are different models that you can use. I'm going to talk a little bit more about the models later on, but medium is a good one to use because it's highly accurate but not as slow as the large model. When you're done that, just click here on run cell and you're just going to wait until it finishes. It's going to begin executing and then it will start transcribing down here. So there are five different models available in Whisper. Tiny, base, small, medium, and large. And the tiny model is the least accurate while the large model is the most accurate. However, the larger the model, the more space it takes on the disk. Usually medium works just fine, so we're going to use that one for this file. If you wanted to try a different model, you would replace medium with tiny or base or small, for example. Right now it is automatically detecting the language as English. However, sometimes you may want to specify what language it is, so I'm going to do a demo on that after we finish transcribing this file and downloading the file types. So you can see that took two minutes to finish transcribing and you can see there's a green check here and a green check here. So these different files should pop up in this navigation menu. If they don't, right click and just click refresh to force them to pop up. So here you might want to download the subtitles file .srt, hover your mouse over it, click on more options, then just click download. And that will be a timestamped file. Let's look at that. So you can see here we have the subtitles for this audio file. You might also choose .txt, which will be the transcription without timestamps. So once again, hover your mouse, click on the three dots, click download. And here we have it. Now one thing I noticed was at the beginning they were using punctuation and then closer to the end they stopped. So I actually am not that happy with the medium model for this file. Usually I don't have any issues. So all we'd have to do to try the large model is once again paste this code, type in our file name with the extension, and I'm going to change the model to large this time and then click run cell. And let's see if it gets a better result with the better model. So as you can see here, it looks much better using the large model. It only took four minutes versus two minutes using the medium model. And let's download the text file here just to preview it. And yeah, this is much better. You can see it has proper punctuation and capitalization right till the very end. So I definitely recommend you try the large model if the medium model isn't transcribing it the way you would like. So now I want to show you how you can specify a language of your file. So I'm going to once again click upload to session storage and upload a file that is in Spanish. Once again, make sure you wait until your file has finished uploading. This is just a short file. So now it has finished uploading. So what we're going to do is press code, and you're going to paste this code. Again, this code is in my free ebook. I have instructions on how to specify languages. So there are different language codes depending on the language you want to specify. We have your file name, which we will replace with our file name plus extension. I'm going to put model large for this, or you could just leave it as medium. And then we have language ES. Remember, every character matters with this. Don't erase spaces or add spaces. Copy it exactly as it is. There should be no spaces before the exclamation mark and no spaces at the end. So I am telling Whisper to transcribe this into Spanish using the large model. I just have to click run cell. So once again, Whisper will automatically identify the language, but some people have had difficulties. So it's better for you to tell it exactly what language it is if it's not English. So while we wait for this, I also want to say if you are still having trouble with using this method to transcribe your files, do a test file with a standard extension like .mp3, .wav, .mp4. If you are using very strange audio file extensions that aren't standard, you might have more difficulty with this method. So I would recommend you convert your file into .mp3 or .wav or .mp4 and then do this process. Also, if you have a very large file, maybe two hours long, split it into two different files or four different files and do piece by piece. That will just make the process easier and faster for you. Okay, so you can see it has transcribed this little Spanish sample perfectly. And once again, the files are on this side here and to download it, you would just click here on the file you want to download and then click download. And there you have it. So once again, make sure to download my free ebook. I'm really proud of this ebook. I think it's going to make things much easier for you to be able to transcribe audio and video files for free. You can click on the link in the description below. And if you have any other questions or you want more help with this, let me know in the comment section below. I hope you enjoyed this tutorial and I'll see you all in my next video.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript