20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.
Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.
GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.
Speed Up Research, 10% Discount
Ensure Compliance, Secure Confidentiality
Court-Ready Transcriptions
HIPAA-Compliant Accuracy
Boost your revenue
Streamline Your Team’s Communication
We're with you from start to finish, whether you're a first-time user or a long-time client.
Give Support a Call
+1 (831) 222-8398
Get a reply & call within 24 hours
Let's chat about how to work together
Direct line to our Head of Sales for bulk/API inquiries
Question about your orders with GoTranscript?
Ask any general questions about GoTranscript
Interested in working at GoTranscript?
Speaker 1: Hey YouTube, in this video I'm going to show you how you can quickly convert any audio into text using the free open source package in Python called Whisper. I'm going to show I installed it, show an example of how I ran it, and compare it to an existing library. So starting off, you'll probably want to go to the Whisper GitHub repository that we're looking at here, and they give instructions on how you can install it. Now one thing to keep in mind, when you pip install just the name Whisper, it's not going to install the right version. We want to install from this Git repository, so just take this pip install command and run it in your environment that you're running Python. And they also mention here that you need FFmpeg installed. There's some instructions to do it, but I already had that installed on my computer. Now that I have Whisper installed, let's just make some audio that I can test this on. So I'm going to say some idioms. Idioms are usually hard for models to understand, even though this is just speech to text. This will be kind of fun. I would love to be on cloud nine as a one-trick pony that wouldn't hurt a fly. I'd be like a fish out of water and as fit as a fiddle to be under the weather. Let's save this off. I'm going to save it as a wave. They do have instructions for how we could run this just straight from the command line once it's installed. I'm going to show you how to use the Python API, which they show here. So it's really simple. We just import Whisper. Then we're going to create our model, which is we're going to load the model that's called base. And then just using this model object, we run transcribe on our audio file. So I named it idioms. Let's use the wave version. We want this to return the result. Now I noticed when I ran this before, I get this error because of CUDA's half tensor and float tensor. I was able to solve this. So that's something to keep in mind. If it doesn't work for you, you might need to set floating point 16 to fault. And you can see after it's run here, it detected the language already as English. And then this result object has a few different methods in them. But what we want to get inside of this is just the text. We can see that it looks like the result is good. I would love to be on cloud nine as a one-trick pony that wouldn't hurt a fly. I'd be like a fish out of water. It did mess up a little bit, this fish out of water in as fit as a fiddle. Maybe I didn't say it clearly enough. Another thing to know is when you first run this, it's going to have to download the base model. So you might see a progress bar going across and you'll have to download that model. And it says when you run this transcribe, it's actually taking 30 second chunks of your audio file and running predictions on it. Now there's also another approach that you can take, which is a lower level approach where you actually create the model and then you create the audio object and pad or trim this. What this will do is just make sure that this audio chunk is only 30 seconds or it'll pad it with 30 seconds since that's the length the model expects to have as input. Then it's making a log mouse spectrogram. It's detecting the language and we can decode here and provide a lot more options if we wanted to. So if I run this cell, again, get this error, which I now can set in the decoding options FP 16 equals faults. And actually this time it looks like it got everything correct. I'd be like a fish out of water and as fit as a fiddle. So that's it for whisper. I just want to compare it to an existing type of model and a popular library for doing this is the speech recognition library. The way we run the speech recognition library is we import it and then create this recognizer object, which we then can load our audio file with. After that, you could take the recognizer object and there are a few different recognizing methods for that. And we're going to use the Google recognize and let's see what the result is. So it looks like it didn't add any punctuation and the cloud nine is different. I would love to be on cloud nine as a one trick pony that wouldn't hurt a fly. But the one thing to keep in mind is that this is actually using the Google speech recognition API, the whisper library, you actually have the model downloaded and it's yours to use. I do also recommend you take a look at the whisper paper, which was released with this code. They also go into detail about how the model was trained and the architecture that it's used. Whisper does work on a bunch of different languages. The performance they say varies based on the language. So you can go here on the GitHub repo where they have a plot showing which languages actually performs best for the bars here. Smaller is better and larger means it performs worse. So still pretty impressive the number of languages that this model works on.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now