Transcribe Audio with OpenAI Whisper Effortlessly

Convert Your Audio To Text

4.9/5

3723 customer reviews

Learn how to transcribe audio to text easily and for free using OpenAI Whisper. Simplify technical term recognition with initial prompts.

Python Audio-to-text OpenAI Whisper Accuracy Test

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hello guys. Welcome to Python and Machine Learning Daily. Today I want to demonstrate to you how easy it is to transcribe audio into text with OpenAI Whisper library and although it comes from OpenAI it's actually free and open source. You don't have to pay for that like for ChatGPT or other services. It's actually free on GitHub OpenAI Whisper and they do have Speech-to-Text as API, the OpenAI platform but they also released that Whisper back in 2022 here and it's widely adopted in Python community and it's very easy to use. So all you need to do is pip install OpenAI Whisper and then all you need to write is four lines of code that's it to transcribe mp3 file into text. So I've tested it out on a few of my videos and it was almost perfect even though English is not my native language. The only problem was technical terms and I will show you how to overcome that as well. So in this video I already installed OpenAI Whisper behind the scenes and also in requirements.txt there's certify library required and also Whisper is based on ffmpeg that you need to install on your system. That's what I did on my Macbook with brew install and then that's it. I took mp3 file from one of my YouTube shorts so it's one minute long and let's see how quick it is. Four seconds. There's a warning that FP16 is not supported and you can suppress that warning by adding FP16 false. It's about algorithm for deep learning and of course false should be uppercase. If you're interested you can google out FP16 versus FP32 but the point here is that it did provide the text so one minute video was transcribed in four seconds on my Macbook Pro with m3 processor and this is the actual text. So it transcribes not only the words but the full sentences with things like NVIDIA uppercase, CSV uppercase so you could almost publish the same text as a blog with just a few minor changes. The only problem with such transcription is of course OpenAI doesn't know what is your technical topic and if you are transcribing some mp3 about pandas for example you may want to have pandas uppercased or for example I'll work with Laravel framework for PHP for web and whisper transcribed it as Laravel. So how to fine-tune or train that model additionally to recognize some technical terms and transcribe it correctly? It's actually very easy. All you need to do in most cases it's not guaranteed it would actually work for 100% of cases but you can provide initial prompt. So initial prompt and then you provide just the text the intro for the video introducing those terms exactly as they should be pronounced and written. For example in this video we will talk about pandas and data frames or data frame and then you provide that as additional parameter initial prompt equals that initial prompt because also in the original transcription it said data frame which is totally fine as two different correct English words but in the context of pandas that should be one word. So if we repeat that transcription with initial prompt the result is this. I sliced it into sentences so pandas is uppercase and data frame is exactly as I have put it in the initial prompt. So yeah this is how easy it is and free to transcribe audio mp3 into text with OpenAI Whisper library. If you want more tips and tool recommendations like this one subscribe to the channel. I will keep shooting videos on YouTube. That's it for this time and see you guys in other videos.