Speedy Windows Audio Transcription with Whisper

Convert Your Audio To Text

4.9/5

3722 customer reviews

Discover a fast Windows tool for audio transcription and translation using OpenAI's Whisper, optimized for GPU use. Get started with Whisper Desktop now.

A Faster Windows Audio Transcription and Translation App Using Whisper AI Models and supports GPU

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: In this video, I'll introduce you to a faster Windows only at the time of recording this video, audio transcription and translation tool that is powered by OpenAI's Whisper. And it supports GPU. So if you have a graphic processing unit, this is for you. And I'll leave the link to the GitHub repository for this particular application. I believe it's called Whisper desktop. I don't think there's any other name. And it's a high performance GPU inference of OpenAI's Whisper automatic speech recognition SR model. And if we scroll down, you'll see that this project is a Windows port of the Whisper.cpp implementation, which in turn is a C++ port of OpenAI's Whisper automatic speech recognition model. So the quick start guide, you can download Whisper desktop.zip from the releases section of this repository, unpack the zip and run WindowsDesktop.exe on the first screen, it will ask you to download a model. So the model you can either get it from hugging face by some fellow who has stripped it a little bit to become a little bit better, or get the OpenAI's Python models. So I'll show you how to download it. I've already downloaded it, but I'll take you to the releases section. Actually, the release I want is version 1.61 of last week. So I'll click on that. It comes here, you see that there are minor changes to the desktop app. The DLL is still 1.60. That's okay. Better performance of C++ samples on laptops with two graphics card. Added .m4a extension, file extension to the browse dialog. Text timestamps output option is now available. So these are very simple, small application. So click on that. It's a 403 KB. It's done. Now, let's just go to the folder. It's in the downloads. And I can just say, let's say, new folder. Whisper desktop. I've created this folder so that I can extract the contents of this file here. So here we go. I'll just open this and then bring up my zip file here. Just copy these files, drag and drop them right inside there. So I have the zip file here and I have the extracted files here. So we need to open this particular Whisper desktop app, just double click on it. And once you do that, you're going to get a quick notification. Please provide a Whisper model in GML binary format. Now it's recognizing I already had this because I did this to test. But if you want to download the models, you can click on Hugging Face. Click on that and takes you to this particular Hugging Face option here. And what you need, you need to click on files and versions. And then from the recommendation, I'll just go back to the code. And the creator recommends that you try, I recommend ggml media.bin, the medium version 1.42 GB in size, because that's what I've tested. But it's pretty fast. Now, you can download, let's say we want the medium.bin for all languages. This one, click on that, download it. I've already downloaded the files. So there were some here, there should be somewhere here, this file and this file. So what I'm going to do now is with the app open, it's open right here. I'll do this, I'll go and navigate to where this particular file is. So I'll just click on these three dots. And then go to the downloads. And then select the file that I want. Let's say I want the medium one, click on that, click on open. So the model path is going to change. Model implementation, I want GPU. You also have the hybrid or reference, I want GPU. I'll not change anything else here. But the advanced tab has different computer shaders here, etc. I'll not change anything there. Once I do that, I'll need to click on OK. And that loads the model and then takes you to the next option where you need to transcribe your audio. The model shows you which model you're using, the ggml medium.bin. And then you can choose the option to transcribe audio and you can also select translate. But for this video, we're just doing a transcription. Now, let's do what? Let's come here and say we go with this particular file. So we go with how to quickly summarize YouTube transcripts using YouTube Summary with ChargeBT. Click on open. And that's a video and choose the output format. None, text file, text with timestamps, subgroups of titles, WebVTT. So let's assume we want like the output of this to be let's say our downloads. Click Save. But if you'd like it to be the same as the source, you can just select this. Really well done here. So once you're satisfied, click on transcribe. This is really, really fast. Super fast. The video or the audio from that video is about five minutes. Let's see how long it takes to automatically transcribe that particular audio from that video into WebVTT subtitles. So here we go. I believe it's almost done now. And we'll see the output and how long it's taken to transcribe that particular file should be probably like 40 seconds if I'm not mistaken for a five minute file using the GDML medium language model. So there we go. Let's see. Yeah, it's about 42 seconds, 0.88, 844. So it's doing a really good job. Let's see what the file looks like from the downloads. Here it is. Let's double click on it. It's a WebVTT. And you'll notice that and it's probably the only downside I have this particular application, probably to it's like setting the subtitles at the same time. Subtitles at intervals of five minutes, five seconds, sorry. Five, five, five, looks like five seconds intervals, which doesn't feel right. I wish there was a way to kind of limit that from this particular option. And say you want X number of characters, how many lines per subtitle block, that will be really, really awesome. Also, it would be awesome to have multiple file input. I'm not sure if it supports that. But let's try and select a couple of files. No, it doesn't. It doesn't support multiple audio or video transcription, which that was also available because if you have a GPU, you can take advantage of this because it's super, super fast. But simply put, that's just a quick introduction to this particular Whisper desktop application for Windows. And it's really awesome. And just try it out. See what you can do. You can also use it to live transcribe. And you can do this by going to audio capture etc and all that kind of stuff. But simply put, that's a faster way to automatically transcribe audio and video files using Whisper on Windows for an application that supports GPU. I hope this video is a value to you. Thanks for watching.