Comparing CPU and GPU Versions of Speech Translate for Fast Transcriptions

Convert Your Audio To Text

4.9/5

3722 customer reviews

Explore the speed and efficiency of Speech Translate's CPU and GPU versions for transcribing audio and video files using OpenAI's Whisper technology.

Speech Translate GPU Version for Audio and Video Transcription and Translation Powered by Whisper

Added on 09/06/2024

Speakers

Add new speaker

Speaker 1: I recently did a video on the four free apps that automatically transcribe audio and video files into text or subtitles that are powered by OpenAI's Whisper. Now, I've got this question from Smith, which of these leverage GPUs? In the conclusion, a comparison of speeds and needed computer resources. Subtitle edit uses a lot of CPU would have been nice. Well, I do understand. But I just wanted to introduce the apps available and make sure that people know that these apps are there. Now, I spoke in detail about Speech Translate in one of the videos. But in that video, I use the CPU version of Speech Translate. So the answer to your question is that Speech Translate has a GPU version. And if we actually scroll down to the right up, and I'll leave the link to this section below, you can be able to read that using GPU for Whisper. Note, this process could be handled automatically by running dev setup.py. Now, to use GPU, you first need to uninstall Torch. Then you can go to PyTorch official website to install the correct version of PyTorch with GPU compatibility for your system. So if I scroll back up and click on releases, there are two releases at the time of recording this video. And what we have here is that as for the assets that you can download, you have Speech Translate 1.1.0 CPU version Windows. So this is a zipped file, it's 121 MB when it is zipped. When you extract it, it's about 985 MB, almost a GB. And then we have Speech Translate 1.1.0 GPU version Windows. It's about 1.15 GB when it is zipped. And when you extract, it's about 4.4 GB. So I've gone ahead and downloaded both of them and extracted. So these are the downloads. This is a CPU, this is a GPU. Now, if I minimize, there's a GPU, there's a CPU. If I open each folder, everything looks almost the same. Just some few, maybe differences that you may notice. I don't have time to look at that. But if we scroll down, what we're interested in is testing the GPU. So we go to Speech Translate, this particular application. And if we actually scroll down, we'll see it on this other side as well. This one is a little bit bigger, it's 18 MB. As for this one, it's 17, more or less 18 MB. So let's begin with the GPU version. Double click on it. Obviously, we always get this command line kind of interface. So everything has been set. It's checking for update, checking for update, no update available. And then once that is done, what we want to do is we want to use it to transcribe. It's the GPU version. Let's click on import audio or video. Let's click on video file. Let's import this file. It's about a 5 minute and 52 second file. And if I click on open, it's going to start to automatically transcribe it. Click on open, and you'll see that we have everything set, the time it takes. Let's see how long it takes using the GPU version. For 5 minute 52 second file, it should not be that long, especially using the GPU right here. It takes 19.30 seconds. And the files are here. The SRT file is right here. Can you edit a PDF file in Canva? Maybe my pronunciation should be Canva. But that's not a big issue. It's done a stellar job super fast. For 5 minute 52 second file to be automatically transcribed and subtitled in 19.30 seconds. That is using Speech Translate's GPU version. So that is one thing. And the reason why probably Smith was asking that question is because they want to leverage the GPU because GPUs tend to kind of do the hard labor and are faster when it comes to using Whisper. So let's close out this. That is the Speech Translate GPU version. Let's open up Speech Translate, the CPU version. And what we're looking at is to see how long it's going to take to actually automatically transcribe and subtitle our video using the CPU version. So we can go back to the same transcribe the base. Everything is the same. Import audio video. Let's go to video file, click on the same file, click on open and it begins its business. So so we go here, there is a warning. FP 16 is not supported on CPU using FP 32 instead. So let us see if it is faster if it is slower for 5 minute 52 second file. So it's already at the same time as the GPU and counting. Let's keep going and see how long it's going to take. 28 seconds 29 30. So the GPU version is at least more or less like maybe still fast. Actually, now it's already two times faster. Let's keep going and see what we get. So it's done. If we open the SRT, looks like the same results, but takes a longer time. Let's actually click OK. And then we go to the command line and see it takes about total time. It says 57 seconds, 57.58 seconds, more or less. It takes about three times what the GPU version takes. So if you're looking for an option that uses the GPU that is powered by whisper, I'd recommend Speech Translate 1.1.0 GPU version. If you have the resources over the CPU version, it's a little bit faster. So in answer to that question, Speech Translate leverages the use of GPU. Thanks for asking that question.