Speaker 1: In this video, I'd like to introduce a new speech transcription and translation application that's powered by OpenAI's Whisper. And this application is called SpeechTranslate. And this is just a real time speech transcription and translation application using Whisper, OpenAI and free translation API. The interface is made using Tkinter and the code is written fully in Python. Now, if we scroll down, if you want to download it, you can click on this. And if I click on this, what it does, it brings us to the page where you can see the releases of this application. And this was released four days ago. And you can see the different options available. So it's really, really awesome. And to download and at the time of recording this video, it's available only for Windows. So we've got two versions, the CPU version, it's a little bit smaller. And then we've got the GPU version, that is 1.15 GB. So in this video, I'll be testing out the CPU version. So we can actually minimize this. And then because I've already downloaded it, it's right here. It's a zipped file. And I already have another folder where I want to extract the zip contents. So I'll just double click on it. It's going to open using the free version of WinRAR. And then I'll just drag and drop this particular folder right inside there, and let it extract to its entirety. So that we can actually now access the application itself. So let's give it a couple of seconds so that we can see the good, the bad, the ugly, the whatever you want to call it. And it's done. So what we can do now is we can close this, and then minimize this and then bring Speech Translate here. Just double click. And we've got a bunch of files here. And although it was 121 MB, or there about the download, it extracts to about 1 GB. It should be about, I believe properties, let's see the properties should be about almost 1 GB. So just that just so that you know, what is of importance to us is just one specific file here. And it's called the Speech Translate application right here. And if I just double click on it, it's going to open up a bunch of options. And then open the application interface. So remember, don't close the background, this one, don't close so that so that we can see what is going on in the background. And this is what the Speech Translate application looks like. So we've got a bunch of options right here. Let's maximize this. We have the file where you see everything, the settings that generate, you can detach the transcribed speech window, detach translated speech window, help etc. And then we've got the mode, what do you want to do to transcribe, to translate, to transcribe and translate. This looks really awesome. And then we've got the models for Whisper. And if you hover over this, you'll get a kind of a simple explanation of how OpenAI Whisper utilizes your RAM based on the model that you choose. So normally I'd go for like a base even in subtitle edit, it does the job. And then we've got a couple of translation engines here. You can go for Whisper, Google, LibreTranslate, My Memory Translator. And then this is the option to auto detect the language of your file that is important. So the number one thing I'll let you know is always set the settings. Because this is what I found a little bit awkward with Speech Translate, that if you don't set your settings initially, and you import a file, it just begins translating based on the settings up here. So set what you want to do first, before you make any other move. So we've got all that and then you can actually swap this so that it goes from Indonesian to whatever language that you want. But you can swap that right there. So if you actually decide to choose, you can choose that is if you want to, but it's not really relevant there. And then because this also supports real time automatic transcription, then you can choose your input mic. There are a bunch of options here. So you can see the notes. And then we've got the speaker that you want maybe to be picking or listening to. And then you can export results to a text file etc. And then you can import a file or translate that you want to translate that is audio or video. And then we've got a record PC sound and record from mic. So let's say we want to import a file. Remember, if you want to import files, ensure that this is set initially so that it just doesn't pick any model that you don't want or the mode that you don't want. So let's go for import file. And then let's say we go with something like a video. And then let's say we use this video and then click on open. Once we do that, it starts to automatically transcribe our file. So I wish it was possible to actually just bring in the file, then make your settings and then click on transcribe or even transcribe and translate. That would be much easier and a little bit clearer. So let's give it a couple of seconds. It's almost done. It's really really fast. And there we go. It's done. So if you actually just move this aside, I just want us to see what's in the background. Actually, let's say okay, what's happening in this particular window is that it's working out everything as everything looks like. So it took about 32 seconds to actually transcribe that four minute file right there. I believe it's a four minute file if I'm not mistaken. But it's really doing a quick job on this. It's it's really really awesome. And then we can see what the transcript or the exports look like. And there are two options. We have the SRT file and the text file. And the other thing that I don't like about this is I wish it was possible to kind of like set the configurations for how many characters per line we want. That is for the subtitles, 37 characters per line, 42, whatever, just a section to input a number. And then maybe add word level timings right inside here. What else can we think of? Yeah, and also maybe a break, maybe add punctuations or whatever, to a new line or whatever. Also setting the number of lines we'd like past our subtitle block. So if I just double click on the subtitle, what you'll notice is that it's going from zero seconds to five, five to nine, nine to 16. So you can see there's a very big difference here. So seven seconds. So if it was possible to kind of predetermine based on the characters per line, that would probably help 16 to 24. That's a lot. So I hope you get the gist of this. So that's that. And then we've got our transcript right inside there. It's really awesome. And I believe it picks everything as we want it to pick. But simply put, this is a new application powered by OpenAI's Whisper. It's got a good application and maybe some few changes here and there. An iteration of these, maybe including or making somebody set these settings as the last option and then click maybe a button that says transcribe or even transcribe and translate, or even translate, then that would probably be welcome. Because initially, when I tried this application, I was shocked that it goes once you import a file, it starts transcribing or even transcribing and translating super fast. Also, maybe for somebody who's not into a lot of code or anything or somebody who's skeptical, they may wonder what's going on in the background, it would be best if these would not be really be seen moving around. So everything looks good. But simply put, that's a quick introduction into Speech Translate, a speech transcription and translation application that is powered by OpenAI's Whisper. Thanks for watching this video. I'll see you next time.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now