Speaker 1: The open-source tool I'm about to share with you instantly saved me $99 a year in transcription services and translation services. Welcome back to the channel where we discuss the creative uses of AI and today I'm going to share with you the Whisper web UI. What is Whisper, Bob? What, you got a secret from me? What is it with you and secrets? Whisper is a technology that converts speech to text and if you've ever interacted via a microphone with some of the more recent AI-driven apps, you're probably interfacing with Whisper on some level even if you don't know it. But since the core of its functionality is turning words into text, it makes a great transcription tool. Why would we need transcription tools in the creative AI space? Oh my, I use them all the time. First of all, for brain dumping ideas. If ever I'm pondering any kind of a creative idea, I always go to some transcription services, do an audio brain dump, and then send it in to ChatGPT or something else to make some sense of it. Maybe I'm just riffing on some character dialogue and I don't want to have to type it out. I want to have it come out more organically, but I also still want the option of editing it. Because then maybe I can take those transcriptions, send them back into a text-to-speech editor, and use any character I want. It's a long list of reasons to use a transcription services in this type of creative endeavor. But add the ability to translate into other languages and now you open up a whole new world of possibilities by taking those transcribed transcripts and then sending them through a text-to-speech editor and now you have your ideas out there in multiple languages. And you didn't have to pay for an extra transcription service. Further, if the timing of your text-to-speech is crucial, for example, if you're doing captions for a video, Whisper will also generate an SRT file that you can drag into any video editor. And I'm going to show you how to do all of that. As I said, this is open source, which means there's no charge for the software itself. And if you have a computer that will run this software and the willingness to download and install it yourself, I will leave a link in the description on how you can do that. So there's nothing that you necessarily have to buy in order to use this solution. But if you don't have a computer that will run it, or you're just not ready to clutter up your hard drive with it, this is another great use case for our ongoing sponsor, Mimic PC, which allows you to run these high-end applications on remote computers so you don't have to invest in the technology or the hardware yourself. Let me show you how we get into it on Mimic PC, and then I'll show you the functionality which will apply no matter where you run it. You start by logging into your Mimic PC account. You'll just click on Add a New App. You'll click Whisper Web UI right here. Click Get Started. As always, I choose a large or a large pro machine so I can use the best of the models that they have there without having to worry about my VRAM. If I'm going to be using the tool for more than 30 minutes, I click the automatic extension and then Create and Start. Now having said that, if you've just got something you need to transcribe real quick, it only takes seconds. Once your machine starts up, you can close this right panel over here, which is default in Mimic PC, but we don't need that here. We're not going to be accessing any of the folders right now. So like a lot of these AI apps, if you were to open up some of these advanced parameters or VAD or diarization, you might want to run screaming because you don't know what any of it means. But one of the great things about running things on Mimic PC is right out of the box, the settings are there for you to start playing with it. So I'm not going to mess with any of these advanced parameters because I don't have to. So we have three tabs up here, File, Mic, and Text to Text Translation. So File, as you might guess, allows you to upload an audio file and get a transcription. So let's just do that. So I've got this 28 second clip from my friend Rob. Let me just let you listen to a second of it so you know what's going on here.
Speaker 2: He was 5'0". He was maybe 5'1", and he was about 100 pounds, and he was the 98 pound wrestling champ in the state of Wisconsin. So it goes on like that for about 28 seconds.
Speaker 1: I can just drag that file right here. It uploads. I'm going to go with the defaults. Use the large model. There are many models you can choose from for this. You don't have to worry about the ins and outs of it unless you really want to go learn that stuff. Language in this case, I can use auto detection, but I will go ahead and choose English just to make it easy. But you can see all the languages here that are supported. It is a lot. So I'm going to click on English. The file format refers to what are we going to get on the other end of this. An SRT file is for captioning, which means all the time code information is all going to be embedded. We have the WebVTT. I really can't tell you too much about, but the text file, of course, you'll get a text file output of it. So why don't we take a look at what they both look like. So without changing anything here, I'm just going to click on Generate Subtitle File. Now the first time you do it, it has to initialize the model. So it does take a little bit longer, but once that's done, these conversions generally just take seconds. So there you go. Now we have the output, which was in the SRT format. It was done in 15 seconds, and we have all the time code. He was five foot nothing. He was maybe five one. He was about 100 pounds, and he was the 98-pound wrestling champ in the state of Wisconsin, which is exactly what he said. So we've got that. So let's do it again. Let's do a file format this time of text file. Generate the subtitle file. Loading the audio. Transcribing. Boom. You saw how fast that was. He was five foot nothing. He was maybe five one. He was about 100 pounds. There you go. So now if I wanted to take this and have somebody else say all that stuff, I could go over to maybe Eleven Labs, go to their text-to-speech function, or for that matter, I could use my locally installed voice cloning and conversion software. But in this case, I'll use Eleven Labs. Choose a voice. Generate speech. Gratitude is riches. That sounds too much like him. How about I use me? Let's check out this conversion. He was five foot nothing. He was maybe five one, and he was about 100 pounds, and he was the 98-pound wrestling champ in the state of Wisconsin. So the smallest weight class. He was state champion wrestling. So that's interesting because it's almost like it's using the same cadence of the original audio file, although this has no information about the audio file. That's just a kudos to Eleven Labs, I suppose. One of the powerful features of this tool is that we can take what we input and translate it into several languages for captioning of videos that already exist. So again, let's generate that SRT file one more time. It should only take like half a second. Boom. There it is. We can now download that file. So now we have an SRT file. We now click over to the text-to-text translation tab. We can take that SRT file, drag it right here. For the model, we're not going to use the DeepL API. You need an authorization key for that, and we don't need that. We just click on NLLB. Again, we don't need to change anything, but we do need to choose that the source language is English. Then we just decide which language we want to translate into. Let's say Greek. Again, I'm not going to change any other values. Just click on translate subtitle file. And there you go. I'd read it to you, but it's all Greek to me. So once again, we can download that SRT file and just drag it into our favorite editing software, drop it on a timeline, and look at that. The captioning's all done right down there at exactly the time it's supposed to happen. The third way to get your information up there for transcription is just to record it directly through the web interface via a microphone. So first you choose your microphone source over here, and then you just click record. Again, we're not going to mess with any of these settings unless your native language, the one that you're talking into this, is not English and you want it into English. Then in that case, you would click on translate to English. But for right now, I'm just going to click record. Hi, I'm clicking record to show how simple it is to do something like that. You know, it just says record. It's a button. I got a mouse. I know how to move it. I hovered over it and I pressed the button. That simple. Click stop, and now I can preview this. Hi, I'm clicking record to show how simple it is. All right, that worked. I can also trim it so I can take out the part of the beginning where there is silence and bring this all the way to the end. Click on trim. Now there's no extra space. And then click on generate subtitle file. There you go. Done in 13 seconds. Hi, I'm clicking record to show how simple it is to do something like that. You know, it just says record. That's the SRT file. By the way, even though I said you can close this right panel over here, just know that there's an output folder for just about everything here in Mimic PC where you can find what you've done and download it that way. So it's essentially a free open source transcription service that takes seconds to run. And depending on how much you use something like this could save you a lot of money and subscriptions for transcription services over the course of time. I know it is for me immediately. If these are the types of tools that you like to learn about to help you in your AI creative endeavors, well, why not subscribe to this channel? Because this is the kind of stuff we talk about all the time. If you subscribe now, I will not look for you. I will not
Speaker 3: pursue you. But if you do not, I will look for you. I will find you. And I will.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now