Effortless Voice-to-Text Conversion with Whisper AI
Discover how to convert voice to text using Whisper AI, an OpenAI tool that's free, open source, and supports 96 languages even in noisy environments.
File
好用又免费语音转文本工具 Whisper AI allenlow
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello everyone, I am AL Today I will share how to use AI to convert voice into text What is surprising is that it is even better than humans More accurate It can be used in English and other 96 languages Even in the case of a lot of background noise It can also work normally The best thing is that it is completely free And it's open source Today I will demonstrate how to use it to convert voice into text We will use this AI tool called Whisper Whisper is a company developed by OpenAI You should have heard of this company That's right, they are the well-known main company of Check GPD You can install Whisper on your computer But this will require a computer with a stronger configuration So I suggest you install it on Google Colab This tool is called Google Lab in Chinese If you want to use Google Collab, you must open and log in to Google Drive First, you need to have a Google account If you don't have one, you can register one right away, it's completely free In Google Drive, click on the heart on the left Choose the last box, more Then choose to connect more applications Search for applications here, enter Google Collaboratory This is it, click on it Then click install, click continue, choose your account Ok, now it's done Click confirm, then click finish Now you can close Marketplace Now go back to your Google Drive Click on the new tab on the left Choose the last box, more Now you will see this option Google Collaboratory, click it Ok, now it will open this Google lab Don't think this setting is very complicated Actually very simple The result is also very good Ok, first of all, double click here on the top left Give it a name For example, I put module to text So that you can easily find it the next time you want to use it The next step is to click the Runtime in the menu bar above Select Change Runtime Type Here we need to set the hardware accelerator Select T4 GPU here Then click on Save Next, we need to input some code Ok, this is it I will put it in the description of this video You can copy and paste it from there First, we need to install Whisper from GitHub GitHub has stored all the code related to Whisper Next, we need to install a tool called FFmpeg This tool can allow us to use audio and video files Don't worry, these programs will not be installed on your computer So it won't affect your computer After all done, click on the run button on the left After clicking run, it will start to install Whisper and FFmpeg It takes about 20 seconds to complete the installation Ok, this is the installation Next, click on this folder on the far left Now you can drag the audio file you want to convert to text file or video file to upload Ok, now I have a WAV audio file here Drag it to here You will see this message pop up The uploaded file will be deleted after running the shortcut This is not very important Click OK Now you will see that it is uploading this audio file After completion, it will show that it has been successfully uploaded After you finish uploading, you will see that your audio file is now here Now we are going to convert this audio file to text Click on the plus code above Then we need to enter a command I will also put this command in the description of this video Then this part is the file name of the audio file you uploaded For example, mine is recording.wav Here you have to follow it So we just need to make sure that the name here is the same as the name you upload You can also choose which model you want to use I am using the medium model now You have 5 models to choose from The lowest pair has a tiny model This is the fastest in the most non-occupied space But the accuracy is very low And the highest pair has a large model This space will even use such a large GB It also takes a longer time to convert But the accuracy is the highest So for this part, I think the best one to use is the Medium model After everything is done, click on the Run button next to it OK, after the conversion is completed, you will see this screen now At the bottom here, you can see the version that has been successfully converted from audio And on the left hand side here, if you don't see these updated files Click here to refresh Next, you should see these .txt, .txt and .vtt files .txt files are all the files generated from the hard disk .srt and .vtt are all in the subtitle format So it will contain time points in it If you want to download these files, just click on the three dots next to it Then click Download I will download this .txt and .xrt format file Now let's take a look at the file format of TXT The accuracy of Whisper's conversion is very high It even added these timestamps for you Next is the XRT format You will see that this contains the time point Which is the time when you are saying this sentence OK, if you want to convert another hard drive Just drag the hard drive file here to upload Then go to the code here You just need to change the name of the file and the name of the audio file you uploaded Then click on run, so you can get another converted file We only need to use these simple commands to run such an audio transfer At the same time, you can also refer to other command parameters Click on the add code above, add code, and then enter whisper I will also put it in the description box and click on run. This will show all the parameters that can be modified. For example, you can set where the generated file is saved. You can set it to print the translated text or print the translated text. Language, here you can choose the language. There are many other parameters you can use. If you don't know what these parameters mean, you can scroll down Here will explain the function of each parameter When you close this Google lab All these files will be deleted So I suggest you save all the generated files before you close the lab This is definitely a tool that must be used for editing videos I usually use it to make subtitles for my videos I think it is more accurate than the video recognition subtitles And it can also recognize more languages Finally, if you want to see more related videos Remember to subscribe to this channel See you in the next video

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript