Step-by-Step Guide: Convert Speech to Text with Whisper AI
Learn how to use Whisper AI via Google Collaboratory to turn spoken language into text with ease, supporting 97 languages for free.
File
Sprache zu Text KOSTENLOS umwandeln - Whisper AI
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hey and welcome to a new AI tutorial from me. Today I'll show you step by step how to turn a spoken language into a text. To convert language into text I use the Whisper AI. This AI is extremely powerful, free of charge and totally easy to use. And the best thing about it is that it works with 97 spoken languages. Let's find out how it works. To be able to use the Whisper AI we use Google Collaboratory. This allows us to execute a code directly in the web browser. For the use of Google Collaboratory we jump to Google Drive. So drive.google.com For Google Drive you need a Google account. So if you don't have one, you have to set up a Google account for free beforehand. Once at Google Drive we click on new. Then to more. And here at the bottom we find more apps. At the top of the list we enter Collaboratory. Here we find the program and we click on install. Then we get a message that Google Collaboratory has connected to Google Drive. Then we click on finish. We can close the window here. Then we click again on new. And under more we now find Google Collaboratory. If we select it, we end up directly in the program. In my opinion, this looks pretty scary. But believe me, it works super easy. First we give our file a name. Here at the top in the first window. We call the whole thing audio to text. So that we can find our file again later. Then we click on runtime in the menu bar. Here we find the option change runtime type. Then a window opens where we can select the hardware accelerator. Here we select GPU or a graphics card. Because this tool runs extremely well on a graphics card. And of course don't forget to save. The next step is to install Whisper AI. To do this, we go to the field where we can type in a code. And here I enter this code. You can of course find this code in the video description. Simply enter with copy and paste. The first line says install us. Whisper from the website named Github. This is the place where the entire code is stored. As soon as we have Whisper, we install something called FFMPEG. This allows us to work with audio and video formats. And don't worry, although I constantly say install. We don't install anything on our computer here. Everything is installed in the Google Collaboratory. Then we click on the run icon. This gives us the command to install Whisper and FFMPEG. When both are installed, we get a green checkmark here. The installation took me 26 seconds. On our left side we find a folder icon. Here we can now drag in an audio file or a video file for the conversion from language to text. In my case, I drag in an mp3 file, which I have already mentioned in another video. Here we see that our file has been successfully uploaded. Now I'm ready to take the text out of this audio file. For this we need a new code window. Click on plus code here and enter the following code to apply Whisper to the file. Then you have to type in the name of your file. In my case it is called audiotext.mp3 And then you can choose the model, so to speak the quality level. I want to use the medium model here. There is also the large model. This takes the longest time to process the files. Below that there are the models small, base and tiny. These models work the fastest, but with poor accuracy. That's why I use the medium model. When we have entered our code and checked our file name again, we click on the run icon. And tada, the language was automatically detected. And here we see our spoken text in the audio file. More files were stored in our folder. If these are not visible for you, just click on update. Here we find an SRT, TXT and VTT file. The TXT file only contains the text of the audio file. SRT and VTT also contain the time stamp, so you know when something was said. To download one of these three files, click on the three dots and then on download. I will download the TXT and SRT file here. Here we see the TXT file. If we read the text roughly, it looks like Whisper AI has done a perfect job. Even with punctuation after a sentence and correct capital and small letters. The same with the SRT file. Here we see the same text, this time additionally with time stamps. I have to make a few small optimizations, but in total a really strong program and a real time saver. I hope you enjoyed the video and see you next time.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript