Learn Google Colab with Whisper AI Transcription Project
Explore building a no-code transcription project using Whisper AI in Google Colab. Enhance your tech skills with our easy-to-follow guide. Stay tuned!
File
Audio to Text project on Google colab WHISPER AI
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello everyone. This is Prashant your friend and host in iNeuron for your learning journey. So today we are going to work on Google Colab for a small project where we will be understanding about how to use Google Colab along with that we'll be making something very useful. So let's get started. So right now we are in our Google Drive. What we'll be doing is to get started firstly, we will be going to this option where we will be searching for Collaboratory so that we can install Google Colab. So here we go. Oh, yeah, you have to install this. I've already installed that the first option so you can skip if you have already installed. Otherwise, you can also search for the same in Google search and I'll get that. So once you have installed this, what you're supposed to do is just go to new option and click more and then Google Collaboratory. Now what we're going to perform over here is to transcribe an audio into text. So this product you can actually use in your resume as well once you build because this is going to be kind of no code approach where we just two to three lines of code, we will be pushing out entire project. So let's get there. So this is my Google Colab. So what I'll be doing is firstly, I'll rename this to transcribe that is going to be my file. And the extension you can see over here is ipynb that is Python notebook. So once you get over here, make sure that you're going to runtime and you're checking the runtime type. So over here, we'll be using the runtime type as Python three, and the default GPU will be using because GPUs are generally considered very much versatile in case of these tasks. Let's get started with this. Now over here in this, I've shared a code, this code. So I'm just pasting it over here. So technically, what we are going to do is we are going to use a model named as whisper AI. That is a model of open AI. So the link for the same github.com slash open AI slash whisper dot get. So we will be cloning the same repository in our Colab. So we'll be running this, it usually takes around 15 to 20 seconds, depending upon the GPU power. So let's check how much time it takes. It is working, it is cloning. Yes. So now this is done, it took 21 seconds for the same to be completed. Now what we'll be doing is we'll be writing another code for executing the step. But before that, what I've done is I have downloaded a file, I'll just upload that file. So you can also just drag and drop that. So what I'll be doing is I'll just go to here where I'm clicking on file upload. I'll be selecting the audio that is harvard.wav. So over here, I have a file that has a format of .wav. So instead of this, you can also use mp3 format based audios too. So you can just do that and you can refresh the same. So it will reflect over here, the file is being uploaded right now. So let's wait a second. Yes, that has been uploaded. Now what we'll be doing is, can you see over here, the plus code option. So this is to write another code. So I'm clicking on that and the second step is to do this where I'm initiating the process to use whisper model on this file that is harvard.wav and we'll be using the medium model. Why we are using medium model is there are five types of models like large, small and medium and stuff. So with respect to large model, it is pretty much accurate, but at the same time, it is very extensive. At the same time, if you use small model, it gives the output very soon, but the accuracy is very less. So to stand in between, we'll be using the medium model. So what we have done is we have written the code, whisper file name, model and medium. So we'll be running this. Now the file we have uploaded, I downloaded that from internet and it was like around five to 10 seconds of file where there were five to six sentences and yes, so the process has been started. It took around five to six seconds for the same to be processed on a GPU. Because if you remember, we have changed the runtime to T4 GPU. So the 1.42 GB model has been already run and now it might give the answers within seconds. So just wait a second. Come on, come on, come on. Let's see how much time it takes. Yes. So here is the output. So it states the stale smell of old beer lingers. It takes heat to bring out the order, something like that. So don't judge me on this. So what happened is the file that I have downloaded, it was from Google of a conference in Harvard school. So this is what it does. And it's pretty much good. The accuracy is pretty much good. If you listen to the audio, you'll also get, you'll get the audio file in the description. If you want to download, you can just go through that. And along with that, we can see over here, there are timelines too. So even if you are willing to use this in your transcription of YouTube, like some titles, this is going to be your go to place. Now one more additional stuff is we have, if you refresh over here, we get these files with extensions like Jason CSV, DXT and stuff. So what you can do is whenever you are working on a web development project, you can just use this format. Now I'll just open the response. So here you can see it is written in JavaScript the same way. If you click on dot text, it will show the entire text. So this can be used in any kind of project. So if even if you are using on Python or Java or else any language is going to be working for you. So I hope this is very much easy for you. And my main point of thought behind building this project was to give you the confidence that yes, you can also do anything without the knowledge of coding. So hope so you liked it and also don't miss our classes that are coming up very soon on no code AI series. Stay tuned for that and make sure to subscribe and like at the same time. And that's it for today. Thank you so much. Have a good day. Transcribed by https://otter.ai

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript