Run OpenAI Whisper in Lightning AI: Quick Setup Guide (Full Transcript)

Learn to run OpenAI Whisper with Lightning AI's free GPU hours. Get transcription results quickly with this easy-to-follow guide.

Download Transcript (DOCX)

Speakers

Add new speaker

Speaker 1: Hey guys, in this video I'm going to show you how you can run the OpenAI Whisper speech recognition model inside Lightning AI. This Lightning AI is a bit like Google Go Lamps, but better in my opinion. You can get a free account here and you get 22 hours of free GPU hours monthly. There's a free tier, you get a single GPU, T4 or L4. I guess you use more credits if you use the wired end ones, but the T4 is quite okay for this Whisper use case. So just choose the free tier and then if you log in for example with a Google account, I think it will take a few days before they approve your account. If you use an educational account from an organization, it might get approved quicker. Once you get your account activated, you can log in and in here under studio templates, you can find different ready templates to get you started quickly. For example, if you search for Flux, you will find different ways to run the Flux image generator, for example inside Confi UI. But in this case we want to run Whisper, so let's search for Whisper and let's in this one start the faster Whisper model. And here we have some guidance on running it, but just click on open in studio. And once you click on open in studio, it says duplicating for a moment and then starting studio app. And let's just pause the video for it to get loaded until 100%. So once it's at 100%, you will see here on the right hand side readme.md and here you have the guidance. So using the Streamlit plugin on the right hand sidebar, make sure you have an app that runs and then there is a command which needs to be run. So here on the right hand side, we have the Streamlit data apps, this kind of red crown here. Let's press on that and then let's press on new app. And I don't know why it says port 8502, maybe let's just click that back to 8510, which is the same as here. Let's press on run

Speaker 2: and it's starting up the web UI.

Speaker 1: Okay, we were able to get it started. So now it says here in the UI, we can select the model. So let's just leave it at the default at the large version 3.

Speaker 2: And we can find the logs here about settings. And here we could also

Speaker 1: open it in full screen or there's a public link so we could just access it via browser as well. Let's track an audio file here. So I just found an audio file, a Barack Obama speech

Speaker 2: from some time ago. So I'm dragging it in here. And now it has downloaded or uploaded that file in here.

Speaker 1: And then we can select files to transcribe. We just have this one and then we have a button for transcribing. But here on the right hand side, we can still select the output,

Speaker 3: change that text, use GPU. So let's say transcribe selected files. It started the transcription process. And here on the top, we can see it's using some CPU and now it's using GPU. So it started to work on it. I think here on the right

Speaker 1: hand side, we can see what kind of... We have 24 gigabytes of VRM now currently. Now it's actually already done. So that was quite quick. I think it was like a 16 minute speech.

Speaker 3: And here from the download button, we can get the file.

Speaker 2: And here we have the results.

Speaker 3: It looks pretty good to me. I'm sure it made some mistakes, but this is a really fast way of transcribing audio files if you have a need for it.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file