Streamlit & Whisper: Build Speech Recognition Apps
Learn to set up Streamlit and Whisper for automatic speech recognition, including installing dependencies and creating user-friendly app features.
File
Building an Audio Transcription App with OpenAI Whisper and Streamlit
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: What's up guys? Welcome back to the channel. In this video we're gonna learn how to set up a Streamlit and Whisper app to do automatic speech recognition. So what we're gonna do here first is first we're gonna import Streamlit and and then also import Whisper. Of course, both have to be installed and the setup for Whisper requires a few, requires a few steps, which you guys can do separately and also, let's install, let's try that. So, here I outlined the setup steps for different machines. You have to have PyTorch installed, Whisper, and then Streamlit. Once you have all that done, what you do is first we import Streamlit as a state, and then we import Whisper. Now we can start working on our app. So I guess the first thing I'm going to do is set up a title for our page, let's call it the whisper app and I already have Streamlit running here on the background so if I so this is what it looks right now okay perfect now let's take a look at let's upload an audio file with Streamlit and I want I wanted to be able to load WAV files, MP3 files and M4A files. What we're gonna do is we have to feed that audio file to the Whisper model. So first things first we have to load the Whisper model. I guess the best thing to do here would be to let's set up a function called low Whisper model and then yeah And here we're going to load the base model, if you don't know what that is. Basically Whisper has a few different models like base and tiny. whisper has a few different models like base and tiny etc and I'm using the base model. Okay perfect so now that we have a function called load whisper model I'm gonna cache this function with st.cache and now I'm gonna associate the loading of the model with a Streamlib button. So let's say std aside bar, load whisper model and then we're gonna say if actually it's not FFST How about dot button? We're gonna say model equals load whisper model. Perfect and this is what our app looks right now. Okay very nice. Now we can now test this button so I'm gonna come here I'm gonna just click the whisper model so as you can see the model is running here and boom the model is loaded perfect so that part is working so now what I want to add to this model is the prediction part Okay, so yeah, so this is what we're going to do. We're going to set up another button and generate, no, transcribe audio, so I'm just here correcting github copilot completions let's see audio file.read no that's not how it works no no no no github copilot hasn't worked with whisper yet so So now what we're doing is we're going to set up a variable called transcription equal to model dot transcribe, right, in the audio file. And the one problem here is that if the button is not loaded, I might run into some problem here, but then it will tell me, look, yeah, so this is how I'm doing this. Audio file is now none, transcription equals model.transcribe the audio file, and then transcribing the audio. Transcription succeeded. so yeah first it's gonna say this and then after it's gonna say ssidbar.success Okay, let's see what this looks like and let's see if it works. I would like to STDOT text transcription text perfect okay so first I'm gonna load the Whisper model perfect and now okay so yeah I already loaded the Whisper model now let's transcribe the audio now I need to okay so the app seems to be working I need to upload an audio file yeah of course so I'm going to upload an audio file have another file right here So I need to load an audio file, so I have an audio file right here, perfect, I have my audio file loaded. Let's now transcribe the audio. So I'm getting an error. String API exception message is not valid. Okay perfect. Okay cool. So I tried to do the message with this message thing but actually it was just success. Yeah. So now I try again and he said model is not defined. Okay so I have to load my model and then run the transcription and still saying model is not defined

Speaker 2: interesting

Speaker 1: no whisper model and transcribe audio

Speaker 2: After smoothing the surface, use a scraper to make the dumping horns, from the sanding direction. I'm actually excited about the rest of this adventure.

Speaker 1: Okay, so we're having an issue with this part, apparently he's saying the model is not defined when it actually is defined. So I need to figure out why that is. I return the loaded module and then I use it right here

Speaker 2: Uh,

Speaker 1: maybe But yes.

Speaker 2: All done, let me know if you try this. a pinch of salt 50 g of melted butter 1 pinch of cinnamon

Speaker 1: a tablespoon of coffee melted chocolate soak thewwwwllder coffee in SOME hot water All right, so this is the change I'm going to do, I'm going to set up this instead of Let's see if like this it works, let's see, transcribe audio, okay so he's getting an error, he's saying expected in PyArray, gotta upload the file, that makes sense, so the other file is not a path. So let's figure out how to get the path.

Speaker 2: Refrine the dough at room temperature for at least 30 minutes I don't know if you can see it, but I'm going to add a little bit of water to make it a little bit thicker. I'm going to add a little bit of water to make it a little bit thicker.

Speaker 1: Perfect, okay. Okay. The thing that I was missing here was I need to write audiofile.name to get the name of the file which will have to be inside of the same folder as the script I'm writing this and because Streamlit, the file uploader in Streamlit doesn't return a file path but it's okay. So I'm getting the path here with audiophile.name and then running the transcription. Now when we test this as we can see here the model is loaded and when I run transcribe the audio it's now transcribing the audio when it finishes it will show a transcription complete here as you can See, and perfect. I don't like taking lecture notes, but I love it too much. Yeah, that's the audio that I have. All right, so that's a perfect first step. I guess one cool thing would be to, let's improve how the text output is shown. Let's improve how the text output is shown. So let's say... let's see if changing to markdown it will look better let's test this out All right, perfect, yeah, as we can see here, the output looks pretty good on the screen. Now I guess the last thing would be to add to this app would be to have a button that can actually read the audio, the original audio file, so I can compare the transcription to the original. Let's take a look at that. To do that, what am I going to do? I'm just gonna come here and I'm gonna say, alright, so if st.sidebar.button original audio file so then the audio file st.audio I guess I can just say this, yeah, maybe with like some text, like I had there saying play original audio file, I think I would prefer that in the sidebar so let's say sidebar, let's say sidebar, perfect, okay so this is what it looks right now, play original audio file and then I can run perfect this is the original audio and then I can run transcribe audio and it will be transcribing transcribing and then beautiful and now I can run this and guys can't hear but my original audio is playing which is great and this is the this is the models transcription all right if you like this video don't forget to Like and subscribe and see you next time Cheers

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript