Setting Up Streamlit and Whisper for Automatic Speech Recognition
Learn how to set up a Streamlit and Whisper app for automatic speech recognition, including importing libraries, loading models, and transcribing audio.
File
Building an Audio Transcription App with OpenAI Whisper and Streamlit
Added on 09/06/2024
Speakers
add Add new speaker

Speaker 1: What's up guys? Welcome back to the channel. In this video we're gonna learn how to set up a Streamlit and Whisper app to do automatic speech recognition. So what we're gonna do here first is first we're gonna import Streamlit and and then also import Whisper. Of course, both have to be installed and the setup for Whisper requires a few, requires a few steps, which you guys can do separately and also, let's install, let's show you how to do it. So, here I outlined the setup steps for different machines. You have to have PyTorch installed, Whisper, and then Stimlet. Once you have all that done, what you do is first we import Stimlet as a state, and then we import Whisper, and alright. Now we can start working on our app. So I guess the first thing I'm going to do is set up a title for our page. Let's call it the Whisper app. I already have Streamlit running here on the background so if I so this is what it looks right now okay perfect now let's take a look at let's upload an audio file with Streamlit and I want I wanted to be able to load WAV files MP3 files and M4A files. What we're gonna do is we have to feed that audio file to the whisper model. So first things first we have to load the whisper model. I guess the best thing to do here would be to let's set up a function called low whisper model and then yeah We're going to load the base model, if you don't know what that is, basically Whisper has a few different models like base and tiny. whisper has a few different models like base and tiny etc and I'm using the base model. Okay perfect so now that we have a function called load whisper model I'm gonna cache this function with st.cache and now I'm gonna associate the loading of the model with a Streamlib button. So let's say std aside bar, load whisper model and then we're gonna say if actually it's not FFST How about .button? We're gonna say model equals load whisper model. Perfect and this is what our app looks right now. Okay very nice. Now we can now test this button so I'm gonna come here I'm gonna just click on whisper model so as you can see the model is running here and boom model is loaded perfect so that part is working so now what I want to add to this model is the prediction part Okay, so yeah, so this is what we're going to do. We're going to set up another button and generate, no, transcribe audio, so I'm just here correcting github copilot completions let's see audio file.read uh no that's not how it works no no no no github copilot hasn't worked with whisper yet So now what we're doing is we're going to set up a variable called transcription equal to model dot transcribe right in the audio file The one problem here is that if the button is not loaded, I might run into some problem here, but then it will tell me, look, yeah. So this is how I'm doing this. Audio file is now none. Transcription equals model.transcribe the audio file, and then transcribing the audio. succeeded. So first it's gonna say this and then after it's gonna say ssidbar.success Success Okay, let's see what this looks like and let's see if it works That I would like to SD-DOM, text, transcription, text, perfect, ok, so first I'm going to load the whisper model, perfect, and now, ok so here I already loaded the whisper model, now let's transcribe the audio. Now I need to, ok so the app seems to be working, I need to upload an audio file, yeah of course. So I'm going to upload an audio file, I have an audio file right here. So I need to load an audio file, so I have an audio file right here, perfect, I have my audio file loaded. Let's now transcribe the audio. So I'm getting an error. String API exception. Message not valid. Okay, perfect. Okay, cool. So I tried to do the message with this message thing but actually it was just success. Yeah. So now I try again and he said model is not defined. Okay, so I have to load my model and then run the transcription and still saying model is not defined

Speaker 2: 7. Read. so

Speaker 1: so okay so we're having an issue with this part apparently he's saying that model is not defined when it's actually is defined so I need to figure out why that is I return the loaded model and then I use it right here as extremely partners

Speaker 2: Ajinomoto Chinese sake Soy sauce

Speaker 1: Now, let me draw my shoes. All right, so this is the change I'm going to do, I'm going to set up just instead of yeah it's the model loaded and then yeah okay so let's see if like this it works let's transcribe audio okay so he's getting an error he's saying expected empire got upload the file that makes sense so the other file is not a path so let's figure out how to get the path

Speaker 2: and

Speaker 1: Perfect, okay. The thing that I was missing here was I need to write audiofile.name to get the name of the file, which will have to be inside of the same folder as the script I'm writing this in, because the file uploader in Streamlit doesn't return a file path, but it's okay. So I'm getting the path here with audiofile.name and then running the transcription. Now when we test this, as we can see here, the model is loaded and when I run transcribe the audio, it's now transcribing the audio. When it finishes, it will show a transcription complete here, as you can see, and perfect. I don't like taking lecture notes, but I love it, yeah, that's the audio that I have. all right so that's a perfect first step I guess one cool thing would be to let's improve the how the text output it's shown let's improve how the text output is shown so let's say Azerbaijan MVP Syahir H sent a video to facebook group Let's see, if changing to markdown it will look better, let's test this out, alright perfect. Yeah, as we can see here, the output looks pretty good on the screen. Now I guess the last thing would be to add to this app would be to have a button that can actually read the audio, the original audio file, so I can compare the transcription the original let's take a look at that to do that what am I gonna do I'm just gonna come here and I'm gonna say all right so if st.sidebar.button Play original audio file, so then the audio file st.audio. I guess I can just say this, yeah, maybe with like some text, like I had there saying, play original audio file, I think I would prefer that in the sidebar so let's say sidebar, let's say sidebar, perfect, okay so this is what it looks right now, play original audio file and then I can run perfect this is the original audio and then I can run transcribe audio and it will be transcribing transcribing and then beautiful and now I can run this and guys can't hear but my original audio is playing which is great and this is the this is the models transcription all right if you like this video don't forget to Like and subscribe and see you next time Cheers

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript