Creating Real-Time Transcription and Sentiment Analysis with Fast Whisperer
Learn how to set up almost zero latency real-time transcription and sentiment analysis using Fast Whisperer and GPT-4. Explore various use cases and code examples.
File
SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper Python
Added on 09/08/2024
Speakers
add Add new speaker

Speaker 1: In today's video I thought we can take a look at how I created this almost zero latency real-time transcription that you actually can see on the screen here. We're gonna go through some use case ideas I have for this and how you can create this too. So yeah, I think we're just gonna get started. You can also see when I stop this now, we get a log of everything we said. So yeah, perfect. Okay, so before we take a look at the code here, I want to show you one more example, just a use case I thought of. So I have this MrBeast YouTube video here. I'm gonna bring up the terminal. So what I'm gonna do now is I'm gonna take off my headset. I'm gonna put it around my mic here, right? Hopefully that's fine. And yeah, let's start the script and let's fire up the video. Yeah, so you can see, I think that worked out pretty good. And you can still see we are streaming this and we are actually real-time transcribing this. So yeah, just a cool use case. I have more planned for this. So I'm gonna do like music videos, test it out, different stuff. But yeah, pretty cool, right? And you can see when I stop this now, we kind of get the log here too. So yeah, that was just basically a very simple use case. Okay, so before we move on to the next use case, let's take a look at the Python code. So this is kind of built on something called Fast Whisperer. So this is kind of a sped up version of Whisperer from OpenAI. And yeah, we are actually using our GPU here. That's why we can get such low latency. But I am running on a kind of new 4-series NVIDIA. So it's nothing insane, but it's working pretty good. So this is very easy to set up. I'm gonna leave a link to the GitHub here. So just basically just pip install Whisperer and just follow the instructions here. And you can have this set up in no time. If you go back to our code, you can see we have a function that is actually recording from our microphone and creating chunks. And here we can adjust the chunk length by, yeah, we can set like one. I think that's one second. So the shorter this is, the faster it streams in the terminal, right? And Faster Whisperer has, of course, all the models. So we can have small, medium. We can have medium in English. We can have large V3. That's the best one. And we can have auto detect language. I just set mine to English for now, right? And then we come into this true loop here that is actually taking what we are recording. And it's printing it. And it's accumulating this into like a log file, too, that we can actually, when this breaks, we can print the log here. Other than that, it's pretty simple, pretty straightforward. It's not a big script. You can see we are using CUDA cores here from my GPU. And there's a few things you can adjust here to make it even quicker. But I think we're just going to keep it as is for now. And I want to move on to kind of our next use case. Like always, if you want to support me, become a member of the channel, you can follow the link in the description below. I will be putting this up on our community GitHub. You will also get access to the community Discord. That is if you want to support me. Other than that, just like the video. If you like this kind of content, maybe leave a comment or something. But yeah, let's move on to another use case I created for this. OK, so the next example is going to be a real time sentiment analysis. So you can see we have a get chat response function here that uses GPT-4. And we have set the system message to you are an expert sentiment analysis. If you scroll further down here, we have basically all the same. But here we have something that is more of a sliding window because we always want to keep the prompt to a set number of characters. So this is going to be 100 characters. That means that the prompt will always be 100 characters long and it changes over time. You will see how this works when we run it. And we create a simple UI that actually is going to displace the sentiment. And yeah, basically all is the same. You can see we have another prompt here. So this takes the sliding window as an input. And what is the sentiment of the conversation above? So answer only with positive, neutral or negative. And yeah, kind of what this does is that it looks at what is going on in the conversation now and gives a sentiment analysis. So let's just fire up the terminal and you will see how this works in action. OK, so when we start this now, we should get like a pop off of this UI window. So let's place it over here. And when we continue to talk now, we can see that the sentiment is changing. So let's talk about some happy stuff. So yeah, I'm very happy. I'm looking forward to my vacation. I have won a lot of money. So you can see we are changing to positive now. So if we turn this into, yeah, I'm going to a funeral. I'm feeling pretty depressed. I've lost a lot of money. I'm broke. Yeah, you can see it's changing to negative. So this is basically like a real time sentiment analysis. And if we just keep talking now, you can see this is probably going to change over time. I think we're going to go back to neutral now. Yeah, so when we're neutral, it's just going to keep it like this. It gets a bit messy, but I guess, yeah, it works. So yeah, pretty happy with this. And it's a very simple UI and I think it works pretty good. But yeah, nothing else to say. I think this works good. Okay, so the final example is actually going to be a preview of Wednesday's upcoming video. So I'm not going to spoil anything about the code or anything or actually how I'm doing this. But I'm just going to show you how it works because I'm going to work a bit more on it to the upcoming video. But this is basically up the same alley. So if we zoom in a bit here now, we fire up the terminal. This is going to be a bit different, but it kind of takes off the same workflow. So let's start this now. Okay, so basically what is going to happen here now is when I start to talk, you're going to see images start popping up here. So let's start just talking about some red cats and maybe some white flowers. We have some green nature. We have some trees. We have some architecture and we can actually see a blue deer running over the hill. That is going to be very special. And some UFOs and astronauts are actually walking into the scene. The deer is fighting the astronaut. So what is going to happen next? Well, no one knows. You can see there are some aliens coming and landing on Earth. People are scared. People are running away and no one knows what to do. People are very scared. People are screaming. People are running and they are very afraid. Okay, so sorry about that rant. But yeah, you can kind of see here. So what happened here is actually we are creating images from our sliding window prompt. So this goes on in pretty much real time. So the series was kind of this, right? So it's a bit strange, but I'm going to work a bit more on it. And this is also using the faster whisperer. But we also have something else that is kind of mixed into this to make this happen. I want to create some kind of UI to display the images better because this was not a good solution. But I just wanted to leave it as a preview for the upcoming video on Wednesday. Okay, so that was basically what I had for today. I hope you found it interesting. And like I said, if you want to get access to this yourself, just follow the link in the description. You can join my YouTube channel. You will get access to this private GitHub here where I will be uploading this. Other than that, have a lot of fun with this. And I'm going to make some improvements. Like I said, watch out for Wednesday's video. I think it's going to be pretty cool. And yeah, have a great day and I'll see you again soon.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript