Innovative Hack Transcribes Lecture Videos for Easy Study and Search

Convert Your Audio To Text

4.9/5

3720 customer reviews

Discover a hack that converts lecture videos into searchable text files using Google Cloud's speech-to-text API, enhancing study efficiency.

WinHacks NoNotes

Added on 09/07/2024

Speakers

Add new speaker

Speaker 1: Hello, in hacks, no notes is a hack that transcribes videos. Inspiration behind this hack allows students to record their class lectures and generate a text file that can be used later to study search content within their lectures. Traditionally, you cannot search or, as we know, control F lecture videos. With this hack, you're able to pinpoint texts. So that was a little video I had to record in order to demo our hack. I'm not going to play the rest of it because it was the first time I was recording. And to be honest, I got a little better over recording more times. I was very boring in that video. So what I want to show is how our hack works. So that video is stored in this folder right there. So in order for our hack to work, the Google API speech to text requires an audio file. So I'm going to go ahead and show you guys a command that extracts the audio file from the video. We found a package called FFmpeg, which is used for audio and video handling. Now, if I were to go to a shell and run the command FFmpeg and put in the input file and then designate an output file, I'll go ahead and convert it to an audio file. For the sake of this demo, that process takes too long. So I'm not going to show it extracting the audio file. I'm going to have it right here. The next step is that once you've generated an audio file, you have to upload it to the Google Cloud bucket for longer audio files. So anything longer than a minute needs to be on the bucket in order for the speech to text API to work. So I'm going to show you guys how to push the audio file into the bucket. And let me show you the bucket. This is a Google Cloud console over here. I'm going to refresh my bucket to demonstrate there's nothing inside of it. I'm going to go ahead and push that audio file into that bucket. So I go back over to my bucket right now. As you can see, that audio file is right there. And then it's going to generate something called a URI, which is pretty much like a URL for that specific file. With this URI, I'm going to use the speech to text API on the Google Cloud API to transcribe that audio file. So this is where the really cool part is. I'm going to delete this last code, I'll explain in a second what that did. So what's happening right now is I've sent over a request to the Google speech to text API to transcribe the audio file in my bucket. So the little bit that I just deleted in the end was just rerouting the output of the API to a file called transcribe results.txt. So this is what the transcription says. Hello, when hacks, no notes is a hack that transcribes video inspiration. The inspiration behind this, blah, blah, blah. It was pretty much what I said in this video. It isn't 100% accurate. It does come with a confidence level. So the confidence level right here is about 96% accuracy or confidence, which you can see is extremely accurate. It's incredibly impressive. So I'm going to do is show how to results.txt. This is going to store it into a file called results. Unfortunately, I didn't put a path. So it's going to be saved into my default path, which I could still pull up as soon as it's done. We'll take a quick second. Another nice thing about this transcription API is it's pretty quick. So if you have a 40 minute lecture, you can almost instantly transcribe it. So instead of sitting there in class and typing out your notes, you can have a transcription done quickly by just a video. So if I go to results.txt, here are the results pretty much of the transcription. I'm going to keep this video very short and just pretty much a TLDR. The future implementations that we could do with this hack is to build a mobile app where users can instantly just upload videos from their phone. Another implementation that we want to do is text summarization is to be able to quickly give you the like kind of a quick snapshot of what the text says. Instead of having to read through it all, you could have a whole lecture summarized in about one or two paragraphs of the good info. This is a future implementation that we want to bring into and we found online services that do enable that. Another implementation that we see with this is imagine all the videos on YouTube or Google or any other video platform being transcribed. And when you were to search something, you're able to pull in the contents of those videos. So right now the search algorithm, the only way it suggests videos is by tags or the description or the title of the video. This goes one level deeper and getting the actual contents of the videos. So you'll be directed a lot more information that is better suited for your search. The technology we used is specifically all the Google Cloud technology. We used the speech to text API, the Google Cloud bucket, and as well as we have found a library called FFmpeg for audio and video handling as I demonstrated earlier. We ran into a lot of challenges during our hack. We initially wanted to use something called UiPath to automatically upload the video to a transcription service, like some website that does the transcription for you. But we decided that going the approach of doing the hack ourself is a lot more rewarding. Building an RPA or a robotic process automation to do it was kind of the quick, dirty, and cheating method. It's not really cheating, it's just we wanted to build our own hack that we leverage our own APIs, and it was really rewarding. The biggest struggle we had was properly unpackaging and connecting libraries to the Google Cloud API. Their guides were informative. Unfortunately, they weren't extremely detailed, and as noobs we were having a lot of trouble figuring out some information that was thought to be secondhand info. For example, we don't know how to use Gradle or Maven, and their guidelines weren't really explicit on how to use them, and we struggled for a very long time on this. Fortunately, we decided to not use a client-side library. We just went straight to the Google Cloud API, which is very impressive. The amount of power you can leverage with one line of code is incredible. Anyways, guys, thank you for watching this. I hope you enjoyed our hack, and catch you guys on the flip-flop. Take care.