Exploring Whisper AI and Audio Search Innovations
Discover Whisper AI from OpenAI for multilingual transcriptions and Arama Search, a tool for intelligent audio search via JavaScript frameworks.
File
Episode 58 - Whisper AI and searchable audio
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hey, welcome back to another Just 5 Mins. Only seems like five minutes since the last one. Anyway, jokes aside. Okay, what have we got this time? Well, actually, it's a couple of things or even three. Goodness me. The things I give out for free. No, no, it's terrible. Okay, so what we really do have is there are three things. Whisper, Whisper AI, actually, more importantly, from our good folks that brought us ChatGPT, OpenAI. There's also a searchable. Actually, it looks really quite interesting. Something called Arama Search. I'll put the links below, as usual. But that looks really quite interesting in the fact that you've got a JavaScript sort of way of doing some really quite intelligent search, actually. It's quite an interesting one. And if you want to follow along with this one, this is, I'll put the link from where I read this from, but it's also using the Astro framework that's a JavaScript framework. I have heard about it. I haven't actually done Just 5 Mins on that, but I think it warrants one, actually. But the key bit, in fairness, is the Whisper AI, and I have had some experience of that. I used it a little while ago as a bit of a test when, because just to put some context, it's been out for a while, but it hasn't had the publicity that ChatGPT has had, or all the Azure stuff, or any of the other multitude of AI stuff that's been going around. But Whisper AI, in its essence, it takes audio, and it can take audio from anything, from a video, from a podcast, from whatever it might be, and does a rather interesting sort of way of transcribing that because it can handle multiple languages. You can go and check this out. I'll put the links below, but there's lots of different languages that it can actually take out. And, of course, it ends up just producing text, okay? But with this example, it's then making audio searchable, okay? So the idea is you're first transcribing your audio using Whisper AI. The text can then be searched, and that's where Arama Search comes in. Now, I'm probably going to be using, to be honest, I've been using Azure Cognitive Search, and I'm probably going to tap that into there. But Arama Search, I think I'm going to look at it anyway, and it may well be its own episode, as I've mentioned, as well as Astro. But the idea is that as you search the text that Whisper AI has produced, what it can do, it can obviously find the text, but it then can actually link back to that sort of piece in the audio. And it does it by just, like, giving you a few seconds before the search, so you can get a little bit of context. Really quite interesting. And the fact that Whisper AI, again, it's been out for a while, you've needed generally something like Python and PyTorch to run it. Now, I did this with the Google Collab function, which, again, probably I've referenced it in a couple of podcasts, but that's where you can use Google's sort of massive infrastructure to run some of these AI processes without trying to run them locally. Now, the real kicker here, the interesting thing is trying to usually run that Whisper AI, you're going to need a reasonably powerful PC with a decent sort of GPU. But there is essentially a Docker container for this now. So where I was using the Collab feature or even just trying to run it via Python, you can now interact with it through Docker. So essentially, when you spin up the container, and again, my poor little laptop is going to be on fire, and I'll be probably cooking eggs or something on the top of it. But if you do run it, you essentially get a web API. It's Whisper as a service, so it's a web API. So there's no actual front end, but it's really easy to pass up audio files to it and then get the transcriptions. Now, there's loads of transcription services out there, but I'm really curious to see what Whisper AI from the folks at OpenAI have provided. And as I say, when I tested this a little while ago, it did work really well. And having that searchable, that text searchable, and pinpointing to the piece in the audio when it has found that search is pretty awesome. Again, I think I'm really still swaying with Azure Cognitive Services Search because I've used that in the past, and I'm actively working on something at the moment. But again, an amazing way to look at maybe transcribing audio from, again, videos or any other audio source. It supports quite a lot of the usual MP3 and different formats. You can check that out with the links below. But that's what I've got this time. It'll be interesting. Make podcasts searchable. There are a lot of services out there, but I think something like this that you can adapt is going to be quite an interesting little adventure. Okay, that's what I've got for this time. And take care until the next time.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript