Transcribe Podcasts 22x Faster Using AI and Elixir
Discover how to build a local podcast transcriber using OpenAI's Whisper model, Elixir, and Bumblebee for efficient audio-to-text conversion.
File
Transcribe Podcasts with Whisper AI Elixir in Livebook
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: With a podcast URL and a little code, I managed to transcribe this episode locally over 22 times faster than I could even listen to it. Want to build your own? Stick around. In 2022, OpenAI, back when they were actually open, introduced a transcription model called Whisper, which is still one of the best available models to extract text out of speech. This model is available on Hugging Face and is truly impressive. And it got me thinking, I listen to way too many tech podcasts. Some of my favorite moments don't end up in the description and it's hard to reference them later on. A few podcasts publish their transcripts, but most don't. For example, there's an interesting discussion on epics and story points in a random episode of the Elixir Outlaws podcast. Unfortunately, it's 30 minutes into an episode called Little's Law. Not very obvious. Now I'm lazy and would hate to have to re-listen to hours of episodes to find that one discussion, but I can definitely write some code to find that clip. This is the story of that code. As always, the code from this video is available on GitHub with the link in the description. There's a million videos showing how to hack together AI and Python, but it's not the only option. I personally love the Elixir programming language for its concurrency, Ruby-inspired syntax, high-quality abstractions, and the simple on-ramp called Livebook. Livebook is Elixir's response to Jupyter Notebooks. It has a bunch of features that I can't get into for time, but you should check it out at livebook.dev. For this project, I'll need a few dependencies. First, a library called REC for requests, then FastRSS because nobody wants slow RSS, Bumblebee to load Hugging Face models, XLA as an accelerated ML runtime backed by Google's XLA, and finally, Kino for formatting and displaying the result in Markdown. With these dependencies listed, I can run my setup cell and everything's in place. So let's scrape that podcast feed. Podcasts run on a standard called Really Simple Syndication, RSS for short. It involves an endpoint that a client can pull to get a feed of content. We'll grab the RSS link from this podcast website, make a request for the file, and then parse it. As a result, we get a usable representation of the RSS feed XML file. Before transcribing the episodes, I'm going to trim this down to the fields I care about, which is the title and the URL. With these fields extracted, I'm ready to download the files to my computer. For this demo, I'm gonna limit the episodes I process to two. In reality, we can download the full feed and maybe we would use a cache to keep track of the episodes we already have in our system. To do this, I'll establish a temporary directory and then loop through the episodes, limiting it to the number that I wanna process. I grab the name of the file from the URL, in this case, some UUID MP3 file. Then as a result, I put the local path onto the object that I'm accumulating. In Elixir, it's very common to shadow a variable like this as you process and enrich it. With the files locally on disk, it's time for the main course, AI transcription. At the beginning of this video, I installed a library called Bumblebee. It's a high-level interface to AI models similar to the libraries HuggingFace distributes to the Python community. Here, I'm installing the WhisperTiny model and creating a serving object. This handle automatically batches and chunks its input to optimize processing time. Bumblebee also automatically distributes AI processing across connected Elixir nodes, which could be cloud instances with GPUs or a cluster of machines on your home network. If you're interested in learning more about distributed AI processing in Elixir, leave a comment on the video and I might make a follow-up. It's actually really neat. Since we're working with media files, I have to mention, you need to install FFmpeg on your machine. Installation instructions vary across operating systems, so you'll need to find the right one for your machine. On my Mac, I had to add the homebrew binary directory to my path. Now it's time to transcribe. I'm gonna start running this, and as I do, I'll walk through what this code does. We use nx.serving.run. This is a method to pass a job to the Bumblebee Whisper object. And I give it the local path to the file and let it rip. To understand how much faster Whisper is listening to the file than I would, I'm using a simple datetime diff to count the processing seconds. And now we wait. Two very boring minutes later. And now I have an array of transcribed episode objects, along with their timestamps. What you choose to do with this data is entirely up to you. Put it in a database, index it for vector search, send it to your enemy, whatever floats your boat. In my case, I'm gonna use simple string interpolation to create a markdown file and display each segment as a list item. This code itself isn't very exciting, but the result is. This transcript was generated 22.65 times faster than the regular playback rate of the audio. And I didn't even use a GPU for acceleration. Now I can search the transcripts and confirm. This episode is where they went on a tangent talking about epics, around the 31-minute mark. I hope this has sparked your curiosity and shown that AI models don't need fancy cloud-hosted APIs to provide real value and efficiency in your own life. This could very easily spin off into a ton of different side projects. Maybe you'll build a new podcasting platform that transcribes things as you upload. Maybe you analyze trends across podcasts within an industry using embeddings and k-means clustering. Generate captions to play on top of videos synchronized with the audio. Create better summaries of episodes than the creators do based on their actual content. The list goes on. Seriously, you can build anything. Let me know what interests you. I just might build it in a future video. Shout out to the Elixir community for building the infrastructure that made this technology really approachable. This has been Code & Stuff. Thanks for watching.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript