Effortless Podcast Transcription with DeepGram
Learn how to transcribe podcasts using DeepGram, from direct URLs, RSS feeds, or local files. Get formatted, human-readable transcripts efficiently.
File
Create Readable Transcripts for Your Podcast with Python
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello there, my name is Kevin Lewis and I'm a developer advocate here at DeepGram and today I'm going to show you how to get nicely formatted human readable transcripts for your podcast. We'll start by providing a link to the direct media file, I'm getting a transcript for that. Then we'll go ahead and provide the RSS feed URL and get a transcript for just the latest episode and then finally I'll show you how to get a transcript from a media file on your computer so you can transcribe it before you're ready to upload and publish it. We'll be using Python today, I've done a few things ahead of time, firstly I've created and activated a virtual environment called virtual underscore env. I've created a .env file with one property in it, DeepGram API key, which includes shock horror, a DeepGram API key, which you can get for free at console.deepgram.com, link in the description. And I've also downloaded an episode of my favourite podcast, so when it comes time to transcribe the local file, we already have one locally, that's this file here. So with that, let's get ready. The first thing you need to do is install some dependencies for today. You'll need to import the DeepGram SDK, AsyncIO, the Python.env package and also FeedParser, which you will only need if you are transcribing from the RSS feed, but we'll install them all anyway. The next thing to do is go ahead and import all of our dependencies, so that's all the ones we've just seen, plus the native OS package. Then we'll go ahead and get our DeepGram API key from that .env file, like so. The final thing to do before we really get into the guts of this project is to actually set up a main function, so let's go ahead and do that. And so we're going to do everything inside of this main function here. Now the first thing we're going to do is initialise the DeepGram SDK. So we'll go ahead and do that, like so, fantastic. And now we'll go ahead and get the direct URL and transcribe it. So the way we're going to do that is as follows. We're going to store the URL in a variable, and in here you'll provide a direct URL to a media file, generally a .mp3 file. Then we'll go ahead and create a source object here, with one property, URL, with the value of this string here. We'll set up all of the transcription options that we want from DeepGram. To start, we'll just turn on punctuate, always a word that gets me even though I type it basically every day, and this is a boolean like so. And then we'll go ahead and ask for a transcript from DeepGram. DeepGram.transcription.prerecorded, pass in the source which is this object here, pass in the transcription object options like so. We'll await this, so this will take as long as it needs to take, and then return into the response variable, and then we'll print response like so. So let's go ahead and run this. It's quite a long episode, so I'll be right back. A few moments later. Hello, it's me from the future. That took about 10 seconds, but I didn't need you to sit through it. Now this is all the data that's returned by DeepGram. You'll see it's a huge JSON object that contains not only the words that were said, but a bunch of metadata like confidence, punctuation, and when words start and end, which is super useful. Now we're going to go and use a couple more DeepGram features to enable us to generate a much nicer human readable version of this transcript. This is great for computers. This is great for our applications, but not so great to read. So as well as punctuate, we're going to turn on diarise, which is speaker detection, and we're also going to turn on paragraphs like so. So now we will get back all of this data plus more. Once we enable diarise and paragraphs, instead of just printing the response, we can print a pretty version of the transcript complete with paragraphing. Let me show you how to do it. So what we'll do is create a new variable here called transcript, and in here we will extract just the value we want. Okay, I think when I edit that, I'm going to speed that up, you don't need to see me type all of that slowly. But here we are, we're dipping into this big JSON object here. So results channels, which is an array, first item alternatives, which is an array, get the first item, this should actually say paragraphs like so. And then inside of that, we'll just grab the transcript. There's more data inside of this paragraph subject, which breaks down each individual paragraph in more detail, but we want the whole thing. And then we'll go ahead and print the transcript like so. So let's rerun that. And again, I'll cut to when this is ready. Okay, and we're back. And we see here there's a much nicer transcript here that's been presented. I noticed speaker labels are missing. Why is that? Because I misspelt diarise. That is how you spell diarise. You know what, let's run this once more, and I'll be right back. Okay, and we're back. And I've scrolled through this transcript in order to find a nice example here. But you'll see that now, some paragraphs are appended, are prepended rather with the number of the speaker starting at zero. So this will be the seventh detected voice. And you'll see this person says a few paragraphs, and then it moves on to another speaker. And so if you don't have diarise, as we just saw when we made a typo, we'll just get the paragraphs nicely presented. Or if we have diarisation turned on, we'll also get these paragraphs when there's a new speaker, a change in speaker appended with the name of that speaker. Now we can go ahead and write this to a file. So instead of just printing it here, we can go ahead and say with open transcript.txt, or whatever it is you want your final file to be called. We'll open that in write mode, that's F. We'll go ahead and write to it the name of the transcript. So again, I'll run that once more and show you what that looks like in practice. And there we have it. That's a transcript for that podcast, which was the final episode of Gimlet Media's Reply All, which I'm very sad has ended. So that is how you transcribe a directly hosted URL for a podcast episode, or indeed any other audio file. Now let's talk about how we are going to get an RSS feed URL, because every podcast has an RSS feed that accompanies it, get the latest episode and transcribe that. So what we'll do here, I think we'll do it just here, is we'll create a new variable called RSS, and we'll go feedparser.pass. And in here, we provide a URL. Here is one I made earlier. And I think this is of NPR's morning show, forgot the name of it off the top of my head. So that will represent the RSS feed. And then we go in, we're going to grab the latest episode and just grab the URL. This is how we do it. URL equals RSS.entries, which is an array, firstEntry.enclosures, firstEntry.href. So this is now the URL, and everything else stands as it always was. So let's go ahead and run this. It will overwrite this transcript file, and I'll be right back when it's done. And here we go. Here is a transcript for NPR's Up First, which I'm glad was written right here in the transcript. Now for the final step in this video, I'm going to show you how to transcribe a podcast that's right here on your computer, not necessarily yet uploaded or published. So we'll go ahead and delete these three lines here, and we'll open this file as a buffer, and then we will create a new source object and send that to DGRAM instead. We do this with open. We provide the name of the file, ICYMI, in case you missed it. Excellent podcast, by the way, on internet culture from Slate Podcasts. We'll go ahead and open that in read mode as audio, and then we create a new source. This source is going to have two properties. The first is a buffer, which is how this is loaded, because of the B here, R for read, B for buffer. And we also want a MIME type, audio slash mp3. And that is actually all we need to change, other than to indent all of the following lines in this main function, because Python is an indentation-sensitive language. So we'll go ahead and save that, and we'll rerun this once more. Now I am uploading this entire reasonably large mp3 file, so there might be a little bit of a delay here while it uploads. So I imagine this will take a minute or so just to upload, then the transcription is pretty quick. So I will cut to when it is done. Hello, welcome back. I've been sitting here for like three or four minutes expecting it to be really slow, but eventually I got my prompt back, so it's done. Here we are. Yeah, this is a podcast. This is ICYMI, in case you missed it. This is a very good episode on the Gentle Minions meme, which is reasonably recent, which will age this video when you come and see it later. So in summary, we've done it. We have transcribed audio in three different ways here for your podcast. Firstly, by providing the URL directly, secondly, by passing an RSS feed and getting the latest episodes mp3 file, and then transcribing that, and then finally, by uploading quicker than I expected, a local media file, and then DeepGram transcribes that pretty quickly and returns it to us. And then we saved the returned formatted transcript with paragraphs and speaker labels with this code here. If you have any questions at all, please get in touch. We love helping you build awesome things with voice. Bye for now.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript