Learn Speech Recognition with Python API
Explore how to transcribe audio to text using Assembly AI's API and Python. Discover step-by-step code implementation and tips for efficient results.
File
How to Transcribe Audio Files with Python
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hey and welcome in this project we are going to learn how to do speech recognition in Python. It's going to be very simple. What we're going to do is to take the audio file that we recorded in the previous project and turn it into a text file. Let me show you how the project works. So here is the audio file that we recorded in the previous project.

Speaker 2: Hi I'm Patrick this is a test one two three and if you run our script we get

Speaker 1: the text transcription of this audio file so like this here hi I'm Patrick this is a test one two three so let's learn how to implement this in Python so for this project we are mainly going to need two things assembly AI's API to do the speech recognition and the requests library from Python to talk to assembly AI's API. So let's first go ahead and get a API token from Assembly AI. It's very simple you just need to go to assemblyai.com and create a free account. Once you have an account you can sign in and just copy your API key just by clicking here and right away I'm going to create a configure file and put my API key here. Once I've done that now I have a way of authenticating who I am with Assembly AI's API and now we can start setting up how to upload, transcribe and get the transcription from Assembly AI's API. The next thing that I want to do is to have a main file that is going to have all my code. What I need to do in there is to import the requests library so that I can talk to the Assembly AI API. So this project is going to have four steps. The first one is to upload the file that we have locally to Assembly AI. Second one is to start the transcription. Third one is keep pulling Assembly AI's API to see when the transcription is done. And lastly we're going to save this transcript. So uploading is actually quite simple. If we go to the documentation of Assembly AI, we will see here uploading local files for transcription. So I can just copy and paste this and change the code as we need it. So basically yeah okay we are importing the request library already the file name we are going to get from the terminal. So I will set that later. Just a couple of things that we need to pay attention here. Basically there is a way to read the audio file from our file system and then we need to set up headers. These headers are used for authentication. So we can actually already set this because this is not going to be your API token we set it to be API key assembly AI right and we need to import it here of course all right that's done so we also have a upload endpoint for assembly and this one is API that assembly AI comm V to upload but you know this might be something that we need also later so I'm just going to put this to a separate value variable and then just call this one here so when we're doing when you remember you're uploading a file to assembly AI we are doing a post request in this post request you need to you need to send this post request to the upload endpoint you need to include your API key included in the headers and of course you need the data so the file that you read and we are reading the data through the read file function in chunks because assembly AI requires it to be in chunks and in chunk sizes of five megabytes basically this is a number of bytes that are in there. While we're at it we can already get the file name from the terminal too right so for that I just need to import system and And inside system, the second, or the first, not the zeroth, variable or the argument is going to be the file name. And here, let's clean up a little bit. All right, now we should be able to just run a command on the terminal, include the name of the file that we want to upload, and it will be uploaded to Assembly AI. And we will also, let's print the response that we get from Assembly AI to see what kind of response we get. Again, this is the file that we are working with.

Speaker 2: Hi, I'm Patrick. This is a test 1, 2, 3.

Speaker 1: And what we need to do right now is to run Python main.py and the name of the file, in this case, output.love. All right. So, we uploaded our file to assembly.ai successfully. And the response, what we get is the upload URL, so where your data, where your audio file lives right now. And using this, we can start the transcription. So for the transcription, let's again cheat by getting the code from the docs. Here is the data, the code that we need, starting from here. So this is a transcription endpoint. You can see that it ends differently than the upload endpoint. This one ends with upload, this one ends with transcript. I will call this the transcript endpoint. Headers, we already have a header. We don't really need this anymore. The endpoint is transcript endpoint. JSON is the data that we are sending to, or the data that we want Assembly AI to transcribe. So we are going to need to give it the audio URL. We already have the audio URL, right? So we got the response, but we did not extract it from the response. So let's do that. Audio URL is response.json. And it was called upload URL. So we're going to give this audio URL to here, because it was just an example. OK, and this way, we will have started a transcription. let's do this and see what the result is. I will run this again, same thing. All right, so we got a much longer response. In this response what we have, we have a bunch of information about the transcription that we've just started. So you do not get the transcript itself immediately because depending on the length of your audio it might take a minute or two, right? So what we get instead is the ID of this transcription job. So by using this ID from now on we can ask assembly AI hey here is the ID of my job this transcription job that I submitted to you is it ready or not and if it's not ready it will tell you it's not ready yet it's still processing if it's ready it will tell you hey it's completed and here is your transcript. So that's why the next thing that we want to build is the polling we're going to keep we're going to write the code that will keep polling assembly AI to tell us if the transcription is ready or not but before we go further let me first clean up this code a little bit so that you know everything is nicely packed and functions we can use them pre use them again if we need to so this one is the upload function yes and what it needs to return is the audio URL we do not need to print the response anymore we've already seen what it looks like. And we need to put the heather separately because we want both upload and transcribe and basically everything else to be able to reach this variable called heathers. For transcription again I will create a function called transcribe. And what I need to return from the transcription function is the ID. So I will just say job ID and that would be response.json and ID. Again we don't need this anymore. I'll just call this transcriptResponse to make it clearer. This will be uploadResponse. Let's call this transcript request so everything is nice and clean this is this and this goes here and for upload response we use it here and we need to return job ID all right so now we have them nicely wrapped up in different functions and everything else looks good let's run this again to see that it works oh of course I'm not calling the function so let me call the functions and then run it. Upload and transcribe. But of course I also need to pass the file name to the upload function. So let's do that too. Audio URL is not defined. Audio URL of course then. I also need to pass audio URL audio URL to transcribe. Good thing we tried. So this will be returned from the upload function and then we will pass it to the transcribe function. And as a result we will get job ID and then I can print job ID to see that it worked. Let's see. Yes I do get a job ID. So okay things are working. The next thing that we want to do is to set up the fault polling function. So the first thing we need to do for that is to create a polling endpoint. Polling endpoint. So as you know we had the transcript endpoint and the upload endpoint here. That's how we communicate with Assembly AI's API. With polling endpoint it's going to be specific to the transcription job that you just submitted. So to create that all you need to do is to combine transcript endpoint with a slash in between and add the job ID. But the job ID is a bit weak so I'll just going to call this transcript ID. So by doing that now you have a URL that you can ask to Assembly with which you can ask Assembly AI if your job is done already or not. And again we're going to send a request to Assembly AI. This time it's going to be a GET request. Well I'll just copy this so that it's easy. Instead of POST it's going to be a GET request. We're going to use the polling endpoint instead of the transcript endpoint. And we just need the headers for this. We do not, because we are not sending any information to Assembly AI. We're just asking for information. If you're familiar with requests normally this might be very simple for you but all you need to know about this is that when you're sending data to an API you use the post request type and if you're only getting some information as the name suggests you use the get request type. So the results the resulting or the response that we get is going to be called polling response. Let's see, it's not job ID, I called transcript ID so that it works. Then we get the polling response and I can also show you what the polling response looks like. Looks good. Okay, let's run this. All right, so we got response 200. That means things are going well, but actually what I need is a JSON response. So let's see that again. Yes, this is more like it. So again, we get the ID of the response, language model that is being used, and some other bunch of information. But what we need here is the status. So let's see where that is. Oh yeah, there it is. So we have status processing. This means that the transcription is still being prepared. So we need to wait a little bit more and we need to ask Assembly AI again soon to see if the transcription is done or not. What we normally do is to wait 30 seconds or maybe 60 seconds, depending on the length of your transcription or the length of your audio file. And then when it's done, it will give us status completed. So let's write the bit where we ask assembly AI repetitively if the transcription is done or not. So for that, we can just create a very simple while loop, while true, we do the polling. And if pollingResponse.jsonStatus equals to completed, we return the pollingResponse. But if pollingResponse.status is error, because it is possible that it might error out, then we will return error. I'll just wrap this into a function. I can call this get transcription results URL. And while we're at it, we might as well also wrap the polling into a function. Do we need to pass anything to it? Yes, the transcript ID you need to pass a transcript ID to it and instead of printing the response we will just return the response so instead of doing the request here all we would need to do is to call this function with the transcript ID we can pass a transcript ID here or might as well I will just call the transcription or transcribe function in here and the resulting thing would be the transcript ID from the transcription function and then I'm going to pass this transcript ID to the polling function that is going to return to me the polling response I will call this polling response data and inside this data so this is not needed anymore yeah this this so the polling response that Jason is what is being passed I called that the data so I change this to data here and also data here yeah then I'll just pass the data if it's error I can still pass the data just to see the response and what kind of what kind of error that we got and here then we just just say none all right let's let's do a little cleanup so we have a nice upload function, a transcribe function. What we did before was we were calling the upload function, getting the audio URL, and then passing it to transcribe. But I'm running transcribe here, so I do not need this anymore. I still need to pass the audio URL to transcribe, so then I would need to pass it to here. So instead of this, I just need to call this function with the audio URL. Yeah, let's put these here. Actually, to make it a bit more understandable, maybe instead of passing the string error, I can just pass whatever error that was that happened in my transcription. Then, you know, we'll be able to see what went wrong. All right, so what we get as a result from get transcription result ID is the data and if there is any the error. So then let's why not run this and see what the data is going to look like. All right so we get something really really big let's see maybe I'll just clear this and run it again just so that, you know, we can see it more clearly. All right, so we get the ID again, the language model that is being used, etc. Now we want the results. Yes, it is under text. Hi, I'm Patrick. This is a test 1, 2, 3. That's what we get. And we also get the breakdown of words, when each word started and when each word ended in milliseconds, confidence of this classification and much more information. What we want to do though, even though we have all this information, we want to write this transcript that is generated by Assembly AI into a text file. So in this next step that's what we're going to do. All right let's come up with a file name for this file. We can call it actually we can just call it the same thing as the file name plus txt. So the file name okay we were using the argument or a variable file name too so maybe let's find something else I will just call this text file name and it will be the file name plus dot txt we can also just you know remove the dots valve or dot mp4 or whatever but well let's not deal with that for now so once I have this I will just open it. I will open it in writing format and inside I will write data text because that's where we have the text information of the transcript if you remember here. This was a response we got and text includes the transcription. And I can just prompt the user saying that transcription is saved. Transcription saved. We're happy. Of course there is a possibility that our transcription errored out so you want to cover that too. If you remember we return data and error what we can do is you can say if data is returned this happens but if it errored out I will just print error no it didn't work out and the error itself so that we see you know what went wrong okay let's do a little cleanup again I want to wrap this all up in a function we can call the save transcript data and error will be returned from get transcript URL. It needs the audio URL so I will just need to pass the audio URL here. And with that we're actually more or less ready. So let's run this and see if we get what we need the transcript saved in a file. For that after the after calling the upload function. I can move this one here and calling the upload function here. I call the upload function and then I call the save transcript function and let's quickly follow that up. I call the save transcript function. It calls a get transcription result URL. Get transcription result URL calls transcribe. transcribe is here it starts a transcription process and then get transcription result URL also calls polling so it keeps polling assembly AI and when it's done it returns something and then we deal with it in the save transcript function and we either save a transcript or if there is an error we display the error so let's run this and see if we get any errors transcription saved. All right, let's see. I'll put wow.txt. If I open it up, it looks quite small. Maybe I can, if I open it like this. Yes. Hi, I'm Patrick. This is a test 1, 2, 3 is the result that we're getting. So that's awesome. We actually achieved what we wanted to do. So in this next couple of minutes, I actually want to clean up the code once again, because you're going to build a couple more projects and we want to have a Python file that has some reusable code so we don't have to reinvent the wheel all the time. So let me first go here actually when we're doing the polling if we just have a while true loop it's going to keep asking assembly AI for results and you know that that might be unnecessary so what we can do is to include some waiting times in between so it can ask if it's not completed yet, it can wait, let's say, 30 seconds to ask again. So we can inform the user, waiting 30 seconds, what I need is a time module, so let's call this 30, and I will just import time here, and this way it should be waiting 30 seconds in between asking assembly AI if the transcript is ready or not and okay let's create that extra file that we have API communication I'll call it yes so I will move all of the functions that communicate with the API there so I need to move the upload function I need to move transcribe pull all of these actually. So just remove that, yeah. Let's see, did we miss anything? No. I'll just remove these from here. File name can stay here, of course. Headers and the upload and transcript endpoints need to live here, because they are needed by the the functions. In here we have to import the requests library, so we don't need it anymore here. We need to import the assembly AI API key. System needs to stay here, time needs to go there. And we also need to import from API communication, import, we'll just say all. And that way we can use these function in our main Python script. I will run this again to make sure that it is still working. So I I will delete the text file that was created. I will keep the output. Nice, so we also get the prompt that the program is waiting 30 seconds before asking again. Oh yeah, we passed the file name but of course it might not exist there so let's go and fix that. The file name is here, we only pass it to the upload function and the upload function is here now. And in the save transcript we do not pass it but we are actually using it so what we can do is to just also pass the file name here and that should be fine. It should fix the problem. Transcription saved. All right, let's see. Output.wav.txt. Hi. Like this. Hi, I'm Patrick. This is a test 1, 2, 3. So this is a very short audio file and we've actually been using it over and over again. So I want to also show that this code is working to you using another audio file. This is the audio of the one of the latest short videos that I made for our YouTube channel. I was just talking about what natural language processing is. So this time maybe if I add underscores it will be easier to call. Yes I'll just copy its name and when I'm calling the script I will use its name. This will probably take a little bit longer because the audio file we've been using is only a couple of seconds and this one is one minute. So we will see what the results are going to show us. Right, here we go. The transcription is saved. We find it here. Right, this is exactly what I was talking about. Let's listen to it while the transcription is open. Can Alexa be your best buddy? Well, not now, but probably very soon. We have been seeing gigantic leaps over the last couple of years in terms of how computers can understand and use natural language. All right, you get the idea. So our code works. This is amazing. I hope you've been able to follow along. If you want to have the code, don't forget that you can go get it and the GitHub repository will be prepared for you using the link in the description.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript