Master Postman and Assembly AI for Audio Transcription

Convert Your Audio To Text

4.9/5

3721 customer reviews

Learn how to use Postman with Assembly AI's API for transcribing audio, leveraging conversational intelligence models, and managing large language models effortlessly.

How to use postman to test LLMs with audio data (Transcribe and Understand)

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hello. Today let's learn how to use Postman with Assembly AI's API. I will show you how to transcribe an audio file, but also how to use some of the conversational intelligence models Assembly AI offers, like speaker labels. And then we're going to see how we can use LLM directly through Assembly AI's API to get answers to questions about your audio files, create a action list, or even generate a summary directly on Postman without having to code at all. I already have an example here, for example. I had meetings or recordings of two meetings, and I asked what are the main issue that the data team is facing and what is the proposed solution, and I got the answer directly from Lemur, which is a framework for LLMs by Assembly AI, and it says the main issue the data team is facing is lag in getting updated metrics from the database. And then it mentions a proposed solution. So let's start and see how we can use Postman to achieve this. So this is the Postman website. There is also a desktop app, I believe, that you can download and use, but it's much easier to use it on the browser, so I'm going to show it to you here. Let's create, or let's start by creating a new workspace. That will be a blank workspace, and I will call it Assembly AI Test. Postman is actually a great tool if you want to start using an API, you don't know how it works, you don't know what the response looks like, and you want to kind of test it out before you integrate it into your code base. So I would 100% recommend it that you use it beyond this tutorial, too, if you are working with APIs generally. So let's start with transcribing a file with Assembly AI. So I'll start a new HTTP request. That's going to be a POST request, and the first endpoint we're going to want to use is a transcript endpoint. I'll just copy it from Assembly AI, Assembly AI's documentation. This is the API reference, the full API reference of all the parameters that we have, every little customization and everything that you can do. So if you want to learn beyond what I'm showing you in this tutorial, or if there's something maybe I miss and I'm not fully explaining, you can always go and consult the documentation. It's complete and very comprehensive. All right, so that is the endpoint that I want to use, the transcript endpoint. Next, I need to set up my API key. The API key you can find on your Assembly AI dashboard. If you don't have an account yet, just go create one at assemblyai.com. It will take you two seconds, and then you can copy your API key here. The API key you get for free, and for transcription you also get 100 hours for free when you first create your account. So you don't have to add your credit card information or anything, or upgrade or anything. So this you can use for free through Assembly AI. And then we need to set our content type. That's going to be application JSON. So we're just letting the API know that we're passing it a JSON file, or JSON data, I guess. And the next thing that we need to do is to tell the API where the audio lives. So there are two ways how you can pass audio or video files to Assembly AI. We're going to do the first one, where you just pass it the publicly accessible URL pointing to an audio file or video file. The video file in our case this time. The second way is to upload your audio file from your local system, or wherever it is, to Assembly AI. And I will show you how to do that too in a second. So right now I have a meeting of a team. I think this is GitLab recorded. We can kind of watch it for a second. So it's basically like a group of people discussing some things in their team. And I already have it uploaded here. So I'm just going to copy that into here. And I think that's all I have to do. And then we can send this request to Assembly AI. All right, so what we get immediately is a summary of our response, of our request basically. So we get an ID for this transcription job that was created. We have the model that is being used. We have which language this is going to be transcribed in. You can set it up yourself. Let's say if you have a Spanish audio file, you can also just set it to be Spanish. And then it will be transcribed in Spanish. We have the audio URL here listed again. And the status right now is queued. We have a couple of different ways how this can be returned. We have queued processing completed. So once it's completed, you also get the transcript. Or it can error out for a couple of reasons. If it errors out, then you will get something that will say error. These are all the things that we are not using right now. All the different ways how you can use a conversational intelligence model with Assembly AI. Or some customizations that you can do. Well let's not go into the details of that to not make it confusing. I will just save this and call this request start transcription. And then save it here. All right. So I will just duplicate this to make it easier. So I have my header here still. So now that the transcription started, the next thing that I want to do is to get the transcription. So for that we need to send the get request again to the same endpoint. With the same headers. We're not going to use a buddy this time. So I can delete it. But it's okay even if you don't delete it. And then I just need to copy the transcript ID that was returned to me. And paste it at the end of this transcript endpoint. And when you do this, what's going to happen is we're going to send a request to Assembly AI asking whether this transcript is finished yet. If it is already completed, it's going to return to us the text. And it is completed. As you can see the status is completed. And we get the text. Like I said there's another way how we can get the transcription. And that is, let me save this one first. Yes. And that is to upload an audio file to Assembly AI. So I will again duplicate this. The initial one that I sent. And move it even here. All right. So for that one, again we're going to need the headers of course. But for content type, as far as I know, it needs to be different. Yes. The content type, as you can see here, maybe I can make it bigger in the documentation, will be application octet stream. So let's change that. The endpoint will be upload endpoint. So we can copy it from here. So instead of transcript, you're basically sending it to the upload endpoint. Headers are the same. For the body, we're not going to use a raw JSON file. Instead we're going to say binary. All right. I have to change browsers because my other browser, for some reason, was not working with the file upload. But anyways, now we uploaded our file. A meeting from GitLab of their key metrics of their engineering team. Or something like that. All right. So I'm going to use the upload endpoint. And then upload it to Assembly AI. And as a response, what I'm going to get is the URL where this file is uploaded. And then basically you can do the same thing like we did before. To start the transcription, we just need to call the transcription endpoint attaching the auto URL. And then once you send this request, the transcription request, a transcription is started. And then I can again go to the next step. And then just the headers, because it's a get request, I can paste my transcription ID to the end of the endpoint URL. And then send this get request. And as you can see, now the status is processing. We'll just need to wait a couple more seconds and then send this request again. Then we will get the transcription. Now it is completed. As you can see, now we get the... And this is the January 2021 engineering key review and the discussion in this meeting. I want to show you one more thing before we get into the large language models. And that is to use the conversational intelligence models from Assembly AI. Let me go to the documentation. All right. Here in the audio intelligence section are everything that is available through Assembly AI. You can get a summary of this audio file. You can get content moderation, sentiment analysis, entity detection, topic detection, and many more things. As part of the speech-to-text though, you can also get speech recognition, which is a transcript. Or you can get speaker diarization. And streaming is a real-time transcription or real-time speech-to-text, basically. So let's try and see how we can do speaker diarization. I will go back to the API reference. And we can search for speaker labels, as you can see. So to get labels of speakers, we just need to set speaker labels to true. So let's go back to Postman. In the body, additional to the audio URL, I will set speaker labels to true. So this is a file that we uploaded. So let's start this transcription again. And this is transcription ID we're getting. And then I will paste it here. All right. And now it's completed. Let me show you what the speaker labels results look like. So we have the audio URL listed here. And then the text. So this is just a plain transcription. And then if you want to see the speaker labels, you can go to the utterance section of your response, of the response. And here, so basically every piece of text for every little sentence or group of sentences, you get the speaker that uttered all of these words. And then, additionally, you also get each word and when the start timestamp is and when it ended. Basically. And then for the next group of sentences, you again get the speaker B was speaking. And then speaker A is speaking again. And this is what they said. So this is super useful. We're not going to use it immediately right now. But like I said, if you want to use any of the conversational intelligence features or models of assembly AI, you just need to, while you're starting your transcription, you just need to set it to true. All right. So let's see how we can use Lemur to ask questions or get answers to questions or get summaries or basically do any other task on these audio files using an LLM. So for that, let's go to the API documentation again, the API reference, and I will go to Lemur. Maybe let's start with action items since these are meetings. All right. So let's start a new request. An HTTP request is going to be a POST request. And this is going to be our endpoint, just clicking it to copy it. I will save it already and call it get create action items. We need to set the authorization again and the headers. Let's see what else we need to do. Content type is going to be application JSON again. All right. And next, all of this needs to be in the body. So I'm just going to copy this so that we can edit it here. The body is going to be raw JSON. All right. So you can get the answer format basically in any, you can get the answer in any format that you want. Here it asks for bullet points so we can keep it for now. And the context will be that these are some meetings where the GitLab engineering team is discussing their key metrics as far as that is my understanding. So this is an example transcript ID. So I'll remove that. So we need to paste our transcript IDs here. As you can see, you can use more than one transcript ID. In this case, we have two meetings. So I'm going to pass both of their transcripts to Lemur. So basically now I have two meetings, two recordings of recordings of two meetings from GitLab engineering team, and I'm passing them to Lemur. This is an extra thing I accidentally copied. No. Okay. One more curly bracket to close it. All right. I think that's all I need to pass to Lemur. And it's going to give me some action items that came out of these meetings. All right, here are the potential action items based on the transcript. Update NPS target metric to specify it is out of 100 points and has no units. Add certification logos to marketing materials, and so on and so forth. So if you're using this in an application, if you integrate this into your codebase, of course, you can format this output format, this response to look much nicer. But as you can see, we got a list of action items out of these two meetings. Next thing I want to show you after it's saved, just duplicate this so it's faster to move forward is maybe asking questions. Let's see how we can do that. So we have the heathers again, and the body is going to look a little bit different. Just one thing to point out, though, before we go forward, as you can see here, there are a bunch of things you can do with Lemur. So we use the extracting action items endpoint. I'm going to show you how to get answers. So the end question answering endpoint. But there's also a just generic endpoint where you can write your own task. So whatever that task may be. So we can also take a look at that one in a second. But let's see how to use the question answering endpoint. So I will change the endpoint. It's still a post request. And then I will copy the data that needs to be passed again. So you can pass more than one question. To make it easier, for now, I'll just pass one question. You can, again, give a format to Lemur in which you want your response. You can even give options of what the answer could look like. So I don't need this one for now. So I'll delete it. Answer format, I might fill it in in a second. And then question two, I'll fill it in a second. The context is going to be the same. So I'll just copy it from before. Context is going to be the same. So let's take a look at some of the other parameters. A final model is basically the model that you ask Assembly AI to use. If you go to the API reference, you're going to see all the options that you have for the final model. And then we also are using the max output size. This is basically the number of tokens that are maximum number of tokens that can come as a response to this request. Temperature, let's take a look here, is basically how creative you want a response to be. If it's zero, it's going to be very conservative. But if the higher you make it, it might come up with things that are a bit more creative. So a more creative response to your questions. Transcript IDs, let me copy it again from here. So the first question, let's ask what are the common topics discussed in both of these meetings. And then I will say, give a list of topics with a couple of, maybe with a short explanation of what they are. All right, let's send this request. One caveat of using Pulseman here is if the request is taking more than 30 seconds, I think it's going to error out on the browser. So you might need to download a desktop agent. So not the app, but there is an agent that runs in the background of your laptop. So that makes it possible to still use the browser, even on requests that are taking more than 30 seconds. And then it will help you get responses without erring out. And the icon looks something like this. The common topics discussed in both meetings include infrastructure and security metrics like production risk index, bugs and remediation metrics, employee satisfaction metrics like MPS. Both meetings involve reviewing key metrics and discussing next steps. That sounds about right. Another thing I can ask, who is speaking in each of these meetings? Let's see if we can get this. I don't really need an answer format for that. I'm just going to give me a list of names. So let's send this. All right, we got the response. In the first meeting, the main speakers are Craig, Jonathan, Sid, Eric, Steve. In the second meeting, the main speakers are Eric, Johnson, Lily, Sid, Matt. Okay, so that's pretty good if you need to do any analysis of your meetings. Another thing that we can ask, for example, is what is the main issue faced by the database team and what is the proposed solution? All right, let's see. The database team is facing issues with replication lag on secondary database hosts due to conflict and query traffic. Okay, so this is like a pretty good kind of detailed answer of what the database team is facing. So let's say you didn't join these meetings but you still wanted to know what's up with the database team because you know they're having some issues and you know it was discussed in this video. You can quickly ask this question and get a response and then you're informed even if you haven't joined this meeting. But like I said, there are many other ways how you can use Assembly AI, either Lemur or transcripts or any other conversational intelligence model that you can use through Assembly AI. You can put it on Postman and try it out before you completely integrate it into your application or into your workflow. I hope this tutorial was helpful for you. I hope you learned something. If you have any questions, please leave a comment below and we will do our best to get back to you. Thanks for watching and I will see you in the next video.