How to Convert Speech to Text Using Amazon Transcribe: A Step-by-Step Guide

Convert Your Audio To Text

4.9/5

3749 customer reviews

Learn to convert speech to text with Amazon Transcribe. Ideal for call centers, classrooms, and more. Follow this guide for real-time and file-based transcriptions.

Convert Speech to Text with Amazon Transcribe Step by Step Tutorial

Added on 09/05/2024

Speakers

Add new speaker

Speaker 1: Hello everyone. In this video, I'm going to show you how to convert speech to text using Amazon Transcribe. This is great for a whole bunch of different use cases. Some initially that came to mind are if you're in a call center and you want to transcribe what your agents are saying, or maybe you're in a school class and you want to convert what your professor is saying into notes so you can review them later. There's a whole bunch of reasons to use speech-to-text converters. So in order to do this, we're going to be using a service called Amazon Transcribe. And if it's not under your recently visited, you can just search for it by typing in transcribe here and clicking on Amazon Transcribe. And I want to show you over on the left-hand side here. There's two ways of doing this. You can either do a real-time transcription. So what we can do here is I can just click this button start streaming here and you should see that it's going to convert what I am saying into words. There you go, and let's just click on stop streaming now, so I don't confuse myself. But this is great if you're kind of sitting in front of your computer and you have your microphone available and you want to just transcribe what you're saying. Works perfectly and then you can download this full script here. There's a whole bunch of additional settings that you can take a look at as well. But what I wanted to show you was a use case where you have a file and that could be an audio only file or it could be a video file. Maybe you download it from, I don't know, YouTube or something. So this will be a way for you to provide files as the input and convert that into an audio transcription. So in order to do that, we want to go to the left-hand navigation menu here and click on transcription jobs. And yes, we want to leave this page. That's totally fine. So this is a job that I ran previously, but to create a new job you want to go to the orange button here in the top right and just click on create job. It's going to ask you what you want to convert. This can be anything you want. Let's just call this demo here. And there's a whole bunch of settings that are available. So you can either choose specific language if you know the language that is spoken in the audio file. If you want to use automatic language identification or if you want to use automatic multiple language identification. It works better if you provide the specific language that's being spoken, but you can experiment with this to try it out. Some additional settings that are available if we take a look here. So you can use a general model or a custom model if you want to train your own for a different language use case. And then if you go into the additional settings, you can add this to a job queue. So you can submit many many jobs all at once up to 100 by default and it'll basically complete this whenever it gets resources to do so. We're not going to do that in this case. What we need to do now is provide it with some input data. And what you can see here is that it's asking for an input file location in S3. So what we need to do is go into S3, create a bucket really quick, upload our file and then select it here. So I'm going to do that really quick. We're going to have a second tab open here. I'm just going to go into S3 and I'm going to create a really quick bucket and upload a video file. So we're going to go to create bucket here and let's just call this PBDV-demo-bucket. Hopefully that name isn't taken. Let's just scroll down and create that bucket. Let's just filter now and that should be the one. We're going to go to upload here and I'm just going to add a file now. All right, so I selected the file there and now you can see it's edited.mp4. Now, this is actually a video file. It's one of my previous videos that I made for the channel on AWS dark mode. But that doesn't really matter because Amazon Transcribe can take either video or audio files. So that's good. We're going to go ahead and click on upload now and because this is a rather large file, this may take a few moments. So I'll just fast-forward this for you really quick. All right, so the file successfully uploaded. Let's just go and click on this file now because we do need the path to this file to provide it as input into Amazon Transcribe. So what we need is this text over here. So the S3 URI. So I'm just going to copy this to my clipboard so I can provide it as input to Amazon Transcribe. I'm just going to go back to the transcribe section. Now we can just paste in that information. So let's just paste that in there and that should be good to go. Now you have an option for output data if you want it to go to a Amazon Transcribe managed S3 bucket or you want it to go to your personal bucket. Using a customer specified one is great if you want to store the file for later on. But a service manager one is kind of for more short-term temporary where you may just upload a file, copy the results out of the console and then move on your way. So I'm just going to leave it as the service managed default option for now. We're going to scroll down. We're going to click on next in the bottom right here. Now there's some additional settings that you can provide. So there's audio identification. Now you can click on this button to get more information, but if you have multiple different speakers in your audio file, this will account for that and show like speaker one says this, speaker two says that. And if you also have different channels for different speakers, say in the case of a call center where you're recording on different channels of your audio file, you may want to use this as well. You can also use alternative results. And why this is an option is because transcribe like comes up with different transcriptions based on different confidence levels. Now, by default, it gives you the most confident one, but you can get all of the different ones. If you want here, you can say, give me five of the different options that you produce, but we're not going to use that here. And then in terms of content censoring or content removal, you can redact personally identifying information, which is what PII means. Automatically from your transcripts. So if you click on this, it gives you a whole bunch of different options. So you can strip out the person's bank account number, credit card details, expiry, pin codes, phone numbers, address, social security number, name, email. All this kind of information you can strip out from the transcript. This is great from a security perspective. And in addition, if you really want to, you can include the redacted version as well. So you'd get two versions in this case. But I don't have any PII information in my use case, but this is great if maybe it's a medical transcript or, you know, if you're providing bank account numbers, if it's like, you know, you're communicating with a bank or something like that. This may be appropriate, but not for our use case. Now, there's also this notion of vocabulary filtering. So if you want to remove mask or tag specified words in a final transcript, you can do that as well. And custom vocabulary can be useful as well. So if you use things like acronyms, Transcribe doesn't do very well with that. You can provide your own acronyms and your own vocabulary, and this will improve the accuracy of recognizing those particular words in your transcription. You'll see, because I'm not going to be doing this in this demo, it's not going to recognize the word IDE, which stands for integrated development environment. It's going to get a little bit confused. We'll see that when we create the job and read the transcript. So let's go ahead now and click on create job. And this is going to queue the job. As you can see here, it's currently in progress. And once this is done, if you want to wait a couple minutes, it definitely does depend on the size of your input file. So if it's a really, really large transcription, it can take longer. If it's really short, I don't want to take a few moments, but I'll let you know how long it takes and fast forward it for you really quick. All right. So after about a minute or so, we can see that this is complete. Now, before we take a look at the results, I just wanted to show you really quick what is the file that we provided to Amazon Transcribe. So I'll just show you a couple seconds of this really quick here. I'll play it for you now. A lot of developers prefer using dark mode in their IDEs. But what about the AWS console? Well, today I have some good news for you. All right. Not going to plague your ears here, make you listen to that whole thing here. Let's move on now and show you what that transcription looks like. All right. So if we go to demo and we look at the results here. This just tells us a little bit about the job. Nothing really special here. And if you scroll down, you can see the transcription. So as you can tell, this is pretty accurate. So a lot of developers prefer using dark mode in there. And you can see here I said IDE in the video, but it's not picking that up well. So a way that you can correct this is by providing a custom vocabulary filter so that you can say like IDE means something. And it'll hopefully pick up that acronym a lot easier. And from here, you can just, you know, like select everything from here if you really want. You can also download this into a JSON file so you can get all those results that way. But this is an easy way to transcribe any video or audio file provided you use the correct formats into text. And it's very quick and easy to do so. So I hope this video was helpful. Thanks so much for watching. I'll see you next time.