Boost Transcription Accuracy with Custom Language Models in Amazon Transcribe
Learn how to enhance transcription accuracy using custom language models in Amazon Transcribe. Discover the steps to create and apply CLMs for domain-specific needs.
File
Using Custom Language Models (CLM) to supercharge transcription accuracy Amazon Web Services
Added on 09/28/2024
Speakers
add Add new speaker

Speaker 1: Greetings, I'm Vishesh Jha, and I'm a Senior Solutions Architect at AWS. Today, I'm here to tell you how to supercharge transcription accuracy using custom language models in Amazon Transcribe. Amazon Transcribe is a fully managed automatic speech recognition service that makes it easy to add speech to text capabilities to voice enabled applications. As our service grows, so does the diversity of our customer base, which now spans domains such as insurance, finance, law, real estate, media, hospitality, and more. Naturally, customers in different market segments have asked Amazon Transcribe for more customization options to further enhance transcription performance. Amazon Transcribe can help you boost the transcription accuracy of your content with custom vocabulary, or supercharge it with custom language models. Custom vocabulary, which has been reviewed in another video snack, allows you to quickly teach Transcribe to recognize proper nouns or domain specific words and phrases that it isn't recognizing. Custom Language Models, or CLM for short, allows you to leverage domain specific data that you already have to train custom speech models tailored for your specific transcription use case. A CLM is built using text data. There's no requirement for acoustic data to build CLMs. Most businesses actually have existing text data that can be used to train CLMs. Some examples of this training data include instruction manuals, website content, brochures, textbooks, marketing materials, or even existing reference transcripts of audio files. What matters is that the text represents the language and vocabulary that's spoken in your business domain. Let's now look at how we can create a custom language model and analyze its transcription results. For the purposes of this demo, we'll be transcribing biology lectures. As we know, biology contains a lot of terms which have very specific pronunciations. We'll be gathering training data from Wikipedia articles on some key terms in the biology domain. Using a blog authored by a friend of mine, I use the sample code to fetch the Wikipedia articles and save it in the desired format for training the custom language model. The link to the blog post and code sample can be found in the video description area. Let's jump into the console now. We ran the script as per the instructions in the blog. This script pulls various biology related articles from Wikipedia and stores them as text files as required to create CLMs. I've uploaded all those text files to S3 bucket in my AWS account, as you see here. Now, let's go to the transcribe console for creating our CLM. Let's navigate to custom language model from the hamburger menu and then hit train model. I'll give this model a name and leave the language selected to US English. For the base model, I'll select white band. The training data location is a location where we have uploaded the text files. So I'll just paste in the location prefix. We are not using any tuning data, so we'll keep this field blank. Here, I'll select create an IM role and limit the access permission to training and tuning S3 bucket. And I'll also give this role a name. After that, let's hit train model. The model creation process can take a few hours to complete. I do have another model that I created earlier using the same data and following the same steps. So we can use that for our testing. So for our testing, let's go to real-time transcription. We will test by speaking some lines with biology related jargon or terms. First, we'll use standard transcribe and then compare the transcription results with the CLM we created. So let's start streaming. Another unique feature in some cells is flagella. Some bacteria have flagella. A flagellum is like a little tail that can help a cell move or propel itself. As you see here, standard transcribe wasn't able to recognize flagella, which is a technical biology term. So now let's test this out with CLM. For that, let's expand customizations and enable custom language model. For the model selections, let's select the model that we created from the drop-down menu. Now let's start streaming. Another unique feature in some cells is flagella. Some bacteria have flagella. A flagellum is like a little tail that can help a cell move or propel itself. As you see here, with CLM, transcribe was able to recognize flagella. This is how we can use CLM in real-time transcriptions using the Amazon Transcribe console. The data you provide to train your custom language model can be of two types, namely training data and tuning data. It's important to distinguish between these two. Broadly, training data is domain-specific. In the case of CLM, training data is domain-specific. In the examples we previously mentioned, that would include text data from sources like sales and marketing collateral, website content, or even textbooks. Tuning data is use case-specific. In the examples we previously mentioned, that would include text from human-annotated audio transcripts of actual phone calls or media content. Training data is required in order to generate any CLM, while tuning data is optional but recommended. Earlier in this presentation, we touched upon custom vocabulary. So when should you choose custom vocabulary versus CLM? Custom vocabularies are great, but there's an optimal number of terms that can be added before impacting performance. CLMs require more training data than custom vocabularies, which can generate significant accuracy improvements over custom vocabularies. Along with that, CLMs can recognize individual terms with context and automatically add words to their recognition vocabularies. Custom vocabularies can't do that. Finally, there's no additional charge for custom vocabulary. For CLM, there's no additional charge for building the model, but you do incur an additional charge for the transcription jobs or streaming sessions where a custom language model is applied. To achieve the highest transcription accuracy, use custom vocabularies in conjunction with your custom language model. This wraps up our demo for today. For more information, please refer to some of the links mentioned in the description below. Thank you.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript