Exploring Azure Cognitive Services: Speech to Text Implementation Guide

Convert Your Audio To Text

4.9/5

3720 customer reviews

Learn how to implement Azure's Speech to Text service, from setup to code integration, enabling real-time and pre-recorded audio translation.

Getting Started with Azure Speech Services - Convert Speech to Text

Added on 09/08/2024

Speakers

Add new speaker

Speaker 1: Hello everyone. Today I'm talking about Speech Services, which is again an offering from Azure Cognitive Services. It works across multiple speakers and it allows us to communicate in different languages. So Speech to Services has several offerings and you can see here on my screen it includes speech to text, text to speech, voice identification and real-time translation capabilities. So all these offerings allow us to add natural interaction to our apps and let our users communicate in the way they find it convenient. So as a developer, one can go with the SDK or they can also go for the REST APIs. So these APIs enable text to speech translation and just a few lines of code making it more economical to add these capabilities in the existing as well as the brand new applications. In my today's video, I will speak more about speech to text and will cover rest of the offerings in my subsequent videos. Okay, so starting with the speech to text. So speech to text, if you are planning to go with the traditional way, then it requires hours of time and it may require specialized equipments for a human to translate speech into text, which could be sometimes very expensive. Hence, Azure offering, which is quite economical and works with real-time streamed audio data and the pre-recorded audio files also. So it's the same underlying technology as that is used in Cortana. So it's very well proved that how it is working in wide range of conditions with many accents and in multiple languages. So it's pretty well tested service and we can safely use it. So the service is primarily intended to be used with the stream data and using SDA, it can be done very easily as it gives a direct access to our audio streams, including our device microphones and the already pre-recorded recording files. So another advantage, which I can tell you is if you have already rebuilt some of the custom language understanding models using LUIS, then also this service can be easily integrated with that to extract speaker intents and much more information related to that. So let's take a slight deep dive into it and understand how we can make a call to speech to text. So I will switch on to my Azure portal. So I'm on my Azure portal and here you can search for speech services. So either you can search for speech services or you can either directly go with the cognitive services. It's one and the same thing. We can use any of the instance created using these. So click on create. Here you can select the subscription then you can either go with the existing resource group or you can definitely go ahead and create a new one. This one would be the one which is very closest to you. So I'm going with West US. Instance name, we can provide any unique name. And pricing tier you can go for F0 or S0 depending on what one suits you. You can get the complete information by clicking on this hyperlink. So if you are just experimenting this service or just trying out I would recommend you to go with free one. Once it is done click on review plus create. Click on create. So it will take just a few seconds to get this instance deployed. Perfect. So it's done. Click on go to resource and on the left hand side click on keys in the end point. So if you will click you can see that if these things are generated it means we are good to go. The next thing what we need to do is we need to create an application so that we can give a try how things work. So I have already created one empty console application which is .NET Core 3.1 based. And here we are going to write our few lines of code. So the very first thing what we need is we need to add all the required dependencies. So for that I will go ahead and use the NuGet package manager. Go to browse and here we can search for Microsoft.CognitiveServices.Speech So this is the first one I am going to install it. Okay so it is installed. Next thing is I will create a new function in that we will be making a call to the SDK. Let's go ahead and create a static Recognize Speech Recognize Speech Recognize Speech Okay we need to add few more references System.Threading Okay now the very first thing which we need to do is we need to create an object which will consume the region and the key. So for that I am going to create a configuration object SpeechConfig I need to add reference for this one SpeechConfig. from subscription So here it will ask for two parameters subscription key and the region. So for that I will quickly jump on to my Azure portal and grab this key. We will paste it over here and then next is the region so that you can grab it from here which is West US for me. Now we need to use this particular configuration where recognizer is equals to new speech recognizer and it will take configuration as the parameter Okay so now just hardly two more lines we need to write here. So first of all I will write some text for the user So we will get to know that system is ready to listen Recognize once async. So make a note that here we are using recognize async and the reason why I am using this is because I am going to read just one utterance of whatever I am speaking. So when I am saying one utterance it means either the first 15 seconds of the audio or the silence I mean the silence occurred after the speech So whatever happens first will be considered over here. So that's the reason I am taking here recognize once async Okay so we are done if result. equal to recognize speech then we can go ahead and print this Yeah I think we are done. So this is the bare minimum code which we need because when you are writing in production you may have to deal with many error handlings and try catch blocks and everything but let's go with this one because my intention is to just provide you an idea how you can make a call to speech API. So let's quickly do it. Oh we need to make a call to this method So here I will await and here I am going to call this I will convert it to task and we are good to go I am sure it's going to work See so you can see that I said I am sure it's going to work so it has recorded as I said like there are no spelling mistakes nothing. Let's try out one more thing This is going to be the best application ever. And you can see how perfectly it has converted a speech translated a speech to a text. So this is how easy it is to deal with these APIs. So I hope you enjoyed this video and in my next video I will tell you how we can do more than a one line. How we can convert a pre-recorded audio file to a text. So till then keep watching Thank you