Azure Speech Service Tutorial for Beginners
Learn how to set up Azure Speech Services, create a project in Visual Studio, and convert speech to text using Azure, both live and from WAV files.
File
Capture spoken words with Azure Speech in a few steps
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: The quick brand fox jumps over the lazy dog. This video was made with the help of AI. To start this tutorial, you'll need to have an Azure subscription. First thing we'll do is create a new speech service resource in Azure. You'll want to specify an existing resource group or create a new one. Choose a region that is near other resources that you may be using with your application, like Blob Storage, App Services, or Cosmos DB. Choose your pricing tier, either free or standard. You can have one free tier per region. Next, create a new project in Visual Studio 2022. Once you have your project, in this case a new console app, add the NuGet packages for Microsoft Cognitive Services Speech. Once your NuGet packages are loaded, add using statements on your main class. Using Microsoft.CognitiveServices.Speech and CognitiveServices.Speech.Audio. The Audio namespace will be used for creating WAV file inputs. Next, we'll add a new SpeechConfig object. The FromSubscription method takes a key and a region as inputs. You'll get these from your Azure speech service resource. Look under Resource Management, Keys and Endpoint. In a production system, you'll want to store this key in a secure manner. The SpeechConfig object defines the client connection and input and output settings for speech operations. These include text-to-speech and speech-to-text. Next, set the language for your speech input. The speech recognition language setting takes a text value in the locale language Dash Country format. There are currently over 140 variations of language plus country available for Azure speech services. The default is English US. You should pick a language variant that matches your text and character. For example, given Spanish as the language for a South American speaker, you could choose ESCO for a Colombian Spanish speaker. You can see a list of all of the available language variants on the Microsoft site. See the link in the description below. Now we'll create a speech recognizer client using the SpeechConfig that we set up earlier. The speech recognizer client connects to the Azure speech service to handle the conversion. The speech recognizer client will return the text from the Azure service along with the results of the operation. Remember to wrap your client so that it's disposed of properly. The RecognizeOnceAsync method sends the audio stream to the Azure speech service for conversion. The method will accept up to 30 seconds of audio, in this case from the default microphone. The default silence timeout is 5 seconds. Once the operation completes, the outcome response is returned. You can see the result of the conversion using the result object that's returned from the RecognizeOnceAsync method. This includes the converted text or any errors in the conversion process. Because a text conversion is not guaranteed, the code should handle the result. We'll check result reason in a function. The OutputSpeechResult function will take the speech recognition result and check the result reason. Use RecognizeSpeech for success, NoMatch for silence or nonverbal sounds, or Cancelled for interruptions. The function sends the result to the console. For the next conversion, we'll load a WAV file and output the resulting text. The audio should be spoken in the language that you've chosen in your config. Non-Latin languages are supported. We'll use the AudioConfig class to target a WAV file input from the local machine. Remember to wrap your service so it's disposed of properly. Now we'll create another SpeechRecognizerClient. The new SpeechRecognizerClient is using the existing SpeechConfig and the new AudioConfig. The AudioConfig class defines both input and output channels for the speech services provided by the SDK. Call the RecognizeOnceAsync method to send the audio for conversion as before. Once the operation is complete, the outcome response is returned. The result is displayed using the OutputSpeechResult function as before. And that's it. Speech detects conversion using the Azure Speech service, both live and pre-recorded. Thanks for watching. Subscribe for more tutorials on Azure technologies.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript