Transcribe Speech with Azure API in Python Tutorial
Learn to use Azure's Speech API in Python for speech-to-text via a specific microphone. Set up environment variables, handle inputs, and process text effectively.
File
Real-Time Audio Transcription Using Azure Speech to Text API in Python
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hey, how's some guys? Alright, so in this video, I'm going to show you how to transcribe speech to text from a specific microphone using address speech API in Python. So this is actually a video that I get a lot of requests from patrons and members and from a few subscribers. And to be honest, trying to figure out how to slide a microphone and stream the text to speech using address API, it actually took me a day to figure out. Alright, so let's do this first. I'm going to add a speech service to my resource group. If I'm a new user, simply create a new address account and it's free. They want to create a subscription. So in the search field, simply navigate to subscription and provide your credit confirmation. Then you want to create a resource group. So simply search for resource groups. And I did create a resource group. And for this demo, I'm going to use my development resource group. Alright, so here, click on Create and want to add speech service to the resource group.

Speaker 2: It should be the first item. So click on Create and Speech. Now here, simply give the instance a name, I'll name the instance test one. And for the pricing tier, choose the free tier. Then click on Review and Create. And I think my instance name is duplicated.

Speaker 1: Let me go back.

Speaker 2: And I'll change the instance name to test123. What's going on here? Oh. Right, so let's do testx123 and see what happens. Okay, so it looks like the name is valid. Then I can click on Create to create the service. And once the deployment is complete, click on Go to Resource. Now, here, let's go ahead and create a blank Python script.

Speaker 1: Now, to use a specific microphone, on Windows, you want to go to Device Manager. And I'm not sure if on Mac OS, you might need to figure out how to get the device ID. But on Windows, I can simply go to Device Manager. Then I need to navigate to Audio Inputs and Outputs. Now, make sure that you choose a microphone's input. And I'm using a third-party microphone, which is this one right here. And here, I'm going to right-click on the microphone device. Then I'm going to click on Properties. Then go to Details tab. From the drop-down here, you want to choose Device Instant Path. And here, I'm going to copy the value here. And I'll copy-paste to my script. I'm going to remove the first two components, and that leaves me the device ID. And when we use Azure's Speech-to-Text API, and if we want to use a specific microphone, we need to provide the device ID.

Speaker 2: Now, I want to create environment variables.

Speaker 1: Or you can directly create the environment variables on your PC. Now, in my environment variables file, I created two variables, API key and region. The API key is going to come in from the Speech API key.

Speaker 2: And the region is going to be location, where the instance is located.

Speaker 1: Now, for the prep command to install the required Python libraries, there are two libraries that we need to install. The first one is going to be python.env.

Speaker 2: And the other library is going to be azure-cognitive-services-speech. Hit enter. So, I will type on the services. Alright, so once you install the libraries, we can dive into the Python script.

Speaker 1: Let's go ahead and import the OS module. And from .env, we're going to import the load.env function. And from azure.cognitive-services.speech, I'm going to name this as speech SDK. Then I'm going to create a function called speakToMicrophone. And the function takes two parameters, API key and region. Inside the function, I'm going to create a speech config object. And I'll provide the API key and region. And this is going to be our connection to the API. Then I'm going to set the speech recognition language to English. Next, I'm going to configure my device using speech-sdk.audio.audio-config. Now, for the device name, this is going to be the device ID,

Speaker 2: which is going to be the ID that I copied right here. Then we need to create a speech-recognizer object.

Speaker 1: It will provide the speech configuration setting from the speech-config object and the audio setting from the audio-config object. Now, because the default timeout duration is actually relatively short, I think it was like five seconds. And if we want to increase the timeout duration, we can reference speech-recognizer.properties.setProperty. And from speech-sdk.propertyID.speech-service-connection, for the initial silence timeout, I'm going to set that to 60 seconds. And for the end silence timeout, I'm going to set that to 20 seconds. And here I'm going to insert a print statement. Speak into your microphone, say stop session to end. So the keyword stop session is going to stop the program. Now, to stream the audio to speech or to text, we're going to insert a while loop to keep the process running. Now, using speech-recognizer.recognizeOnceAsync.get, so this function is going to wait any audio signals. Then we're going to insert three if statements to check if there's any speech that is detected. Then we're going to print recognize followed by the text from the audio. Now, if the speech contains the text stop session, then we're going to terminate the program. Otherwise, we're going to check the other two conditions. If the audio or speech is not a valid speech, then we're going to use the no match condition. Then print the message, no speech could be recognized followed by the no match details text. Now, if the reason is canceled, this can be if you press the shortcut to stop the program, or if you integrate the API with a web application or front end. You can implement your own specific operation to run the cancel operation or method to trigger this condition. Inside this if block, I'm going to print the cancellation details followed by the error details. And to use the function, so this is going to be the entry point. First, we need to load the API key and the region. Then we're going to run the speakToMicrophone function and we'll provide the API key and region, and that's it. And for testing, I'm going to run the script.

Speaker 2: Oh, I forgot to load the environment variables.

Speaker 1: Alright, let me try again. So for the beginning, we're going to see the message speak into your microphone. Say stop session to end. Now as you can see that as I speak, more text is going to get displayed in my terminal. Now I'm going to stop the program by saying stop session. And I'll terminate the session. Alright, so this is going to be everything I'm going to cover in this tutorial. And hope you guys find this video useful. If you find this video useful, please don't forget to like the video and subscribe to our channel. And I'll see you guys in the next video.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript