Azure AI: Convert Text to Speech with SDK and Python

Convert Your Audio To Text

4.9/5

3720 customer reviews

Learn how to use Azure AI services and Python SDK to convert text into speech, with a step-by-step demo and script walkthrough for easy implementation.

Text to Speech Conversion with Azure AI Service Azure Python SDK

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Now, we will see how to use Azure AI services to convert text into speech. So for that case, you need to have an instance of Azure AI services created in your subscription something like this. So I have created an Azure AI services and if I go to the Oreo page, that's where I'm going to get the keys of that Azure AI services which I'm going to use it in my script. So I have created a script for this particular demo which I'm going to walk you through that and do a demo of that. So here, let's copy the key that is needed for this demo. Now you see that Azure AI service is a basically a pack of Azure provided cognitive services which does the cognitive job like speech to text, text to speech and computer vision blah blah. So now we are looking for text to speech. You can do this with using the APIs but I'm interested to show you with using the Azure SDKs for Python. So I have created a Python script which I'm going to take you to here. So this is my text to speech conversion Python script which I have created and it is ready for the demo. Now before you execute the script, you know you need to have your system installed with the Python module called Azure Cognitive Services-Speech. So you can do something like this that is pip install Azure Cognitive Services-Speech. Now so that is so once that has been installed, basically you are ready to use the script. Now I'm going to walk you through that script and then we're going to see the demo. So here we are importing that particular module which we just installed that is Azure Cognitive Services-Speech module as you know the speech SDK and then so this is where the speech you know key you need and the service region. So that's the reason I was talking about copy this particular key. So I'm going to copy that particular key and put it across here in the value of key something like this. So this is my you know the speech key in your case it could be different so you need to change it accordingly. Now with using that you know module that is speech SDK we do config you know we do config of the client here basically and then the speech key region has been passed and since it is a voice as an output so we are interested on a particular voice. So this one speech synthesis voice name so basically this is nothing but you know it's a huge list of voice names that is available in the list from Microsoft Azure you can use it and try to change the value. For now we are interested on EN-US aerial neural voice and then with using that module what we do is you know we're going to create an audio object from the audio object we're going to configure it with using the default speaker which is equal to true. What is the default speaker is basically my laptop has a one speaker so it's going to use it to output the voice. Now then with the audio config being set and then we are using the speech SDK and calling the speech synthesizer passing that configuration which we have just now seen. Now this you got an object called speech synthesizer and that object further it will take one more parameter or basic one more function called speech text asynchronous. So in this one you're going to feed in the text that you want to convert into a speech and then call the object that is get which will give you the voice output. Now if everything is good then you know that is where it will tell you like hey speech is synthesized or else we're going to enter into the error loops which we're going to see now. Now to run the script you know what I do is I'm going to go to this particular path and try to run the script. So my script is here is you know Python that is text to speech you know text to speech one. So here as you see here you know right now I have one sentence of text here that is good morning this is an AI quick lab. So let me call it as an AI quick labs demo right. So what I do is you know I'm going to say this particular text and we're going to call the script and let's see you know if this works.

Speaker 2: Good morning this is an AI quick labs demo.

Speaker 1: So as you as you just now heard you know the this text is now converted into the speech and we were able to hear that voice alright. So this is how you know you can convert the text into speech like that. So you can take the script and try to you know do experimentation do you know the applications out of it. Alright so thank you very much for watching my videos.