Transform Text to Speech with Azure Cognitive Services
Learn to turn text into speech using Azure's Text-to-Speech service. Create an app with Visual Studio and harness lifelike speech synthesis capabilities.
File
How to get started with neural text to speech in Azure Azure Tips and Tricks
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Learn how to turn text into speech in this episode of Azure Tips and Tricks. You can use the Azure Neural Text-to-Speech service to turn text into human speech in many languages and voices. Let's create a small app to use the service. To follow along, you need the latest version of Visual Studio, and you can also do the same in Visual Studio Code. I'll start by creating a Text-to-Speech service in Azure. Here, let's search for speech. There it is. Create. This creates a Cognitive Service speech service, which includes API endpoints like the Text-to-Speech service. Let's start by giving it a name. Next, I'll pick a location, also a pricing tier, and now select a resource group. That's it. Create it, and I'll skip to when it is done. Here it is. This is the Cognitive Services speech service. Let's take a look at the keys and endpoints blade. We need this information for our application. We need the access key and also the location, so West Europe in my case. Let's use the service in an app. This is Visual Studio, and I've already created a simple console application. The first thing that I changed was to add a NuGet package. Let's take a look. I added this one for the Cognitive Services speech service. Let's go back to the program.cs. In here, I've added a using for the Cognitive Services speech namespace, and here I create a new config with the access key and location that we saw in the portal. Next, I use this config to create a speech synthesizer. Finally, I invoke speak text async with a text that I want to be turned into speech. Let's try this out, and there it goes. Synthesizing directly to speaker output. See, the audio comes directly from the default audio device. Cool. These are the default settings, which means that the service detects the language of the text and uses the default voice to synthesize it. It uses US English for this. You can change these defaults, for instance, by inserting this. This configures the speech service to use British English instead. Let's see. Synthesizing directly to speaker output. Yep, that works. You can use multiple voices for a language, so I can change this to use another voice. Let's try this one. Synthesizing directly to speaker output. See, that sounds very different. Very cool. Also, by default, the audio is returned through the default audio output of your device. You can change this to return the audio into a memory stream or into a file like this. This outputs the audio into a file called audio.wav, and you can tweak the parameters of the audio that goes into that file as well. The Azure Neural Text-to-Speech service enables you to convert text to lifelike speech which is close to human parity. Go and check it out.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript