Getting Started with Google Text-to-Speech in Python

Convert Your Audio To Text

4.9/5

3723 customer reviews

Learn how to use Google Text-to-Speech API in Python, including setup, voice options, and pricing tiers. Create engaging, accessible content today.

Convert Text To Real Human Speech With Google Cloud Text-To-Speech API In Python

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: In this tutorial, we are going to learn how to get started with Google Text-to-Speech API in Python. Google Text-to-Speech is a cloud service by Google that turns text into natural sounding speech. Using advanced machine learning, it offers a variety of voices and languages, making digital content more engaging and accessible to a wide audience. Before we dive into the tutorial, let's look at a couple of demos on the service itself. If you go to Google's Cloud Text-to-Speech homepage, and if you scroll down to demo, here you can play around with the Text-to-Speech API service. So some of the options that we can choose, such as language, we can choose from English to a variety of languages like Turkish, Chinese, Japanese, even Korean and a couple of other not well-known languages. And today, the Text-to-Speech service has a lot more voices available. Now in the past, we only had Basic, Standard, and WaveNet. And today, we have Neural, Studio, Journey, and this one is new, still in the experimenting stage, Polygraphs, and I think, oh, there's one more, News. Now the differences between these voices are basically the quality of the voice. Now Studio and Journey and Neural, those are the premium voices, which is going to cost a little bit more. Now since Journey is still in the experimenting stage, I'm going to choose Studio, which is the most premium voice. And for Studio, we have male and female voice available today. So the first one is going to be, I believe this one is the male voice, and the Q is the female voice. So I'm going to choose the first one, and for the profile, based on your use case, you can choose different profiles for the Text-to-Speech output. Now I'm going to choose a small home speaker, and here you can set the speech speed and the pitch level.

Speaker 2: And I'm going to click on Speak, and let me, okay, alright, so let's wait.

Speaker 1: Now I don't know if you can hear it, for my speaker, it sounds very human-like, it's very natural. Now if you cannot hear the voice, then I'll link the link, and you can play around with the demo by yourself.

Speaker 2: Alright, so let's look at the pricing. Now currently we have five different pricing tiers.

Speaker 1: Neural, Polygon, Studio, Standard, and WebNet voices. Now these two voices, so those are the standard feature of the standard voices. And every month, you get 4 million characters for using the standard voice, and 1 million characters using the WebNet voice. And for the premium voices that use machine learning to train the voice, those voices will be a little more expensive, especially the Studio voice, which costs 0.00016 per byte. And it's going to be approximately 160 dollars per million bytes, comparing to Polygon, which is 16 dollars per million bytes. And Neural voice is also 16 dollars per million bytes. And with the free tier, you can use up to 1,000 bytes per month for free for the Studio voice, and 1 million bytes for both Neural and Polygon voice. Now that's going to be something I'm going to cover in terms of Google Text-to-Speech. Now let's dive into the tutorial. Alright, so first, I want to create an account. If you don't have an account, you can navigate to console.cloud.google.com, and that will take you to the GCP console. Now if you already have a project and a service account created, then you can enable the Text-to-Speech API and skip to the script development section. If you are a new GCP user, then you can follow along, which I will go through every single step from creating a project, creating a service account, enable API, all the way to Python script development. Now before we can use any Google Cloud service, we need to create a cloud project first. On the top, you're going to see a drop-down. Now click on the drop-down and click on New Project.

Speaker 2: Now give your project a name, so I'm going to name my project Tutorial GCP Text-to-Speech and click Create. Once your project is created, we want to make sure that we set the target project. Next, we're going to enable the Text-to-Speech API by clicking on the navigation menu, then go to APIs and Services, then we want to click on Library. Now here, search for Text-to-Speech. Now here, we want to click on Cloud Text-to-Speech API. Enable the API.

Speaker 1: For the next step, we need to create a service account by going back to the navigation menu, APIs and Services, and want to click on Credentials. A service account is a special type of account used by your Google Cloud project or by different applications to interact with different GCP services on your behalf. And to create a service account, we want to click on Create Credentials, then choose Service Account. Now here, give the account name, and I'll name this as SA underscore Text-to-Speech Demo. Then click on Create and Continue. Now here, we need to grant the appropriate permission to the service account. For demonstration purpose, I'm going to choose Owner, but I want to limit the access the service account can perform, and I believe that you can use the Text-to-Speech permission set right here. You can choose Cloud Speech-to-Text service agent, and this will allow this account to perform only text-to-speech related activities or operations. And because I'm going to create a follow-up video to show you how to do a couple of other things, so I'm going to choose Owner just to make things a little bit simpler. Click on Continue. Now this step is optional. You can grant other users to access this account as well, but I'm going to leave that blank. Then click on Done to create the account. Now to allow our Python script to connect to the text-to-speech service, we need to provide the credentials. And to do that, we're going to download the account file or the token file from this service account that we just created. So I'm going to click into the account.

Speaker 2: On the top, you want to click on Keys. Click on Add Key, Create New Key, and for the key type, you want to choose JSON.

Speaker 1: Now it's going to prompt you to save the file, and I'll navigate to my project folder. Now here I can name the file to something more descriptive, and I'll name this as

Speaker 2: DemoServiceAccount.

Speaker 1: Alright, so that's everything we need to do in terms of creating a project, creating a service account, enable the text-to-speech API, and download the service account file. Now we can dive into the Python script development. Now the first thing we need to do is we need to download the Python dependencies. And to download Google Text-to-Speech SDK, we want to type pip install google-cloud-text-to-speech

Speaker 2: and enter. And once the Google Text-to-Speech package is installed, we're going to create a file. I'll name the file Demo.py, and we open the file. From the import statement, from google.cloud, we're going to import the text-to-speech module. And we increase the font.

Speaker 1: Alright, so the first thing we need to do here is we need to construct a text-to-speech client instance. So from text-to-speech.text-to-speech-client, I'm going to name my client Client.

Speaker 2: And I want to go back to the documentation again.

Speaker 1: Alright, so here's a list of all the available voices that you can use. And the reason why I want to show you this page, which you can also find the link in the description below, is that depending on the voice that you want to use, you need to reference the voice ID, which is the voice name, as well as the language code. So a voice can speak multiple languages. It can be English, Chinese, Korean, Japanese, Russian, Turkish, and so on.

Speaker 2: Alright, so I don't know if you can hear it. Yeah, I guess not.

Speaker 1: Now to authenticate my account or to connect to the service, we actually need to provide the credential file, which is the service account file here. So going back to the script, I'm going to import the voice module. And the way how GCP or the text-to-speech client uses to authenticate the service is by using this Google application credentials environment variable. Then we're going to assign the file path where the service account JSON file is located. You can also assign the file path in your environment variable instead of creating the environment variable in the Python script. Alright, so next I'm going to create my text block. So this is going to be the text I want to convert into a speech file. And to be able to load the text block from text-to-speech, we want to reference the synthesis input class. Then we're going to provide the text block to the text parameter. Now name the outputs synthesis input. Next we're going to specify the voice property. So from text-to-speech.voice. I want to set my voice to English. And for the voice name, I'm going to use English US Studio O, which is this one here.

Speaker 2: Let me copy and paste. And this is going to be a female voice.

Speaker 1: And to control the voice profile or the audio profile from text-to-speech.audio config, we can use this class to specify the settings. So here I'm setting the audio output as mp3 format. And for the audio style, in this case will be profile. I'm going to set that to small Bluetooth speaker class device. And for speaking rate, which is the speed of the speech, I'm going to set that to 1, which is the normal speed. And also set the pitch to 1, which is also the default. And once we have the text block input, the voice, and the audio configuration settings, we can now make the API code to convert our text into speech. And to do that, we're going to reference the client object that synthesize underscore speech. And here we're going to provide the text block to the input parameter, voice to the voice parameter, audio config to the audio config parameter. And I'll save the response as response. Now the response object is going to return as a speech content object. I forgot the actual data type name. But from the response object, we can retrieve the audio bytes by referencing the audio content attribute. And to save the audio as an actual file, here we can use the waitOpen function, providing the file path or file name.

Speaker 2: I want to write as binary as output.write, then we're going to provide the audio binary. And that's it. Alright, so let's do a test run. I'm going to run the script. And here's the audio file. Let me open the file, I don't know if you can hear it again.

Speaker 1: Right, so we have a 14 seconds of audio file. Let me pause this, because I don't know if you can hear it. But yeah, so this is going to be everything for the Google text to speech API in Python tutorial. And feel free to post your question or feedback in the comment section below. If you enjoyed this video, please don't forget to give this video a like and click on the subscribe button. And I'll see you guys in the next video. Bye bye.