Guide to Using Google Cloud Speech-to-Text API

Convert Your Audio To Text

4.9/5

3720 customer reviews

Learn how to convert speech to text using Google Cloud Speech API with a step-by-step tutorial on setup, installation, and code execution.

Python Speech to Text with Google Cloud Speech

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hi friends, I'm Paroiz, and in this video I'm going to show you how you can use Google Cloud Speech and how you can convert the speech to text using Google Cloud Speech. Now if you see the documentation of Google Cloud, you can see speech-to-text client libraries. And if I come down, you can see that you can install this library for different programming language, for example, C Sharp, Go, Java, Node.js, PHP. So I'm interested in Python, so I can just use this. And you can see that we can just use pip and install Google Cloud Speech. And so first we need to install this. I already have created a project in PyCharm ID. Now the first thing is that we need to install this library. So you can just open your PyCharm terminal, and in here I can just say pip install. And after that, we can just use this Google Cloud Speech. So let me paste that in here. Now let's just wait for the installation. And now it's installing the packages and also the requirement dependencies. And you can see that after the installation of this library, we need to set up the authentication. And first you need to create a service account name, and after that you need to just create a service account ID. So to do this, first of all, you need to have an account in Google Cloud Console. So I already have opened my Google Cloud Console, cloud.google.com. Now I already have created a project in here. So you can see in the Google Cloud Platform, and I have created a new project, test project. You can create a project from here. You can just click on the new project, and you can create a project. Also make sure that you have added your billing information in here. Now for the first time, there will be no charges for you. It will give you three months of free trail. So if I come down in here, now it's not showing me in here, but right now you can see that I don't have any charges in here because after adding your credit card information, you will receive a trail for three months, and you can use that without any payment. So now make sure that you have added the billing information in your Google Cloud Platform. So now the first thing is that we need to create a credential, so we can just click on this. And also you can add the billing from here. So we can just go to the API and services, and we can just click on the credentials. And in here, we can create our credentials. So now in this section, we can just click on the create credential, and we want to create a service account. Let's just click on service account, and let's just wait. Okay, so we need to give a name for the service account. For example, I can just give it my service, and after that, click on create and continue. And also we need to select a role. So I'm going to just make it owner, and after that, we need to click on continue. So now let's just wait, and after that, we can just click on done. So now the service account credential is created. After that, we need to create our JSON key, so we can just click on this. And after that, from here, we can just click on keys, and let's create our key. So we can just say create a new key, and from here, we can choose JSON. Let's click on create. And now it's created, and you can see that it's downloaded in here. And let's just copy this. Let me copy this, and let me paste this. Now it's also installed in here, in my working directory. So let's just wait. Now it's indexing. Now let's paste. Let me just change the name to key.json. Okay, so now after that, and after creating your credentials, make sure that you have enabled the Cloud Speech-to-Text API. Now, you can just search that in here. You can just click on this, and you can just search for Cloud Speech-to-Text, and after that, this page will be open, and in here, you need to enable that. Now, when you're enabling that, this Google Cloud Speech-to-Text API, it will require the billing account information, and if you have added the billing information, then you can just choose your billing account, and it will be enabled. So you can see I have already enabled this, and the API is enabled for my this test project. So now let's write our coding. I'm going to just open my PyCharm ID. Now, let's create our coding. Now, the first thing is that I have opened my PyCharm ID, and in here, I can create a new Python file. Let me just call that text-to-speech, for example, and after that, make sure that you already have added an mp3 file in your working directory. So I already have added. You can see this is myfile.mp3. Now, if you see, let me just run this.

Speaker 2: In this lesson, we want to talk about Google Cloud Speech, and also, we are going to learn that how you can convert your speech-to-text using Google Cloud Speech.

Speaker 1: Okay, now we want to convert that audio to text using Google Cloud Engine. Now, we already have installed it, and now let's just import that. So I can just say from google.cloud, we are going to import speech, and after that, we need to just instantiate the client. So a client, so I can just say client speech.speechclient.fromserviceaccount file. Now, this is the file that we already have created, and also, we have added that in here. So that's key.json, and in here, I can just say key.json. So after doing that, we need to just create our file name. So that is our myfile.mp3. So I can just say myfile.mp3, and after doing that, now, let's just read that file, our myfile.mp3. So I can just say with open, we need to give the file name in here. So file name like this, and also, we need to give the mode as rb, and I can just say as f. And now, we can just say, for example, mp3 data, and let's just say f.read. So after doing that, now, we need to create our recognition audio. So I can just say audio file, and in here, we need to just add our this mp3 data as a content. So I can just say speech.recognitionaudio, and I can just add the content, and the content is mp3 data. Now, after doing that, now, we need to configure the media file output, and I can just say config and speech.recognitionaudio like this. Now, we can add the sample rate in here. So I can just say sample rate hertz, and I can just use it for 400. And also, we need to give, for example, if you want to enable the punctuation, you can also just do that. So we can just enable automatic punctuation, and we can just make this to true, and okay. So now, after that, also, we need to just give the language code. So I can just say language code, and that is ENUS. So we need to give the sample rate. You can also give a lot of, for example, options in here, but we are interested in these, so enable automatic punctuation and also language code. So now, after doing that, now, let's just recognize our audio file or detect speech in the audio file. So I can just say response, and I can say client.recognize, and in here, first, we need to add the configuration. So config, config, and after that, we need to just give the audio. So audio to audio file, okay? So now, in here, if I print response, so I can just say print response. Now, let's just run this. Sample rate hertz. It's doing that unknown field for recognition audio rate. Oh, sorry, it's not recognition audio, but it should be recognition config. So like this, okay? So now, this is recognition audio, but this should be the configuration. So let's just run this. Enable automatic, so we have a typo. So enable automatic punctuation. Let's run this. And unknown field for enable automatic punctuation. Enable automatic underscore pon, oh, sorry, in here, we need to just give a CT. Okay, like this. Now, let's run it again. Let's just wait for this. So now, we can see that we have received the response, and you can see that this is transcript. In this lesson, we want to talk about Google Cloud Speech, and also, we are going to learn that how you can convert. Now, this is in here, it's a UR, but we are assuming UR. Now, make sure that you have a good pronunciation in here, but there is no problem. Speech to text using Google Cloud. Now, this is the response, and also, we have the language code, and also, like this. Now, let's just use this, so we can just say for. Now, we can access to the response.results, so I can just say for result in response.results, and in here, I can print, for example. I can just say transcript, and I can just say .format, and in here, we can use result.alternative, so I can just say alternatives, zero index, and after that, I can use .transcript, so .transcript, so because we want to get the actual transcript information or the actual, for example, text, so we can just use it like this, and now, let's just run this again, and let's just wait, and now, you can see that we have received the actual transcript in this lesson. We want to talk about Google Cloud Speech, and also, we are going to learn how you can convert your speech to text using Google Cloud. Now, it's more accurate, but just in here, we have a problem. Maybe that's the problem from my pronunciation, but there's no problem, and you can see how it's accurate and how we have used Google Cloud Speech, and we have converted our speech to text.