Explore Google Cloud Speech API Commandments and Uses
Dive into using the Google Cloud Speech API for audio-to-text conversion, its functionalities, and practical tips with Mr. White in this engaging podcast.
File
Conversión de voz a texto con la API de Google Cloud Speech Google Cloud Platform
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Welcome and welcome to Mr. White's podcast. Mr. White vs. ALE Podcast Very well, in this little demo we are going to see how to use the voice conversion to text with the Google Cloud Speech API. This as part of the Mr. White vs. AI podcast playlist. So let's get started. How does Cloud Speech work? Well, basically it also uses automatic learning, which we know as machine learning, for voice transcription, being able to convert audio into text by applying powerful neural network models in an API that is very easy to use. The API recognizes more than 120 languages and variants to fit a huge international user base. Among other things, it also allows you to enable voice command control or transcribe conversations from text-to-text call centers. In the previous demo, I taught you how to use natural language processing to do analysis of entities, analysis of feelings, so we could combine these features of natural language processing with Cloud Speech to be able to process text calls and then analyze those texts with natural language processing. Let's get started. In addition, it uses Google's automatic learning technology to process audio in real time or previously recorded, so we could even do analysis of speech-to-text in real time. What are we going to learn? The first thing is to start a Google Cloud service to create an API request. We will use the Speech API with audio stored in Google Cloud Storage. It can be stored elsewhere, but well, we are going to use one in Cloud Storage. And finally, the analysis of the result sent by the API. So let's get started. Well, friends, we are back here in the Google Cloud console. This is the dashboard that I hope we are already knowing. In the other videos that we are doing, I have explained a little about the Google Cloud Platform. And we are going to use, as I was saying, the voice-to-text analysis or processing. For this, we need an API key. We are going to use the REST API. If we were to use other methods, such as SDK, in that case, we could use another one. We could use other ways, but in this case, we need an API key. We come to the main menu, APIs and Services, and there we are going to see the section called Credentials. In Credentials, we are going to create a new credential. There are three types of credentials, API keys, OAuth 2.0, or Service Accounts. In this case, we are going to use an API key to be able to make calls to the API using the REST method. We are going to copy it. And in this case, to show you the demo, I am going to do it here in the same Cloud Share. You can activate it in this button that you see here next to the question mark. There it is, Cloud Share. You can activate it. And this is a very, very small virtual machine, which is volatile. That is to say, any information or any environment variable that you save here can be easily deleted. So this is just for you to do tests. It already has the SDK of Google Cloud and other important things to be able to use these Google Cloud services. That is why it is quite useful for this type of demos. Now with the API key copied, we are going to create an environment variable here to be able to use it in the requests that we are going to give to the API. We are going to call it API key. And we save the value. Very well. Now we are going to open the editor. Also, this terminal has an embedded editor to be able to use code files more easily. And we are going to create a new file here in File, New File. And we are going to put it request.json. With the purpose of saving it here. And we save the JSON structure that we have to send to the API so that it can do the processing. In this request.json file, we are going to put some basic information that we have to pass to the API. And I'm going to show you, in this case, an object of JSON with a couple of objects. One config, which brings the configuration. The first is the encoding. The encoding is in what format the file or the audio that I am passing to Google Cloud comes. Or in this case to the processing API. So that it does the processing. In this case, it is of type, or it has an encoding of type FLAC or FLAC. There may be several, right? We can pass an MP3, we can pass an WAV. We can pass different formats or encoding of audio files. And second, we have to pass the language code in which this audio that I am passing to it is. In this case, we are going to use an example of Google Cloud that they already have in a cloud storage pocket. In order to make it easier for this demo. In this case, the language code is in English because that is where the audio is. You could perfectly pass your own audio that you can have in your own Google Cloud bucket. Or in a place that is accessible. So that they can process it. In this case, the second object is audio. And we have to pass at least the URI, which in this case is a URI of Google Cloud Storage. It is important that you pass it that way. And that's it. That's really what we need for it to do the processing. Before we execute it, I want you to listen to the audio that it is going to process. So I'm going to play it right now. How old is the Brooklyn Bridge? Pretty simple, pretty clean. That is, it doesn't have background noise. It doesn't have background noise. It's going to be simple for the API to process it. Could it process something a little more complex? Of course it could. And right now we are going to see how we can understand how reliable the results that the API is going to give us are. Let's go back to the terminal with that saved file. And we're going to make a call to the API. Let's see what I did wrong. Okay, I saw what I did wrong. The name of the file went wrong. In this case, what we have to do is... I'll tell you. Here, request. It's not 12. That file is request.json. So now, with curl, you could use postman. You could use any method. To make an HTTP call. In this case, to do it right here in the context of the window we're working on. I'm going to use curl. I'm going to make it a little bigger so you can see it. The first thing is that I pass you a content type header. To tell you that the information I'm sending you is a JSON. A binary file. Well, in this case, data binary is to upload a file. Because that's where I have our JSON file. Or our JSON information. Perfectly, we could pass it here directly. But it's going to be less readable. So, for the example I'm teaching you. It's easier to do it in a file. And finally, the URL where I want to make the request. In this case, it's speech.googleappies.com. Slash v1. Slash speech. And in this case, we use the recognize method. And we have to pass the API key that we previously saved. In an environment variable. So that it can do the recognition. So, with this, we can make the request. And see that it already did the processing. It's going to give us several alternatives. See that the result is an arrangement of alternatives. In this case, as you heard. The audio is very simple. Very clean. Easy to understand. And it brings a confidence value. This is important. Because it gives us the confidence that this API gives us. How accurate. That is, how correct is the understanding that it made from the audio to the text. This value can be from 0 to 1. And see that it is very close to 1. It's a 0.98. Which gives us confidence. It gives us confidence. It's very high. That is, this translation. We can say that it is. Or this transcription to text. We can say that it is very, very accurate. If it were a text. Sorry. A longer audio. Or an audio that had background noise. Or that maybe the person did not speak in a way that was easy to understand for the API. It will return us several alternatives. With values of confidence. And there we can make the decision. That if any of these alternatives really make sense. Or even we could. Make a union of those alternatives that make more sense. To have a more real result. So see that. This API is that simple. This is the example. I'm not going to go deeper. With this we can do the transcription from an audio to a text. And also this API allows you to do streams in real time. This is very useful. Because if you, for example, would like to do the integration of maybe a stream. Of a video in real time. Maybe for a client who needs to do the transcription. Of audio text in real time. To make subtitles. Or maybe to use the translation APIs. To pass that text to different languages. You could do it in real time. With the stream API. So what you do is open a connection. Send certain bytes. That are going to be these. These audio stream values. That he's going to process. And he's going to send them the transcription in real time. So that also has a super big utility. And a business value. So that is the first thing. That we are going to do. I invite you to stay on the channel. We are going to have more examples. About the different services of machine learning. And API services. That Google Cloud Platform has for us. Next video. About TensorFlow. About how to create our own learning models. And use them within the AI platform. In the cloud of Google Cloud Platform. So thank you very much. And see you next time. Bye. Mr. White versus AL podcast.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript