Introduction to Google Speech-to-Text API
Explore speech-to-text integration, its methods, pricing, and a demo using Google Cloud Shell for converting audio to text.
File
Introduction to Google Speech to text Tutorial-32 TheEducationByte
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: In this lecture, we'll take a look at the introduction to speech-to-text. So what is speech-to-text? Speech-to-text enables easy integration of Google speech recognition technologies into developer applications. And in simplistic terms, you can send audio and receive a text transcription from the speech-to-text API service. So there are three main methods of calling speech-to-text. So synchronous recognition, and there are two options, REST and gRPC. So REST is obviously a way of sending information, gRPC is Google's own RPC, so you can take a look at what gRPC stands for and the technical details if you want. So synchronous recognition sends audio data to the speech-to-text API, performs recognition on that data and returns results after all the audio has been processed. So synchronous recognition requests are limited to audio data of one minute or less in duration. The other one is asynchronous recognition. So this also supports the methods of REST and gRPC. And this sends audio data to the speech-to-text API and initiates a long-running operation. So using this operation, you can periodically poll for recognition results. You can use asynchronous requests for audio data of any duration up to 480 minutes. The last one is streaming recognition, and you can only use gRPC protocol only. And this performs recognition on audio data provided within a gRPC bidirectional stream And streaming requests are designed for real-time recognition purposes, such as capturing live audio from a microphone. Maybe it's like a live webcast or a speech that you want to translate in real-time. Give subtitles, maybe, if you're a broadcaster. And streaming recognition provides interim results while audio is being captured, allowing results to appear, for example, while the user is still speaking. So let's take a look at the free tier limits. So speech-to-text is priced based on the amount of audio successfully processed by the service each month measured in increments rounded up to 15 seconds. And free tier allows 60 minutes of speech-to-text every month for free. And beyond that, there is a charge and the table shows the pricing for beyond the free tier. So over 60 minutes up to 1 million minutes is like 0.006 dollars for every 15 seconds and so on. So there's slightly different rates for enhanced models, which is video and phone call, where the audio has to be enhanced, but standard model is slightly cheaper. So let's take a look at the topics that we are going to cover in the demo. So we will try converting one audio speech file, single sentence-to-text, and we'll use the Cloud Shell and a publicly available file in Google Storage for this demo. And with a little experimentation, you should be able to try this out with your own audio files. We won't cover that, but I will give you some hints on what you need to do in order to do this. So that was a quick overview of speech-to-text API. Thank you, and I'll see you in the next lecture.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript