Faster research workflows · 10% .edu discount
Secure, compliant transcription
Court-ready transcripts and exhibits
HIPAA‑ready transcription
Scale capacity and protect margins
Evidence‑ready transcripts
Meetings into searchable notes
Turn sessions into insights
Ready‑to‑publish transcripts
Customer success stories
Integrations, resellers & affiliates
Security & compliance overview
Coverage in 140+ languages
Our story & mission
Meet the people behind GoTranscript
How‑to guides & industry insights
Open roles & culture
High volume projects, API and dataset labeling
Speak with a specialist about pricing and solutions
Schedule a call - we will confirmation within 24 hours
POs, Net 30 terms and .edu discounts
Help with order status, changes, or billing
Find answers and get support, 24/7
Questions about services, billing or security
Explore open roles and apply.
Human-made, publish-ready transcripts
Broadcast- and streaming-ready captions
Fix errors, formatting, and speaker labels
Clear per-minute rates, optional add-ons, and volume discounts for teams.
"GoTranscript is the most affordable human transcription service we found."
By Meg St-Esprit
Trusted by media organizations, universities, and Fortune 50 teams.
Global transcription & translation since 2005.
Based on 3,762 reviews
We're with you from start to finish, whether you're a first-time user or a long-time client.
Call Support
+1 (831) 222-8398Speaker 1: In this lecture, we'll take a look at the introduction to speech-to-text. So what is speech-to-text? Speech-to-text enables easy integration of Google speech recognition technologies into developer applications. And in simplistic terms, you can send audio and receive a text transcription from the speech-to-text API service. So there are three main methods of calling speech-to-text. So synchronous recognition, and there are two options, REST and gRPC. So REST is obviously a way of sending information, gRPC is Google's own RPC, so you can take a look at what gRPC stands for and the technical details if you want. So synchronous recognition sends audio data to the speech-to-text API, performs recognition on that data and returns results after all the audio has been processed. So synchronous recognition requests are limited to audio data of one minute or less in duration. The other one is asynchronous recognition. So this also supports the methods of REST and gRPC. And this sends audio data to the speech-to-text API and initiates a long-running operation. So using this operation, you can periodically poll for recognition results. You can use asynchronous requests for audio data of any duration up to 480 minutes. The last one is streaming recognition, and you can only use gRPC protocol only. And this performs recognition on audio data provided within a gRPC bidirectional stream And streaming requests are designed for real-time recognition purposes, such as capturing live audio from a microphone. Maybe it's like a live webcast or a speech that you want to translate in real-time. Give subtitles, maybe, if you're a broadcaster. And streaming recognition provides interim results while audio is being captured, allowing results to appear, for example, while the user is still speaking. So let's take a look at the free tier limits. So speech-to-text is priced based on the amount of audio successfully processed by the service each month measured in increments rounded up to 15 seconds. And free tier allows 60 minutes of speech-to-text every month for free. And beyond that, there is a charge and the table shows the pricing for beyond the free tier. So over 60 minutes up to 1 million minutes is like 0.006 dollars for every 15 seconds and so on. So there's slightly different rates for enhanced models, which is video and phone call, where the audio has to be enhanced, but standard model is slightly cheaper. So let's take a look at the topics that we are going to cover in the demo. So we will try converting one audio speech file, single sentence-to-text, and we'll use the Cloud Shell and a publicly available file in Google Storage for this demo. And with a little experimentation, you should be able to try this out with your own audio files. We won't cover that, but I will give you some hints on what you need to do in order to do this. So that was a quick overview of speech-to-text API. Thank you, and I'll see you in the next lecture.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateExtract key takeaways from the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateWe’re Ready to Help
Call or Book a Meeting Now