Faster research workflows · 10% .edu discount
Secure, compliant transcription
Court-ready transcripts and exhibits
HIPAA‑ready transcription
Scale capacity and protect margins
Evidence‑ready transcripts
Meetings into searchable notes
Turn sessions into insights
Ready‑to‑publish transcripts
Customer success stories
Integrations, resellers & affiliates
Security & compliance overview
Coverage in 140+ languages
Our story & mission
Meet the people behind GoTranscript
How‑to guides & industry insights
Open roles & culture
High volume projects, API and dataset labeling
Speak with a specialist about pricing and solutions
Schedule a call - we will confirmation within 24 hours
POs, Net 30 terms and .edu discounts
Help with order status, changes, or billing
Find answers and get support, 24/7
Questions about services, billing or security
Explore open roles and apply.
Human-made, publish-ready transcripts
Broadcast- and streaming-ready captions
Fix errors, formatting, and speaker labels
Clear per-minute rates, optional add-ons, and volume discounts for teams.
"GoTranscript is the most affordable human transcription service we found."
By Meg St-Esprit
Trusted by media organizations, universities, and Fortune 50 teams.
Global transcription & translation since 2005.
Based on 3,762 reviews
We're with you from start to finish, whether you're a first-time user or a long-time client.
Call Support
+1 (831) 222-8398Speaker 1: Can you please explain to us this diagram? Okay, so this is about how the training works. So, like I say, most of the model is shared, but there's parts of it that are specific to LibriSpeech and GigaSpeech. What this decoder is, is it gets the language model history of the last two tokens and it encodes it into a vector. So that's the kind of recurrent part of an RNNT, at least it's recurrent on the RNNT's output. So the decoder and joiner are specific to LibriSpeech and GigaSpeech. We have two separate data loaders, we don't combine the two data sets into one data loader, because we need to feed them to the appropriate head and it would be quite inconvenient to split apart the data within a mini-batch to one head or the other. So what we do is we create two data loaders, and I think the way it works is the epoch iterations are over LibriSpeech, and then we just continuously get data from GigaSpeech and when it runs out, we just restart the data loader or we put the data loader in some kind of mode where it does that. Okay, also, can you explain to us how to decode using GigaSpeech decoder? That's the kind of thing that people had better create an issue on GitHub to ask about it, but my guys know about this, but I'm not sure of the specifics. Okay, can you explain how the GigaSpeech model is trained? Okay, so our GigaSpeech model wasn't just trained on GigaSpeech, it was trained on LibriSpeech and GigaSpeech. Now it's difficult to combine those two data sets because they're normalized quite differently. GigaSpeech has things like comma and period as separate words. So what we did is most of the model is shared, but then there's two heads of it. One is for LibriSpeech, one is for GigaSpeech. So that's how we trained and this API, this is with the LibriSpeech head of it. So it'll output in a way that's normalized like LibriSpeech, it won't give you period and comma and stuff like that. My suspicion is that probably there's not a ton of difference between what the two heads output, because most of the model is shared, but I don't know that for sure. The underlying models should support decoding with GigaSpeech, you just have to pick the right top part of it, but I don't think this API supports the GigaSpeech head. Thank you for watching.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateExtract key takeaways from the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateWe’re Ready to Help
Call or Book a Meeting Now