20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.
Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.
GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.
Speed Up Research, 10% Discount
Ensure Compliance, Secure Confidentiality
Court-Ready Transcriptions
HIPAA-Compliant Accuracy
Boost your revenue
Streamline Your Team’s Communication
We're with you from start to finish, whether you're a first-time user or a long-time client.
Give Support a Call
+1 (831) 222-8398
Get a reply & call within 24 hours
Let's chat about how to work together
Direct line to our Head of Sales for bulk/API inquiries
Question about your orders with GoTranscript?
Ask any general questions about GoTranscript
Interested in working at GoTranscript?
Speaker 1: Can you please explain to us this diagram? Okay, so this is about how the training works. So, like I say, most of the model is shared, but there's parts of it that are specific to LibriSpeech and GigaSpeech. What this decoder is, is it gets the language model history of the last two tokens and it encodes it into a vector. So that's the kind of recurrent part of an RNNT, at least it's recurrent on the RNNT's output. So the decoder and joiner are specific to LibriSpeech and GigaSpeech. We have two separate data loaders, we don't combine the two data sets into one data loader, because we need to feed them to the appropriate head and it would be quite inconvenient to split apart the data within a mini-batch to one head or the other. So what we do is we create two data loaders, and I think the way it works is the epoch iterations are over LibriSpeech, and then we just continuously get data from GigaSpeech and when it runs out, we just restart the data loader or we put the data loader in some kind of mode where it does that. Okay, also, can you explain to us how to decode using GigaSpeech decoder? That's the kind of thing that people had better create an issue on GitHub to ask about it, but my guys know about this, but I'm not sure of the specifics. Okay, can you explain how the GigaSpeech model is trained? Okay, so our GigaSpeech model wasn't just trained on GigaSpeech, it was trained on LibriSpeech and GigaSpeech. Now it's difficult to combine those two data sets because they're normalized quite differently. GigaSpeech has things like comma and period as separate words. So what we did is most of the model is shared, but then there's two heads of it. One is for LibriSpeech, one is for GigaSpeech. So that's how we trained and this API, this is with the LibriSpeech head of it. So it'll output in a way that's normalized like LibriSpeech, it won't give you period and comma and stuff like that. My suspicion is that probably there's not a ton of difference between what the two heads output, because most of the model is shared, but I don't know that for sure. The underlying models should support decoding with GigaSpeech, you just have to pick the right top part of it, but I don't think this API supports the GigaSpeech head. Thank you for watching.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now