Understanding LibriSpeech and GigaSpeech Model Training

Convert Your Audio To Text

4.9/5

3718 customer reviews

Explains the training and decoding process using shared models for LibriSpeech and GigaSpeech, highlighting data loaders and normalization differences.

Dan K2 32 Multiple Datasets in Training Next-gen Kaldi

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Can you please explain to us this diagram? Okay, so this is about how the training works. So, like I say, most of the model is shared, but there's parts of it that are specific to LibriSpeech and GigaSpeech. What this decoder is, is it gets the language model history of the last two tokens and it encodes it into a vector. So that's the kind of recurrent part of an RNNT, at least it's recurrent on the RNNT's output. So the decoder and joiner are specific to LibriSpeech and GigaSpeech. We have two separate data loaders, we don't combine the two data sets into one data loader, because we need to feed them to the appropriate head and it would be quite inconvenient to split apart the data within a mini-batch to one head or the other. So what we do is we create two data loaders, and I think the way it works is the epoch iterations are over LibriSpeech, and then we just continuously get data from GigaSpeech and when it runs out, we just restart the data loader or we put the data loader in some kind of mode where it does that. Okay, also, can you explain to us how to decode using GigaSpeech decoder? That's the kind of thing that people had better create an issue on GitHub to ask about it, but my guys know about this, but I'm not sure of the specifics. Okay, can you explain how the GigaSpeech model is trained? Okay, so our GigaSpeech model wasn't just trained on GigaSpeech, it was trained on LibriSpeech and GigaSpeech. Now it's difficult to combine those two data sets because they're normalized quite differently. GigaSpeech has things like comma and period as separate words. So what we did is most of the model is shared, but then there's two heads of it. One is for LibriSpeech, one is for GigaSpeech. So that's how we trained and this API, this is with the LibriSpeech head of it. So it'll output in a way that's normalized like LibriSpeech, it won't give you period and comma and stuff like that. My suspicion is that probably there's not a ton of difference between what the two heads output, because most of the model is shared, but I don't know that for sure. The underlying models should support decoding with GigaSpeech, you just have to pick the right top part of it, but I don't think this API supports the GigaSpeech head. Thank you for watching.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3718 customer reviews

1/730

Verified Order

“I needed an interview transcribed accurately and I was happy with the quick turnaround. ”

Jen

Jul 20, 2025

“Very accurate transcription, fast service, easy to use and order, thank you!”

Gabby

Jul 15, 2025

“I am beyond happy with this service, which I am using it produce interview transcripts for my dissertation research. The interface is easy, the customer service was prompt and informative, the transcript is accurate, and the pricing is wonderful. I will recommend GoTranscript to anyone who is in need of affordable human-powered transcription services.”

Justin McDonald

Jun 29, 2025

“great work. quick and professional”

christian oradesky

Jun 28, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support