Improving LibreSpeech Model with Better Language Models (Full Transcript)

Enhance word error rate by rescoring with RNNLM in Kaldi, using basic RNNLM or a PyTorch-based model for improved performance.
Download Transcript (DOCX)
Speakers
add Add new speaker

Speaker 1: Hello, this is Daniel Povey and today we're going to ask him, we trained LibreSpeech model using call these scripts. What is the next step? What can we do now to improve its word error rate? Hmm. Well, so when you ask that question, I'm going to assume that you trained like to the very end of the run.sh, like the chain system. So, I mean, already that's a pretty good system. But if you want to improve the word error rate further, I think the main thing you can do is to use a better language model. So, like the default decoding in Kaldi is what I think with a foreground language model, that script should be testing with a foreground. That's as good as you can get from an N-gram language model, you know, a graph-based decoding. But you can improve that by rescoring with an RNNLM. There are some scripts in there to rescore with an RNNLM. So, this is a Kaldi-based RNNLM. It's not one of those PyTorch-based transformers or something. So, I mean, it's a pretty basic RNNLM. These days people can do better. And we do have some scripts somewhere in Kaldi that you can run a PyTorch-based RNNLM. But I think I would recommend to use the Kaldi one for now simply because there's fewer things that can go wrong. Will we do rescoring with this new RNNLM? Yeah, you'll do lattice rescoring. We don't normally do decoding in the first pass with the RNNLM. So, you decode the entire utterance and then you rescore the lattice. Okay. Thank you. Okay. Bye. Bye.

ai AI Insights
Arow Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Arow Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Arow Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Arow Key Takeaways

Extract key takeaways from the content of the transcript.

Generate
Arow Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript