20,000+ Professional Language Experts Ready to Help. Expertise in a variety of Niches.
Unmatched expertise at affordable rates tailored for your needs. Our services empower you to boost your productivity.
GoTranscript is the chosen service for top media organizations, universities, and Fortune 50 companies.
Speed Up Research, 10% Discount
Ensure Compliance, Secure Confidentiality
Court-Ready Transcriptions
HIPAA-Compliant Accuracy
Boost your revenue
Streamline Your Team’s Communication
We're with you from start to finish, whether you're a first-time user or a long-time client.
Give Support a Call
+1 (831) 222-8398
Get a reply & call within 24 hours
Let's chat about how to work together
Direct line to our Head of Sales for bulk/API inquiries
Question about your orders with GoTranscript?
Ask any general questions about GoTranscript
Interested in working at GoTranscript?
Speaker 1: You You ♪
Speaker 2: At the moment, I'm just doing some initial research on training a Vosk model. So my use case here is, I have a collection of audio and video recordings that I'm going to try to transcribe. And
Speaker 3: I need something that's sort of specific
Speaker 2: More specific than currently the whisper model or the Yeah, the whisper models can can sort of do you
Speaker 3: So there's this version and then there's an L graph version.
Speaker 2: What is the difference between the The BOSC Model English US and the BOSC Model English US L-Graph.
Speaker 3: Okay, so the mask model English US is the standard model,
Speaker 2: which has a fixed vocabulary. the vocabulary is determined at the time of training and cannot be changed during run time. So it makes it efficient and suitable for situations where you know what kind of speech you'll be transcribing. Whereas the ElmGraph utilizes a dynamics graph structure, which allows you to modify the vocabulary on the fly, even while the speech recognition process is running. So, this makes it ideal for situations where you might encounter new or unexpected words, which for this use case is definitely, definitely the ideal choice. you
Speaker 3: You I think I'm asking the wrong question here.
Speaker 2: So, what I'm hoping to do is to automate some of this process using Ruby. I'm just trying to find out if there's anything that already exists. There sort of was for pocket Sphinx, but I don't think there's much for, I don't think there's much for Vosk, Vosk-related material. So, when it talks about language model interpolation, it's referring to the step where once the models are trained, you interpolate them to create a new model that combines the strengths of both. This can be done using a variety of methods. What I had in mind was the pre-processing steps before you would interpolate the models, which is collecting collecting the data, the text corpus, cleaning the data, and then training the generic model, as well as training the domain-specific model.
Speaker 3: So I was thinking of
Speaker 2: Ruby gems related to natural language processing but I think I got out of it pretty good but just because it then asked me well which libraries do you want to use? I'm like shoot, I don't think there really are any already for Vosk or Kaldi. So I suggested using TTS, text-to-speech, and Rake. So we'll see if that. The Hugging Face blog post discusses using n-gram language models to boost the performance of the Wave2Vec speech recognition model. could be implemented using libraries like TTS or Rake that provide n-gram language modeling capabilities, which isn't true at all, just by the way, but, and this does not exist as well, unless I missed something. Let me check out the link here. Also, it's clearly really good at aggregating search results. I was about to ask sort of a deeper question into programmatically going about this, but At least with the basic service, it's not the best at that.
Speaker 3: You You I failed to find the interpreter.
Speaker 4: Now what am I talking about?
Speaker 2: Setup tools, system side packages.
Speaker 3: I don't know if that's what it was.
Speaker 2: I always thought these two were the same thing, kind of like RVM and RBN, Finn Ruby. Not exactly.
Speaker 3: Oh, I want 3.10.
Speaker 4: Doo doo doo doo doo doo doo da da.
Speaker 2: All right, oh, okay, you have to install it first, I see. Okay, so you have to install the specified version first with pyinv. Then you can create a virtual environment with that version.
Speaker 3: You
Speaker 2: So this is another reason why I'm taking the time to go through this is that these tools are a bit further along in development for models like Vosk or Pocket Sphinx, or I forget the other one, Kalbi, but as you can see it does Let's take some configuring.
Speaker 4: Okay.
Speaker 2: I think
Speaker 4: And you can find references to...
Speaker 3: Eight point seven.
Speaker 4: 18.7. Why is this necessary? Oh, okay. Oh, okay. You
Speaker 3: Oh, I see.
Speaker 2: So the problem might be with the fact that I had Rust up installed as opposed to just Rust.
Speaker 3: Thank you.
Speaker 2: Well, given that it's some sort of packaging system, I'm not sure if it's actually required for this particular piece. The first thing I'm actually giving it is an audio track here.
Speaker 5: West exterminate, yes, yes, gas cap link, stretch more, Mint's hack, point sun red, point hoof trap, slash tide, gust each bat, slash nerd urge, mate each nerd Yes Cap drum, space nerd urge tab, yes Hoof space, shy red tab, Troy Link Yes Sun odd, right Yes Go go go Sss, shh Boom Boom, sss, shh, pew, sss, shh, shh, sss, sss, ah, ah, ah, ah, quick, quick, quick. Each space, bat, yank, each, yes, ice, scribe, goodbye, clause, space, red urge, nerd, space dash, notify, send, space, shout, bat, yank, each, yell, same, space, scribe, sleep, space, two, same, space, pit, scribe, kill, video, point, sun, hoof, save, hype, quench, scribe, echo load space tide plus slash backtab space vert space nerders made each nerdcap yes goodbye
Speaker 6: From spontaneous generation of contextualized sense sense. Contact.
Speaker 3: You
Speaker 6: From spontaneous generation of contextualized sense
Speaker 3: Contextualized sense, electro-rhythmic pulse Electro-rhythmic pulses propagate signaling the transcending self-referential state. Electro-rhythmic pulses propagate signaling the transcending self-referential state. Electro-rhythmic pulses propagate signaling the transcending self-referential state.
Speaker 6: Propagate signaling the transcending self-referential emergent patterns at recursive depths. Emergent patterns at recursive depths.
Speaker 3: Conscious flow engaged, a quick tremor of deep resonance. Thank you so much for watching, and I'll see you in the next video.
Speaker 2: All right, so now I'm going to give it a do.
Speaker 3: Let's see. All right, building a database of words.
Speaker 2: All right, building a database of words. So when, yeah, so scripts for parsing the transcripts can probably be adapted from the ones that are parsing the files. or breaks, I don't know, I guess we're gonna start with the link chain chunker that seemed to be the chunker.
Speaker 3: So yeah, sorry, it's a little loud.
Speaker 2: Yeah, this works pretty well, pretty impressive. Later on today, I hope to explore the VideoGrep and MP4Grep tools.
Speaker 3: So yeah, so the next step will be to look into how the,
Speaker 2: the text corpus is to be structured. The Kaldi toolkit that requires a specific structure to the text that it's gonna be trained on. And that'll take a few days. That's a fairly grindy sort of set of tasks that you have to really sort of just zero in on for a while.
Speaker 3: Bye-bye. you you
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now