Unlock Effortless Live Speech Transcriptions
Learn how to set up and use Watson Speech-to-Text for seamless live transcription in a few simple steps. Perfect even for beginners!
File
Live Speech to Text with Watson Speech to Text and Python FREE Speech to Text API
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: We're about to do some live speech-to-text transcription using Watson Speech-to-Text. Let's dive into it. What's happening guys? My name is Nicholas Renaud and in this video, we're going to be taking a look at a live transcription using Watson Speech-to-Text. So you'll be able to speak into your microphone and see the transcription done live. Let's take a deeper look as to what we'll be going through. So in this video, we're first up going to start out by setting up our Watson Speech-to-Text server. So we'll just go to IBM Cloud and set that up. Pretty easy to do. Then what we're going to do is clone down the repo that allows us to perform live speech-to-text. So this is a pre-built template that you can just pick up and use really easily. And then once we've got that set up, we'll configure it and we'll actually perform some live speech-to-text. Let's take a deeper look as to how this is all going to fit together. So first up, what we're going to do is jump on over to IBM Cloud, set up our Watson Speech-to-Text API. So really easy to do. I think it's like two clicks. Then we'll clone down that template from GitHub. And again, all the code from this video is going to be available in the description below. Once we've got that cloned, we'll set it up, install all the stuff that we need, and then we'll perform some live transcription. So we'll be able to do our live speech-to-text in real time. So as we speak, we'll see our transcription appear on the screen. Ready to do it? Let's get to it. All righty, what's happening, guys? So in this video, we're going to be running through live speech-to-text transcription. So ideally what we're going to be able to do is run our code and then leverage our microphone to convert our speech-to-text in real time. Now, in order to do this, there's going to be a couple of steps that we need to walk through, but they're relatively simple to get going with, even if you're a beginner. So what we're first up going to do is set up our Watson Speech-to-Text service. So these six key steps are what we're going to need to go through. So we'll set up our Watson Speech-to-Text service. And to do that, we'll go to cloud.ibm.com, but I'll show you how to do that. We're then going to clone our live speech-to-text code. So this is all pre-written for us. So we can just pick this code up, leverage it, plug in our API keys, and it's going to work. Once we've cloned that code, we'll then need to install a couple of key dependencies. And I've also got a solution if you're running this on Windows and you run into an error, particularly if you're installing one of the dependencies called Pi Audio, but we'll come to that in a second. Once we've done all that, we'll update our live code with our API key that we get from our speech-to-text service in this step. And then we'll go on ahead and actually run our live transcription at the end. So ideally I'll be able to speak into this mic and you'll actually see the speech converted to text in real time on the screen. Alrighty. So on that note, what we're going to do is start by setting up our Watson Speech-to-Text service. Now, in order to do this, we just need to go to cloud.ibm.com and then specifically forward slash catalog. So if I copy that link, you can see I'm currently already there, but if I paste that in, it's going to take me to the exact same page. Now, if you don't have an IBM cloud account, you can set this up and keep in mind that everything that I'm going to be showing you can be done for free. So you've actually got a free tier that you can begin to leverage for speech-to-text, which is actually pretty cool. Now, in terms of getting the speech-to-text service that we're going to need, once you're at the cloud.ibm.com forward slash catalog link, you can just go to services over on this left-hand side here, and then scroll on down to AI machine learning. And if I zoom out, you can see you're going to have a whole bunch of APIs, some of which we've covered already. And I believe we've covered baseline speech-to-text, but not live speech-to-text yet. So let's take a look at that. So the service that we're going to need is speech-to-text. So this one down here. So if we select that, then from here, we're going to have a few things that we need to choose. So obviously at the moment, we're going to be leveraging a light plan. So this gives you up to 500 minutes for free per month. So you can actually convert 500 minutes of live speech per month, which is pretty cool. And we also need to choose our region. So in this case, this is quite important because we're going to need the code for this region later on. So I'm based in Sydney. So let me zoom in on that. So I'm based in Sydney. So I'm going to choose AUCID, but I could just as easily choose Seoul, Tokyo, Frankfurt, London, Dallas, or Washington as well. In this case, we're going to choose AUCID and keep it on that. And make note of that region because you'll need it when you actually go to set up your live speech-to-text service. Now, if you forget to grab it from here, I'm going to show you where to get it from in the next step. So we've selected our location, selected our pricing plan. The next thing that we want to go and do is hit create down here. And so this is going to allow us to create our service. And on the next screen, we'll be able to grab our API key as well as our region. Alrighty. So we're now on to that next screen. So what we can do from here is just select service credentials from this left-hand side. And then from here, we can begin to grab all the stuff that we're going to need to work with our live transcription. So if we open up this line here, which says auto-generated service credentials, you're going to get a couple of key lines. So you'll get your API key, and then you also get this URL, which is useful as well. So first up, what we're going to do is copy our API key into our checklist that I was showing you. So if I just, or just copy it somewhere safe, so you've got it available. So we'll copy this. And we're just going to paste it into here, API key. So this is just a placeholder. So this isn't doing anything at the moment. We're just writing it down. And then we also need our region. So if I go back, so remember when I mentioned, you want to keep or take note of the region that you spin up. So if you didn't take note, you can actually just copy it from this URL here. So the bit between is going to indicate where your service is actually located. So in this case, our service is located in AUCID. So we want to copy that over. And then we're going to copy that over. So again, if you used Tokyo or Seoul or Washington or Dallas, just copy that location code or that region code, because you'll need that in a second. And again, we can paste this in to our notebook or just store it somewhere safe for now. So that is really step one done. So we've now gone and set up our Watson speech text service from cloud.ibm.com forward slash catalog. Now, the next thing that we want to do is actually go on ahead and clone our live speech to text code. So in order to get that, we're going to be able to go to github.com forward slash IBM forward slash Watson dash streaming dash speech to text or STT. So this link is going to allow you to access the live transcription code. Now, again, all of the code that we write in this tutorial, as well as our little checklist here is going to be available in the description below. So you can pick that up if you want. So in this case, we can grab this link. Let's zoom out. And I've already got it open here, but again, I can just paste that in and that's going to open the link to the Watson streaming speech to text code. Now, what we're going to do from here is basically run through these steps. So we're going to be running this install step and we're going to be setting it up. So what we'll do is we'll clone this onto our own machine for now, so we can actually work with that code. So if we go back, what I'm So if we go back, what I'm actually going to do is open up a command prompt. So in this case, I've already got one here and I'm in the same folder as my checklist. So you can see here that I'm in ready dash 24 dash zero three dash 2021 real time STT. It's the exact same folder that I've got here. So if I paste that link, or if I go to clone that link, I can do that using git clone and then pasting the link of the code that I want to clone. So if you've worked with git before, this is just as regular git clone and then hit run. And then that's going to clone down the code. Then the next thing that you want to do is open this up inside of your favorite code editor. So there's not much code that we need to write. We really just need to install a bunch of stuff and update a couple of config parameters. So what we can do, I'm going to open it up inside of VS code by typing code dot. And in this case, it's just opened up my existing VS code editor. You can now see that I don't just have my checklist. I've also got this Watson streaming dash STT file, which contains all the stuff that I need to actually go on ahead and begin our live transcription. So on that note, that's actually step two done. So we've now gone and cloned our live speech to text code. Now, the next thing that we can do, as you might notice is install our dependencies. Now inside of this file, let's quickly take a look at what we've got here. So you've got a license document, you've got a read me, you've got a requirements dot text file, and these are going to be the two dependencies that we need to install onto our machine. Now, sometimes if you're running on a windows machine, you might run into some issues when you're installing PI audio. So if you get an error, I'm going to show you how to solve that. So you can actually just install a new library called PIP win, which is going to download a pre compiled version of PI audio and just solve those issues. So I'll show you how to do that as well. Then we've got our setup dot config. So this, we don't really need to change our setup dot PI. Again, this one, we don't really need to change. Then we've got our speech dot CFG dot example. So this file, we do actually need to change it and we need to replace it with our API key and our region. So if I close all of this, and then we've also got our transcribe dot PI file. Now, if we wanted to, we could also convert our or change the model that we're actually using in here. So down on line 175, you can see that there's this line that says the model that we're going to be using is English dash us underscore broadband model. But there's a whole bunch of different types of models that you can actually use when leveraging Watson speech to text. And I'll show you where to get a list of those. So we're actually going to sub this out and use the Australian model. As you might've guessed, I've got a pretty strong Australian accent. So we want to be able to leverage that model. But that's pretty much all the code that we've now gone and cloned. Now, the next thing that we want to do is install these dependencies here. So what I'm going to do is I'm going to open up a new terminal inside of VS code, just to make it a little bit easier to work in. Actually, that might cover up when we're doing it. Let's actually close this. So what we're going to do is we're going to go first up, go into that folder. So remember, right now we're sitting in the top level folder, we want to go into this Watson streaming speech text folder. So I'm going to CD into that. You can see I'm now in the Watson streaming speech text folder, and then we're going to run this command here. So this is going to go on ahead and install this requirements.txt file. So it's going to install pyaudio and install the WebSocket client as well. So to do that, we can just run this command. So pip install-r requirements.txt. So let's do that. And this is going to go on ahead, install those dependencies into our machine. So let's go on ahead and run that and see how we go. Okay. So it doesn't look like I got any errors there. So you can see it said requirement already satisfied, requirement already satisfied. So no big issues there, but if you do run into some issues, particularly when installing pyaudio onto your machine, if you're running on a Windows machine, I'm just going to clear this for now. What you want to go in ahead and do is install pipwin. So to do that, you can do pip install pipwin. And this is going to install pipwin for a Windows machine. So I believe it helps when installing compiled packages. And then again, we can clear this. I've already got it installed. And then to go on ahead and install pyaudio and solve that issue, you can type in pipwin install pyaudio. And that's just going to download that pre-compiled version of pyaudio and just make sure that everything runs successfully. So in this case, it looks like it's already installed and no issues there. So that really covers these two steps. Ideally, you want to just be able to run pip install-r requirements.txt. And if you don't get any errors or any issues there, you're good to go. If you do get errors on a Windows machine, install pipwin and then run pipwin install pyaudio. So that takes care of these two steps. Now, the next thing that we want to do is update our setup and config with our API key and our region. So remember, we copied these down when we set up our Watson speech text service right up here. Well, now we're actually going to make use of them. So what we'll do is we'll grab this API key and copy that. And we're going to go into this speech.cfg.example file. Now, before you paste anything in, what I want you to do is actually rename that file. So if I right click onto that file and hit rename, right now it's got .example at the end of it. What we actually want to do is get rid of the .example. And this will make sure that our file or our Python code actually goes on ahead and leverages this speech.config file. Because if you leave the .example file, you're going to get a bunch of errors. It's not going to work. Make sure you delete the .example after it. So ideally, it should read speech.cfg. Then what we want to do is paste in our API key. So it says API key equals, and then you've got this in arrows, you've got API key. So we just want to paste our API key in there. And remember, we changed where our service was actually going to be hosted. So here we've got US South, but we want this to read AU CID or whatever region that you're operating from. So we can copy the AU CID and paste that in there. So that's our setup done. And make sure you save that. So it's obviously committed. So in this case, we've now gone, what we did there was we removed the .example from our speech.cfg file, we then went and updated it with the API key, updated it with our region and saved it. So we're good to go now. So that is this step done. So we've now gone and done step five. Now, if you wanted to, you could also go on ahead and leverage a slightly different model. So remember when I said that over in here, it's currently using this broadband model. So model equals EN dash US underscore broadband model. If you perhaps have a different accent or your base in a different location other than the US, and you want to leverage it with an accent, which is similar to yours, you can actually sub this out and leverage your own model. So if I actually go back to our browser, so I've got this open now. So if you go to cloud.ibm.com, forward slash docs, forward slash speech to text, and then question mark topic equals speech text models. Let's just copy this into our checkbook or into our checklist. So this is our models link. So this link here, so cloud.ibm.com, forward slash docs, forward slash speech dash two dash text question mark topic equals speech dash two dash text dot dash models. This is going to show you all the models that you've actually got available when leveraging the Watson speech tech service. So if I actually open that up, you can see there's an Arabic one, a Brazilian one, a Mandarin one, Dutch, English, French, German, Italian, Japanese, Korean, Spanish, so on. So there's a whole bunch of different types of models that you can leverage. It's going to work a lot better for your particular accent. Now you can see, or you can probably hear that I've got an Australian accent. So we're going to copy our Australian model, which is this one here. And we're going to use the broadband model. So EN dash AU underscore broadband model. Now keep in mind, you don't need to do this. So this is sort of optional. So if you don't want to change the model, you don't need to, you can leverage the model, which is in there, which is the US broadband model. So it's this one down here. If you do want to change it, however, just go copy the model that you want from here, then jump into your transcribe file. And then I believe again, it was at line 175, where you can change this. So if we just change this model here to our AU broadband model, we should be good to go. So that's really everything done. So we're going to change this model. We've now gone and set up our service, cloned our live code, installed our dependencies, set up our config. Now it's time to run our live transcription. So in order to do this, I'm just going to move this to the side so you can see it a little bit better. So we're going to open up a terminal and let's clear this. And in order to run our live transcription, so I'm just going to make it so it should be about there. So that's good. So what we're now going to do is actually run this code. So I'm going to be able to speak into my microphone and you'll see the live transcription appear on the screen. So what we can do in order to run this is we actually need to run this transcribe.py file. So to do this, we can just type in Python, and then we want to run transcribe.py dash T. And then we want to pass through how long we actually want to record for. So in this case, we can type dash T 20. So this means that it's actually going to transcribe for 20 seconds. Now, again, you can change this to whatever it is that you want. If you want to transcribe for longer, you can do that. If you want to transcribe for shorter, you can do that as well. Just pick the timing that you want to transcribe for. So the full command is Python space transcribe.py space dash T space 20. So this means it's going to transcribe for 20 minutes. So if we run this, ideally what should happen is we're actually transcribing for 20 seconds, and then we should see all of the speech appear on the screen. So let's go on ahead and run this. So it should read out recording, and you can see that it's now transcribing in real time, and it's actually performing really, really quickly. So I can keep talking and it's going to keep transcribing, and we're going to keep seeing the results displayed to the screen until we hit our 20 second time limit. Pretty cool, right? So you can see there that it's now completed, and we've got our transcription, and we've got our larger transcription displayed here as well. So that makes it really, really easy to transcribe live. Now, if we wanted to transcribe for a shorter amount of time, we can just change that command. So let's actually go on ahead and clear this. So we can run that command again, but say, for example, we wanted to transcribe for five seconds. Well, we could just type in five and it's going to transcribe for five. Hey, how's it going? We're performing a live transcription example using Watson Speech Text. And again, so you can see that it's now cut off midway through that last transcription. So it's just written transfer, but you can see that it's written, Hey, how's it going with performing or we're performing a live transcription. Again, you can sort of play around with this and you'll get differing results. Ideally, if you're speaking more clearly, you're going to get better results. If you've got lower quality audio, you're going to get worse results. So let's try it again for 15 seconds. Hey, how's it going? We're performing a live transcription using Watson Speech to Text. And now we're done and you can see that we've got our transcription there. So, Hey, how's it going? We're performing a live transcription using Watson Speech to Text. Pretty cool, right? So that about wraps it up for live transcription. So we've now gone through all of our six key steps. So we've set up Watson Speech to Text. We've gone and cloned our code down, installed our dependencies, installed Pipwin. This is particularly relevant if you're installing on a Windows machine and you get errors installing Pi Audio. We've updated our setup and config with our API key. And then we've gone ahead and run our live transcription. And you can see just by running that Python command. So remember it's python transcribe.pi-t and then whatever time you want to transcribe for, you're able to get that live transcription. So remember we did that last one. So it actually performed reasonably well when it was doing that transcription. So it went and wrote, Hey, how's it going? We're performing a live transcription using Watson Speech to Text. And that about wraps it up. Thanks so much for tuning in guys. Hopefully you enjoyed this video. If you did, be sure to give it a thumbs up, hit subscribe and tick that bell. Let me know what you're using live transcription for. And if there's any other use cases you'd like to see done. Thanks again for tuning in. Peace.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript