Comparing Python Speech Recognition and IBM Watson
Explore the differences in performance between Python's speech recognition library and IBM Watson's API for building a digital assistant.
File
Python Speech Recognition Testing with IBM Watson Speech Recognition API 132
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello, world. Today, I'll be timing the performance between the standard speech recognition Python library that I've been using in all my previous videos and IBM Watson's Speech-to-Text Speech Recognition API. So, I received a comment from my popular video, How to Make Your Own Digital Assistant Like Alexa or Siri. And I run through several commands and, you know, the commenter wanted to know how I respond or how the digital assistant responds so quickly. And for full transparency, it really doesn't. So, that's a lot of video editing on my part. That way, it's interesting for the audience. But as the system runs or as the program continues to run, it's listening the whole time for a command every like five seconds and it slows down. So, when I yell out the wake word, Shane, then it's slower to respond and then sometimes the command takes longer. And so, now I want to explore alternatives and I wanted to check out the IBM Watson Speech-to-Text Recognition API. But first, welcome to the 132nd video on my channel where I'm building my own digital assistant named Shane like Jarvis from the Iron Man movies and comics. And speech recognition is one of the most vital parts of having a real digital assistant or something that's the closest to Jarvis. And so, we're going to run through two examples, really quick examples, trying to compare Apple's-to-Apple's performance. And we're going to use the Timeit library that Python. It's a standard library from Python. So, we're going to try the speech recognition which allows you to stream easily from your microphone and that's what we've been using in all my previous videos. And then we're going to do a pre-recorded sample for the IBM Watson test because it does not have an easy way to stream your microphone. But that's something we could fix if for some reason it performs better. So, we're in this speech recognition and so we're going to run it. It's going to say, waiting for a command. Then I'm going to say my test. And then we're going to time how long it takes for it to analyze and print what we said. So, let's check that out now. Test, test, test. All right. So, it correctly printed out test, test, test and then it took about less than a second. All right. And so, now we have this MP3. Okay. Maybe let's not run it from there. Oh, wait. So, we have this MP3 here. Test, test, test. Okay. It's just me saying test, test, test just like the previous one. And now we're going to use the IBM test, the IBM Watson. And we'll go through how to set this up in a second. Okay. So, test, test, test. That's correct. That took about 2.7 seconds. Neither of them work while the Internet is – while I'm not connected to the Internet. So, that's a concern of mine as well. So, the standard speech recognition that we've been using, this is a – you have to import this library. So, pip import speech recognition as SR. If you're using – yeah. And then you have to import timeit. Timeit is just used for the testing. You have to set up a recognizer which is SR which is the speech recognition dot capital recognizer. And you call that microphone equals SR dot microphone. And then we did a try and accepts. So, with the microphone as the source and this is why speech recognition is so commonly used is because it's easy to stream from your microphone. You saw the waiting for command. I like to adjust it for ambient noise because that's what I use in my real digital assistant. And I set the threshold based on some tests that I did. And then we wait for five seconds. If we're not connected to the network, we'll get a network error. If I don't say something in five seconds, it will pass. And if it doesn't understand me, it will just pass. Then to do – to work in timeit, you have to put it in a function. So, what we do is we use the recognizer microphone and then we pass it the audio file which is what we called here or what we defined here. And then we're going to print that command. So, command equals recognizer dot recognize underscore Google. And there's different options you can use for speech recognition. You pass it the audio and then we print this command. And you saw that when it said test, test, test. Then to use timeit, you set the variable t equals timeit dot capital timer. And when you're using timeit, if you have to pass functions or arguments into a function, go ahead and just use a lambda. A lambda is when you use a one-time throwaway function. So, lambda equals SR test which is this function. We pass a recognizer variable, a microphone variable which we set up here. And then you pass it the audio which is what we got here. And then it recognizes that. And then you print t dot timeit. And we just want it to run it once, right? And that's what you saw when it did it in less than one second. So, we've used that before. I'm not going to go too much into this. But we're going to focus on the IBM test. So, the first thing you do is you're going to pip install dash dash upgrade. That way you always have the recent one. And it's IBM hyphen Watson is greater than or equals 5.10. And then to know if that ever changes. And I'll put these links in the description. But you can go to the docs itself. And it has the Python code to the right. And you can use that to know. And they'll just update this as the versions upgrade. Okay? So, once you pip install it, I do recommend pip installing it first. Then go to cloud.ibm.com slash catalog. And I'm already there here. Right? And then scroll all the way down or search the catalog for speech to text. When you do that, it will take you to the page. But then sign up from here. So, free. You get 500 minutes per month of how much you can translate from speech to text. And IBM Watson, I've heard, is great for recording meetings and transcribing meeting minutes. That's where the IBM Watson, maybe not streaming from my microphone. But we can code that if for some reason this was quicker. And so, you'll have to sign up. Just email and password. And you'll need two things, which you'll get when you sign up. And that is an API key and a URL. So, then once you signed up, you have a username, you validated, you have this API key. Then from IBM underscore Watson, which we just installed here, import speech, capital speech, capital two, capital text, capital version one. So, speech to text version one. And then from IBM underscore cloud dot underscore SDK underscore core dot authenticators, import capital IAM authenticator. Then from, this is where a lot of my viewers get confused. This from keys import is my personal file since I have a YouTube video. Right? So, I don't want everybody to have access to my API keys. So, I have a file called keys. It's just a keys dot pi. And then we import IBM API key. If you don't want to do that, just pass in the string and it'll be some super long API key that's unreadable. It'll look similar to this, but this isn't it. This is what's called a service ID. So, then you pass it the IBM key. Okay? So, if you're a new viewer of mine, then please don't type in from keys import IBM API key. Then we're going to import time it. Then we're going to put all of the IBM test into a, into a, I'm sorry, into a function. Then authenticator equals capital IAM authenticator. Then you pass it the key. Then, this is short for speech to text. You could type, you know, whatever variable you want to call it. Equals speech to text version one, which is up here. In parentheses, you're just going to pass it the authenticator equals authenticator, which is what we defined here. Close that parentheses. Ooh, that's ugly. Let's put that there. All right. Ooh. Having that on the lower one hurts my eyes. It makes it look like it's a, what's it called, a C++ program. Sorry about that. Oh, I'm sorry. Your eyes had to be exposed to that. So, then the stt.set underscore service URL. And this is the second thing you'll get when you sign up for the catalog speech to text. Okay. So, you'll get an API and you'll get a service URL. Okay. That one I'm not too concerned if users have access to. Then, with open, you pass it whatever MP3 you want. So, it has to be in your root folder. If it's not, then you just do the full path. So, C, users, whatever. But if it's already in the project path, then you just do untitled.mp3 as a string, comma, rb for read bytes, as f, which is just file. Then, you do a result, or res for short, equals stt.recognize. The audio equals this, this file. The content type equals an audio slash mp3. The model, we're using a US, but there are tons of models that you can use. And then, continuous equals true. I'm not exactly sure what that does. And then, we're going to .get result, which is one of the methods that this library gives you access to. This result is actually a dictionary, which a lot of things. And so, what we want, or a nested dictionary, we want just the text. So, the text equals result, which is up here. And then, we want that in the zeroth key, or the first index of results. Then, we want the zeroth index, or the first result of alternatives. And then, we want the transcript. So, we can see what that looks like by printing just the result first. So, let's check what that looks like. Print res. So, that's this, right? So, this is going to print out both. Okay, that took three seconds, probably because it had to return both. So, what we get is this results, right? And that's a dictionary. And, let me move my face. So, that has this thing called final alternatives, right? So, what we want is results, and then the zeroth index, which gives you access to final and alternatives. Then, we want alternatives, which is right here. We want the zeroth index, which is the very first one. And then, we want the transcript, which is test, test, test. Okay? So, if you don't know about nested dictionaries, then that seems confusing. But, whatever. So, we just printed out the text, which is test, test, test, which is from this transcript. Then, we printed it out. And then, we just timed it. So, same thing, t equals timeit dot uppercase timer. Passed it the lambda function of IBM test. There's no arguments. I encapsulated all of this into just one function. And then, we printed the time, and we only ran it once. So, due to the difficulty of, and there are codes. I've seen them. I was going to include it, but I chose not to. There are codes that allows you to stream your microphone into a separate function, right? It requires some threading. But, the speech recognition still is the easiest, or at least the easier of the two speech recognition sources. So, if you know another one you want me to try, please leave a comment. But, I do think the IBM Watson is probably a lot better for large file transcriptions. Now, if you see here, the free amount is only 500 minutes per month. I think that's kind of great, actually. And then, you have to pay for additional services. So, I hope you enjoyed this video. If you know of a better speech recognition, again, please leave a comment. Please subscribe to my channel if you want to continue watching me build my own digital assistant named Shane. And then, like this video. Alright, thanks for watching. Goodbye, world.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript