Creating Python Speech-to-Text Application

Convert Your Audio To Text

4.9/5

3720 customer reviews

Learn how to develop Python code for converting speech to text using audio input and file output in a continuous loop.

Creating a Speech to Text Program with Python

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hello everyone, my name is Oscar and welcome to my channel. In this video, I am going to show you how to create Python code that will be able to hear you and then convert your speech to text. In order to do this, we will break this task up into two basic steps. The first step is to have our Python program listen for audio input. Specifically, audio input that can be perceived as speech. The second step is to have our Python program output that speech into text. And once we have those two steps done, we can then put those two steps in a while loop so that the program can convert speech to text indefinitely. To get us started, I created this file. It has the basic structure that we need to write this program. It has two functions which will correspond to the two steps that were outlined in the slide that we just saw. The first function, record text, will allow the Python program to use our computer's microphone to listen for audio input. Then, if it is able, it will take the audio input and turn it into a string. The second function, output text, will allow the Python program to take the string produced from the previous function and output it to a text file, which would complete our speech to text transition. And finally, down here we have a while loop, which simply calls these two functions on repeat, allowing the program to convert from speech to text indefinitely. And every time it does, it will print something telling us so. Now, all we have to do is write the function definitions for these two functions. I'll start with the harder one, which is record text. For this function, we are going to need to import these two libraries here. And they're needed so that the Python program can access our microphone. In order to be able to use these libraries, you'll have to install them on your computer. And you can do that by following documentation on these libraries, which I've linked down below in the description. I've already installed these prior to recording this video. I personally had to run these three commands here to get it to work. I have a Mac OS, but other operating systems might need to use different commands. And the documentation link below should be able to clear up which commands you would need to run. Now, once you have these libraries ready, you will then have to initialize the recognizer. And this is the Python object that is used to interact with your microphone. And after that, you'll have everything that you need to start writing the function. To start with, we will have the entire function exist in one big while loop. We're going to do this because in the event that the recognizer receives audio that it cannot convert into text, we need it to try again. This can happen if the audio is just impossible to understand. Now, inside of this while loop, we will also need to have a try except so that we can try again if the audio cannot be converted into text for whatever reason. Now, in the except section, we'll check for these two exceptions. And this will cover the case where the audio cannot be understood. Inside of the try, the first thing that we do is use the microphone as a source of input. Then we need to prepare the recognizer to receive input. We do that with the adjust for ambient noise function, which, like it sounds, prepares the recognizer for ambient noise. After that, we can finally use the recognizer's listen function to finally get the audio input from the microphone. Once we have it, this input is then passed over back to the recognizer, which will try to use Google to convert that audio into a string. If it is able, it will store the text into the string. And if not, it will throw an error, which would just bring us back to the top of the loop and try to get more audio again. Assuming, however, that this does work and that we get the text and it's stored in my text, we are finally able to just return the text that we got and finish with this function. This now brings us to the easier second function, output text. For this function, we don't need to import anything as Python's basic packages will allow us to do what we need. The first thing we do is open a text file by using the open function. We'll call the file output.txt, and this function will allow us to access the output.txt file that exists in the directory that you're running this program. If no file with that name exists in this directory, then it will just make a new one for you. The second parameter here, the A, will be used to indicate that we want to append the text to the end of the file. This way, anything that is written to the file is just added to the end of it rather than just overwriting the whole file to put whatever text we have. Once that is done, our access to the file will be stored in the F variable. Then, we can use the write function to append our text to the end of the file. After that, we want to write a new line character so that our text is separated by a new line. That way, all of what is outputted isn't just bunched up on one line if we say multiple things. Then, we need to close our access to the file. With that, the output text function is done, and we are ready to test this. I'll do that by using this terminal. First, I'll create the output.txt file by using the touch command. Then, I will use the tail command to output whatever is written to that file. Currently, it is empty, and the hope is that after I start the Python script, whatever I say should be outputted into the text file. So, let's run this. I hope this works. Awesome. That works. I was not expecting it to, but it looks like it did, which makes me happy. There you have it. You now have Python code that is able to convert speech to text. You can use something like this in a larger application where your primary source of user input is audio, something like an Alexa type app. I hope you found this helpful, and if you have any questions or if you wanted me to expand on this more in any way, please let me know down below. I'll do my best to respond to you, and also, if you got something out of this, or if you like content like this, I'd really appreciate it if you could like and subscribe. My goal is to get you content like this at least every week or two, and hopefully that happens. Again, my name is Oscar. Thanks for watching, and until next time.