Creating a Voice-Activated Chatbot with APIs

Convert Your Audio To Text

4.9/5

3726 customer reviews

Learn to integrate ChatGPT, Whisper API, and voice recognition to build a voice-activated chatbot. Follow the tutorial for coding tips and GitHub resources.

How to Set Up ChatGPT and Whisper API with Voice Recognition for Your AI Assistant

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: What's up? Today I'm going to show you how I use the ChatGPT API, Whisper API, and then the Speech Recognition in order to create a chatbot. Let me go ahead and give you a quick demonstration on that real quick. Hey Vivi.

Speaker 2: I'm listening.

Speaker 1: Can you help me decide what I want to eat for dinner tonight?

Speaker 2: Of course. What type of cuisine are you in the mood for?

Speaker 1: Um, something with meat in it.

Speaker 2: Great. Do you have any particular type of meat in mind? Such as chicken, beef, pork, or something else?

Speaker 1: Uh, let's say pork.

Speaker 2: Sounds good. How about a classic dish like pork chops with mashed potatoes and roasted vegetables? Or something more exotic like stir-fried pork with rice and vegetables?

Speaker 1: Uh, that sounds pretty good. Can you give me a recipe for the first one? And so that's a quick example from my Vivi project that I'm working on right now. The code is in the GitHub that I have in the description. Um, but I'm going to go ahead and show you how you can do that for your own, um, with Voice Assistant and whatnot. So let's go ahead and jump into code. My guess is that if you're following this tutorial, that you have Python installed. And then you also have Visual Studio or some type of IDE installed. As those are the prerequisites for this. So make sure you have those installed. Okay, so what you're going to want to do is create a new file and then name it .py. So I have one called voiceassistant.py. That's what I'm going to be using as the example file. And what I have here is the Vivi class that I've been using to build my chatbot. Um, and I'm going to be using it to help me guide, um, guide us through this process. Because I don't remember exactly everything, um, that I did. So the first thing that we're going to have to do is we're going to need to import some things into the script. In order to do that, you're going to have to do some pip installs. So you're going to have to do pip install openai. You're going to have to do pip install pyttsx3. And then you're going to do pip install speechrecognition. Okay, so all of those pips will be down below in the description. Um, and then what we're going to have to do is, um, import all of those into the script. The first thing that we're going to do though is get a prompt completion from openai. So let's go ahead and start with that first before we integrate voice into it. We're going to go ahead and import openai. And the function that I used for this one was a response completion. But in order to do that, we have to initialize some things first. So let's go ahead and initialize chatgpt. In order to initialize chatgpt, what we have to do is set up some type of key. So we'll just call it key equals, um, this is where you enter in your chatgpt key. And to do that, you have to go to openai and create a API key. Um, I'm going to show you this just as an example. Um, I will be deleting this API key after the video. So don't even try to use it because it won't work. So that is, um, what we're going to input into here. So you can see the key is there. And then, um, we're going to go ahead and set it to openai. API key equals key. So now we have the openai key set to this. And we can, um, use the prompt completion that we're going to be, uh, needing for chatgpt. Okay, so I realized the window was getting a little cramped. So I just moved that over to another screen. Okay, so once we have the API key set, uh, we're going to now need to be able to do a chatgpt completion. So, um, let's just go ahead and start with what I have here. So this is going to be a file, um, that we would have to import a personality into. So let's just go ahead and create a, um, personality variable. Personality variable. And then we're going to set it equal to some text file. So we're going to call this text file, um, personality.txt. And put those inside of the quotes. So, uh, what this is going to do is it's going to take whatever text is in the personality.txt and read it into chatgpt in order for it to respond in. In order for it to respond in whatever fashion you want it to. So, um, let's just go ahead and create that text file. Let's just do you are a helpful assistant. Assistant. Period. And then we'll go ahead and do file save. And then we'll save this as personality.txt. So now we have the prompt. We've got personality.txt. And then we're going to be opening it as, um, the mode. And so this messages list here is pretty key in order to set up the chatgpt assistant. Um, in order for it to respond back to you. And we'll be using more from that, um, as we go on. So, okay. And so the next thing that we have is something called completion. We set it equal to openai.chatcompletion.create. And then we have the parameters that we're going to pass into the completion in order to get our response. So, the model we have is gpt 3.5 turbo. And this is the latest edition, March 1st. Um, the messages is going to be the messages list here. And we're going to set it to the role. Um, we're going to set the system role to what we have inside of the content for personality. And then we have a temperature parameter here, um, for 0.8. And that's basically just the randomness of the response that it's going to give you. And so once we have this, the next thing we have to do is just print out what the response is. And so this can be done by using these lines here. So we're going to be taking in a response for the completion that it sends back. Um, the most recent response with the zero here. And then we're going to append the, um, we're going to append the list of messages. This is, this append is needed in order for it to kind of store the conversation that you've been having. And it's going to store inside of this messages list here. Alright, and so now that we have the response completion and we have the system, what we need to do is have some type of user input. So we're going to create a variable called user input. And we're going to set it equal to, um, just do input. What question do you want to ask? So that's going to take in the user input. And then what we're going to do is append that list once again. So messages.append. And so we're going to do that same thing in order to add this to the list. We're going to do role. And then we're going to set the user. And then we'll have to do content equals user input. So that's going to take in the user input. And then send it over to chat.gpt to, and send it over to the OpenAI API in order to have a completion. And then it's going to read out the response here. So this code should work. So let's go ahead and run it by clicking F5. You can also go to run up here and start debugging. Okay, so it's not finding personality.txt. Probably because I have a typo here. Okay, so there's this weird bug with VS Code that I created the text file, but it wasn't grabbing it from the folder. So you just have to close out of VS Code and then reopen it back up. So now that we have that, I renamed it to p.txt just to make it a little bit easier to make sure I'm not doing typos. And let's go ahead and open up that file again. So you are a helpful assistant. So let's go ahead and run it. F5. And you'll see down below in the console here. What do you want to ask? What were your instructions? And it's going to say, I'm an AI language writer, blah, blah, blah. So yeah, as you can see, it gave back a completion. Let's go ahead and ask something else. What is the first president of the USA? Okay, George Washington. Okay, cool. So that is how you can get a text completion from chat GPT. But as you can see, that's not really useful because you're answering only one at a time. So what you have to do is you have to nest this inside of a while loop in order for it to continue going on. So let me just show you how you can do that real quick. So the key lines that we need for the for the bot is going to be these ones here. The other ones are just initializing things. So we're going to go ahead and put these inside of a while loop so that we can keep asking questions. So let's go ahead. Set while true. It's going to just keep going forever and ever and ever until you stop. So let's go ahead and demonstrate that real quick. How are you doing today? Okay, it gives me a response. How can I assist you today? I am feeling hungry. Man, it's all about food today for me. I don't know why. So, yeah, as you can see, it's doing the chat GPT stuff and it's just giving the response back. So we're going to go ahead and end it here. So this is it. This is going to be your first chat bot with the open API key. And if you just did this, this is chat GPT at its core. So what we're going to do now is now that we've got the chat GPT completion working, we're going to integrate it with some speech recognition so that we can actually talk to it. So what we're going to need to do is import a couple more things. Import speech recognition as SR and then import pytsx3. Pytext to speech. And we're going to need to initialize a couple of more things. So let's go ahead and go down a little bit more. We need to initialize the text speech engine. So just go ahead, call it engine. We're going to do pytsx3.init. So initialize the engine for it. And then we're going to need to select a voice. So voices equals engine.getproperty. And then inside here we're going to do voices. So that's just going to get all the voices that we have installed onto our Windows system. And then we're going to do engine.setproperty. Voice. Voices. Okay, and so what we did here was set the voice that we wanted to. Zero is going to be for male and then one is going to be for female. The two voices on the system. So once you've initialized the text to speech, the next thing that we have to do is initialize the microphone. So r equals speech recognition recognizer. Oops, recognizer. And then mic equals sr.microphone. And then this device index is going to decide what microphone you're going to be using for the system. We're going to set it to zero as that is your default. A couple of additional things that I did set for my Vivi project was I did dynamic energy threshold equals false. And then I did energy threshold equals 400. So the reason I set this to false is that it kept listening and never ended. And that's because it was automatically adjusting the threshold to my ambient noise. And it might have adjusted it too high to where it never stopped listening. So that's why I set it to false so that it just sets it at the beginning and then the energy threshold to 400. Okay, so we've got the microphone set up. We've got the speech to text engine set up. We're going to do this inside of the while loop so that we can continually talk to it. But what we're going to need to do is we're going to need to listen to the voice. So to do that, we have to do with mic as let's call it source. We're going to print out something that says listening or... We're going to print out something that says listening so that we know that it is listening. Okay, and then we're going to do the recognizer adjust for ambient noise to do source and then the duration. And then duration we just set at 0.5. And I believe what the duration for this is is how long it's adjusting for ambient noise. So I just set it to the very minimum that it recommends. And then what we want to do here is we can do a try accept so that it only passes if you actually spoke into it. So if you didn't do any speaking, it's going to fail and go into the accept. And then it'll just keep looping until it actually registers that you said something. So let's go ahead and set a new variable. Audio equals our listen source. And then we're going to do an accept. We're going to continue. Okay, and I actually moved the audio outside of the try. So it's going to try for the audio and then it's going to use... Well, first it's going to listen and it's going to store that audio data into audio. And then it's going to try to use the recognize Google to see if there's any words spoken. If not, it's going to go to the accept and then it's going to loop back and skip all of this and then listen again for a valid response. So to give a demonstration, let's just go ahead and run. And I'm just going to blow into the mic when it says listening. And as you can see, it said an empty list here and now it's going to register what I'm saying. And then it's going to go ahead and pass and then it's going to go ahead and head on over to this next one. So what we're going to need to have here now is just pass in what we said over into the messages directly. So we don't have to do anything other than just delete this input line here. And what it's going to do is recognize the audio. Store it inside of user input and then send that over to chat GPT as the user input for this messages here. So and so let's just go ahead and try that. So go ahead, click run, start debugging. Hello there. I'm just testing something out. So it said, hello, let me let me know if you need any assistance. Can you give me something to eat for dinner tonight? Something that is savory says certainly. What are your dietary restrictions or preferences? And so here gave me an entire list. So let's just go ahead and stop it. So as you can see, it can now listen to what I'm saying and respond back to it. And then the last thing that we have to do is have it just read out the response. So this is what we're going to use that PYTTSX thing for just reading it out. So the last thing we've got to do is generate a voice. So actually, we want to have it after the response. What we're going to do here is do engine say, and then we're going to insert an F string to insert that response inside of here. So, man, I'm using single quotes and double quotes. It's probably bad code, but let's just go ahead and do engine run and wait. And let's just go ahead, do that again. Hello there. How are you doing today?

Speaker 2: Hello. As an AI language model, I don't have feelings, but I'm here to assist you. How can I help you today?

Speaker 1: And as you can see, it responds back now with that voice that we just gave it. So that is basically it to get a chat bot up and running with voice and speech recognition. The next thing that I'm going to show is it's not completely necessary, but how you can use whisper instead of speech recognition to get this done and then how you can save the conversation. OK, so we're going to go ahead and get rid of this user input. We won't be needing it anymore. What we're going to end up using is whisper. And to do that, we're going to have to first save the audio file as a save the audio file as a way file. And then we're going to go ahead and read that into the whisper API. So what we have to do is save that audio file as a way file. So we're going to go ahead and just save it as speech that way. And there is probably there's probably a way to do it with temporary files, but I haven't figured out how. And so that's one thing that I'm looking into. But we'll do, quote, quote, quote, WB quote. And then we'll go ahead and do as F and then we'll do F right audio. We're going to get get wave data from the from speech recognition. And then and so this is going to write it to a way file. Then what we're going to do is we're going to write it into a new variable called speech. Let's go ahead, open it to speech that wave. And then we're just going to do it as read bytes. And then the model that we're going to be using for this is going to be whispers. So I make sure I didn't use model ID for I use model down here. So we're going to go ahead and do model ID. Actually, we don't need to do that. So what we're going to do, let's just do completion. And since we already have completion down there, we'll do W completion for whisper completion. So W completion equals open a I dot. Audio that transcribe. And then we're going to set in the model. So model ID model equals. Let's do whisper. I believe the model name is Whisper one dash one. Dash one. Then. We're going to feed in that way file with file and we'll do speech as that's what we have the variable up there for. OK, and so that's all you need to do in order for it to transcribe this. And then we're going to need to take the response. So in this case, we're going to name it as user input equals W completion. Text. Act. And so this will allow us to use the whisper API to complete the the text, the speech to text part of it. So with this, we should be able to go ahead and run it. We already set the API key up here, so that's why we don't have to set it again for the whisper API. And so at five, we're going to run it. Hey there, welcome to the video.

Speaker 2: Thank you. How can I assist you with the video?

Speaker 1: OK, so as you can see, it replied back, but I actually didn't get I want to see what what it's registering. So I'm going to go ahead and print the user input here. So let's go ahead. We run that. Hi there. How are you doing today?

Speaker 2: It's going to register what I said here as an AI language model. I don't have feelings, but I'm functioning properly and ready to assist you. How can I help you today?

Speaker 1: And so as you can see, Whisper is now officially working. So what I like to do, what I did in my in my chat box is I made this into a function and then just called it. So as you can see, this is a lot of code to just put in here, whereas the other line was just one line of code. So an easy way to make this into a function is let's just go ahead and go up here and define it as a function. So we'll do deaf. Let's do whisper. And so what we end up passing is an audio variable. And then all we have to do is paste all of this in here. Let's go ahead. Make sure it's tabbed correctly. And so now instead, we just need to return user input and we should be able to just simply call Whisper and passing that audio variable. And it should do the same thing. So let's go ahead and user input equals whisper audio. OK, cool. Let's go ahead. We run it. Hello there. How are you doing today?

Speaker 2: As an AI language model, I don't have emotions, but I am functioning well. Thank you for asking. How can I assist you today?

Speaker 1: See, there you go. And then now you've got Whisper running as a function. So this is just a little bit more. This makes your main loop a little bit more readable, in my opinion. And that way, if you want to do it with the other one, you can always have an if loop in here to select Whisper. So if you just go back and do user input equals r.recognizeGoogleAudio. You could do something like this. If Whisper. If use Whisper, then user input else. You can use Google. So then all you have to do is just set a use Whisper variable up here somewhere. We'll do false. And this will skip it and then go to the recognize Google. So let's go ahead and show you that one. F5. Hello there. This is Google's. As you can see, this is Google's here, and it's not using Whisper. But if I go ahead and set this now to true. Hello there. How are you doing today?

Speaker 2: Hello. As an AI language model, it's going to go ahead and use Whisper.

Speaker 1: As you can see, it didn't have all of these different dictionaries here. So that's just an easy way how you can switch back and forth between using Whisper and then the Google one. This is if you want to perhaps save money and you just want to use Google. But I find Whisper is a little bit more accurate and still pretty cheap. Six point zero six cents. Three fifths of a cent in order to use Whisper for like a minute of audio. So that's pretty cheap in my opinion. And yeah, you just have to have some type of variable and and you can switch between them. So the last thing that we're going to do is how do you save the responses into a into a folder? Or how do you save the conversations for later? So to do that, I'm just going to go ahead and import some functions that I that I used for my chatbot just to make it just to speed it up a little bit and show you what I did. OK, so the two functions that I borrowed from my Vivi project was save conversation and then save in progress. So to use these, what we're going to have to do is import OS and then we're going to have to import JSON. So that'll get rid of the little squigglies that we saw. And so let me just go ahead and detail what these two functions do. So this first one is a save conversation function. This is going to be the initial save for the conversation so that it doesn't overwrite any other folders. So what it's basically going to do is go through the directory where you have your folder stored and check to see if the path exists. If not, it's going to go ahead and create a new a new text file that you can go ahead and start writing to. So this one is going to return the suffix number and that suffix number needs to get fed into save in progress where it takes in the suffix count from the save conversation function in order to know which file you're writing to. So that's just the way that I separated it into. You could probably have it done in one, but I found it a little bit easier with two in order to call them from two different locations in the script. So let's go ahead and use them. The first thing that we can do is we can we can call save conversation. So so in order to utilize it, we got to do suffix equals save conversation and we're going to pass it a save folder name string. We're going to take it takes in a path to save the conversation to. So preferably we would want to set that path. And so the easiest way to do that would be to just grab the current script location. So to do that, we're going to go ahead and use this line here. So I'm going to grab script location. I'm going to use this. We're going to use this line right here to grab the script location. This is something that I just found online and it just grabs the script directory. So what you can now do is specify a save folder location. So we're going to pass in something save folder name. Save folder name. So let's go ahead and specify a folder name. Let's just say this one is voice assistant. And then we're going to create one called save underscore folder name equals OS path that join where we can now put in the script directory. And then we can do we're going to save it into a folder called conversations. And then we're going to do we're going to make this an F string so that we can put that into it. So folder name. Cool. And then once we have this long thing out of the way, this is where we're going to save it to. We're going to go ahead and put save folder name into save conversation. And so that's going to I'll show you what it does after we run the script. And then what we're going to want to do is at the end of here, I prefer to have it right after the response is printed. We're going to call save in progress and then we're going to pass in that save folder name variable here. So. Oh, yeah. And then we're going to also need suffix. So we're going to do. Yeah, we're going to pass in a suffix and then save folder name. And then I'll go ahead and show you why I did it that way and what it ends up doing. So the easiest way to do that is let's start a conversation up.

Speaker 2: Hey there. How are you doing today? As an AI language model, I do not have emotions, but I am functioning perfectly fine.

Speaker 1: How can I assist you today? Would you be able to tell me a funny story? Something that is short and under 50 characters.

Speaker 2: Why don't scientists trust atoms? Because they make up everything.

Speaker 1: Can you tell me another one that is less corny? Sure. Here's another one.

Speaker 2: Why did the tomato turn red? Because it saw the salad dressing.

Speaker 1: Oh, my God. OK, so, yeah, you get the point. And then let's go. Let me go ahead and show you what it does. So. So inside of the folder, what we have here is a conversations folder and inside we have that folder name. So as you saw, we named the folder voice assistant. So inside of here, we have voice assistant and then a conversation where we can now see the conversation. So here we have that messages list that we had. We have the system and then you've got the user, the assistant. And this is basically the entire conversation, all the receipts of the conversation. And and you can store all of this locally so that you never, ever lose it. So that is how you can now incorporate Whisper and then save all of those conversations to a folder with the text and have everything inside of it. And that is going to be pretty much it. I know we ended up going through a lot of lines of code here. And of course, if you do everything in just one script, it's going to get a little bit messy because you have to do all the initial initialization. And then you have to define some functions and then call them inside of the loops. But if you do it inside of a class or you do more functions, you can get it looking pretty concise. So this is and that's why this assistance Python script, you only have 41 lines here and then literally just five lines that you have to adjust. But this is everything that I went through in order to get my voice assistant up and running. I didn't show you 11 labs, but let me know down below in the comments if you want me to show you how you can do the 11 labs part. It's part of my Vivi project if you want to go take a look at that. But that is going to be the end of today's little tutorial for getting the chat GPT set up with speech recognition and Whisper. So everything I did today is going to be on the GitHub page. So go ahead, check the description if you just want to see the code and let me know if you want me to do anything else. I will be working on my Vivi project and there are some other things that I'm going to be focusing on. So just let me know what you want to see and I will be sharing my progress for little projects that I'm working on as well. So see you later. And yeah, let me know what you think in the comments below.