Build a Voice Assistant with Raspberry Pi & Python
Learn real-time speech recognition on Raspberry Pi using Python. Turn your setup into a functional voice assistant with basic hardware requirements.
File
Build a Speech Recognition System on a Raspberry Pi
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hi everyone, I'm Patrick from Assembly AI, and today you'll learn how you can do real-time speech recognition on a Raspberry Pi and build a voice assistant. All you need is a Raspberry Pi, so in my case it's the Pi 4, and of course a microphone, so in my case it's a simple USB mic, and that's all to get started. So let's test it. Hey Sam, send a tweet. Python is awesome. Hey Sam, save a note. Do bodyweight workout later today. All right, let's do the unboxing first. In my case, I got a fancy starter kit with the Raspberry Pi 4, a case with an integrated fan, heatsinks, and all the cables. Of course, all of this is not necessary, and the basic version gets the job done as well, but hey, the company I'm working for recently raised $28 million in Series A funding, so I can at least spend 100 bucks on a Raspberry Pi, right? Now let's set up the Pi, which came with Raspberry Pi OS pre-installed. The smartest thing to do with it, obviously, is to install Python first, so let's update and upgrade all packages first. Now we can install Python 3 with sudo apt install python3. As a beginner-friendly IDE, I recommend VS Code, which you can install with sudo apt install code. When this is done, we can open it by typing code. So this opens the editor in the current directory. Then you should install the official Python extension, and one tip for a smoother experience on the Pi is to open preferences, configure runtime arguments, and then set disable hardware acceleration to true. The next step is to install Pi Audio, so we have to install port audio first, and don't worry, you find all the installation steps and commands and all the code in our GitHub repo that I linked below. I recommend to create a virtual environment like so, and then activate it, and then we can simply say pip install pi-audio, and since we are working with an API and WebSockets, I additionally install the other dependencies, requests, and WebSockets right away. Next, let's configure the USB microphone. You can type lsusb to list all devices, and ideally, this should list your plugged-in microphone. In my case, it's the Blue Yeti Nano. You can also type arecord-l to list all hardware devices, and again, you should see your USB device and also the corresponding card number. We can also use the following commands to list the cards and modules, and again, this verifies that the USB mic is detected and working with card two. We want to make card number two the default, so open the alsa.conf file with an editor, scroll down to the default section, and make card two the default in these two lines. Now save the file and reboot the system. After the reboot, you can try to record something with the arecord command. This will record for five seconds and dumps it to a WAV file. You can then use the aplay command, and this should play the recorded file, and now we can finally jump to the code. So let's open VS Code again. I prepared one short file that simply imports pi-audio and lists all devices. In the list, you should find the connected USB mic. Now let's set up the real-time speech recognition. For this, we use Assembly AI and use the real-time API that works with WebSockets. You can grab a free API token using the link in the description below. We have a detailed video and blog post that walks you through the script, and I will link the resources below, but basically, we just need two async functions that work with WebSockets, one to send the audio data from the microphone and one to receive the transcript from Assembly AI, and then we can do whatever we want with it. In this simple example, we just print it to the console. Now you can run the file and say something, and then you should see the output in the console. Congratulations, you did it. Now let's look at two more steps we can do to improve this project. The first next improvement is wake word detection because, of course, we don't want to call the speech recognition API every time we say something, so the Raspberry Pi should be able to constantly listen to the microphone in the background, and only when a certain wake word is detected, it should become active, and this should ideally happen offline. So wake word detection is a whole topic on its own. If you want to use a pre-built solution, I recommend checking out these projects you see here. Raven, Porcupine, and Snowboy are all great projects you can test, for example. If you're interested in this, or if you have too much time, an awesome deep learning project is to build and train a wake word detection model yourself. The best open source project I found is this one by Michael, the AI hacker. He provides a great video tutorial, the code on GitHub, and step-by-step instructions how to do this. Seriously, you need to check this out. Now, what should we do with all of this? We can use it for a stupid, I mean, ingenious use case, and let the Raspberry Pi respond with Arnold Schwarzenegger quotes like Michael does it. Hey, Wally. Who are you? Hey, Wally. You idiot. Nice. Or we can use it as a virtual assistant to do certain tasks, like sending out tweets or taking notes. For this, we need to do some kind of intent classification, so we need to detect what the speaker wants to do. Here, we could also build a complex deep learning model for this, but honestly, you can simply search in the strings for different predefined and hard-coded intent words and use if-else statements that then trigger all the different kind of actions. If you want to see the code for sending out tweets and taking notes, I have two more videos for you that you can check out. All right, then that's it. If you manage to build your own virtual assistant, share your project in the comments below. I hope you have fun with your own Raspberry Pi, and then I hope to see you in the next video. Bye.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript