Effortless Audio Transcription and GitHub Backup Guide

Convert Your Audio To Text

4.9/5

3723 customer reviews

Learn to record, transcribe audio in terminal, and save to GitHub. Enhance journaling with Whisper AI, ffmpeg, and Python setup. Happy coding!

OpenAI Whisper and Python Easy Speech to Text

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Whisper AI. This is a free bit of technology that allows you to transcribe audio to text, and it does a really, really good job. Today, I'm going to show you how to record audio in your terminal, then transcribe it automatically, and finally push it up to GitHub to save it in an offsite location. You ready? Let's jump in. The first step is recording. Of course, you can use any recording program on your machine, but let's be extra and record in the terminal. We'll use a library called ffmpeg, the all-in-one tool for working with audio and video. We'll do this in two stages, installation and usage. While you can install it in many ways, the easiest is using a package manager like Homebrew for macOS or Chocolaty for Windows. If you don't have Homebrew or Chocolaty, follow the install commands and then type brew install ffmpeg or choco install ffmpeg to add a library. Whisper AI will need ffmpeg too, so we'll need to install it anyway. Next, with ffmpeg installed, you'll need to see which recording devices are available to you. ffmpeg uses local tools on each different software platform, so you'll need to make sure you use the right command for your machine. In my case, I'll be using macOS, but you can find Windows and Linux commands in the attached post. To keep organized, I'll cd into my journal folder on my computer. Next, I'll locate all my input devices using this command. ffmpeg should return to you a list of all your video and audio inputs. Note the name given to the audio input you want to use. Then use this command to start recording your audio and output it in an output.mp3 file. Remember, for Windows and Linux users, this will change slightly and again you can see the blog post linked in the description. To stop the recording, stop the output with Ctrl-C. The output will be saved at your present working directory. To automatically stop the recording after a certain time, use the T flag. As a sample, let's set it for 5 seconds. To make the script more flexible, I'll update the name to reflect today's date and add a full path to the output. The date command provides the date and time. I'd prefer to sort my outputs by year, month, and day, so I'll provide this syntax. To run this mid-command, we'll need to wrap it in parentheses and add a dollar sign in the front. Finally, I'll add the full file path so I can run this from any directory in my machine and know it'll always land in my journal folder. That's it, so let's capture our recording. Alright, so you've got an audio recording, now it's time to transcribe it, and let's do this again in two phases, installation and usage. First, we'll be using a Python library, so you'll need Python installed and configured. Ensure you have Python 3.8 to 3.11 on your machine. You can check your version of Python with this command. If you don't have it, or you don't have 3.8 through 3.11, head to python.org and download the latest 3.11 version. Once it downloads, install Python like any other program on your machine. If you're on Windows, make sure you check addpython.exe to path during the installation process. On macOS, you'll also need to install the security commands for Python. Finders should open automatically after installation, showing the files associated with Python on your machine. Otherwise, you can locate the version of Python in your applications folder. To run the install command, you'll need to open the install certificate command using your terminal to allow secure network requests. Since I'm using warp, I can simply drag the file over to warp and it will automatically input that exact file path. Then I'll press return to run the command. You'll also need pip, the package manager for Python, as we'll be using this to install WhisperAI. It should come installed with Python, but you can double check with this command. If you don't have it, use this command to install and upgrade it. Second, we need to install WhisperAI. The WhisperAI docs give you this command, which will not only install Whisper, but also any dependencies it needs to run. So, how do you use WhisperAI? It's super simple. I'm in my journal directory, so let's confirm the audio file exists. For Whisper at its most basic, provide the whisper command and an input file. Note that Whisper is detecting the language and making a few adjustments based on my machine. In the end, it automatically saves a bunch of different files in the journal directory mimicking the audio file name. Let's customize it a bit. First, I'll add my settings, so Whisper doesn't need to auto-detect anything. In my case, this includes adding English as the language and setting the perform inference 16 flag to false since my machine doesn't support that. Next, I can tell Whisper to output only certain types. In this case, I only want a text file, which I can set with the output format flag. Whisper offers several different levels of transcription quality. By default, it uses the small model, but you can get slightly better, although slower, results with the medium model. And since I'll only personally transcribe English, I'll limit it to the medium.en model for a quicker experience. As one final improvement, I'll provide the full path name to my audio file, including the reference to today's date since we'll be running these scripts back-to-back. So, I can once again run this from any directory on my machine. The first time you use any model, it'll need to download, which can take some time. Alright, so now that this is done, let's stitch everything all together. Our script will start with the audio recording command. Directly following that, let's add the whisper command. Finally, since I won't personally want my audio file afterwards, I'll remove it with the rm command, being sure to provide the full path to the audio file. This works, but let's clean it up a bit since you'll see we'll use the audio file path three times. I'll extract that to its own variable and reference it in each command. Now, I don't know about you, but I don't want to type or paste this every time I want to record a daily journal entry. In Warp, you can save custom workflows. Instead of manually typing out a title and description, I'll tell Warp AI to autofill these fields. It'll not only add a title and description, but will automatically detect variables, input my defaults, and make them tab stops so it can quickly alter them in the future. Now, while we're being extra, here are a few other improvements we can add. As an example, you could push these files to a remote repository automatically to save your journal online. You can do this in a few ways, but I'll go to GitHub and create a new private repository so only I can see these journal entries, unless you want your friends reading your innermost thoughts. We haven't yet created a local repository, so I'll copy this first suggestion, which initializes a local repository and pushes it online. Back in my terminal, I'll navigate to my journal directory and paste in the command. I don't need to add a readme, so I'll delete that line. Next, let's change the git add command to add all files in my directory, in this case, my first journal transcription entry. Then, let's run the command. Just like that, my journal is backed up on GitHub. Now, just to make sure I don't accidentally add any audio files, I'll create a gitignore file and add both mp3s and ds-store since I'm on Mac. This will prevent adding any audio files or any macOS ds-store files to my repo. Now, on macOS, there's one little extra you could do, and that is you could show a notification using AppleScript. Using this osascript command, you can display a notification like Transcription Complete with the title Whisper AI. All right, all that's left is a little stitching together. I'll open my warp drive and replace my previous workflow with this new one. Now that I'm committing files to a local repo, however, I'll need to first cd into the journal directory and then record audio, transcribe it, delete the audio file, display a local notification, and then finally commit and push the changes up to GitHub. Each day when I want to journal, I'll press command p, search for my workflow, and press enter. I'll press enter again, talk, and watch the magic happen. And it really does feel like magic when you start talking, it automatically starts transcribing it, and your text just starts flowing onto the screen. Now, once again, there's an attached blog post with all the commands you'll need, including this final full command. Just copy and paste it, update the directory information to match your computer, and then add it to your workflows. Of course, you'll need FFmpeg, Python, and Whisper AI installed, but if you followed this far, hopefully you're set up and ready to go. Windows and Linux users, again, remember that you can grab comparable commands in the blog post linked below to get you started. Now that you can journal from the terminal, rubber duck those problems to your heart's content. It's just you, artificial intelligence, and the terminal. What a team. If you're interested more in using your terminal and learning how to build cool things like this, make sure you subscribe so you don't miss any more content. I'll catch you in the next one. Thanks for watching. Happy coding.