Effortless Transcription for Voice Memos to Workflows
Learn how to transform your voice memos into actionable transcripts for LLMs, streamlining your workflow with push-button ease using iOS, Shortcuts, and DeepGram.
File
Issue 001 Transcribe Voice Memos with DeepGram AI Push Button Nation Tutorial
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: How often have you wished it was a push-button simple process to take your thoughts and ideas from your voice memos and have them quickly and accurately transcribed so you can use them inside of large language models like Chachubiti or Claude to turn them into real work product? If you're like me, all the time. In fact, I wanted transcripts so bad that before I learned to do this, I would use other more complex or costly apps because the transcript is just so much more useful to me than raw audio, no matter what I'm trying to accomplish. Now, whether that's using the transcript in an LLM, building a scannable and searchable database of my knowledge, or turning those raw thoughts and ideas into content for social media or actions for my team. Here's a quick demo of this push-button process in action. You'll see here I have a 10-minute audio that's just me brain-dumping ideas about this push-button nation's newsletter. As an audio, all of my thoughts and ideas are trapped since audio is difficult to scan, search, or take any actions on. So what I might have done previously is to save it to files or airdrop it to my Mac so I could upload the file to Descript or Rev or another transcription service, then wait for that transcript to process, then save the transcript to a note or some other document, and then do work with it. It's quite the ordeal. Now, with this push-button process, I can simply tap the share sheet icon, then tap the transcribe voice memos shortcut. And in roughly 30 seconds, I'll get a human-readable transcript back that can be passed on to other apps easily for further processing. In this demo, we're going to pass it onto my collections database on my iPhone where the transcript will have a title and a detailed synopsis written. I'll show you how to build this second stage in our next issue of the push-button nation's newsletter. For now, let's go over the data, automations, and intelligence needed to make this push-button process a reality. First up is the data. This one is simple. We just need the raw thoughts and ideas captured by you, or in this case me, in the voice memos app. Second is the automation app. We're using Apple's shortcuts for iOS to manage the whole process. This keeps the whole process local, fast, and mostly free. Third is the intelligence app. We're using DeepGram's Nova 2 API, which is the fastest and most accurate speech-to-text model currently available. It's so fast, in fact, that the first time I used it, I gasped at how quickly it transcribed an hour-long podcast interview. Plus, it was more accurate than anything I'd used previously. Okay, now that you know what we're using, let's dive deeper into each one individually, starting with our data. This is the voice memos app. It's free with iOS, though you may have to download it from the App Store if you've deleted it. I like to think of this as a quick and simple way to just think out loud. Anytime you have a thought, an idea, feedback for a team member, software provider, vendor, client, or anything really, just pull up voice memos, hit record, and start talking. You don't need to be structured with your thoughts. You don't need to care about what your eventual outcome is. Nothing. Just brain dump with the record button on. With the LLMs we now have access to, you don't need to worry about structuring your data at this point. Just capture it. Just talk. Let your creativity run wild and your thoughts flow freely like vines through a garden. Even if you're working on something simple, like composing an email, you can turn your rambling thoughts into that later on in our process. When you've run out of things to say on your topic, hit that stop record button. Now you have your raw data. And on to the next stage. Next, we need to process that raw data into something that we can take action with. For that, we turn to our on-device automation platform, Shortcuts for iOS. So, let's pop into the app and see what we've built here. First up, we have our trigger, which is receiving media input from the share sheet. That basically means that we have created a share sheet action and told it only to accept media files. If there's no media file, it just stops and says, hey, we didn't find any audio. Pretty simple for a trigger. Next are our actions. The first action is to save the file to Dropbox. The reason we're doing this is that the intelligence we're using later, DeepGram, requires a direct access URL to the media file in order to transcribe it. And Voice Memos doesn't provide a public URL, even though your files are most likely already stored in the cloud. So, Dropbox to the rescue here. Our next action is to get the link to the saved file. What this action does is it gets the public share URL for the file we just uploaded to Dropbox. This link is the link you might give a person in an email so they can view the file online and possibly download it. It's still not a direct access link like the one we need for DeepGram. So, that takes us to our next set of actions, which is to reformat the Dropbox URL into that direct access URL. Here's a quick secret. I didn't know how to reformat this URL. I asked chadcpt to help me figure out what I needed to change. So, we have two replaced ext steps that alter the past in URL. The first changes the Dropbox.com portion to the download link. The second changes the download parameter from 0 to 1, meaning true, which will make the file directly downloadable from that link. Boom, ready for DeepGram. And that's our next step. This is where all the magic happens. It looks simple on the surface, but there's a lot going on here. First, this is the get content from URL action. It allows us to use webhooks from our device. And a webhook is just a way to pass data from one system to another. So, here we have five data pieces that make it work. The URL itself, which has several parameters that tell DeepGram what we want it to do with our media file. Our method, which is post, that just means we're passing data to DeepGram. Our first header, which is the authorization. This is where you put your DeepGram API key. The content type header. This just needs to be set as application JSON. And I still don't know what that means. But, you know, chadcpt, the rescue again. The request body set to JSON and passing a key value pair where the key is URL and the value is the updated URL that we just made. What all of this will do is log into DeepGram with your API key, give DeepGram the media file from Dropbox, and DeepGram will process it according to the query parameters that are in the URL. Again, this whole setup was done with chadcpt and I just copy and pasted the values here to get what I needed. The next part here is where it gets a little tough if you're building it yourself. DeepGram responds with a specific response format in JSON. What we're doing here in shortcuts is diving into that JSON structure to find the transcript key value pair. So, we can set the transcript as a variable that we can pass on later in our automation. So, what you see here is that we're grabbing the whole response, getting the value for results, then going the next level deep into channels, then getting the first item in channels, then grabbing the value for alternatives, then getting the first item in there, then grabbing the value for transcripts, then finally setting the transcript value as a variable. And again, I didn't know how to do this when I built the automation. I just took the response from DeepGram, gave it to chadcpt, and had it tell me which steps I needed to add to shortcuts so I could extract the data that I wanted. At this point, our automation is done. The next step is just here so I could show the transcript result in the demo live. I remove that step when I'm running it locally. The last step is where the fun begins because now we have a transcript that we can work with. You have tons of options here from creating a note with your transcripts, to emailing it, to sending it directly to chadcpt to talk about, or you can do what I'm doing which is passing it to a second workflow where I process the transcript with AI and have it give me a title and a detailed synopsis and store all of that in an actionable knowledge base of all of my thoughts. I'll be going over that workflow in our next issue. And that's it. Voice memos to capture our data, shortcuts to handle our automation, DeepGram providing the intelligence, and chadcpt with the assist providing intelligence on the automation build. Now let's go back into voice memos real quick and show it off one last time by processing that demo recording we made earlier in this video. So we'll open the voice memo, open the share sheet, hit our shortcut, wait just a few seconds, and boom. There's our transcript. Let's pass it on to our next workflow to get a preview of what's going to be in our next issue. There you go. Now we have a scannable title, a detailed synopsis, and the body of our transcript in a database. And that's just one of the many things you can do with your transcript now that you have it. Check out our Pushbutton Nation X Plus subscription for advanced tutorials on other things you might do from this point. If you're watching this on YouTube, please subscribe. That helps a lot. And if you're not already a member, go to pushbuttonnation.com to get access to all of our Pushbutton process build outs. Our paid members get full build guides and even install templates to get up and running quickly. See you in the next one, Pushbutton Nation.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript