Which AI Dictation App Wins: WhisperFlow vs Others (Full Transcript)

A hands-on comparison of WhisperFlow, VoiceType, and SuperWhisper across email, chat, coding, pricing, offline use, privacy, and Windows/Mac reliability.

Download Transcript (DOCX)

Speakers

Add new speaker

[00:00:00] Speaker 1: AI voice dictation has doubled my productivity because the big strength over older speech-to-text solutions is that AI can understand the context of what you're saying instead of translating every word verbatim and this makes it much more precise. It can even correct mistakes automatically and remove ums and ahs and stutters. And what's more, it can even understand what app you're currently using and adopt the formatting and spelling accordingly. Overall, AI voice dictation is a huge productivity gain that I don't wanna miss anymore. But there are many different alternatives and they all have pros and cons. And if you pick the wrong one, you might waste a lot of time and even money. In this video, I will compare three of the most popular AI voice dictation apps for Windows and Mac. WhisperFlow, SuperWhisper and VoiceType. And I've tried all of them for weeks, I know their pros and cons in and out and in this video, we will try them out in different scenarios, for example, in chat apps, writing emails and even bytecoding. We will compare their speed, accuracy, whether or not they work offline and what other cool features they offer. And at the end of this video, you will know exactly which one to pick. And I will also tell you my personal winner and which one I'm using every single day. My name is Florian Walter and this is the AI Tool Corner, where I review the latest AI software to find out which ones can actually improve our lives and businesses. First we compare the pricing and how easy each app is to set up. We will start with WhisperFlow and I will put all links into the video description and also timestamps for the different sections of this comparison. Let's start with WhisperFlow. Let's check out the pricing. They have a very generous free tier. This is the best free tier of all of these apps. You get almost all the features of the pro tier, except for command mode and some other little things. You can use it for up to 2000 words a week, which is a lot. But even after you reach this limit, you can still keep using it just with a slower model. But to be honest, if you use this every day, it's worth paying for the pro subscription and you also get a 14 day trial for the pro subscription automatically. So you can try this out without any risk. They have a 50% student discount. What they don't have, however, is a lifetime deal. The other two apps I'm going to show you have lifetime deals where you only pay once and then you can use it forever. WhisperFlow only has monthly or yearly subscriptions. To try out WhisperFlow, click on download for free, create an account in the process, install it and the UI looks something like this. It has a bunch of different features and settings. It's not the most simple UI of the three, but I'm going to show you how to configure it to get started as quickly as possible. First we have to go into the settings and set up our keyboard shortcuts because this is what you use to dictate. Hands-free mode means that you click the combination once, then you can speak and then you press it again to end the dictation and transcribe it into text. I almost always use push-to-talk, I prefer this, where you hold down this shortcut and then release it after you're done dictating. We will try this out in a moment. You can change the keyboard shortcut here to whatever you prefer. Make sure it doesn't clash with other keyboard shortcuts. And one more important setting is to set the correct microphone. Now in daily life I actually use this super cheap $20 headset from Amazon. The microphone quality is abysmal, but it's actually good enough for these dictation apps to work. They don't need a super high quality mic. But for the purpose of this video I'm going to switch to this mic here because I don't have my headset in right now. And after we configured the microphone and the keyboard shortcut we can try this out. So I hold down the push-to-talk combination, then start speaking. Hey everyone, it's Florian Walther from the AI-Tool-Comma YouTube channel. Subscribe if you haven't yet, then I release it and half a second later we see the whole transcript here. Everything is spelled correctly, it's fast and accurate and this works in every app because it will just put the text where your cursor is. You can use this in chatGBT, taking notes, emails, everywhere. We will stress test this in different situations later. If you dictated something but the cursor wasn't at the correct place you can always find a history of what you dictated here inside the app. You can copy it and then paste it wherever you need it. The second app we will compare is VoiceType which again is available for Windows and Mac. They don't have a completely free tier, they have a free trial for 3 days. But they also have a lifetime license which I really like because even though this is a high price you only have to pay this once and then don't worry about it anymore. And I think buying a proper voice dictation tool is worth it. But a monthly plan is also available. So again you can try this out for free for 3 days, click on start free trial, then we have to go through a short onboarding process, just click whatever fits your description, what exactly you select here doesn't really matter and eventually we get to a checkout screen. You have to enter your payment details here but you can cancel this before the trial is over if you don't like it. Again the link is in the video description. Let's take a closer look at the VoiceType. This is the UI. One strength is that the UI is extremely simple, you only have a few settings and if you don't like having to configure a bunch of different features this might be good for you. In the settings again we should configure the mic, gonna set this to the router one again and the hotkey. We have the same hands free and push to talk modes and we can set the keyboard shortcuts here. We also have a cool feature called custom system prompt which you can use to do some useful stuff. We will take a look at this later. For now let's give this a try. So again I hold down the push to talk keyboard shortcut. Hey everyone, it's Florian Walter from the AIToolCorner YouTube channel, subscribe if you haven't yet. There we go, again it's accurate. And again if the cursor is in the wrong position you have a history where you can copy your past transcriptions. The third app we will compare is SuperWhisper. Again let's take a look at the pricing. This is the cheapest one of all the three apps. They have an unlimited free tier but this uses smaller AI models which means they are slower and less exact. You also get 15 minutes of free pro usage in your free tier to try it out. And the pro subscription is the cheapest one of all the three and they also have a lifetime deal for 250 bucks. Again if you pick this one I would go for the lifetime deal because then you don't have to pay every month and you can just forget about it and maybe even write it off as business expenses. Again to use it create an account and install the app. And SuperWhisper has by far the most complex settings of all of these three apps but also a ton of different features and configuration options. However, on Windows I get a bunch of technical problems, for example the settings screen here is transparent and I didn't find a way to turn this off. I think the developer is a Mac user and he designed this app for Mac first. It works on Windows but it seems to have some technical problems on Windows, more on that in a moment. Again the minimal configuration we need to do is setting our mic, which we do up here, and the keyboard shortcut to start recording. Again we have a hands-free mode and a push-to-talk mode. I usually use push-to-talk, we can set the shortcut here. But in SuperWhisper we actually have to select the mode that we want to use. And each mode has different models available. So this is much more complex than the other two apps we have seen so far where we can't select a model at all. And we have even offline models available here that we can download and use without an internet connection, but more on that later. Let's try this out with one of the default models. Again push-to-talk. Hey everyone, it's Florian Walter from the AI Tool Corner YouTube channel, subscribe if you haven't yet. And this is the biggest problem I had with SuperWhisper. As you can see it put the focus on this button here. This is a bug. And because of this it doesn't put a translation here where the cursor is. So I'm gonna try this again, but after I finished my dictation I quickly have to click in here again to even get a translation. I already told the creator about this bug and I'm sure they will fix this in the future, but right now this is unusable for me. Anyway, let's try this again. Hey everyone, it's Florian Walter from the AI Tool Corner YouTube channel, subscribe if you haven't yet. There's the translation. It's correct and it's fast. But again it only works when I quickly click in here to put the focus inside the note app. Again my suspicion is that SuperWhisper was optimized for Mac. It will probably work correctly there, but it has some hiccups on Windows. And like the other apps we have a history of our past transcriptions. We can even listen back to the original recording, we can copy the transcription and so on. So for basic dictation all three apps are equally good. Now let's stress test them. One big strength of AI voice dictation is that it can correct mistakes for you. Those LLMs are smart and they can figure out what belongs into the transcription and which parts are probably mistakes. Let's give this a try. Hey everyone, it's Florian Walter from the AI Tool Corner YouTube channel, subscribe if you haven't yet. There we go. We have the same transcription as before, because the AI removed all the mistakes and arms and ass. This is super clean. And this is also much better than Windows' inbuilt voice dictation or the one I have on my Android phone. If you want to see a comparison, check out my voice type tutorial, because there I compare it to Windows' inbuilt voice dictation. Spoiler alert, the AI is much better. And of course this also works with different languages, which we can configure in the settings. In WhisperFlow we can actually set multiple languages or we can set this to auto detect. The only ones I speak are English and German, so I selected those. And I can dictate the same sentence in German. Hallo zusammen, hier ist Florian Walter vom AI Tool Corner YouTube channel, abonniert falls ihr das noch nicht gemacht habt. Boom, there's the German translation. And the cool thing is, we can even combine German and English. For example, YouTube channel is an English word, but I like to use it over the German translation. And it's no problem to have this inside our German dictation. Let's see if it can recognize special terms and names out of the box. I'm currently working on a vibe coding tutorial with lovable, replit and v0. This is not quite correct. Lovable and replit are correct, but v0 should be spelled like this. And that's what the dictionary inside WhisperFlow is for, which you can find here. If the AI doesn't know how to spell certain words, you can simply add them here. And in the future it will know how to spell them. Let's try this again. I'm currently working on a vibe coding tutorial with lovable, replit and v0. There we go, now it's correct. Now WhisperFlow has a unique feature that the other two apps don't have. It can automatically add words to your dictionary. You can see the ones here that have the sparkle emoji next to them. Those words were actually automatically added by WhisperFlow. And this happens when you correct a word. Sometimes it doesn't happen, like it didn't automatically add v0. I don't know what exactly triggers this, but let's give this a try. Hey Caitlin, how are you today? Let's say we want to spell Caitlin with a K. I change this and maybe this to a Y. Sometimes this triggers WhisperFlow to automatically add the word to the dictionary, but it didn't this time. Again, I'm not sure what exactly triggers it. Let's try this one more time inside chat.gbt. Hey Caitlin, how are you? Ah, actually, can you see it down here? It added Caitlin to your dictionary, so it worked, but it did it with a bit of a delay. So now when we look back into the settings, we can see Caitlin with the sparkle emoji next to it. And now it should spell it correctly. Hey Caitlin, how are you? There we go. But I don't want to keep this, so I delete it. Now this will be even more impressive when we dictate whole emails, which we will do in a minute. But first I want to test something else. WhisperFlow actually works when you whisper into the mic and even with a noisy crowd in the background. So let's say you are in an office, there are people talking in the background and you don't want everyone to hear you. So I have prepared this background noise and now I'm gonna say something into the mic and I'm gonna whisper it and see how well it gets the transcription. I just want to see how well WhisperFlow works when there is a noisy crowd in the background. And voila, it nailed it. Let's see how well VoiceType handles these scenarios. So again we start by making some mistakes and ums and ahs in our transcription. Hey everyone, it's Florian Walder from the AI Tool Corner YouTube channel. Subscribe if you haven't yet. Again it's precise. It removed all the mistakes. Again it can handle multiple different languages. In the settings I set the output language to autodetect and again this can mix different languages in the same sentence. Hallo zusammen, hier ist Florian Walder vom AI Tool Corner YouTube channel. Abonniert, falls ihr das noch nicht gemacht habt. Perfect. Let's see how well it handles special terms and names. I'm currently working on a vibe coding tutorial with lovable, repl.it and v0. Again it got v0 wrong, that's funny. And again we have a dictionary to add words so the AI can use them in the future. I'm currently working on a vibe coding tutorial with lovable, repl.it and v0. Now it got it right, but VoiceType doesn't have autoadd. So you have to add these words manually, which, however, is not a problem in my opinion. Because you don't do it all the time. Again let's see how well it works with a crowd in the background and whispering. I just want to see how well VoiceType works when there is a noisy crowd in the background. Yeah again it nailed it. And the last one is super whisper. Hey everyone, it's Florian Walder from the AI Tool Corner YouTube channel. Subscribe if you haven't yet. Again I have to click inside the note app to put the focus at the correct place. But the translation again is correct. Nothing to complain here. The language settings in super whisper are actually specific to each mode. So different modes can have different language set to them. However, there is an automatic option, which automatically detects the language. And this is what I usually use. Let's try a German sentence. Hallo zusammen, hier ist Florian Walder vom AI Tool Corner YouTube Channel. Abonniert, falls ihr das noch nicht gemacht habt. Okay this is weird. It translated it into English. Let's try this again. Hallo zusammen, hier ist Florian Walder vom AI Tool Corner YouTube Channel. Abonniert, falls ihr das noch nicht gemacht habt. It automatically translates it into English, which is confusing because I set the language to automatic. Let's see how well it handles special terms. I'm currently working on a vibe coding tutorial with lovable, replit and v0. Okay it nailed it, including v0. But I haven't yet added it to the vocabulary. v0 isn't mentioned here, but it figured out the correct spelling automatically, which is nice. Again, you can add additional words to the vocabulary, but it doesn't happen automatically. You have to enter them manually. Let's try whispering with the crowd in the background. I just want to see how well Super Whisper works when there is a noisy crowd in the background. Looks correct, perfect. Now how well Super Whisper works depends a lot on the configuration. You can choose again for each mode. You can select the voice model and the language model. You can even add custom models here via an API key, more on that later. And you can download local models to use Super Whisper without an internet connection. We will try this out later. But in most day-to-day use, I just use the online models. Now let's increase the stakes and try dictating a whole email. Again we will start with WhisperFlow and I will add some mistakes and ums and ahs on purpose. Hey Steven, yes, I can record a quick loom video for you and talk about my experience with Zerif. Here are a few points I can give you from the top of my head. 1. The bot often malfunctioned and got stuck. 2. The standard plan is too expensive for me. And 3. There were too many confusing settings. I hope this helps and I wish you good luck with your product. Regards Florian Walter. There we go. So it added formatting, like this numbered list here. I would have found it better if it put this regards into a new line like this. Sometimes it does it, sometimes it does not. But what is most impressive is that it spelled Steven right with PH instead of a V and Zerif. And this is the most impressive thing about WhisperFlow. It can actually read some of the information of the active window so it can see how the name of this person is spelled or this word Zerif. And this saves me a ton of time because I don't have to go through this and make a lot of corrections. Of course this has privacy implications because the app can see some of the information of your computer. And if you don't like this you can disable this in the settings. So here under system we have this smart formatting option which defines if WhisperFlow should apply formatting or not. I would always keep this activated. And here under data and privacy we can disable context awareness. This is what allows WhisperFlow to read some of the information of the active window like the email in Gmail. I keep this activated because I'm not scared of these companies spying on my information. But some people don't like this and they might want to disable this for privacy reasons. We will talk more about privacy and security at the end of this video. In WhisperFlow you can also change the style of the formatting for different apps. So for example I set email to formal which means it adds commas and line breaks. But again it didn't get this perfect because this formatting looks more casual to me. It put hey Steven and the first sentence into the same line and remember it did the same for the regards. So this is definitely not perfect but the AI will take these settings into account. Now one thing I've noticed is that these different categories really only work with apps. When I have Gmail open in the web browser it doesn't detect this as an email app. It actually detects this as other. So whatever formatting I'm setting here under other is what will be used in the email client. If I set this to excited it will add more exclamation marks. Hey Steven this sounds like a great idea. Thank you very much. I'm looking forward to talking to you next time. See it's a bit more casual, a bit more excited and you can configure this here in the style settings. I keep this as formal. Next up a voice type. Here we don't have different style settings. We only have this one setting here, format transcriptions or not. I always keep this activated. However voice type can still understand when we are dictating an email just because of the fact that what we spell looks like an email. So let's give this a try as well. Hey Steven, yes I can record a quick, quick loom video for you and talk about my experience with Serif. Here are a few points I can give you from the top of my head. 1. The bot often malfunctioned and got stuck. 2. The standard plan is too expensive for me. And 3. There were too many confusing settings. I hope this helps and I wish you good luck with your product. Regards, Florian Walter. It took a bit longer this time. And it spelled Steven and Serif Wong because the voice type doesn't detect the name and the spelling from the context. Which is definitely a downside because now we have to go in here and fix this manually which costs us time over the long haul. However the formatting is actually better than what Whisperflow gave us. It put the regards into a new line. I really like how this looks. It removed all the mistakes and it formatted the list properly. So again this is pretty good. And in voice type you can give the AI more instructions on how to format this via a custom system prompt. Here you can enter whatever you want. For example use casual formatting whenever I dictate an email. And now whenever you dictate something this will be taken into account after saving these changes. We will take another look at the system prompt later. And this is a feature Whisperflow doesn't have. I also have a detailed tutorial on voice type and the system prompt and everything. Again I will put a link into the top right corner. You can check it out after watching this video. And next up Super Whisper. Here for optimal email transcription we have to set up a separate mode. You can click on create mode. And here you can select mail. I've already set one up below. There it is. Again you can select the language, the models it uses. And for something long like an email it actually makes sense to use a very accurate model that is maybe not as fast. But for an email I don't mind if the transcription takes one second longer because it's such a big chunk of text I just want it to be correct. So I use one of the more exact models like currently GPT 5.1. We can also configure that this mode is automatically used when we use a certain app or a certain website. However, the website setting down here didn't work for me. As you can see I added all these different Gmail URLs but it didn't make Gmail auto-pick the email mode. I have to select it manually. And to select a mode there is a keyboard shortcut. In my case that's Ctrl-Alt-M. And here we can select between these different modes. I switch to email. So now I'm in email mode and I'm going to dictate the same reply as before with the same mistakes. Hey Steven, yes, I can record a quick loom video for you and talk about my experience with the Reef. Here are a few points I can give you from the top of my head. One, the bot often malfunctioned and got stuck. Two, the standard plan is too expensive for me. And three, there were too many confusing settings. I hope this helps and I wish you good luck with your product. Regards, Florian Walter. It takes a moment to transcribe this. And there is the email. Again, it added formatting automatically, but again it spelled Steven and the Reef wrong because it didn't read these names from the context. However, the rest looks correct, the formatting is on point. And before when I used Super Whisper, it sometimes changed the email too much. I don't know if they fixed this, but this time it looks pretty good. Now I have to say that switching between models manually is a bit annoying. Because often I forget it, dictate something, notice I had the wrong mode and the spelling is not as good. For example, when I dictate the same email with the default mode, it will look worse.

[00:23:08] Speaker 2: Hey Steven, yes, I can record a quick loom video for you and talk about my experience with the Reef. Here are a few points I can give you from the top of my head. One, the bot often malfunctioned and got stuck. Two, the standard plan is too expensive for me. And three, there were too many confusing settings. I hope this helps and I wish you good luck with your product. Regards, Florian Beuter.

[00:23:20] Speaker 1: Yeah, as you can see, now we don't have email formatting because we are in the wrong mode. And switching between modes is a bit tedious. Now for daily life, you want to use this super mode most of the time because this should intelligently adapt to your current application. However, this doesn't work perfectly all the time, so it's still more exact to use a specific mode like email for email dictation. We can also set up some cool custom modes to do different things, more on that later. So the clear winner for me here is WhisperFlow because it's the only one that spelled Steven and the Reef correctly. Now let's try them in messaging apps. Again I'm gonna start with WhisperFlow and let's say I want to replay to him, GromSoldier, about his video here. So I put a cursor here and say, haha, GromSoldier, that video is hilarious. And again the mind-blowing thing about WhisperFlow is that it spelled this name correctly. Look at this. This is not a common word. It's a completely made-up username and WhisperFlow spelled it correctly. And this again saves us a ton of time. Because if I spell the same sentence in here, haha, GromSoldier, that video is hilarious, it spells it wrong because what the hell is a GromSoldier. But inside the messaging app, it figured out the correct spelling automatically. A big plus for WhisperFlow. If we want to tag people in here, like add everyone, we can add this to our dictionary. So here I could add, add everyone, to teach WhisperFlow how to use that word. And now I can say, hey, add everyone, who's gonna join the meeting tomorrow? There we go. Another use case that's interesting is using it to dictate in the middle of a sentence. For example, let's say here, before the question mark here, I want to add, morning at 7 a.m. And WhisperFlow figured out that this is not a full sentence and it started it with a lowercase letter. Again, this saves us a lot of time. Let's see how well VoiceType performs. Haha, GromSoldier, that video is hilarious. And it spelled GromSoldier wrong because VoiceType doesn't figure out the name from the context. Which is a bit annoying because now I have to go in here and spell this manually. Of course, I can add this name to the dictionary in VoiceType, but it's better if it can figure it out automatically. Let's also try adding, add everyone here, I actually already added it to the dictionary. Hey add everyone, who's gonna join the meeting tomorrow? Nothing to complain here. Morning at 7 a.m. Yeah, this time it started morning with an uppercase M because it didn't figure out that we are in the middle of a sentence. Now we can somewhat fine-tune this by configuring a custom prompt. We could say, if I don't spell a complete sentence, start a transcription with a lowercase first letter. This should be it there. Let's save this and try this out again. Morning at 7 a.m. And now it's correct because this is not a full sentence. But this will now also happen if I spell it anywhere else. Morning at 7 a.m. It will start it here with a lowercase first letter as well because again, this is not a full sentence, but this is good enough. And lastly, SuperWhisper. First we have to switch to the correct mode. We are still in email mode, but I want to use super for regular dictation. Haha, ChromeSoldier, that video is hilarious. Again it spelled ChromeSoldier wrong, which is not great. Then we can add everyone to the vocabulary. I already did this down here. Let's try this out. Hey at everyone, who's gonna join the meeting tomorrow? Yeah it got this wrong. Let's try this again. Hey at everyone, who's gonna join the meeting tomorrow? Yeah it always spells everyone. Let's try this replace with feature. I can say replace at everyone with at everyone. The downside of this is that now whenever I spell at everyone anywhere, it will replace this. But let's try this out. Hey at everyone, who's gonna join the meeting tomorrow? Yeah, still doesn't get it right. Let's also try completing the sentence. And by the way, it didn't put a question mark here, which is a bit of a bummer. Morning at 7 a.m. Again, I have to put a cursor at the correct position manually. And it actually spelled out the complete sentence again, which is a bit weird. So WhisperFlow is the clear winner here. And unfortunately, SuperWhisper is the loser. We can even use AI voice dictation for vibe coding. And WhisperFlow has some specific settings for vibe coding here, which is cool. You can enable variable recognition and file tagging. And you want to keep both enabled if you use WhisperFlow for coding. Let's try this out. So here I have this file open, this login index.tsx. If you're not a programmer, you can skip this part in the video. The timestamps are below. Let's try something simple. Go into the index.tsx and increase the padding, increase the horizontal padding to 6. And you just saw the magic in action. It automatically tagged the correct file. This is important for the coding agent, because now it has a reference to this specific index.tsx file. And we don't have to change anything in here, and we can just send this. In fact, let's try this out. Let's burn some tokens just for presentation purposes. There we go. It went to the correct file and made the correct change. WhisperFlow can also recognize the correct variable or function name. Let's say I want to remove this login mutation here. Let's try this. Go into the index.tsx file and comment out the useLoginMutation. Again, it tags the correct file. And it spells useLoginMutation correctly, in camel case, like we have it here. This is extremely useful when you're working with a single file. When you're working with multiple different files, it's not always that useful, because for example I have a ton of different files called index.tsx in different folders. And there is no real way to tell WhisperFlow which folder it should tag. And also it doesn't always work, so in day-to-day life it has limited practicality, but still it's a nice-to-have feature. Let's try the same with VoiceType. Go into the index.tsx file and comment out the useLoginMutation. Not as good. It didn't tag the file, it only spelled it out, but then the AI agent doesn't know which file it has to open and it doesn't quite spell useLoginMutation right. So not as good as WhisperFlow. And lastly, SuperWhisper has the same problems as VoiceType. Go into the index.tsx file and comment out the useLoginMutation. Yeah, it doesn't tag the file automatically, but it spelled useLoginMutation correctly, which is nice. So for bytecoding, the clear winner is WhisperFlow by far. Another very cool feature is text expansion. So here in WhisperFlow we have this snippets feature. Here you can enter certain trigger words, which get then expanded into longer text. This can be a whole email template. So for example, I could write InsertEmailTemplate and then I can put a whole email template with a placeholder in here. Let's try this out. And then I just say InsertEmailTemplate. And there we go. Or you can use smaller snippets like replacing this trigger word for the URL of my YouTube channel. Hey guys, check out my YouTube channel at myYouTubeLink. There we go. I replaced myYouTubeLink for the actual link. Now a voice type doesn't have a dedicated feature for this, but we can achieve the same with a custom system prompt. There's only one global system prompt and this is limited to 5,000 characters, but whatever instructions you can fit in here, the AI will take into account. So for example, I can say when I say myYouTubeLink, replace it with and then the actual YouTube link. Save this. Let's try this out. Hey guys, check out my YouTube channel at myYouTubeLink. Voila. This works. And in Super Whisper, we have this replace with feature in the vocabulary that we already saw earlier. So again, we can say replace myYouTubeLink with the actual link. Let's give this a try. Hey guys, check out myYouTube channel at myYouTubeLink. This worked. So all three apps can expand text snippets in one way or another. When you have a Whisperflow Pro subscription, you get a cool feature called Command Mode. You can activate it here in the experimental settings. And you have to set a keyboard shortcut to enable this. I set this to Alt Shift and plus. And with Command Mode, you can do one of two things. Either you can highlight some text and edit it. For example, I have this text, Hey Steven, yes, this sounds like a good idea. Thanks for reaching out. I can highlight this and say in Command Mode, format this into a professional sounding email and add regards Florian to the end. And now Whisperflow edits this text. It's okay, but it didn't add empty lines. Let's try to edit this again. Add proper formatting with empty lines and everything. There we go. And this can be useful to edit some text on the fly. The other thing you can do with Command Mode is opening Perplexity, the AI online search. For this, you just have to start your dictation with Ask Perplexity. It looks like this. Ask Perplexity what the hell looks maxing is. And it automatically opened the browser window. It opened it on my main screen, but this is the window it opened. It went to Perplexity. It entered the query. It's bad looks maxing, wrong unfortunately, but we can add this to the dictionary. But this is a cool little feature. I don't really use this much in practice, but it's nice to have. Now, VoiceType doesn't have a way to edit existing text or do an online search, but we can still do some cool stuff with the custom system prompt. For example, I can do something like, when my dictation starts with Hey VoiceType, treat everything that follows as instructions rather than verbatim dictation. And let's put this into quotation marks to be a bit more clear, like this. Save this. And now I can say, Hey VoiceType, create a crotchery list with five random items and lead each item with a fitting emoji. And then instead of transcribing this verbatim, it does what I said in the instructions. So you can get really creative with this custom prompt. One other thing I like to add is avoid em dashes, because em dashes these days are a sign of AI generated text. So I usually have this in here. For custom prompting, SuperWhisper is the most flexible one, because here we can create different modes. We have a few predefined ones that we already looked at earlier, but we can also create completely custom modes by selecting blank here. And here we can enter an instructions prompt, the LLM, the voice model. We can even select the context that this should use. Should it read information from the application similar to how WhisperFlow does it? Should it see the selected text, the text in our clipboard? And you can do some really cool stuff with this. I've already prepared a few examples. Again, we have to switch manually. And for example, I have this custom mode query highlighted text, which has this prompt. Answer the user's query about the highlighted text. So for example, I type in looks maxing, select this, and I dictate, what is this? And instead of transcribing my input, it puts the answer here into this text box. Now it says you haven't highlighted any text, so again, sometimes this is buggy on Windows. Let's try this in a browser window. Again, I select looks maxing. Or maybe let's try selecting this whole text and then say, summarize this for me. There we go. Well, it's not very concise. We could fine tune this in the prompt here, but it's a nice feature. But this doesn't perform an online search. So when I ask, what is this? It will probably not be able to figure out what looks maxing is. Yeah, again, it didn't even notice the text. So this is a bit buggy. And you can do some funny stuff with this. I also created this alternating case custom mode, write everything I say in alternating case. And in the settings, you can also add examples to make the AI more precise. So I added this input example and output. And I can select this custom mode and say, hey guys, Florian Walter here from the AI Tool Corner YouTube channel. And it writes out this text in this alternating case. And you can get creative and create as many custom modes as you want for different use cases. Now, even though Super Whisper was the most buggy of the three apps so far, it has a few features that the other two don't have. For one, we can transcribe files. Here you can select an MP3 or WAV file. And you can let Super Whisper transcribe it. Let's try it out with this example. This takes a few seconds. And then we see the output here. We still had alternating case selected. Let's select super mode and try this again with the same file. And there we go. It translated whatever it said inside the file. And this is actually correct. And this also works with longer files. So this one here is a whole transcript of a video of mine, which is around 15 minutes long or 10 minutes or so. This will, of course, take a bit longer, took around half a minute. And then we see the full output here and we can copy it wherever we need it. This is the full transcript of this audio file. This is a feature Whisper Flow and VoiceType don't have. And if you want to transcribe a live meeting, then you can use one of the mobile apps. We will take a look at them in a few minutes. Another feature that only Super Whisper has of the three is that it can actually work offline. This only works with the default mode, because the default mode only has a voice model, whereas all the other modes have a voice model and a language model. For the voice model, we can download local models that run on our computer, but not for the language model. This is why it only works with default. And here, instead of one of these online models, we can download and select a local model like Ultra V3 Turbo. I downloaded this here via this download button. As you can see, it's between one and two megabytes. And this runs completely on your computer. So now when I select default mode, I can actually completely cut off my internet connection. And this will still work. So I disable ethernet. Now I am offline, but I can still transcribe. Hey everyone, it's Florian Balder from the AI Tool Corner YouTube channel. And the local model transcribes the text. Took a few seconds longer, but the spelling is correct. Whereas if you try the same with WhisperFlow or VoiceType while being offline, you will just get an error message. Hey everyone, WhisperFlow doesn't work offline. This is extremely useful if you need to use this, for example, while on a plane or somewhere else where you don't have an internet connection. So if you need offline support, then Super Whisper is your only choice of the three. However, I want to note that since the default mode doesn't use a language model, the results are not as good. Because the language model is what takes your raw input and it formats it and it removes mistakes and everything. For simple dictation, using only the voice model is good enough. But for more complex tasks, you want to use one of the modes with a language model behind it, which again works only when you're online. Let's turn internet back on. Let's also take a look at privacy and security, which is especially important when you use these apps inside a company. In WhisperFlow, you have a bunch of settings related to privacy. You can find them here under data and privacy. For one, we can enable privacy mode, which means that none of your dictations will be stored on WhisperFlow's server and used for training their models. It's a good idea to activate this. Then down here, we also have this enable HIPAA option. I don't know how to pronounce this, but this is important when you use voice dictation in a medical context. You can enable this here. I didn't do this, but it should be noted that WhisperFlow will still send your input to an LLM. I don't know what service they use, if they use OpenAI or one of these other public APIs and they can't really control what happens to the data that they send to a company like OpenAI or Anthropic. So this might still have data privacy implications you should be aware of. The only way to really get around them is via an offline model, which again, we only have with SuperWhisper. And on the pricing page, they mention for the enterprise plan that they are SOC 2 type 2 compliant, if this is relevant for your company. For voice type, I found some information in the privacy policy. They say that they only store the last 10 transcriptions, none of your data is ever used to train our AI model, securely encrypted and so on. I didn't find any information about HIPAA or SOC 2 compliance, so it's probably safe to assume that they are not available. The same for SuperWhisper in their privacy policy, they mention that they are not using your data to train their models and they don't retain your data. And again, on the enterprise plan, you have SOC 2 compliance. And as we saw earlier, SuperWhisper is the only one that can fully run offline, which is the only way to get 100% privacy. You can also use any model with SuperWhisper via an API key. And you can connect this to whatever model you want. Maybe your company has their own models, which you want to use, you can do this with SuperWhisper. This is not possible with WhisperFlow or VoiceType as far as I know, so if this is relevant to you, you might want to give SuperWhisper a closer look. Because in terms of privacy, this is the clear winner. Let's take a look at team features. WhisperFlow enables you to share snippets and your dictionary with your team members, which can be really useful if you use this inside your company. I haven't tried this out, but I assume this works. In VoiceType, I couldn't find any team features, so I assume that this only works solo. SuperWhisper doesn't seem to have any sharing features, but at least they have centralized billing and authentication in the enterprise plan. But I don't think you can share your snippets here automatically. I will give you my final review and my personal choice in a minute, but first let's look at the mobile apps we have available. WhisperFlow has an iOS app. I haven't tried this out, but this has pretty good ratings, 4.8 stars. And apparently this is a custom keyboard, and via the keyboard you use the AI dictation just like on the desktop app. And on their website, they say that an Android app is coming soon. For VoiceType, I couldn't find a mobile app. And SuperWhisper again has an iOS app with 4.4 stars, but they don't mention planning an Android version anytime soon. And again, SuperWhisper seems to be optimized for the whole Apple ecosystem. So which of the three should you pick? My personal choice is WhisperFlow because it has the best context detection and accuracy, which saves me a lot of time. I have the Pro subscription, which also gives me the command mode I can use to rewrite existing text or do a quick online research. And of course, if you are a programmer, the automatic file tagging is super useful. Besides that, it has the most generous free tier of the three apps. The biggest strengths of VoiceType are that they offer a lifetime deal, so you don't have to pay a fee every month. Two, you can do some really cool stuff with the custom system prompt. And it has the most simple UI of the three with the least settings. So if you are a boomer and these other apps are too complicated for you, this might be a good choice. But VoiceType is not as versatile and as flexible as the other two apps. The biggest strengths of SuperWhisper are that it has offline support, which the other two apps don't have. You can transcribe audio files. You have very flexible custom modes and you can select which models you use and even connect your own models via an API key. You can't do that with VoiceType or WhisperFlow. But manually switching between the different modes is annoying and my biggest gripe with SuperWhisper at the moment is that it's kind of buggy on Windows. It loses focus and this makes it unusable for me right now. But if you are a Mac user, this could be a great alternative for you. Again, I will put the links to all three apps into the video description. And they all have either a free tier or a free trial, so you can try out all of them and see which one you like the most. Let me know in the comment which one is your choice. And then I wish you a nice rest of the day and I hope I see you in the next video. Take care.

Summary

Florian Walter reviews three AI voice dictation apps for Windows and Mac—WhisperFlow, VoiceType, and SuperWhisper—after weeks of testing them in real workflows like chat apps, email writing, and coding. All three provide accurate, fast dictation with filler-word removal and formatting, but they differ in context awareness, customization, offline support, pricing, and stability (especially on Windows). WhisperFlow stands out for context detection (correctly spelling names/terms from the active window), automatic dictionary improvements, snippets, command mode, and coding-focused features like file tagging and variable recognition; it also has the strongest free tier but only subscription pricing. VoiceType offers a very simple UI, lifetime license option, dictionary and a powerful single global custom system prompt for formatting or automation-like behaviors, but lacks context-based spelling and mobile apps. SuperWhisper is cheapest with a free tier, offers lifetime pricing, highly flexible per-mode configuration (models, prompts, context sources), audio file transcription, and true offline dictation via local voice models; however, it’s buggy on Windows (focus issues), mode switching is tedious, and language auto-detect behaved oddly (German dictation translated to English). For privacy, offline use and custom model/API key support make SuperWhisper strongest; WhisperFlow adds privacy mode, HIPAA option, and enterprise SOC 2 claims. Walter’s personal pick is WhisperFlow for day-to-day productivity due to best context-aware accuracy, while recommending SuperWhisper for offline/privacy or Mac-centric users and VoiceType for simplicity and lifetime pricing.

Copy

Download

Title

WhisperFlow vs VoiceType vs SuperWhisper: AI Dictation Compared

Copy

Download

Keywords

AI voice dictation Remove

Remove

speech-to-text Remove

Remove

WhisperFlow Remove

Remove

VoiceType

Remove

SuperWhisper Remove

Remove

context awareness Remove

Remove

dictionary Remove

Remove

snippets

Remove

command mode Remove

Remove

custom system prompt Remove

Remove

offline transcription Remove

Remove

audio file transcription Remove

Remove

Windows

Remove

Mac

Remove

productivity Remove

Remove

email dictation Remove

Remove

coding dictation Remove

Remove

file tagging Remove

Remove

privacy

Remove

SOC 2

Remove

HIPAA

Remove

Copy

Download

Key Takeaways

AI dictation can outperform traditional speech-to-text by using context, cleaning filler words, and applying formatting automatically.
WhisperFlow is best for context-aware spelling (names/usernames/terms from the active window), strong free tier, snippets, command mode, and coding features like file tagging and variable recognition.
VoiceType is best for users who want a very simple UI, a lifetime license, and customization via a single global custom system prompt, but it lacks context-aware spelling and mobile apps.
SuperWhisper is best for offline dictation, audio file transcription, and advanced customization (modes, prompts, model selection, API keys), but is buggy on Windows and mode switching can be annoying.
If privacy is critical, offline transcription and/or using your own models via API keys (SuperWhisper) offers the strongest path to minimizing data exposure.
All three tools handle noisy environments and whispering well in the tests; differences emerge in context usage, customization depth, and reliability.

Copy

Download

Sentiments

Positive: Overall tone is enthusiastic about AI dictation’s productivity gains, with positive assessments of each tool’s strengths; criticism is mainly targeted at SuperWhisper’s Windows bugs and mode-switching friction, and at limitations like lack of context spelling in VoiceType and subscription-only pricing for WhisperFlow.

Copy

Download

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file