Comparing Top AI Transcription Tools for Accuracy (Full Transcript)

Discover the best AI transcription model for your content with a comparison of tools like OpenAI, DeepGram, and Assembly AI using Zapier workflows.

Download Transcript (DOCX)

Speakers

Add new speaker

[00:00:00] Speaker 1: Hello, I'm Mike, and in this video, I'm going to compare the best AI transcription tools available right now. And I don't know about you, but sometimes it's confusing when OpenAI, DeepGram, and others like Assembly AI all put out benchmarks that say they're 92%, 98%, or 99.9% accurate. Well, once and for all, I'm going to determine the best for my specific content and show you how you can build something to quickly benchmark the AI transcription models to find out which is best for your use case. Okay, let's dive in. So you'll see here, I've got GPT-40 transcribed. That's the latest from OpenAI. Nova 3 here, available from DeepGram. And Assembly AI's Universal 2. Now I'm going to go ahead and build a Zap with Zapier that is going to execute these transcriptions, time them, and then give me results in a nice Zapier table. Let's start off with a trigger. And I'm going to trigger with a webhook. So when I call a web address, this Zap will run. We'll select catch hook here, and I've got the URL to call. Next up, I'm going to go into formatter, and I'm going to select an action event for date time. And here in the configuration, I will choose to transform the format. And the input value will be the Unix time at the start of this Zap. And I'm converting to Unix time code here. Now we're going to set up our first webhook, and this will be to Assembly AI. So essentially, we're going to do a, we're going to do a custom request, and we're going to go ahead and configure it. And we're going to be posting audio to the Assembly AI Universal 2 transcription model. So of course, URL will be the transcript version 2 model. Data pass-through will be false. For data, we'll use this example. But I'm going to copy and paste in a new URL to an audio file that I created of this YouTube video that did really well, and it's also got lots of technical terms in. So I'm definitely curious to see how it's going to handle this. Unflatten is yes. And then for headers, we'll add in authorization and content type. That's a space for my API key and the content type. Now, this video is about 20 minutes long. We can give it a quick test. Okay. You can see I've got an ID back saying that this is actually being transcribed by Universal 2 right now. What's more, the API actually passes back to us the time that it took to transcribe the audio. So now you'll see I've nicely labeled this Zap up. Start AI transcription. We record the start time. We call the Assembly AI Universal 2 model in this. In this case, we get the Assembly AI transcript. Now we're going to go ahead, add in formatter. And again, we're going to do date time, this time to get the end time. And we'll label this end time. Okay, finally, with that all done, we want to check the quality of the transcript now that we've got it. And for this action, as Zapier has thousands of tools available, we are going to go for Gemini. Yes, we're going to use Google AI Studio and Gemini's latest model to compare the transcripts and give us an accuracy score. All you have to do is go to Google AI Studio, create an API key, copy it, paste in here. I've connected it to my Google account, and I'm going to start a conversation here. For configuration, the API version should be on beta because that allows you to use the latest models from Google. And I'll tell you what I really like. If we scroll down just a little bit, you'll see Gemini 2.5 Pro Experimental. Not only does Gemini have one of the longest context windows, and we're going to need that for 20-minute transcripts, but it's highly accurate. So let's go in and type in the message, compare this transcript, and in triple quotes, we will place in the transcript we got back from Assembly AI, which is just here, to this original 100% accurate transcript. And for this bit, I've got a human-made transcript, which is 100% accurate, pretty much, if my human brain is correct. And we'll paste that in there in the triple quotes. And then down below. As the final part, I've said give an accuracy percentage score as your output to two decimal places max. Look at spellings, punctuation, and overall accuracy. We can actually give this a spin right now. As you can see, it says compare this transcript. That's the one we got back from Universal 2 to the second transcript, and then the prompt at the end. Let's test this step. And, of course, this will take just a moment, but it's a long context that Google is working with. And, boom, it's come back with an answer. And look at this. It's come up with an accuracy of 98.94% for the assembly. AI Universal 2 model. Finally, in this experiment, I'm going to create a new Zapier table with a new record. We'll create a table especially for this experiment, AI transcript benchmarks. Over here in Zapier tables, we'll just add in a few fields. So you can see here we've got it set up to record the model, the time it takes to do the transcript, the accuracy score generated by Google Gemini, and a cost score that I'll put in myself. Now, we just need to make sure to map all these fields. There we go. With all the field mapping set up, we can now go ahead and run this multiple times on all the different AI transcription models and conclude which is the best for my particular content. Okay, this is brilliant. You can see here that assembly AI Universal 2 took just over 15 seconds at an accuracy of 98.94% and a cost of just 12 cents to transcribe a 20-minute video. So let's run this multiple times on all the different AI transcription models and see what comes out on top. Now, note, as I've been building this Zapier, you can also use some native Zaps. Like, for instance, Zapier supports OpenAI's ChatGPT. And if you go in here, you'll see as an action event, you can actually pick up, if you type in transcription, create transcription. Now, do note, at the time of recording this video, I believe that's supporting OpenAI Whisper models. So we'll test both Whisper and the new GPT-4.0 transcription models together. Okay, and there we go. Look, all of the models have now been compared using my Zapier. Now, I do understand this is just on one video that I made. And to get a more accurate representation, I need to put multiple videos through these Zapier Zaps to really get an idea. But just to give you an idea here, it's brilliant because we can actually sort this table now by ascending. So we can actually see the slowest model is OpenAI's Whisper at just over 48 seconds to transcribe a 20-second video. In terms of accuracy, let's sort that ascending. And we can see that ChatGPT-4.0 means, that the Mini Transcribe is actually at the top of the pile and is also one of the cheapest models, which is crazy, followed by ChatGPT-4.0 Transcribe. So if you're in any doubt at the start of this video, ChatGPT-4.0 Transcribe, whether it's Mini or the full-fat one, is accurate, fast, and well-priced. Next up, if you value speed here at just 15 seconds, the fastest of the lot, Assembly AI's Universal 2, coming in at the same price as the full-fat ChatGPT-4.0 Transcribe. Then we've got Whisper, which is in the next slot, and you'll find that OpenAI's Whisper was the slowest speech-to-text transcription AI model, but that's obviously been surpassed now by ChatGPT-4.0 Transcribe. Then we'll see, ending up at the low end of accuracy, we've got Assembly AI's Universal 2 Nano, and also, unfortunately, DeepGram Nova 3 didn't perform very well on my particular video, but I do note the AI videos I make are quite technical. I use technical terms and often brand names that maybe the AI didn't crack. Finally, we can sort the Zapier table, ascending by cost, and we can see the most expensive models are OpenAI's Whisper and ChatGPT-4.0, followed by Assembly AI and DeepGram, and then the Mini and Nano versions from ChatGPT and Assembly AI are the cheapest. So probably the best value for money, the most accurate, and the fastest for what you get is ChatGPT-4.0 Transcribe. And there you go. That is the full conclusion on AI transcription models. If you enjoyed this, like, subscribe for more. Join my community where we'll be discussing this more. It's linked up down below. And let me know how you get on with transcription. I definitely, in the future, am going to be building more advanced workflows using the ChatGPT-4.0 Transcribe models. Can't wait to try it out, and YouTube is showing a video on your screen right now you should watch next. Thanks.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file