Unlock Speech-to-Text: Discover Assembly AI's Power
Learn how to convert speech into text with Assembly AI. Explore real-time transcription and chapter generation with this easy-to-use, accurate API.
File
Best FREE Speech to Text AI in 2024
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hi everyone, I'm Smitha. In today's video, we're going to look at how we can turn speech into text using an AI-powered speech-to-text API called Assembly AI. Assembly AI is ranked as one of the best speech-to-text APIs because of its high accuracy, how easy it is to use, and because of how well it integrates with large language models. It supports real-time transcription, and it also works on extremely noisy audio data. The most amazing thing is that it comes in-built with LLM functionality, so you get to use it alongside some of the best large language models out there, which allows you to generate insights on your spoken data, summarize it, and also ask questions as well. The best part is that you get five hours of transcription for absolutely free when you first sign up for an API key. Once you've gotten your free API key, let's head on over to a notebook file to write some code. I'm actually going to be showing you all of this in Python, but feel free to use any other language that you like. Assembly AI supports many other languages like TypeScript, C Sharp, and PHP as well. To run the code that I'm going to be showing you, you need to install a couple of different Python libraries. So head on over to terminal or command line and run the following three statements. Once that has been downloaded, let's head on over to the code and begin. In the first cell, all I'm doing is I'm importing Assembly AI's Python library, and also I'm declaring the API key. So in the place of constants.APIKey, you can just simply write on your own API key. And then once we've done that, let's actually begin with a very basic example of how we can transcribe an audio file. So what we want to do is we want to create a transcriber object, transcriber equals to AI dot transcriber, and then transcript equals to transcriber dot transcribe. So this is the file which I'm going to be transcribing. Feel free to use any other file that you want to transcribe. All I'm going to do is I'm going to copy the relative path and then I'm just going to paste it here. And once that's done, they're going to print out the transcript. So once this is done, just go ahead and hit run. So the audio file which I've just transcribed is from the original Apollo moon landing event. And if you've ever listened to that, you know how noisy the entire audio is. But as you can see, in just a few lines of code, we have converted that extremely noisy audio into legible text. In the next example, we're going to be making use of something called Lemur. So Lemur is Assembly AI's LLM framework. So through this, we're going to be able to use large language models with Assembly AI speech to text API. So what I'm going to be doing is I'm actually going to be transcribing this one and a half hour long talk by Gustav Söderström from Spotify. And he has a talk called The Conspiracy to Make AI Seem Harder Than It Is. I actually really want to listen to this talk. However, it is 19 minutes long. So what I'm going to be doing is I'm going to be using Lemur to transcribe this and also break this down into different chapters and give me a summary of every single chapter so I understand what this entire talk is about. So let me show you how I can do that. As usual, we need to create a transcriber object. And what I'm going to do is I'm going to give the name of the file which I've already downloaded, as well as some other parameters. So what we want to do in this config parameter is just simply set auto chapters equals to true. So this will tell the transcriber object that we're looking for AI generated chapters in our audio file. So we want to print out certain things from each chapter. So this is what we're going to be doing next. So what this code does is for each chapter, we're printing out the start time, the end time, a gist of it, a headline as well as a summary. So let's go ahead and hit run on this. This was a 90 minute long talk, and it has managed to transcribe and also generate auto chapters in approximately 34 seconds. So that is pretty amazing. Besides summarization, there is so much more you can do with assembly is lemur framework. If you're interested in a much more in depth example of lemur, you can check out the video above, which will show you how to build an entire application using lemur, which analyzes call center audio files, and you will be able to get sentiment analysis as well as action items. Lastly, let's take a look at how we can make use of the real time transcription feature of assembly AI. All you have to do is head over to the homepage of assembly is documentation. Once you're there, scroll down to the guides where you will see using real time streaming, click on using real time streaming. And once you're there, head on over to step three and just copy the code and paste it back in the cell, as well as the code from step four and five as well. This code essentially handles the transcription as well as getting audio data from your microphone as well. So once you've done this, let's hit run. And as you can see, it's real time streaming right now. To find out more about how assembly AI compares with other speech to text APIs, check out the blog post, which I'm going to be linking in the description box below. And also, if you want to find out more things that you can build with the API, check out the documentation. It is extremely detailed and has a ton of different examples. Thank you guys for watching and subscribe for more AI content.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript