Build Real-Time Speech-to-Text Python App with LLM

Convert Your Audio To Text

4.9/5

3717 customer reviews

Learn to create a Python app for real-time speech-to-text, leverage large language models for analysis, and automate documentation with Google Docs integration.

Live Speech-to-Text With Google Docs Using LLMs (Python Tutorial)

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: In this video, we will build a Python application that does real-time speech-to-text transcription and combines that with a large language model for analysis. So as you speak, whatever you're saying is being transcribed in real time, and that text is being passed to a large language model, which in turn is doing analysis on it and writing it all into a Google document, like exactly what's going to be happening right here. Now, the interesting thing is that this is all happening in real time, and there's so many use cases with what you can do with this. You could use this to record interview notes or meeting notes and also a lot of different applications, for example, filling up forms based on a customer call. The opportunity for what you can do with this is pretty endless because we're making use of a large language model. So now let's get started with the project. This project can be broken down into three different steps. First off is real-time speech-to-text transcription with Assembly.ai's API. Second is passing that transcript into a large language model. And third is getting output from the large language model and writing it into a Google document and automating that. We will be starting off with step one, and in order to do so, we want to install some dependencies first. This is the link to Assembly.ai's documentation. I will be leaving this in the description box below if you want to take a look. Let's open Terminal. Once you've opened Terminal and created a virtual environment, if you want, what we want to do is install PortAudio. So brew install PortAudio. I'm going to copy this and paste this right here. Next, I will be installing Assembly.ai. So by doing pip install Assembly.ai with extras. Once we've done that, we want to configure our API key and start writing some code. So you can get a free API key on Assembly.ai's website, and the link for that will be in the description box below. Next, we will open Visual Studio Code and start writing some code. I've created an empty folder, and I have just a Python file in that called app.py. So let's start off by writing some code. So the first thing we want to do is configure our API key and also import Assembly.ai as AAI. So let's start off by importing Assembly.ai. Once we've done that, we want to write our API key. So AAI.settings. underscore API key equals to, and then you can just paste your API key here. Once you've done that, the next thing we want to do is create a transcriber function. So we want to deal, we want to actually create a couple of different functions that handle events from the real time transcriber. The first thing is on open on error and on close. So what this does is when our real time transcription starts, it prints our session ID. And in this, in the case of this function, it prints our error, if there's an error during the real time transcription, and this closes that. So what you can do is copy all this and paste it right here. The next step is to create another function which handles the data. So that is on data, we can just copy this and paste it right here. So what this does is if there is no transcript, it doesn't return anything. But in the event that you're actually saying something, and there's a real time transcript, it returns and prints that transcript. And finally, what we want to do is copy this code, which actually creates our real time transcriber. So this real time transcriber object defines the sample rate. And then we are also passing in all the functions that we created earlier on as parameters on what to do in the case of receiving data on error on close and on open. Once we've done all this, next, we want to connect the transcriber. So let's do a transcriber dot connect. So as of now, we have created our transcriber object, and we have connected it. But next, we want to pass input from your computer's microphone to this transcriber object. So that's what we're going to be doing next. Let's do this. So microphone stream equals to assembly AI dot extras dot microphone stream. So let's copy this and paste this here. And then once you have done that, you want to pass this into the transcriber object. So this, this returns the microphone stream. And now you want to pass this to the transcriber object. So let's do transcriber dot stream. Let's do transcriber dot stream and then pass in microphone stream. At the end of this, we are going to call the close function on the transcriber object. So let's do transcriber dot close. Once we've done this, we still need to make some a few changes to the on data method that we have here. So in the on data method, you'll notice two print statements, the first print statement actually prints out something called real time final transcripts. So this is often full entire sentences that it's going to be printed out, what we want to do is actually remove the second print statement in the else loop, because this one actually prints out partial transcripts, which we don't necessarily want, because it might just look messy when we actually try to print it out. So let's remove this. And once we've done that, we can actually save this by clicking on s. And we're going to be running this by opening terminal and typing in Python app dot p y. And make sure that you're running this in the same project folder that you have your app dot pi file saved in and hit enter. So this is a test of the real time speech to text transcription of assembly is API. And as you can see, we get full entire sentences being printed out and not words individually being printed out. So this is a great test. Next up, what we're going to be doing is passing this transcript to assembly is lemur framework, which means that we will be sending this to a large language model for processing. So let's take a look at the next step, what we want to do is at the top of our Python file, right under your API key, what you want to do is create a function called lemur call. So lemur is assembly is large language model framework. And we're going to be using that and passing a prompt so that we can create a task in which lemur will take a look at our transcript. And based on our prompt, we're going to get an output. So let's pass in our transcript as a parameter to this function, as well as previous responses. And we'll define this later on. And what we want to do is create a variable called lemur. And then assembly I dot lemur. So we're creating a lemur object and then and now we'll create a parameter called input text. And inside of input text, we are going to be passing our transcript. And next up, we're going to be creating a prompt to pass to lemur. So in this prompt, we're going to write something pretty long. But this prompt actually helps us to define what type of output we want from lemur so that you get to control what type of output is coming into your Google document at the end. So this is a great place to define all of that. So we'll start off by saying you are a helpful assistant with the goal of taking diligent notes. Your task is to create bullet points about what is being said in the life transcript that in the life transcript that will that is being sent to you every 15 seconds. You can adjust this time later on. And also, let's start let's also mention that respond only with the newest bullet points and do not include any of the older responses. You'll also see here are the bullet points you have given so far. And then here, we'll, this is where we'll give our previous responses parameter that we pass to this function. So the logic of this, this prompt is we're also giving it the previous responses that have been generated by lemur. Also, let's mention that it should avoid making up any information, not present in the transcript. And this is just a good way to control any hallucinations that might occur with large language models. If unsure of the answer, return nothing until more clear information is provided. And since a lot of large language models like to include preambles, such as, oh, here are my notes and are based on previous transcripts, what we can do is also mention that, you know, avoid any preambles. And, and also remove all and also remove any text formatting. I mentioned this just so that when things are being sent over to Google Docs, it looks more uniform. So this is a pretty good prompt to start off with. Once you've written all of this, feel free to include whatever else you would like to include in the prompt, you can make it more specific, like you could mention things like you're a helpful assistant with a goal of taking diligent notes related to healthcare or your customer service bot, and you should come up with things based on this call. So a prompt is essentially whatever you make of it, you can write whatever is super helpful, based on what is being spoken about in real time. Anyway, once we've done that, let's actually head over and start writing the main gist of trying to send this over to the large language model. So we're going to do a try block try response equals to lemur dot response equals to lemur dot task. And we're calling the task function of lemur because we wanted to do a task for the task endpoint, we have to pass a bunch of different parameters. First off is prompt. And since we've already defined it up here, we just have to put prompt equals to prompt, and then input text equals to input text that we've already defined. And then final model. So the final model is where you get to define what type of model that you want. lemur has a bunch of different models, lemur supports cloud 2.1 by n tropic. And also Mistral 7b, what we're going to be using is just a default model. And then we're going to leave, we're also going to mention a max output size. And for the max output size, I'm just going to set it as 3000 tokens. And once we've done this, we are going to just put a print rest print statement to print the response. And then also return response dot response. And what this does is actually just returns the response. And also, let's do some error catching. So except exception as E. And then here, we'll just print error. Now that we have the lemur call function ready, what we have remaining is another function which collects all of the real time transcripts which are coming in, stores them and then sends it to the lemur call method every 15 or 30 seconds, whatever is defined by the user. So that's the function that we're going to be creating. First off, we have to create a class for this. So I'm going to call this class, the transcript accumulator. And what we're going to do is first define it. And we'll have we'll have some properties for this class. So first off transcript and also previous responses, which we previously defined. And also will define a property called last update time. So we're able to keep track of when was the last time we actually sent a request lemur and when is the next time we should be doing that. In order to do this, let's import time method. Let's import the time library. And we'll have another function inside of this called add transcript. And for this, we will create self and then also transcript segment. So let's assign self dot transcript is a sum of space and then transcript segment. And current time is equals to time dot time. So if current time minus self dot last update time is greater than 15 seconds, what we will do is now we'll actually call lemur. So we'll first do self dot lemur output equals to and now we call lemur call which is the function that we previously defined with the prompt and to the function lemur call we will have to pass two parameters, right? So the first one is the transcript parameter. And the second one is previous responses. So what we're going to do is pass in self dot transcript, followed by self dot previous responses. Once we've done that, what we're going to do is update the previous responses variable to have the latest lemur output. And then also we'll update self dot transcript to be empty once again. And we'll update self dot last update time equals to current time. So once you have this, what we want to do is actually create an instance of this transcript accumulator class. So let's do transcript, transcript accumulator equals to transcript accumulator. And next, I'm going to show you exactly where in the code we're going to start using this. So we want to head on over to the on data method. And inside of the on data method, right in this loop, what we want to do is to transcript accumulator dot, and then we'll call the add transcript method that we created inside of that class. And we're going to pass in the current most updated real time transcript that we get. So that will be transcript dot text. So what this does is it sends in that real time final transcript of each sentence that you're saying. So for every single sentence that you utter, we're calling add transcript and add transcript essentially what it does is it adds adds all the sentences together and stores it into transcript until 15 seconds or 30 seconds have has passed. And once 15 seconds has come up, what it does is it then sends that transcript to lemur. And then lemur carries out the task that you've defined based on your prompt. And then it prints out the response. So now all we have to do is run this in terminal to give it a shot. So as we're running this, we can see that our real time transcription is working. And probably within 15 seconds, it's going to be sending the trend that fully accumulated transcript to lemur. And from which lemur is going to do the task that we requested in our prompt, and we should be getting a response back from lemur. And it's right here. So this, this thing right here, starting with request ID, and then followed by response is the response from lemur. Now, all we have to do is connect our application to Google Docs using the Google Docs API. And that way, we will be able to write to a Google document. And we were able to actually pass all of this like text that lemur is sending us right on to Google Docs. So let's go ahead and do that. So the first thing that we want to do is actually create some credentials on Google Cloud Console. So go on to console dot cloud.google.com and create an account if you don't already have. Once you have logged into your account, the first thing you want to do is create a project, you'll be prompted to create a project if that's your first project. So click on this and click new project. When you click on new project, you will be given a page which looks something like this. I've just given a project name as real time lemur, you can name it anything you want. And also you can leave this empty, or you can select something and then click Create. Once you have created this project, what you want to do is open the menus right here and then go on to API APIs and services and click OAuth consent screen. So once we're on the OAuth consent screen, you'll be asked to select a user type. For now, just select external and click Create. Once you've done that, you're required to fill in a couple of different information. First off an app name, I'm going to give it the same name as my project. And then here, you could fill in your email. And also right here, it's really important that you give your email as well. All of this is optional. But if you would like to actually make this public, you can go ahead and fill this as well. The next page is about scopes. Right now you don't have to define anything. So just click Save and Continue. And for here as well, I'm just going to leave this empty and click Save and Continue. Once this is done, you can go back to the dashboard. And at this point, what you want to do is go to credentials. Once you're in credentials, what we want to do is create an OAuth 2.0 client ID. So we want to click Create credentials. And we want to click the second option of creating an OAuth client ID. So let's click on that. And for this, let's select desktop application and click Create. So once it has been created, you'll be given a page which looks something like this. Now what you want to do is actually download this JSON file and store it in the same exact project folder in which you have your app.py file. So download this and store it in the same project folder. Once you've downloaded your JSON file for your credentials, the next thing that you want to do is actually enable Google Docs API for your project. So in the search bar right here, all you have to do is search Google Docs, and click on Google Docs API and enable it. So this enables the Google Docs API for your project. Next, what we have to do is actually install the following Google APIs. So we want to run pip install, and install all of these, what we can do is just combine all of them into a single command. So let's go ahead and run and install all three of these Google libraries. So let's paste this right here in terminal and hit Enter. So that should download everything that we require. Once you've installed all of these APIs, just check to make sure that you have your JSON file that we downloaded from the Google Cloud Console. I've renamed it to credentials so that it's easier to import. And once we have that, we can actually start writing code to import and write to Google Docs. So let's get started. Next, let's also start to import all the Google libraries that we need for this. So let's do from Google dot odd dot transport request, import request from Google dot OAuth to dot credentials. Import credentials. And also from Google underscore odd underscore OAuth, I read dot flow. We're going to import installed app flow. And from Google API, client dot discovery, we're going to import build. And also from Google API client dot errors, we will import HTTP error. Once we have defined that, we need to define our client file. And our client file is essentially this right here. So just copy its relative path and paste it there. And then once you've done that, we also need to define our document ID. So document ID is essentially the ID of the Google Docs document. And you can easily get that by opening a Google Docs and looking at this code right here from the URL and copying it and pasting it. Next, we will also need to set the scopes. So for this instance, scopes refers to which endpoints do we want to use. And since we need to use the documents API, this will be our scope. So Google apis.com slash art slash documents. And that is all of it right there. Next up, we're going to create a function called update Google Docs, which is going to write to our Google Docs. So right, let's write a def update, Google Docs. And for input, let's just write content. So the code for this can easily be found on Google Consults website. So they have an extensive documentation on exactly what we require. So I'm going to be linking this in the description box below. So make sure that you're on the Python section of quick starts right here for Google Docs API. And make sure we're on the Python section. And once you scroll down right over here to configure the sample, all you have to do is just copy this part right here. From credits to print error statement. And we're going to paste it into this update Google Docs method. But we are going to have to make some changes. So first off, we can rename this and just put it to client file. And then what we have to do is make some make some changes to this line of code right here, we can actually get rid of this because this actually retrieves the document ID, which we don't need, we want to actually write code into the document. So here, we're going to write request equals to square brackets, then curly brackets. So let's do insert text. And another bracket will do text is equals to content. And then end of segment location. So what this means is we're asking Google to insert this content into the end of the page. And then we'll do result equals to service dot documents dot batch update. And then we'll do document ID is equals to document ID, comma, body equals to request. And then we'll input this request JSON that we just created dot execute. And we can also remove this print statement right here, we don't need that. And we also have a missing line of code right here. And we have one more missing library, we need to import OS. So this whole chunk of code is essentially verifying our credentials, JSON file, and then also creating another token dot JSON file for the scopes that we have defined for using Google Docs. But this main part right here is where we actually insert text from lemurs content into Google Docs, we still have to make one more change. And that is actually in the define lemur call method. And in this define lemur call method, what we're going to do is actually call the update Google Docs function. So let's go on to the lemur define lemur call method. And then we're going to go all the way down to this print response method. And we're going to paste update Google Docs. And we're going to ask it to update with response dot response. Before we hit run, I just want to point out a few mistakes which I found. It's insert text txt, just make sure the spelling is right. And also for this variable right here, document ID, it's I capital, and the D is lowercase. So once you have all that, just hit save. And we can head on over to terminal. So now I have terminal open as well as Google Docs, we have an empty Google Doc Docs, which is linked to our project. So I'm going to run our project by doing Python app dot p y. And now we've actually fully built our application using assembly is real time speech to text API along with their LLM framework lemur. So it's able to actually take in the transcript from the real time speech to text API, pass that into a large language model, and analyze our transcript based on the task that we have given it. And the output from that large language model is being passed to Google Docs. Let us know what you thought about this tutorial in the comment section below. And if you're interested in watching more tutorials like this, check out this video above on how you can build a talking speech bot in Python in just under five minutes.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3717 customer reviews

1/730

Verified Order

“Very accurate transcription, fast service, easy to use and order, thank you!”

Gabby

Jul 15, 2025

“I am beyond happy with this service, which I am using it produce interview transcripts for my dissertation research. The interface is easy, the customer service was prompt and informative, the transcript is accurate, and the pricing is wonderful. I will recommend GoTranscript to anyone who is in need of affordable human-powered transcription services.”

Justin McDonald

Jun 29, 2025

“great work. quick and professional”

christian oradesky

Jun 28, 2025

“Very quick turnaround and nicely done!”

Chris Irwin

Jun 27, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support