Building a Real-Time Medical Transcription App with Key Data Analysis

Convert Your Audio To Text

4.9/5

3726 customer reviews

Learn to create a medical transcription app that captures real-time spoken data and identifies critical medical information using AssemblyAI and large language models.

Real-Time Medical Transcription Analysis Using AI - Python Tutorial

Added on 09/01/2024

Speakers

Add new speaker

Speaker 1: In this video we'll build a medical transcription analysis app which will be able to take in medical spoken data in real time and also understand key information within it. Let's take a look at exactly how our application will work. I'm going to click the button to start recording and I'm going to pretend to be a doctor talking about a recent patient visit that I've just had. Today I met with patient Mr. Garcia who is a 45 year old man. He has been dealing with a cough for about two weeks and he mentioned some mild chest pain. He did not have any fever and his breathing was fine. However I've prescribed a course of antibiotics to cover any possible bacterial infection. I've also ordered a chest x-ray and some blood work. We'll be meeting again next week to follow up to see how he's doing. So our application transcribes everything that I'm saying in real time and it also identifies key medical information such as medical conditions, any medicines which are being prescribed, or any tests and procedures that are needed. To build this all we need to do is make use of AssemblyAI's real-time transcription as well as its large language model framework Lemur which enables us to use different large language models for example Clot 3.5 Sonnet. There's two things you need to do to start building this application. First off you want to download the GitHub repo for this project and I'll be leaving the link for this GitHub repository in the description box below. And the second thing you want to do is sign up for a free AssemblyAI API key. The link in the description box below will allow you to do that and it also gives you $50 worth of free credits to get started. There are three important libraries that we need to download for this project. First of all it's AssemblyAI so you want to run this command pip install AssemblyAI extras in your terminal and also you want to install port audio as well. So let's actually go ahead and copy this and head on over to terminal. I've went ahead and created a virtual Python environment where I will be downloading all of these libraries. So once I'm there I'm just installing the libraries that are needed. So I've installed AssemblyAI extras and also now I'm going to install port audio. Finally we also want to install Flask. Now that you've installed these three libraries we can head on over to Visual Studio Code, open up that project folder that you've downloaded from GitHub and we can start writing our code. Once you're in Visual Studio Code in the project folder that you've just downloaded you should see three main components. First of all it's app.py where we write the main logic of our code and then next off is index.html which has the HTML code for our UI and also the communications between our app.py file as well as the front end. We also have styles.css which just contains the CSS for our UI. Now let's go ahead and open app.py. I've broken down our code into six different steps. Steps one two and three are already written and we will be writing steps four to six. Before we start that let me actually walk you through the first three steps. So in step one what we're doing is importing all of the Python libraries that we require for this project and that namely includes Flask and AssemblyAI. Next we also want to define our AssemblyAI API key. So here is exactly where you should be doing that. Next we have a few global variables that we want to define at the very top which is our transcriber object as well as our session ID. In step number three we have our prompt. So this prompt is really important and what it does is it essentially tells a large language model that you are a medical transcript analyzer. Your task is to detect and format words and phrases that fit into the following five categories. And these five categories are essentially what we're looking out for in the transcript and what we have stated in our UI as well. So that includes protected health information, anatomy, medicines, and also tests and procedures. So for each of this category when the large language model is identifying them it should format the text that it returns to us. When it receives a transcript and it processes them it also makes sure to format that text with HTML text and that's essentially how we are able to display on the highlights or different formatting on our application. Now let's move on to step number four which is real-time transcription. First off we are going to start off by defining the real-time transcriber. So this is the code for defining our real-time transcriber object. In this we define our sample rate as well as what actions we require it to do when it receives data, when an error occurs, when we first open the real-time transcriber, and when we close it. So these are all methods which are right here and we'll be filling them out right now. In the on open method we want to define the behavior of our real-time transcriber when we start a session. Here we want to define the session ID. Next is the on data method. In the on data method we want to define what we want to do with the real-time transcript which is coming in. So let's actually start writing that out. So what we have just defined is that if we don't receive any text or if we receive empty text, don't do anything. Next in the event that we receive a real-time final transcript which actually refers to a fully uttered sentence or whatever sentence that you have said before you took a break of at least 700 milliseconds. So all of that text what we want to do is send that full final transcript right into this method called analyze transcript. Now in the else loop what we're saying is hey if you didn't receive a final transcript so you're getting a partial transcript and in this case a partial transcript is word by word what you're saying. So individual words of what you're saying, what we want to do in that case is just print that out into our UI. In these two methods we're defining what we want to do in the case of an error as well as when we close our transcriber. Now let's go back over to transcribe real-time and complete this method. So now we can do transcriber dot connect and then we can also start a microphone stream. So here we're connecting the transcriber to our microphone and we're streaming it. Next in step number five we're going to be analyzing our transcript. Every time we receive a real-time final transcript we send it over to this method which will then send our transcript over to the large language model to get it analyzed. For the step we'll be making use of assembly AI's lemur framework and lemur allows you to use a bunch of different large language models and in this case we'll be using cloud 3.5 sonnet. In this case first off we're calling on lemurs task method and to do that we need to pass in three variables which is our prompt which is the prompt that we've written here at the very beginning. The second parameter that we're passing in is the input text which in this case is our transcript for every single sentence that we're uttering we're going to be passing that in. And lastly we're defining that we want to make use of cloud 3.5 sonnet in order to do this. At the end of this once we get our result back from lemur we want to pass that to our front end. So again we're going to be making use of socket IO to pass that result dot response which is the text back into our front end. Now we're in the final step where we'll put all these pieces together to define the logic of our overall app. So first off we are rendering our HTML. After that we're creating a method called handle toggle transcription and essentially what this does is every time we click on the button to start transcribing in real time it first checks if a transcriber is already in place. If so it makes sure that it closes that and then restarts a new one and also ensures that it is thread safe. After saving everything that we've done I'm heading on back into terminal to run our code. Once I open it on the local address this is what our application looks like. So let's start recording. Today I met with a patient Jane Smith who is 32 years old. She has been experiencing frequent headaches and some dizziness over the past few days. I've prescribed some ibuprofen to deal with her pain and I've told her to keep a headache diary so she keeps track of whenever she gets headaches. We'll review her symptoms in two weeks to determine the next steps. So here's our application which does medical transcription analysis in real time and it's also able to identify key medical information in real time which is extremely helpful for health care professionals. In the next part of our application we'll take a look at how we can save this key information directly into Google Sheets and how we can deploy this application to the cloud. If you're interested in watching the next part of the medical transcription analysis project check out the video above.