Create Web App for Auto Meeting Minutes with Speaker Diarization
Learn to build a web app that transcribes and organizes meeting minutes using open-source models and Lama2, avoiding apps like Microsoft Teams.
File
The Secret to Instant Meeting Summaries Whisper Diarization Revealed
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: In today's video, I'm excited to guide you through creating a web app that generates meeting minutes automatically, sidestepping the need for specific solutions like Microsoft Teams. This app isn't just for online meetings, it's perfect for in-person gatherings too. Along the way, you'll master using open source models for speaker diarization and discover how to leverage the powerful Lama2 model on Replicate as an alternative to GPT. Getting speaker separation right is tricky. I've spent countless hours testing various models on platforms like Replicate and the performance was hit or miss. But don't worry, I've found a reliable solution that saves you from the trial and error. But first, let's set the stage. In our recent videos, we explored real-time transcription using the impressively fast Whisper model, displaying the results in a simple web app. In part two, available at AI for Devs, we enhanced the transcription and even translated it to other languages. Yet, we haven't tackled a crucial application, turning transcriptions into a detailed meeting protocol, highlighting key points, individual contributions, and main takeaways. Many of you have shown interest in speaker diarization, the technique to distinguish speakers in a recording. So in this video, we'll focus precisely on that using open source models. Our method involves uploading a meeting recording, either in WAV or MP3 format, to an AWS S3 bucket. Replicate then transcribes the entire file. And next, Lama2 steps in to transform this transcription into a well-organized protocol. We'll break down this process into three straightforward steps, making it easy for you to replicate the solution in your own projects. As usual, we kick off by setting up a virtual environment to isolate our project dependencies. Next, we proceed to create a new file named app.py. Following that, we install the replicate, Boto3, and pigments packages. These packages are then imported into app.py. Boto3 handles our communication with AWS, while pigments will enable us to elegantly display the transcription results on the console. Shifting focus to replicate, we explore models capable of speaker diarization, eventually selecting the whisper diarization model for its efficiency. Upon visiting the model's detail page, we navigate to the Python tab to conveniently copy the integration code. We proceed to paste this code into our application. Subsequently, we introduce a sample recording of a meeting into our project. Let's delve into the content of this meeting.

Speaker 2: Hello, everyone. Thank you guys for coming to our weekly student success meeting. And let's just get started. So I have our list of chronically absent students here, and I've been noticing a troubling trend. A lot of students are skipping on Fridays. Does anyone have any idea what's going on? I've heard some of my mentees talking about how it's

Speaker 3: really hard to get out of bed on Fridays. It might be good if we did something like a pancake breakfast to encourage them to come. I think that's a great idea.

Speaker 1: Firstly, we must transfer the sound file to a location accessible to replicate. Utilizing a bucket created in the previous part of this series serves our purpose here. We create a variable to store the file name and initialize our S3 client. Now we upload the file to our S3 bucket. Constructing the file URL is straightforward, utilizing both the bucket name and the file name. We then replace the hard-coded URL in the model's input data with our uploaded files URL.

Speaker 4: All right, let's test it.

Speaker 1: Encountering an error message due to missing authentication token prompts us to retrieve it from replicate. This token can be found under account settings.

Speaker 4: We set the token as an environment variable before attempting the upload once more. This time the upload succeeds, although the output format leaves much to be desired. To address this, we employ a helper method that presents the result as a colorized and

Speaker 1: formatted JSON string, enhancing readability. Before utilizing this method, we need to import the JSON module. The JSON module can be found in the S3 bucket. We then import the JSON module in the S3 bucket. We need to import the JSON module.

Speaker 4: Then we can replace the print statement with our new method.

Speaker 1: Running the application again yields a significantly improved output. Scrolling further, we find a summary that includes the speaker's name, enhancing clarity. Continuing our review, we encounter segments attributed to different speakers. The next step involves creating a simple web UI to facilitate file uploads and direct browser-based result viewing. This requires the installation of Flask.

Speaker 4: From Flask, we import Flask, Request, JSONify, and RenderTemplate. We initiate a new Flask application. Creating a root root, we serve an index.html file. Flask expects to find HTML files in a

Speaker 1: directory named templates, so we create this directory and add index.html within it. Let's paste the prepared HTML code. The UI components include a file input for uploads and a submission button. Below these, a text area displays the generated meeting protocols. At the page's bottom, we incorporate upload.js for back-end communication. Flask anticipates static files in a directory named static. We follow this structure, adding upload.js within a js folder. Flask also anticipates static files in a directory named static. We follow this structure, adding upload.js within a js folder. The JavaScript is straightforward. Upon clicking the submit button, it fetches files from the input element and sends them to the process audio endpoint via a POST request. Successful requests populate the meeting protocol text area with the results. Next, we define the process audio endpoint. Let's adjust the indentation of the following code to ensure it becomes integrated within the processed audio data method. Within the invoked process audio data method, we begin by reading the audio data. We then create a temporary file, writing the audio data to it, and upload this file to our S3 bucket. Let's quickly import the temp file module.

Speaker 4: Adjustments are made to the S3 upload call and file URL to reflect actual filenames. The prettyprint.json method becomes redundant and is removed. We fix the indentation of the model call, we incorporate code to run the server on port 8080, and as our final step, we aim to create

Speaker 1: an actual meeting protocol based on the transcript. To achieve this, we'll need to employ a second model, such as LLAMA2. Let's copy the code. We need to gather the streamed outputs from the LLAMA2 model into a consolidated result string,

Speaker 4: and to modify the prompt to ensure it not only summarizes the transcript, but also organizes it by key topics. Additionally, we need to modify the

Speaker 1: output to ensure that it's not just a summary of the transcript, but that it's also a summary of the actual meeting protocol itself. Finally, it should highlight significant contributions made during the discussion. At the end, there should be a list of follow-up actions.

Speaker 4: All right, let's return the result in JSON format and put it to the test. We choose our weekly meeting sound file and hit submit.

Speaker 1: The initial output doesn't quite resemble a practical protocol. It seems we may need to tweak the whisper diarization prompt for enhanced accuracy. For now, let's focus solely on outputting the segments, removing any restrictions on the number of speakers, and omitting a specific prompt. This adjustment can be revisited later to refine the results. Giving it another go. Much better this time. Remember, the outcome can be tailored to meet your specific requirements. Providing concrete examples of how you envision the protocol will lead to more effective results, ultimately saving considerable time.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript