Effortless Speaker Identification with Assembly AI

Convert Your Audio To Text

4.9/5

3720 customer reviews

Learn how to transcribe audio with speaker labels using Assembly AI's diarization model, perfect for meetings, podcasts, and multilingual support.

Speaker Diarization In Python - Transcription with Speaker Labels

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: In this video, we'll transcribe an audio and video file with multiple speakers in order to accurately generate a transcript which contains speaker labels and what each speaker has said. So instead of getting a transcript like this, which just contains all of the text, you are going to be getting a speaker labeled transcript. And to do this, we are going to be making use of Assembly AI's speaker diarization. The speaker diarization model lets you detect multiple speakers in an audio file and what each speaker has said. This is especially useful for transcribing meetings, podcasts, or any audio file with multiple speakers. To do this, we're going to be making use of this code right here in Assembly AI's speaker diarization docs. So you can click on the link in the description blocks below to take a look at that. There are code examples available in many different languages and also you can easily run this in Google Colab by clicking right here. Before we get started, you also need an Assembly AI API key and you can do this by clicking on the link in the description box below which will allow you to create a free API key which also gives you $50 worth of free API credits. The second thing we want to do before heading over to Visual Studio and writing our code example is installing Assembly AI's Python SDK. I have already created a virtual environment and I've activated it. So now I'm going to be installing the Python SDK by doing pip install Assembly AI. Once in Visual Studio code, I have imported my Assembly AI Python SDK and also I have defined my Assembly AI API key. So instead of writing this here, you would just write your API key here instead. Once you have done that, let's actually define our audio URL. So this audio URL is the URL of the audio file which we want to transcribe. In this example, I'm going to be transcribing a Zoom meeting. If your file is available locally on your device, you can still use it by writing its relative address and putting it right here instead. Next, I'm going to be creating a transcription config. In this transcription config, I'm going to be setting the speaker labels parameter to true. Next, I'm going to be generating the transcript. Next, I'm going to be generating the transcript. To generate a transcript, I'm going to be calling the transcribe method. I'm going to be passing in the audio URL as well as the config that we just created. Finally, let's actually print out our transcript based on the speakers. So our transcript will contain speakers as well as the corresponding text which they've uttered. So this is a great way to format our output. Once in Terminal, I'm going to be running our Python file by typing in python speaker labels.py. And so you'll get an output with the transcript labeled with each speaker. And that is the fastest way to do speaker diarization using Assembly AI. If you're transcribing a really long audio file like a meeting or a podcast and you already know beforehand the number of speakers in that audio, you can actually specify that in the config beforehand to increase the accuracy of speaker labels. Another really cool feature is that speaker labels is supported for almost 20 different languages in Assembly AI. So you can make use of audio files in all of these different languages in order to generate speaker labels. Check out this next video right here on how you can apply large language models on audio recordings with multiple speakers.