Real-time Audio Transcription with Java and Assembly AI
Learn to transcribe real-time audio to text using Java and Assembly AI. Get setup instructions and coding tips for a seamless transcription experience.
File
Real-time Speech To Text In Java - Transcribe From Microphone
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: In this video, we'll take a look at how we can transcribe real-time audio into text using Java and Assembly AI. Assembly AI is a speech-to-text API which offers real-time transcription capabilities. The code that I'll be using in this video can also be found on Assembly AI's documentation page, which I'll be leaving in the description box below. To start off, you want to make sure that you have Java installed on your laptop or computer. And also, you want to make sure that you have an Assembly AI API key. To do that, all you have to do is set up an Assembly AI account. You can click on the link also in the description box below to sign up for a free API key. Next, in our Java project, we want to install Assembly AI Java SDK. To do that, you can make use of both Maven and Gradle. I'll be making use of Gradle. All you have to do is copy this line of code and put that into your build.gradle file. Now, to find the latest version of Assembly AI's Java SDK, click on this link right here, and you'll be directed to the central Sonotype website, where you'll be able to see what is the latest version of Assembly AI's Java SDK. As of making this video, it's version 1.1.1, so that's what we'll be using. So, this is my build.gradle file, where I've copied in the Gradle dependencies, and I've also mentioned the latest version of Assembly AI's Java SDK. Once we have done this, let's head on over to our Java file, where we can start writing our code. We'll be importing a couple of different libraries for this. The main important thing is Assembly AI's real-time transcriber, as well as a couple of different libraries from Java for processing our input from our microphone, as well as handling thread. The first thing that we want to do is create a thread, in which we'll be writing the main logic for our real-time transcriber. The reason why we're creating a thread is that we want our real-time transcriber to be constantly listening and transcribing audio, so that has to be done asynchronously, and that's why we're going to be writing that in a thread. Once we have done this, let's write a try block, in which we'll be now creating our real-time transcriber object. Now to initialize our real-time transcriber object, we need to set a few different parameters. First off is the sample rate, so let's set the sample rate to 16,000. Next we'll be setting something called onSessionBegins, and what this does is it tells the method what to do when our real-time transcriber starts transcribing. In this case, we want to print out our session ID, and next we want to define something called onPartialTranscript. Assembly.ai returns both partial transcripts and final transcripts. Partial transcripts are what you're seeing at the moment, so they're usually words, but final transcripts contain the entire sentence of what you've just uttered. Upon receiving a partial transcript, we have to write some logic on what we want our program to do, so in this case we wanted to print out what exactly is a partial transcript. So what we're saying at this point is that if we receive our partial transcript from Assembly.ai's API, upon receiving it, if it is not empty, then print out the transcript. Next, we also want to define what we want to do in the case of receiving our final transcript. So similarly, we would want to print that out as well. Next, we also want to write what we want our logic to be when we receive an error. So in this case, we want to print out the error as well. Now it's time to write code to stream audio from our microphone directly into this real-time transcriber object that we've just created. So let's go ahead and do that. Let's write a log message called start recording so that we know that recording has started. And now let's create our audio format. And let's set the sample rate to also be 16,000 similar to the sample rate of our real-time transcriber. And we want to set a couple of parameters here. Now we also want to create something called a target data line. So let's create a target data line. And we want to create a target data line. So now that we have written our code to capture live audio from our microphone and also to stream it directly into our real-time transcriber object, let's write some logic on how we can close our transcriber object once real-time transcription has ended. So first off, we want to print out a system statement saying that recording has stopped. And then we want to print out a statement saying that recording has stopped. To recap, we first created our real-time transcriber object, which will communicate to Assembly AI's API and send our real-time audio and receive real-time transcripts from the API. Secondly, what we have done is we have created, first of all, something called target data line. And what this does is it communicates with our microphone and gets audio. And it captures the audio from the microphone. And this chunk of code right here is converting that audio into byte arrays, which is then passed directly into our real-time transcriber object. Lastly, this code right here handles stopping our transcription and also ending the real-time transcriber. After creating our real-time object, we also want to make sure that we are calling the connect method of real-time transcriber. This is very important to get the connection started between the real-time transcriber and Assembly AI's API. So now let's go ahead and run this. Now let's test out Assembly AI's real-time transcription API in Java. As you can see, we are getting both a final transcript as well as a lot of partial transcript based on what we're seeing for every single word. And then finally, a final transcript. Now, you might not want to receive this partial transcript. If you only want to receive the final transcript, I'm going to show you just one line of code where you can stop receiving the partial transcripts. Once we are back in our code, what we want to do is remove this code, which handles on partial transcripts. And instead, what we want to do is disable partial transcripts. So this will completely disable partial transcripts and only print out final transcripts. Another thing that we want to do is we can actually set the time of end utterance silence threshold. So the end utterance silence threshold is at default 700 milliseconds. So Assembly AI's real-time API actually waits for 700 milliseconds of silence before deducing that you have stopped saying a sentence. Now, based on how fast you speak or how slow you speak, you might want to adjust that. So if you're speaking really fast, you can actually decrease that time to maybe 500 milliseconds or lesser. But if you're speaking really slow, you might want to increase that time to a thousand milliseconds. So now when we run our real-time streaming, we should only be getting our final transcripts after a thousand milliseconds of silence. Check out this video above to learn how you can build an AI speech bot using Llama3 and Assembly AI's real-time speech-to-text API.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript