Get Started with Live Transcription in Your Browser
Learn how to implement live transcriptions in your browser using DeepGram's API with four simple steps. Secure your API key and start building today!
File
Get Live Speech Transcriptions In Your Browser With Deepgram
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello there, my name is Kevin Lewis and I'm a Developer Advocate here at DeepGram and today I'm going to show you how to get started with live transcriptions directly in your browser using DeepGram's Speech Recognition API. This project has four steps. First of all, we're going to request access and get data from the user's microphone. Second, we are going to create a persistent two-way connection with DeepGram that allows us to send and receive data in real time. Third, we're going to get that data from our mic and send it to DeepGram as soon as it's available. And then finally, we're going to be listening out for live transcriptions being returned from DeepGram and show those to you in the browser console. So let's get started. The first thing we're going to do is ask for access to the user's microphone. To do that, we're going to use this built-in API in most browsers. We're going to ask for access to a user's media device, specifically an audio device, so a microphone. And this will return a promise, which in turn will resolve to what is known as a media stream. So let's just console log that and see what a media stream looks like. So here's the page open in a browser. I'm going to refresh. And the first thing we see is that the browser handles requesting access to the microphone for us. And once we allow that, we see a media stream logged here. Now this is great, but in order to get raw data from the microphone, we need to plug this in to what is known as a media recorder. So we'll create a media recorder here, new media recorder. And in there, we're going to plug in our stream, and we're going to specify the output format that we desire. So that's step one. Next we're going to create a persistent two-way connection with DeepGram. We'll create a new web socket here, and we'll connect directly to DeepGram's live transcription endpoint. In here, we're also going to want to provide our authentication details. There's a few ways of doing it, but we are going to provide our API key directly here. Now we're going to, as soon as that connection is opened, start preparing and sending data from our mic. And to do that, we're going to hook in to the socket.onOpen event, like so. And in order to do this, we're going to add an event listener to the media recorder. So we're going to go mediaRecorder.addEventListener. And the event we're listening for is called dataAvailable, all lowercase, all one word. That will return the data from our mic. And we're going to go ahead and send that data. So this is great. How do we make data available? We actually have to start the media recorder. That's just one final line here, mediaRecorder.start. And in here, we specify a time slice. So this is the increment of time in which data will be packaged up and made available via the dataAvailable event. This is in milliseconds, so that's thousandths of a second. So I'll do this every quarter of a second. That's everything we need to send data to DeepGram. The other side of that is to listen for messages that are being sent from DeepGram to us in the other direction. To do that, we're going to listen to the onMessage event. There's loads of useful data that comes back in the returned payload. So here we pass it, and instead of logging it all, we're just extracting the transcript. And now we're going to go ahead and console log the transcript. At this point, you may show it to users or do something else with it, but that is actually all we need in order to do live transcription in the browser. So let me refresh, give access to our microphone, and we should see any minute now that transcripts are appearing right there in our console. How cool is that? And you'll see there are multiple phrases coming for everything I'm saying. There is an additional property in the returned payload that indicates when a given phrase is in its final form. So hopefully you found that interesting. That's how you do browser live transcription. Before we part ways, I just wanted to mention a blog post that we published not long before this video, which talks about best practices with handling your API key. So check the description out for that. And if you are going to use this in the real, make sure that you are doing something to protect your API key from being accessible to users and having too wide-reaching permissions. If you have any questions at all, reach out. We love to help people. We love to see what you're going to build with our speech recognition API. Have a wonderful day. Bye for now.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript