Speaker 1: You
Speaker 2: There are some additional features, but that's the crux of it. And so you can use this API in a number of ways. You can use it for command and control, where, you know, you, like your voice assistants, I'm not going to say their name, you know, to command a piece of software to do something. You can use it to improve accessibility of your apps or media on the Web. You can use it for analytics, because now, for the first time, you can take large amounts of voice data, which is traditionally very hard to interact with as developers, and, you know, in a programmatic way, get transcripts for it and then run analytics against it. Or you can also use it for predictive support. So one of our common use cases is to support people who are in customer support. They have someone call in, and while the customer is speaking, we're listening to what they're saying and suggesting to the support worker what they might want to suggest. So I have got my eye on the chat, so I'll be looking at it periodically. I see a question here, what are APIs? APIs are application programming interfaces. They provide a way for you as developers to interact with services like DeepGram. So before we kind of launch into today's workshop, I wanted to talk to you a little bit about where I'm kind of expecting you're coming from in terms of skill level. You don't need to know a lot of code, but I am going to assume you know a little bit of JavaScript. If I run a function, you aren't going to be necessarily that thrown by the syntax. I'll try as we go to introduce concepts. If anything I'm writing isn't too clear to you, please let me know in the chat. As I say, I've got my eye on it right here. So there are two ways to use DeepGram. Firstly, with pre-recorded audio, so you already have a final audio file, which is either on your machine or it's hosted somewhere on the web. Or you can do live transcription using WebSockets. Yeah, there are these two modes of using it. We have a set of SDKs. At the moment we have SDKs for Node.js, Python, and .NET. But we also have a REST API and a WebSocket interface. So any languages that you are writing in, you can use DeepGram for. We also have challenges at local hack day as one of the sponsors. So we have ‑‑ sorry, I'm keeping an eye on the chat as well. Oh, it's moving very lovely and quickly, which is wonderful that you're also engaged. So we have three challenges coming up this week for local hack day. One today, Tuesday, one tomorrow, and one on Thursday. Today's will be to get started with DeepGram. Tomorrow's will be about mashing up DeepGram in another API. And finally, there's an accessibility challenge on Thursday. I'm hoping when people say I'm lagging, you can still hear what I'm saying. If that becomes a little bit of a challenge, please do let me know. So today, what are we doing today? Today we are going to get started with DeepGram. I'm going to show you how to do prerecorded transcriptions. I'm going to show you how to kind of do the live transcriptions directly from your browser. And to do that, we're going to use JavaScript and the Node.js SDK. You'll need a DeepGram account. So right here, you'll need a DeepGram account. If you head to DeepGram.com, I'll pop that in the chat here. You'll need one of those accounts if you haven't already got one. When you sign up, you get $150 of free credit to get you started, which should last you quite a while. So if you haven't already, head over to the site, sign up for an account because we'll need it in a little bit. I'll give you an opportunity to do it later as well. But you may as well do it now. What else is worth noting here? We're going to use Glitch. Someone's asking how long does the $150 last. It depends on usage. I've worked at DeepGram for three and a half months, and I think I've used like 20 bucks. So it will more than happily get you through local hack day. We're going to be using this platform called Glitch in order to build our project today. Glitch is an online code editor and a runtime environment. So we're going to be building a Node.js app. You could also build this on your own machine. But I think for today we'll use Glitch for this workshop and have lots of reference points that you can use later. So I think with that, we're kind of ready to go. So I'm going to pop a link in the chat. And what this link will do is remix this Glitch project. What does remixing mean? It means that it will take this kind of scaffold project that I've made with a few files already set up for you, and it will create a copy of it that you can then use for today. So I will pop that here in the chat. And you can go ahead and do that. So I'm going to go ahead and type that in. So dpgr.am slash DeepGram dash 101. So you do that. It will expand into a remix link, and it will create a copy of the files that I've set up for you. I'm just going to shut my office door. There we are. So I'll give you just a moment to do this while I explain what this Glitch interface is. So on the left-hand side, we have a file browser. So I've set up some basic files for you, some stuff that I don't necessarily want to spend too much time on today, a little Express application. We've also got this reference folder here. At certain points in this workshop, I'll let you know. The code, as it should stand at that point in time, will be in the reference, so you can use it throughout the rest of Local Hack Day. And we've got this little boilerplate HTML page that we'll be using towards the end of this workshop. So I'm seeing some questions. So Glitch is an on-code editor and runtime environment. It allows us to build applications and run them straight inside of your browser. You equally can do this in your own code editor. But the nice thing about Glitch is what is known as remixing, where you can take the project that I've set up ahead of time and create a copy of it to work on, completed with all of these kind of boilerplate files I've created for you. So hopefully you've remixed Glitch now. And with that, we're kind of good to go. Let me just explain a few more of these files, notably this .env file here. So for those of you who may not have used them before, environment variables are a way to take sensitive information inside of your account, inside of your application, and put them in a separate file. And then where's the link to the Glitch? I will periodically post it in the chat, but I did just pop it above. Thank you very much, chat, for being proactive and getting there before I did. So environment variables. So basically, we're going to put our API key inside of this environment variable file, and other people can't see this file. I will show you my key for the sake of this workshop, but in practice, no one should see your .env file. We will need to populate this. So let's go ahead and show you around the DeepGram console. Can I download this project for offline use? Yeah, somewhere there's a download link. Offline, though, no. DeepGram requires an internet connection because you're going to be making API calls and establishing WebSocket connections. But if you mean not inside of Glitch, absolutely. So what are we doing here? We're going to go to the DeepGram console. This is what it looks like when you're getting ready to sign up. Folks, I see messages in the chat. You don't need to repeat them. So I've got one here, which is around translation. DeepGram is a transcription API, though in a couple of weeks I'll be releasing a blog post about translation and kind of using another API and making it play nice with DeepGram. So I'm going to go ahead and log into my existing account here. And this is what the DeepGram console looks like. So inside of here you can have a number of different projects. When you start an account, we create a project for you, so you wouldn't need to create one. And what we're going to do is head over to Settings, head over to API Keys, and create a brand new key. So I'll call this Local Hack Day. You can set the permission level for this key. The reality is we only need the lowest permission, so we'll use Member. And then you can choose to set an expiration date. Now, because I'm going to show you what this key is, I am going to set an expiration date. I'm going to say two hours, and that means this key will no longer work after two hours. And then you'll see your API key secret here. Now, just to note, again, you shouldn't let other people see your API key. I'm doing it for the sake of the workshop. Heading back to Glitch, we pop that inside of our environment variables file. And that's now set up. Just taking a look at chat, we're good. Now we're ready to get started with writing some code. So just to recap what we've done so far. We've remixed our Glitch project by following the URL that I'm once again posting inside of the chat. And what that does is takes a project that I set up for you already that just has some dependencies installed, has a little bit of boilerplate code already written that isn't really the focus for today, but just a copy of it for you to work on. Then we've gone to the DeepGram console and generated a new API key. You shouldn't be sharing that API key with anyone because API keys allow people to interact with your account. Dependent on the permission level of your API key could allow people to add other people to your account, to change billing information, and so on. So you want to keep a close eye on that. And Dusty Donkey actually gave an amazing analogy in chat. You don't give your house key to a stranger, right? Same concept. Love that. I will quite happily take that to future workshops. I love that. And now we're ready to go ahead and start writing some code. So this is our server.js file. This is kind of the hub of our application. I'll just talk you through this code we've already written here. I've included a URL directly to an audio file just so we don't have to mess around with getting it from chat, getting it from elsewhere. Harshit says, what should be ticked under set permissions? As long as you don't share your key, it doesn't really matter. But generally, API key permissions should have the least permissions that are necessary. The lowest one, I believe, is called member. So that will be what you want. Right. We set up a little Express.js application. That will allow us to make API calls to this later. And then we're setting it to run. And our further code will go in here. So the first thing we're going to do is require and initialize the DeepGram Node.js SDK. And then we're going to get a transcription and see what it looks like. I see a question around pasting the secret that goes inside of the .m file here, the one with the little key next to it. So the first thing we're going to do is require the Node.js SDK. The way we do this is by saying const DeepGram in squiggly brackets equals require and then at DeepGram slash SDK. At DeepGram slash SDK is the name of our package in NPM. I've gone ahead and installed it for you by remixing this project. What are these squiggly brackets all about? Without going too deep into things that aren't immediately related to this workshop, this is actually a property of the exported package. This would also be valid like so. But because we just want to pull out .DeepGram, we can go ahead and pop that in squiggly brackets and omit this on the end. Okay. So now we've required the Node.js SDK. Now we need to go and initialize it. And the way we do this is by creating a new variable. Call it anything you want. I'm just going to call it DeepGram lowercase d. New space DeepGram with a capital D. And in here we put our API key. This is where the environment variable comes in. Because if I put my API key directly in here, other people can see this file. Other people can navigate to this glitch URL. But by storing it in the .env file, we can access it without directly putting in the value. Process.env.DG underscore key in capitals. D underscore key correlates with the name of the environment variable here. Thank you very much, Joe, for posting about the dark mode. Quite a few people were asking about that. And now we're ready to go and get our first transcription. And this is actually a reasonably straightforward set of steps. We're going to just make one request, wait for the response, and then console log what is returned. So what we're going to do here is DeepGram dot transcription. Transcription. As someone who works at a speech recognition API that creates transcripts, the word transcription always gets me. I have to look at it really closely to make sure I have spelled it right. Dot prerecorded with a capital R. Like so. DeepGram dot transcription dot prerecorded. We're using the lowercase d because this is the initialized client with our API key. And the first argument is going to be an object, so squiggly brackets, and in there we need just one property, URL. So in here we put a direct audio file. I've already stored one for you in this variable called audio file URL. So we can just pop that in there. And this is how you ask for a transcript. This returns a promise. So in JavaScript there are a few ways to handle promises. Async, await. An easier way to kind of do it in this context is going to be dot then. So dot then. Data. That's a function. So angled, not angled, arrow, squiggly brackets. And let's just console log this and see what happens. Console dot log data. Console dot log data. Now, if you're struggling to keep up with this, don't worry. This is one of those checkpoints where if you go to inside the reference folder, 01 basic dot JS, that contains the code you should have written by now. So if you're kind of struggling to follow along or to copy and paste that code, you can go ahead to 01-basic JS in the reference folder and kind of use that and copy that over at this point. Equally, you can refer back to it later. So what we're going to do now is open up the logs. And that's down here in Glitch. And we'll see what has been returned from DeepGram is this object here. And this object has this metadata property, which contains this models property, you know, a unique request ID. And it also has this results object here with channels and an array inside of it. Now, I kind of want to see what's inside of results. And so we're going to use a little bit of JavaScript here in order to expand this object. Instead of console dot log, we're going, so how did you run it? A really good question. Glitch will automatically save and run your code. So I didn't have to run it. I just had to write it and then open up the logs. If I make one character difference here, it will rerun the code. So I didn't need to run it, and neither do you. Once you've done this, I just hit logs here at the bottom. Great question. So instead of console dot log, we're going to go console dot dir, put in the data, still, and as a second argument, depth null. And this is just a way in JavaScript with no JS of it. And you see there are errors because every character I write is going ahead and rerunning that code even when I'm not quite done typing. If you can't see logs in your glitch for some reason, a way of just kind of forcing it to rerun, just pop a little space in. It will save. It will rerun. So you see it there listening at 3,000, and there's our new result. So console dot dir, with the first argument being the returned value and the second one being an object with depth null, this means that it doesn't kind of abbreviate our console log here. So we can see the whole thing. Let's take a little look. It's a reasonably long response. Okay. So we've still got that metadata object, and we have results. It has an array of channels because you can pass in multi-channel audio, which this is not, so there's only one. And inside of that, we have a property called alternatives, and that in turn is an array. You can ask DeepGram for multiple interpretations of what was said, and it will return all of those if you want to compare them. But by default, it will return just one, and in here, we have a further object, and this is the one that contains the data we care about. There is this big block of text, which is what was said in that audio file. We also have this words array here, which shows each individual word, when it started, when it ended, and the confidence that that interpretation was correct. So that's what's returned from DeepGram by default. Where can we hear the audio file? Whack that URL into a new tab. It will download the file, and you can open it up. So that's all you need in order to do speech recognition with DeepGram for prerecorded audio. Can I play the audio file? I'm not going to do it on stream just because I don't think I've set myself up correctly to play audio through into the streaming software we're using. I encourage you to play the audio file, however. Okay, so now we're going to talk about the features that DeepGram has. Now, there is a whole list of features that we support. If you head over to readme.md, I've actually provided this link here to our documentation around all the features we have. I'm not going to show you all of them, but I'm going to show you a few that you might find interesting for the sake of a hackathon and how to apply them. Now, this first object that we've passed in here to DeepGram.transcription.prerecorded is the file itself. But you can optionally pass in a second object specifying the features that you want to use to make sure that the transcript that comes back is as useful as possible. Can I explain a bit more about confidence? Just super quickly, not to derail. Under the hood here, we're using deep learning. There are lots of very, very clever machine learning experts in DeepGram and linguists who could probably do this definition a lot more justice than me. But with any machine learning model, the returned result can only have the degree of confidence that it was correct. Yeah, if it really matters, if the accuracy really matters, you could have a cutoff point that you will accept in your application. I did terribly at that description, but I'm not going to spend much longer explaining that. Cool. Right, so we have this first parameter, which is an object with the URL, and the second parameter can contain some features. I'm going to show you a few of them now. The first one I'm going to show you is punctuate. Punctuate. Punctuate. Yeah, I think that's spelled correctly. Punctuate true. This one kind of does what it says on the tin. As well as returning the word, it also returns a punctuated word. You see there, capitalized, full stop. And that big transcription up at the top, let's use this. Should have used a shorter audio file. You see here has some punctuation built into it. If you're displaying transcripts for users, it's probably worth turning punctuate on. So punctuate true simply adds one property to the return value and alters the transcript. Next we have utterances. Utterances true. You can think of utterances as phrases. By default, DeepGram will give you the big block of transcript, and it will give you every single word. But there's somewhere in between there that is also useful, which are phrases. So here, as well as, let me find it here in the returned value. This is a really long response. Oh, I think it's longer than I can get in here. As well as returning each individual word, it will also return an array of utterances. Each one of these utterances, in turn, will have a start and end, will have a transcript, and will have a set of words. So you've kind of got this middle ground here between everything and individual words. DeepGram organizing them into phrases. And, again, without derailing too much, I can't remember what it is. There's a default amount of silence that we will take to mean a new utterance. There is also a feature here where you can alter that, make it more or make it less. Yeah, I could just log the results. In fact, I might do data.results. But I still think it will be too long for the log here. Yeah, still too long. But it is at data.results.utterances. It's how you get access to just the utterances. Anyhoo, let's crack on. The next one worth showing you is diarize. And you can add as many or as few of these features as you want. So how do you get the whole audio file as a paragraph? That's actually what we got by default with no features. At the very top level of the result is a transcript property, and that contains the entire audio file's worth of transcript. Right, so, yeah, the next one I want to show you is diarize, which is very cool. So diarize true. Diarization is a word that apparently a lot of people in speech recognition know, but I didn't know. And it is separating an audio file into individual speakers. So what diarization will do is it will add on a word level and on an utterance level this new speaker property. Zero index, so speaker zero means it was the first speaker. But if it starts to recognize a second voice, they will be in here with speaker one. And in turn, you can build more complex applications off of it. Now, if you're even thinking about how you present transcripts back to users, a combination of these three features, punctuate, utterances, and diarize will give you the data you need to display something a bit more meaningful to your users. There are some other features here. There's keyword boosting that makes DeepGram more likely to hear the words that you specify. There's search that, as well as returning the whole transcript, will show you the phrases that happened that contain a certain search term. So if you're trying to search within a transcript, you can do that. And plenty of others. I encourage you to take a look at the features documentation, which I've linked inside of readme.md. A couple of questions from the chat. Is there a way to filter different voices and give different variables with their respective statements? I'm going to need you to rephrase that question because I'm not quite following the ask, but I'm more than happy to answer the question once I get it. Does channel mean audio that we hear left and right separately? Fantastic question. Multi-channel audio is simply audio that is encoded with multiple channels. A very common way is, yes, left and right will be on two separate channels. But that isn't always the case. For example, if you take a recording of a Twilio phone call, Twilio will encode each of the speakers on its own separate channel. Can we do live transcript instead of using an audio file using DeepGram? Heck yeah, and it's what we're about to do now. So for lack of a clarification on a previous question, I'll more than happily swing around and answer that later. Let's move on and talk about live transcription in the browser. Oh, right, there is some movement in the chat. Looks like Satish is asking if the API can differentiate between two people having a conversation. Yes, that's exactly what Diarize does. Here you see speaker zero. If it was a different speaker, it would be speaker one. SmartMind12, error, DG API key is required. What to do here? We covered this earlier on inside of the .m file here. Make sure DG key has a value, and that value comes from your DeepGram console. What person one said is transcripted and stored in X, and what person two said is stored in Y. What we provide is an array of utterances, and each of those utterances have a speaker value. So you just need to write a little bit of code once the result comes back that separates those out into separate values, into separate arrays, presumably. Right, let's talk about browser transcription now. So I've set this up in such a way that all you need to do is head to index.html. Index.html, and that will be where we do the work today, inside of public index.html. Let's talk a little bit about what's happening here. So the code that's running here is being rendered in the web page here. Glitch has this feature we're going to use, though, where we're going to click these three buttons, and we're going to open it in a new window. And I think this will make it a little easier to work with, and I'm actually going to close that preview. Okay, let's talk about what's going on here. There's a little bit of styling up top. Take nothing of it. It's not a DeepGram thing. I just fancy using these colors and fonts because they're DeepGram colors and fonts. These four lines don't add anything to the application functionality-wise. This is the DeepGram logo, which we saw here at the top. And then we have this script tag, which is where we'll be doing the work. The only thing I've stored here is the DeepGram live transcription endpoint URL, again, to save us having to kind of tediously type it or copy and paste it from chat in a bit. So let's talk about what we need to do here. There are four steps, I think. The first step is going to be getting access to a user's microphone, requesting access to it, and actually getting the live data directly from the mic. My mic's here, which is why I'm referring to my mic as being here. The second thing is we're going to create a two-way connection with DeepGram using WebSockets, which allows us to fire over data in real time and also receive transcripts in real time. The third thing will be to actually take that connection and start sending data from our mic, preparing it and sending it. And finally, when DeepGram returns a transcription, we want to display it here in the web page, I guess. Display it here in the web page. So they are the four steps that we need to do. Now, these four steps are also detailed in a blog post, which is in the readme.md here, and a video, which I think is coming out next Monday. So, yeah, these steps are more defined in those documents. A couple of questions from chat. Did I miss a lot? Yeah, sorry. Though we're just moving into the point of doing browser live transcription. No one's touched this yet. So if you want to go grab a DeepGram API key and just have that on hand, you could probably jump in at this point. Is multiple connections possible? Yes. Not quite sure of the question, but yes. Cool. Let's crack on. So, as I said, the first thing we want to do is get access to the user's microphone. There's actually a baked-in browser API that can achieve this. And I'm now going to type a little bit quicker. What I will say is over here in reference browser.html is the completed code we're going to do today. So we'll see there isn't actually that much code required in order to do live transcription. But if you're starting to fall behind, don't worry. The code is there. So in order to get access to the user's mic, we will do the following. Navigator.MediaDevices with a capital D for devices.GetUserMedia. That's a method, so round brackets. And in here we're going to put squiggly brackets and object, and we're going to specify audio true. And what this means is we are asking for access to a user media device, specifically an audio device, a microphone. This returns a promise, which in turn results to what is known as a media stream. And this means we can use .then, stream. You can call it whatever you want. I'm going to call it stream and arrow function. And let's just console log the stream. What happens here? So Navigator.MediaDevices.GetUserMedia. Audio true in an object. That returns a promise, so .then. It resolves to a media stream we call the variable stream, and we console log it. So I'm going to go over to this page that we opened up, the preview. If you want the preview, you can get to it down here in preview, and I preview it in a new window. I'm going to just open the DevTools here. I'm going to just increase the font size, and I'm going to refresh. And we see that it asks for that. I'm not sure if you can see this on the stream, actually. I don't think you can because of the way I'm sharing it inside of our streaming software, but if you're following along, it's popping up right now and saying, you know, this URL wants to use your microphone. Do you want to allow it? And I hit allow, and we see that there is a media stream being console logged here. This is great, but in order to access the raw mic data, the data from the words that are coming out of my mouth, we need to plug this in to what is known as a media recorder. So coming back here, I'm going to just remove that console log, and we're going to create a media recorder. I'm going to just call it media recorder as a variable name, new media recorder with a capital M, and we're going to plug in our stream here. And as a second argument, we're going to specify the audio type that we want. So MIME type, audio slash web M, which is what this API will provide for you. So that's our media recorder. So that is, you know, we can use this media recorder to access the raw data from our mic. And that is kind of step one. So step one was to get access to the user's mic and get access to the data from the mic. Now in step two, we're going to take, can you increase the font size a tiny bit? Sure thing. There you go. There you go. The only thing you're missing is the end of this image URL, which is kind of, yeah, it doesn't really matter. Hopefully that is a requisite. I might actually just close that and zoom in one more step. Hopefully that's better. So once again, that's step one. We've got access to the user's mic. We've plugged it into a media recorder. We have access to the raw data. Or rather we can get access to the raw data. Now we're going to go ahead and create a persistent two-way connection with DeepGram that allows us to send data whenever we're ready to receive transcripts back in real time. To do that, we're going to use WebSockets. You can use this same WebSocket interface in any application, whether it's in the browser, whether it's on your backend, but the browser has built into it a WebSocket client, which is really lovely. So we're just going to take advantage of that because it means we can write less code. So we're going to create a new variable called Socket and say new WebSocket with a capital W and a capital S. And there are two arguments in here. The first is the URL. Atul, it is not asking me for voice permissions. It should be. This is always a difficult bit with this workshop because there is a chance that you have just broadly allowed every device to have access without requesting it. You might have also set your browser up to do the opposite, which is whatever you're doing, do not give permission. And then it won't even ask you. So you want to check things like that. In the default state though, it will ask you every time you refresh the page. So new WebSocket, two arguments. The first one is the URL that we're trying to connect to. And this is DeepGram's live transcription endpoint. And we stored it in this variable. So we'll just pop that in there. And the second argument is an array. And this is where we put our API key. The first value is just token in quotes. Just to be clear, don't replace that with your token. You actually just want the word token. And the second value is going to be your API key. So I'll just, I stored it in .env earlier. So copy, copy, copy, pop that in quotes there. And that's all we need to create, excuse me, a persistent two-way connection with DeepGram. In step three now, we want to actually prepare the data from our mic and send it off. The WebSocket interface here, the instance of a WebSocket, I should say, has a number of events built into it, which are quite common for WebSockets. We're going to use two. The first one will be as soon as the connection is opened. And the other is when there is a message back. Thank you so much, J13K, for asking about the API key in the browser. This, this, what you're doing right here, don't do this in the real. Don't do this in the real, because if people visit your page, yes, you're absolutely right. Contrary to what I said earlier, where I said, don't let people see your API key. You are letting people see your API key here. For the sake of this workshop, we're going to do this. We will discuss before we wrap up, the right way to handle your API keys. But in order to set that up, it would be a considerable amount of work. More than, more than this. We're going to do this for speed. And for the sake of your local hack day, projects might be fine. In the real, don't put your API key right here. Again, hang with me to the end. I'll talk about how you should handle these API keys. All right. Now, as soon as that connection is opened, we want to set it up. As soon as that connection is opened, we want to send data from our mic. This is how we do it. Socket.onOpen. And we're going to assign that to a function, like so. And in here, we're going to, we're going to prepare data from our mic and send it to DeepGram. And the way we do that is by adding an event listener to the media recorder. So media recorder.addEventListener. And this is called, the event we're listening for, it's called DataAvailable. DataAvailable. All lowercase, all one word. DataAvailable. And we'll go ahead and add a function here, event. And in the most simple version of this code, we're just going to say Socket.sendEvent.data, which is raw data directly from the mic. This will run whenever there is data available from the mic. The final thing we need to do in order to send data to DeepGram is actually start making our data available. And we do that with MediaRecorder.start. We actually have to start it. And there is one argument we need to provide, which is a time slice. So how often do we want to package data up and make it available? This is in milliseconds. So 1,000 equals one second. 1,000 would be fine. Let's make it a bit quicker. Let's say every quarter of a second, we'll send data to DeepGram. That is now sending data to DeepGram in real time. The final part is actually listening for a response from DeepGram. Before we do that, I'm going to take a look at the chat. Could we not just use process.env.DGKey? Fantastic question. Kind of, but no. Process.env is available on the server-side context, on the backend context. It is not available on the client-side context. No, we will wait to the end to talk about the best way to handle it. There's somewhere in the middle of what you're saying that is the right way to handle it. I'm getting this on console. Amplitude, invalid device ID, option input type, expected string, but received Boolean. Amplitude is not part of this application. I think it's part of Glitch because it is an analytic software. So make sure when you hit preview, you're hitting preview in a new window, and then you're opening it up and refreshing the page. How can I open H2O in a new tab? Yeah, exactly that. Preview, preview in a new window. Okay. Let's move on to that final step now. So just after the onOpen function, so where we end this here, break a couple of lines, and what we're going to do is get data back from DeepGram. Socket.onMessage, and there is going to be an argument here because there's going to be some data. We'll just call it message. And what we're going to do is create a variable called data, and we will JSON.pass the message.data. What comes back in message, or what comes back in message.data is a string. We want to turn that into a JSON object, so we just need to pass it. And then what we'll do is we'll, I'm hoping this reconnecting thing here isn't ruining, I'm going to just copy paranoid there. We'll just console.log the data. Oh, there we are. Yeah, console.log data. So socket.onMessage equals, and then it's a function with one argument message. We'll pass the response and console.log it. Let's refresh our browser, and we should see any moment now data coming into our console. How cool is that? It's the same data as before. You know, we have this channel. It's an object. We have alternatives. We have a transcript. We have words. It's really cool. There is something that's worth noting here. There is an additional property. Where is it called? Right here on the top level called isFinal, isFinalForce, isFinalForce, force, force, true. All of these objects will be representing the same phrase that I am saying that's actually coming out of my mouth. But as time goes on, DeepGram will become more confident about what I'm saying, and when it says this is the final version of the phrase, isFinal becomes true. So if we take a look here, this first transcript says and we. The second one says and we should see any. The next one and we should see any moment now. This is also force. We'll skip to the true one. The final version of this, not that one, and we should see any moment now data coming into our, and then a new phrase begins because I paused. So what do we do here? Oh, and another thing worth noting is sometimes you can get, if I go silent for a moment, if I refresh and go silent, I think you'll see what will happen. So I was silent there, and now what comes back is an array with just an empty string. So it will always return, even if nothing has been said in that period. So how are we going to handle this in a way that's useful? There's a few things we want to check. The first is that there was actually a transcript that came back at all, I guess, because it could have just been silence. And we want to check if it's final. There are nice ways you can handle this, where you can show the non-final versions and replace it until it's final, and then you kind of lock it in. But for speed, we're only going to show the text that is final. So we're going to go ahead here and firstly pull out is final, like so. Is final. And we'll pull that out of data. The other thing we want to take, actually, just for clarity, I'm going to write this a little more simply. This is equivalent. What we also want to do is actually grab the transcript itself. So that would be data.channel.alternatives. That's an array. We want the first item.transcript. And then we just write a little console lock. So if transcript, because an empty string in JavaScript is 4C, so it will fail this if statement if it is empty. So if transcript and is final, then we'll console lock the transcript. Again, Glitch automatically saves for us. Let's go back to our page and refresh it. And now when I talk, you should only see the phrases. I wonder what happened there. Oh, there we are. Just took a moment. When they are final, so we're not getting repeats of phrases. Lovely. So I suppose the final thing to do here is just dump it on the page. I've created this empty paragraph tag here. We'll just pop it straight in there. So what we'll do, remove that console lock. Atul, hey, Cleaver, I'm just getting errors in my console. Tell us what the errors are. Give us a little bit more information. We'll see if we can help you out. So what are we going to do here? Documents.querySelector, just some vanilla JavaScript here. P.textContent plus equals space before it, I guess, because there aren't spaces in between them by default. And then we'll add the transcript. And that will take the paragraph tag, and it will add on to the end of it the new phrase with a space just before it. J13K, I am not getting anything on my console. Don't worry. Again, over here in the reference, inside of 3browser.html is the final version of the code we've just written. Try this out, although there is a little typo there. Look at that. The P had rounded quotes, so you'll just want to change those for straight quotes. You'll want to check this out. If this isn't working, there's probably something wrong with your API key. Make sure you are replacing this key here with your key wrapped in quotes. But, yeah, I'm here to troubleshoot, so pop stuff in the chat. Anyway, let's refresh this. And now we should see when we talk that words appear on the screen in basically real time. Remember, this could be quicker if we were showing the isFinalForce values, but we aren't. We're just showing the isFinal values. That's pretty cool. There are a few extra things you may want to do here. In the interest of time, I'd rather talk about handling your API keys correctly, so I'm going to move over to browser.html, this kind of reference piece I've made, and I just want to talk you through a couple of extra lines of code that are useful here. Firstly, the media recorder is not supported in every browser. Notably, oh, yeah, I should have said this. If you're using Safari, it's probably going to not be performing that well. Open it up in a Chromium Blink-based browser or Firefox, and it will perform better. There are workarounds. It just requires some more code being written. If you're using Safari, skip over it. I use Safari day-to-day. I'm currently using Firefox for this workshop. But what this line will do here is say, hey, you know, if this doesn't support it, if you're using Brave, Brave doesn't have an issue. Just pop up to the user browser not supported and don't run the rest of the code. That's what this line does. This line is also new, this if statement here. That's missing from the code we just wrote. What this says is, look, if the data that comes from the mic is empty or if for whatever reason the socket, the WebSocket has shut, you know, the connection has ended, don't try and send the data. Only send it if those criteria are met. And other than that, it's identical to the code that we wrote. You know, there's small syntactical things here, but the features are the same. Don't forget to update your key. Now, that's how you do live speech transcription with DeepGram and on-demand pre-recorded speech transcription. You can also upload your own local files as well. I'm going to just refer you to the documentation for that, and I'm around on Discord if you have questions. Finally, let's talk about briefly putting your API key right here. If you do this and open up your HTML page, that key is just visible for everyone, so you want to watch that. If you are done with your key, destroy it. Make sure it's no longer valid. Otherwise, people can start chipping away at your DeepGram credit. Or if you left your permission levels on higher permissions, people can access the control that that key has. Another way of doing this, which is actually the most simple way I would recommend, and I've got a blog post coming out about this in a couple of weeks, is using the DeepGram API on your server side to generate new keys that expire after a very short period. Let's say, for example, look, from the point you generate the key to the point where you connect for the first time, it's only going to be like five seconds. Generate a key that only works for five seconds, return it to your front end, and then use the key. Sure, it means people can see your key, but they can see it for all of five seconds before it becomes invalid. That's quite a common approach. That would be the one I recommend. There are other ways of doing it. You can open a WebSocket connection between your browser and your server, and then your server can do the interaction with DeepGram and return the results back to the browser. So you kind of just use your server in the middle to push messages back and forth. There are quite a few approaches. As I say, there's a blog post coming out about it. For the sake of Local Hack Day, what I'm going to suggest you do is just pop your key here. Just make sure it doesn't live for very long, so only make it work for 24 hours or only make it work for three days or whatever. There are lots of approaches there. What else do I want to tell you? We have a challenge for today, tomorrow, and Wednesday. The challenge for today is to complete the missions on your dashboard. When you sign up for an account, you have these four missions right here. These missions do not take very long at all, but it shows you a little bit more about how to use DeepGram, quite similar to what we've done today, actually. If you complete all four of these missions, that is today's challenge completed. SMV1999, you are not getting errors, but you're also not getting anything on screen when you speak. I'm more than happy to take this into Discord and try and troubleshoot it with you. It's probably something small. It might be this line. Maybe you're just not adding it into the page. There are no errors, and you've written most of this code. There's a pretty good chance that it's just this last step of actually printing it to the page. Again, this code is in your reference folder inside of 3browser.html. The only thing to note is these quotes around this P on line 42, they just need replacing with normal quotes. That's my bad. Yeah, so that's it. We've got like another five-ish. Look at that. That's pretty cool. We've got another five-ish minutes, so if you have any questions whatsoever, I'm here to answer them. I'm here to help you think about using and building cool things with voice. It's not asking permissions to open the mic. So it depends on the browser. It depends on your specific permissions. Actually, it turns out I configured Firefox differently to what I said, or when it asked me for permissions. See how it says allow temporarily? After a period, it will ask me again. You just want to check this. Maybe you blocked it. I instinctively hit like deny on things that pop up in the browser. So you might just want to clear that and refresh the page. I'm not sure if I think you can see that in the screen share. Let's see. Can you see that? Oh, no, you can't. Yeah, wait. Oh, no, you can't. Sorry. There's a pop-up that I can see that you can't, and my mouse disappears there. Cool. All right, folks. Well, hope you found that good. Hope you found that interesting. As you can see, it's remarkably few lines of code in order to get a transcription for prerecorded audio. We have a set of features that you can use. These are just three. We have a whole set of other features you might find interesting. And doing browser live transcription is, again, remarkably few lines of code. If you have any questions, I'm around on the Discord. You can tweet at us, at DeepGramDevs. You can tweet at me. My Twitter handle is there. Yeah, hope you found this interesting. Thank you, folks. And, again, I'm in the DeepGram channel in the Discord throughout this event.
Speaker 3: All right. Well, thanks so much, Kevin. Really appreciate your time. And, once again, y'all, if you want to get started with your DeepGram challenge for today, we have given you a clue as to what it is. It's not going to be announced officially until today at noon. But, you know, all you need to do is sign up for an account, and you'll see the four challenges there. Just submit a screenshot, basically, of everything that you've done, and your GitHub repo into DevPost. And, you know, you'll get credit for it for today. And, outside of that, really, thank you again, Kevin. Super awesome content.
Speaker 1: Thank you for having me. Really fun.
Speaker 3: Cool, cool, cool. All right. Well, see you later and appreciate your time. Have a good one, y'all. And, remember, we do have another workshop that is scheduled for later today in about another hour or so. You can get started with MATLAB and doing all types of cool stuff there. So, be sure to check in for that as well. Outside of that, I'll see y'all again soon. All right. Have a good one.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now