Building Real-Time Translation with Superbase

Convert Your Audio To Text

4.9/5

3720 customer reviews

Explore using Superbase and Speechmedics for multilingual real-time translation in live events. Share insights and seek improvement tips.

Realtime transcription and translation with Speechmatics

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Hello my fellow super builders, sorry it has been a while. The Superbase GA week has thrown me off my super builders bandwagon a little bit, but we are back and we're building a new Superbase real-time application. Now this is kind of an idea of, you know, sometimes we're running live events and we have, you know, people, speakers in different native languages. So as an example, you know, we have English presenters, we have Mandarin Chinese presenters and we kind of want to be able to transcribe and translate as we go along. And then eventually we want to use Superbase real-time for people to actually get their preferred, basically, language translation on their own device. So they can just use their phone, you know, scan the QR code and then basically get a transcript in their preferred language. Now I was looking into different services for this and I came across something that's called Speechmedics. It looks super interesting, you know, real-time streaming translations. Normally there's kind of a lot that you have to do. They have a pretty cool demo here. So this is exactly kind of what we want to do. You know, we have some source and we want to transcribe it, live transcribe it, and then live translate it. And so one thing that's pretty cool, I actually was doing this with v0 here. Maybe we can take a look at this component here, for example, this transcript panel. So I was using v0.dev from Vercel. It's really, really a neat experience actually. So I'm logged in now and then I can, I think, see sort of, yeah. So here, for example, what I basically put in as a prompt is like make a headline. That was kind of different things, basically just full screen widget with two boxes of the same size. And then what's really cool is you can actually use here npx v0 to like add the component, basically pull the component into your project. So that's what we're doing here. And so we basically just have this kind of transcript panel. We're getting in the transcription language and then the transcription and the translation and the translation language. And now we're basically just putting that into the box. Now, in terms of the Speechmetics API, so something that's pretty cool is you can actually use this in combination with like the WebRTC, you know, web SDKs. So here, like the navigator media devices. And then you basically create this media recorder from the stream. The stream itself is where was the stream coming from? Oh, yeah. So the stream is kind of this media devices, get user media. So we want the audio device and then we create this media recorder. And then what we can do is we can basically using the so Speechmetics has this JavaScript SDK, which really needs real time session. So it's actually quite easy to basically create this real time session. Now, since we're doing this client side, so this is a client component. As doing the client side, what we need to do is we need to generate a short lift API key. So it's just fetching this from a route here. So the route is, you know, a secure server API route where we can use our Speechmetics secret key. And then we're basically just generating an API key of type RT, the real time API. And then we're basically just saying, OK, the time to live is an hour here. So three thousand six hundred seconds, I believe. And then once we have our API key, we can create this real time session, which is our session. So I think we're just setting this year as the session year. So basically, if we don't have a session, we're creating a new real time session. And then we can see here that the start transcript. So we're then basically adding some event listeners. So we have like a recognition started. We have the transcript. So basically, anytime we kind of get an event in where we get the transcript, we're then adding that to our transcript. Here's the transcript and then set translation as well. So that is basically just an easy test for now. So we're not having implemented real time just yet or anything. And so we're just starting our session on here. You know, this is pretty easy. You just say the transcription config is in English with like, you know, the Queen's English. So to speak. And then this is Chinese Mandarin as the target language for our translation. And so, yeah, that's what we're starting here with. So we can test this out real quick in our app. So basically, I just click start and then, you know, if I open up my console, it'll say, OK, we need to allow, you know, access to the microphone. So from the browser and then basically, as this is going along now, obviously, you can see kind of in the network request. I think I was too late. So basically, this is a WebSocket connection. And now, you know, obviously, I think the SpeechMatic servers are I think it's a UK company. So I think they might be located in the UK. So since I'm here in Singapore, you know, there might be a bit of a lack there. Now, you know, it's still decently fast. Now, one thing that I found and, you know, obviously, I don't really read Chinese so well, but is that specifically the translation? So the English transcription, I think, is pretty, pretty spot on. But the problem is like specifically the translation into Mandarin Chinese. And you can see that here, right? Like sometimes it doesn't actually catch it. So I had some friends, you know, check this. And it seems to be pretty far off from, you know, what you would expect as a native Mandarin speaker. Now, you know, maybe we can. OK, if we stop this here. So obviously, you know, maybe I can turn this around. OK, maybe we just say. This. Ni hao, wǒ shì Lēi shēn. Nǐ zhǎo bā bèi ma? Wǒ shì zhū zhǎi Xīngxiāpòu de Dèguó rén. Wǒ shì rán tí gōngshēng shī. German, I've got German software engineering. Yeah, OK, I mean, obviously my Mandarin isn't great. Nǐ zhǐ fēn le ma? Wǒ zhǐ zhǎng wén hài bù hǎo. Yeah, but I think so. No, maybe we stop that. I stop embarrassing myself. Yeah, my Chinese is still bad. OK, it got it got that one. So, yeah, I think maybe OK, maybe we can try this where we're going from, maybe we can say from German. So it's the supported languages. We're going to look up German. D, E, yeah. So maybe if we go from German into English. Let's try that. Yeah, I mean, obviously. Hello, guten Tag, mein Name ist Thorsten Scheff. Heute sind wir hier in Singapur und wir testen out, wir testen die Speechmetics. It's Thorsten's boss. Ja, wir testen die Speechmetics API hier für Echtzeit Transkription und Übersetzung. Ja, und naja, gut, so vielleicht kann man das noch verbessern, wenn wir irgendwie spezielle Worte. Yeah, well, OK, maybe I stop this now, but. Yeah, I think it's not really there yet in terms of sort of the experience that I'm looking for, so, yeah, I mean, if anyone at Speechmetics maybe has some some ideas how I can improve that, please let me know. Otherwise, I think there's some interesting, there's like an offline whisper model. There was some local model that I found wasn't this one. See, yeah, faster whisper from Sistran, which is interesting. So, yeah, I want to see if I maybe actually can do this with WebGPU. So WebGPU, there's some interesting things. So I'll be playing around with that. We'll see if I maybe can actually do the transcription, you know, on the device itself using WebGPU. And then maybe. It is it is a bit faster and yeah, I don't know if, you know, the quality is any better, but we'll see, we'll find out. Anyway, it's just a quick update. You know, let me know if you have any pointers for live transcription, translation, maybe even actually in the end, just should I just be using the Google translate APIs? I don't know. Let me know. Let me know if you have any pointers. Always appreciate it. Sweet. Anyway, I'll keep chugging away on this. And yeah, let me know what you're building on. Use the super builders, the building public hashtags, and I will see you next time. Cheers. Bye bye.