Speaker 1: Guys, this is one of those moments in the AI tech space where you see something made possible by AI that is just magic. Today we're going to be taking a look at some research and a demo that you can try for free by MetaAI. Yes, that's right, Facebook. That is seamless, real-time communication between languages. So what's going to happen here is it's going to take my English-speaking voice and convert it into another language in essentially real-time. And this is a huge moment where seemingly magical AI technology starts to remove and break down language barriers. The whole time you're watching today's video, I want you to think about a little pair of headphones, kind of like this, that you would wear in another country, you'd speak your native language, and a speaker somewhere would output your voice translated in that language. And the device could take in someone else's speech in their native language and translate it to yours and play it in your ears. Because that is absolutely what we are seeing unfold in real-time here. So here we are, guys, Seamless Communication AI Research by Meta. You actually can download these models, I want to make that very clear right off the bat. Check it out, right on GitHub. So while you absolutely can download, install, and use these models for the seamless communication, it is not currently available for non-commercial use, so you can't actually build any products with it. However, it is fully open for research purposes, and you can redistribute it for research. So that's sort of where we are on all of this. I'm sure that they probably will open-source a lot of this stuff at some point in the future. If you want to learn more, they do have license information at the bottom here. But yeah, this is a lot better than fully closed-source stuff. It's still fairly open, and it's a good sign, and with Facebook's actually good track record of releasing open-source software, I think we can hopefully expect to see this fully open in the near future. They have a little introduction, so let's take a look at that.
Speaker 2: At Meta, we're working with the AI community to help remove language barriers and encourage open, authentic communication. We're excited to introduce a suite of new models. Seamless M4Tv2, an improved version of our foundational model. Seamless Expressive, which preserves expression across languages. Seamless Streaming, which translates speech and text in just under 2 seconds of latency. And finally, Seamless, our unified model that combines capabilities of all three. Our improved model, Seamless M4Tv2, serves as the foundation for our new Seamless Expressive and Seamless Streaming models. Next in our family of models, we have Seamless Expressive, which preserves the intricacies of speech, such as pauses, speech rate, and emphasis on certain words, as well as vocal style and emotional tone. We believe it's imperative our translations not only accurately convey the words we speak, but also capture the subtleties of human expression. Please keep the volume down. We just put the baby to sleep. Por favor, mantén el volumen bajo. Acabamos de dormir al bebé. We're also thrilled to share Seamless Streaming. With just under 2 seconds of latency, it's the first massively multilingual model able to translate speech and text in near real-time. Consider being in a social situation where language spoken is unfamiliar to you. Then imagine being able to not only follow the conversation with minimal delay, but also seamlessly translate what you want to say in that language. We can now build towards that very idea. We believe this is another step forward in the journey towards a more connected world, and we eagerly await the innovative ways in which the AI community will build upon this work.
Speaker 1: Well, okay then. Not only is it going to capture the expressive parts of my voice, but apparently it's going to capture the overall tone, of course the cloning of my very voice, and all with a near real-time latency, under 2 seconds of delay. That's definitely enough to be usable in the real world. Like I said, they have a free demo. Let's check it out. Seamless Expressive is an AI model that aims to maintain expressive speech style elements in the translation. So we've also got the pitch of your voice as well as the volume, the tone, whether it be excited, sad, or you're whispering. Obviously speech style, so how fast I'm speaking, and if I'm pausing. They've got a few more examples down here.
Speaker 3: So glad you are here. I am so happy to see you.
Speaker 1: Dang, that is good, man. Wow. That does sound like her voice. The cloning probably isn't as perfect as we'd hope. Two seconds of latency, guys. Wow, that is so good. It's even more mind-blowing to hear it back in English. Okay, I am so excited to try that. All of you multilingual folks are going to have to help me out in the comments by telling me which demos work better and which ones don't sound as good. Really need your help on that.
Speaker 4: Wow, man, it's so good.
Speaker 1: Now whispering. That's so good, dude. This is so usable. I am so excited for the language barriers to just be demolished. Everyone can communicate, and it just makes sense. All right, so I will obviously speak in English. We will translate first into Spanish. Right now they only have these languages in this demo. However, I believe in the actual code, there's a lot more languages to choose from. So we'll translate to Spanish. Again, all you Spanish-speaking folks out here are going to have to help me out in the comments. Wow, we're loading up the camera today. All right, for some reason they insist on having the camera in this as well. So I've got my webcam. The two mats can look at each other. All right, this is a bit strange, but let's give it a shot. Say, try excited. It wants me to say this specific text, but I'm not going to do that. Subscribe to the MadvidPro AI YouTube channel. Okay, this was the expressive Spanish translation.
Speaker 5: Suscribe to the MadvidPro AI YouTube channel.
Speaker 1: Dude, that is mind-blowing. Did you hear the way it said MadvidPro? Oh, I love it. Subscribe to the MadvidPro AI YouTube channel.
Speaker 5: Suscribe to the MadvidPro AI YouTube channel. Yeah, okay.
Speaker 1: It doesn't exactly sound like my voice, but it's so darn good. And it's definitely very expressive. You can hear it. Even though I don't speak Spanish, I obviously know what it's supposed to be saying, and I can hear the expression in there. Suscribe to the MadvidPro AI YouTube channel. This is the non-expressive translation, by the way, for the context.
Speaker 6: Suscribe to the MadvidPro AI YouTube channel.
Speaker 1: I mean, that is just straight robot. We don't want that, right? I can't get over that. It is so dang cool. Oh, share your Spanish translation with friends and family. That's cool. That's why they have the video here. Subscribe to the MadvidPro AI YouTube channel.
Speaker 5: Suscribe to the MadvidPro AI YouTube channel. All right.
Speaker 1: I see now. I see. That's why the webcam was there. I suggest you guys share some of these clips around when you try it for yourself. I'd love to see how different people react. You can get back to me on my Discord server and post different people's opinions on this. I want to know how good the translations are. So, for now, we're going to stick to English and Spanish. We will move on to the German and French, though. Let's try fast talking instead. Can you please buy me some ice cream? I'm very hungry for ice cream. Please purchase me some ice cream right now or I'm going to be really sad and I'm going to cry all over the floor. I'm really, really wanting ice cream. Please get it. Okay, here is the original clip. Can you please buy me some ice cream? I'm very hungry for ice cream. Please purchase me some ice cream right now or I'm going to be really sad and I'm going to cry all over the floor. I'm really, really wanting ice cream. Please get it. I think this is going to be a little tough.
Speaker 7: Puedes por favor comprarme un poco de helado? Tengo mucha hambre de helado. Por favor, cómprame un poco de helado ahora mismo o voy a estar muy triste. Voy a llorar por todo el suelo. Realmente, realmente quiero helado, por favor. How good was that?
Speaker 1: Let me know, please. Did it screw any of the words up? Is it sound like he's speaking very fast? It certainly sounds like it to me. Do we even want to listen to the non-expressive?
Speaker 6: Puedes por favor comprarme un poco de helado? Tengo mucha hambre de helado. Por favor, cómprame un poco de helado.
Speaker 1: You can even see the non-expressive is 11 seconds, where the original clip was 9, and the expressive translation is 10. Can you please buy me some ice cream? I'm very hungry for ice cream. Please purchase me some ice cream right now or I'm going to be really sad and I'm going to cry all over the floor. I'm really, really wanting ice cream. Please get it. Okay. You did a pretty good job translating. Puedes por favor comprarme un poco de helado? Tengo mucha hambre de helado. Oh, I love stuff like this. Okay, I really got to try this whispering. I'm going to try using some words that I know in Spanish, kind of give me a better idea. Dog, dog, dog, dog, cat, cat, house, cat, house, house, house, house, house, house, house, house, house. Okay, this was pretty quiet. Dog, dog, dog, dog. I don't like whispering, man. Ugh. Perro, perro, perro, perro, gato, gato, casa, casa. Dude. Oh my gosh. The whispering actually works so well. As much as I hate listening to whispering, I have to try that again. Now that I've locked you in my basement, I can force you to endlessly watch MadFitPro AI content until you can't watch it anymore. Look, I wasn't trying to be creepy, guys.
Speaker 8: Ahora que te he encerrado en mi sótano, puedo obligarte a ver sin parar el contenido de MadFitPro AI hasta que ya no puedas verlo.
Speaker 1: Why is it so much more creepy in Spanish? I can't believe how good the whispering works. I didn't think it would blow my mind so much, but it totally does.
Speaker 9: If you want to hear a secret, the MadFitPro AI channel will bless you with five years of good luck if you just hit the like button on this video.
Speaker 8: Okay, I was a little bit closer to the microphone for this one. Quieres saber un secreto? El e-channel de MadFitPro AI te bendecirá con cinco años de buena suerte si simplemente presionas el botón gustar en este video.
Speaker 1: Let me know if the translation is good. A little bit more raspiness to that one, but like, to be fair, my original whisper voice was pretty raspy as well.
Speaker 6: Quieres saber un secreto?
Speaker 1: The non-expressive translation is just useless. Get that out of here. All right, I want to try sad, and I'm also going to try some emotions that aren't listed in here. I want to see how well it does with those. Oh my God, you unsubscribed from MadFitPro AI? What's wrong with you? I'm crying right now.
Speaker 10: This is the saddest thing ever. Dios mio, te has desconectado de la ya de MadFitPro. Que te pasa? Estoy llorando ahora mismo. Es lo más triste de todos los tiempos. Wow.
Speaker 1: Oh my gosh. He definitely sounds very sad. He's a little bit more robotic, I think, than the other ones. Te has desconectado de la ya. He does sound like he's on the verge of tears.
Speaker 10: De la ya de MadFitPro. Que te pasa? Estoy llorando ahora mismo. Es lo más triste de todos los tiempos.
Speaker 1: Dios mio. All right, let's try an emotion that's not even listed in their demo. Let's try anger. Are you serious right now? You didn't get me a single lemon for Christmas? You know that's my favorite fruit. What is wrong with you? I'm really trying to push this expressive model to its limits. This one's going to be tough for the model.
Speaker 5: Lo dices en serio ahora. No me has traído ni un limón por Navidad. Sabes que esa es mi fruta favorita.
Speaker 1: Que te pasa? That works better than the sad, I think. You guys that speak Spanish will have to let me know, but to me, at least, that sounds better than the sad one. All right, now, finally, I want to try singing just to see what that's like. Cats and dogs. It's raining cats and dogs.
Speaker 11: Please buy me some sort of a rain hat, because the dogs are falling on my face. Gatos y perros, está lloviendo. Gatos y perros, por favor, cómprame algún tipo de sombrero de lluvia, porque los perros me caen a la cara.
Speaker 1: Yeah, it's kind of like soft spoken like singing is, but not really singing. Better than I thought, though, to be honest. Definitely serviceable. All right, let's move on to French. Hey, I'm speaking French now. Matt Vidpro can actually speak French fully. I've always known how to speak it. I know the lips don't look like they're moving with French, but yeah, I kind of like these little videos that it makes. Hey, I'm speaking French now. Matt Vidpro can actually speak French fully. I've always known how to speak it. I know the lips don't look like they're moving with French, but yeah. Hey, je parle français maintenant. Matt Vidpro peut vraiment parler français entièrement. J'ai toujours su le parler. Je sais que les lèvres n'ont pas l'air de bouger avec le français, mais oui. That sounded like my voice. That voice cloning was nuts. You guys have to remember, it's cloning my voice, doing the expression, and translating the text accurately into another language, all in about two seconds. Just straight magic. Let's do something a little more emotional. I can't believe it. AI has taken over the world.
Speaker 10: Everyone's gone. My friends, my family, AI has eaten them all. I can't believe it. AI has taken over the world. Everyone's gone. My friends, my family, AI has eaten them all. Je n'arrive pas à le croire. L'intelligence artificielle a pris le monde. Tout le monde est parti. Mes amis, ma famille, l'intelligence artificielle les a tous mangés.
Speaker 1: Correct me if I'm wrong, but the French is actually better than the Spanish. That sounds a lot like my voice. It's not perfect, but man, it's close. It's impressively close. Let's try whispering again.
Speaker 9: Hey everybody, I have to whisper to you, but I'm coming back at you with another AI video. Sorry, the AI clones are upstairs trying to rumble through my stuff, so I gotta be quiet.
Speaker 1: We didn't make it to the end.
Speaker 9: Hey everybody, I have to whisper to you, but I'm coming back at you with another AI video. Sorry, the AI clones are upstairs trying to rumble through my stuff, so I gotta be quiet.
Speaker 12: Hey tout le monde, j'ai besoin de vous parler à voix basse, mais je reviendrai à toi avec une autre vidéo d'intelligence artificielle. Désolé, les clones d'intelligence artificielle sont là-haut pour essayer de fouiller dans mes affaires.
Speaker 1: So when it gets a little bit too long, it just cuts my voice off, or rather cuts the video and just continues the voice. I'm still impressed though. Dang it, it's really still very good with French. It might actually be better at French than Spanish. You guys, again, will have to let me know. Let's try German. Hey everybody, welcome back to another Madvid Pro AI video. I'm actually German, I've been German this whole time, and I've been speaking German this whole time. Hey everybody, welcome back to another Madvid Pro AI video. I'm actually German, I've been German this whole time, and I've been speaking German this whole time. Hey alle, willkommen wieder in einem weiteren Madvid Pro Kai video. Ich bin eigentlich Deutscher. Ich bin die ganze Zeit Deutscher und spreche die ganze Zeit Deutsch. That sounds like me. That definitely sounds like me. I mean, the quality obviously isn't great in terms of audio, but it absolutely sounds like me speaking German. Oh man, this is so dang exciting. Let's try anger in German. What? Are you kidding me? You don't like my festive decorations for Christmas? Well, you're just a disgrace to the Madvid Pro channel. We believe in festivities. By the way, no hate to you if you actually don't like my Christmas decorations. I just like to get festive for the holidays. What? Are you kidding me? You don't like my festive decorations for Christmas? Well, you're just a disgrace to the Madvid Pro channel. We believe in festivities. Was?
Speaker 8: Machst du Witze? Du magst meine Weihnachtsdekorationen nicht. Nun, du bist nur eine Schande für den Madvid Pro Kanal. Wir glauben an Feste.
Speaker 1: Okay, I think that German anger one was pushing the model to its limits. It was good and absolutely usable, don't get me wrong, but a little bit more on the robotic side like you could tell there's been some processing there. Oh please, I didn't mean it. Don't unsubscribe. You can hate my decorations. Just don't unsubscribe to the Madvid Pro channel. Oh please, I didn't mean it. Don't unsubscribe. You can hate my decorations. Just don't unsubscribe to the Madvid Pro channel.
Speaker 10: No please, I didn't mean it. Don't unsubscribe. You can hate my decorations. Just don't unsubscribe to the Madvid Pro channel.
Speaker 1: Had an actual fail in the translation. It just did English to English. So this is clearly a little bit of a problem here that they have to fix. Seems to work most of the time. This doesn't look like it would be occurring that often. But yeah, it kind of just put like a slight German accent on my voice. All right, now I want to try Spanish to English. I don't speak Spanish, but I'm going to try to read some Spanish words and see how they translate in English.
Speaker 4: Pero, pero, pero, gato, gato, gato, casa, casa, casa.
Speaker 1: Yeah, that's right. Those are the three Spanish words I know. Don't shame me for it. But, but, but, cat, cat, cat, house, house, house.
Speaker 4: Okay, well, I thought I said dog, but apparently I said butt.
Speaker 1: But, but, but, cat, cat, cat, house, house, house. It definitely sounds like my voice. Let's do some phonetic chat GPT translation and see how close I can get. Hola a todos. Soy Adik Toala Tenolohia Aguinal. I screwed that up.
Speaker 8: Hi a todos.
Speaker 1: I'm Adik Toala Tenolohia Aguinal. I don't think I'm speaking good Spanish there. Sorry guys. Let's just try to actually read the Spanish. Hola a todos. Soy Adik Toala Tenolohia Aguinal. I was supposed to say Aguinal. Hi everyone. I'm Adik Toala Tenolohia Aguinal. Here we go. Okay, I just have to read the Spanish. That worked a lot better. Dios mio, este tipo no puede hablar espanol para salvar su vida. My God, this guy can't speak Spanish to save his life. Yes, dude. Yes. Oh my gosh. I don't know why that gets me so excited. That sounds like me. That sounds like me. This gives us a better idea of the translation because we're translating to English.
Speaker 3: My God, this guy can't speak Spanish to save his life.
Speaker 1: What a good translation too. All right. I want to try German. Apologies in advance. Ja bist du ja enlich, ich bin so froh dich zu sehen. By the way, what I was reading there was what was on the meta demo.
Speaker 3: There you are, Enlich. I'm so glad to see you.
Speaker 1: All right. We got this. Man, versuchst dieser Kirl jetzt durch zu sprechen, er hat nicht all das.
Speaker 4: Okay, I couldn't even get to the end.
Speaker 1: This guy's jumping through hoops. He doesn't have all. I was actually getting pretty good. Apparently, my German speech is good enough for this thing to pick it up. I think that my sentence was a little too long. That definitely sounded like me though. I like trying English sayings in other languages. Dieser Kirl ist der Hit.
Speaker 3: This guy's hot.
Speaker 1: Whoa. Okay. So, the saying that was the bee's knees, it just says this guy's hot. Yeah, screw it. Let's try some French too. Mon dieu, c'est technologie va changer notre monde. Mon dieu, it's technology that's changing our world. I'm really impressed by its ability to pick up my horrible different languages. I don't speak these languages at all. It's probably the thickest American accent you've ever heard. La chaîne des Met Vid Pro, me endormir.
Speaker 3: The Met Vid Pro channel, my fate, my sleeper.
Speaker 1: I don't know. Getting close to this one. So that's the demo that they have. Of course, they do have these other models too, but those aren't demoed on the site. If you want to use the other models, you'll have to download them on GitHub. I mean, talk about a mind blow, guys. This one truly blew me away. It's one of those rare times that I have, well, lately not so rare, but it's one of those AI products that just truly blasts me away. That was so fun. I really recommend you try this yourself. Again, give me feedback. If you speak any of those languages, share your best results in the Discord server. Thanks so much for watching and I'll see you in the next one. Goodbye.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now