IA de audio: transcripción multilingüe en tiempo real (Full Transcript)

Un modelo actualizado permite transcribir con cambio de idioma (inglés/español y más) en streaming, sin latencia, ideal para dictado y apps de voz.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: I am traveling to Arizona and I want to comprar unos tickets de avion. And when I get there, I want to alquilar un auto. So let me show you this because I'm really, really excited about this. So we're making a ton of progress on AI. OK, so that's obvious. But one area that I think people will benefit the most is audio applications. OK, so audio is everywhere from transcription, your phone, you can speak to your phone and dictate to your phone, et cetera, et cetera. Assembly AI just updated their universal model and they can now support, listen to this, you can co-switch from six different languages and the model can transcribe that conversation that you're speaking in six different languages in real time in a single forward pass. There is no latency, there are no delays, there is no thinking necessary, the model supports all of these six different languages. I think there is Spanish, English, Italian, French and German and Portuguese in real time. This is amazing. OK, especially if you go to South Florida. OK, so if you come down here, you will realize very, very quickly that people have a certain way of speaking. Like I meet new people, friends in my household, my family, we sometimes speak mixing two different languages. Sometimes are words that we know how to say in one language and not the other, or sometimes are just the full chunks of a sentence, OK, where you start in one language and then you switch to the other language and then come back. And that's the flow of the conversation. It's a very unique way of speaking, but that's how we speak. Something that happens to me all the time that I have different friends, friends that speak English, friends that speak Spanish, friends that speak both of them. And it's faster for me to just say the words the way they show up in my brain. So when I'm trying to express an idea, if it comes in my brain in English, I don't have to translate, I just say it. And if it's in Spanish, I just say it. If I'm dictating to my phone because I want to send a text message, something that happens is that I have to go to the keyboard. I have to switch the language so the model that's running here on my phone understands what I'm trying to say. So I have to tell the phone I'm going to be speaking English and then I have to stick to that language to speak English. And probably this is not a problem for 90% of you. It is for me. But if I try to say something in Spanish, it's going to be gibberish here because the phone is trying to translate what I'm saying in Spanish to English. It's not going to transcribe. That transcription is not going to work. This model here does not have that problem. I want to show you an example of it so you can see how it can transcribe in real time. And that is super, super cool. No delays. I'm going to run this up. This is the playground. I'm going to leave a link to this playground so you can try it for yourself. Obviously, you can use this model through an API. That's the beauty of it. Like if you're building an audio application, please try this out. It's super, super good. So I'm going to go to streaming and here in streaming, I'm going to select multi. And this is what's going to allow the code switching from one language to another. And I want to start speaking to my computer. I want you to see the speed at which the model is going to transcribe what I'm going to say. Now, I'm going to speak in Spanish and in English. And let's see how the model does. Okay. All right. So let's go. Let's do that. I am traveling to Arizona and I want to comprar unos tickets de avión. And when I get there, I want to alquilar un auto. That was literally real time. I am traveling to Arizona and I want to comprar unos tickets de avión. Buy plane tickets. That's in Spanish. And when I get there, I went back to English. I want to alquilar rent a car. No delays. Real time code switching between two languages. I want this model on my phone. What do I have to do to get this model to send messages with my phone? This is super cool. I hope you find this helpful. Take a look at the Playground and give this a try. Go to their API. Give this a try. It's super, super good. I hope this is helpful. I'll see you in the next one. Bye-bye.

ai AI Insights
Arow Summary
El hablante muestra entusiasmo por avances en IA aplicada a audio, destacando una actualización del modelo universal de AssemblyAI que permite transcribir en tiempo real conversaciones con cambio de código entre varios idiomas (p. ej., inglés y español) sin latencia. Explica que en lugares como el sur de Florida es común mezclar idiomas en una misma frase y que las soluciones actuales de dictado en teléfonos suelen requerir cambiar manualmente el idioma, lo que genera transcripciones erróneas. Demuestra en el Playground cómo el modelo transcribe instantáneamente una frase mezclando inglés y español, e invita a probarlo y usar la API para construir aplicaciones de audio.
Arow Title
Transcripción en tiempo real con cambio de idioma usando IA
Arow Keywords
IA Remove
audio Remove
transcripción Remove
cambio de código Remove
multilingüe Remove
AssemblyAI Remove
modelo universal Remove
streaming Remove
API Remove
Playground Remove
dictado Remove
tiempo real Remove
baja latencia Remove
inglés Remove
español Remove
aplicaciones de voz Remove
Arow Key Takeaways
  • Los modelos de transcripción multilingüe con cambio de código resuelven fricciones comunes del dictado tradicional en móviles.
  • La actualización de AssemblyAI permite transcribir varios idiomas en una sola pasada y en tiempo real, sin retrasos perceptibles.
  • El cambio de idioma dentro de una misma oración es frecuente en comunidades bilingües y requiere soporte nativo en herramientas de voz.
  • El Playground y la API facilitan probar e integrar la capacidad de streaming multilingüe en productos de audio.
  • La experiencia de usuario mejora al evitar el cambio manual de idioma en el teclado o en el dictado.
Arow Sentiments
Positive: Tono entusiasta y optimista: el hablante recalca lo “increíble” del modelo, celebra la ausencia de latencia y expresa deseo de tenerlo en su teléfono, además de invitar a otros a probarlo.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript