[00:00:00] Speaker 1: Welcome back to the Eleven Labs Agents Academy. Today, we're looking at Expressive Mode, a feature designed to blur the line between AI and human interaction. This feature allows your agents to move beyond simple predefined narration and into true context-aware customer interactions. Expressive Mode is powered by two massive upgrades to our conversational stack. First, V3 conversational, our most emotionally intelligent, context-aware text-to-speech model to date, and second, an upgraded turn-taking system that uses real-time signals to decide exactly when to speak, pause, or wait, which makes conversations feel fluid and human. Let's jump into the app now and I'll show you how to turn it on. So I'm going to create a blank agent. Here, let's just name it V3 Agent Academy. And to turn on Expressive Mode, I'm going to click into the voice menu here, and I'm going to change the model to V3 conversational. And when I do that, Expressive Mode is going to be turned on by default. So what makes V3 special is its context-aware ability. It doesn't just read the text, it understands the intent. In previous models, so I'm going to click over to the V2.5 agent over here, you would have these sliders to control conversational styles. So now in V3, and you'll see I'm in V3 here, we've shifted control to the model itself. The model now dynamically adapts its speed and stability based on the emotion needed at that time. You can also use the suggested audio tags here, or if I don't see what I like, I can just add my own. So you can just add the audio tag and then explain when the agent should use this. Now, while you can define these audio tags in the system prompt, we recommend that you use the UI. Simply because when you define it in the system prompt and then change the model back to a previous version, it would actually kind of break the agent and the agent will just read out the text directly or read out the tag directly. So while you can define it in the system prompt if you want complete fine grain granularity and control over your agent, if you do switch back to a previous model, just make sure that you remove it from your system prompt. Now, because the V3 models do a much better job at capturing emotions, it's provided our turn-taking system with a major upgrade. So you'll see here in the advanced settings that speculative turn is on by default. In the past, agents waited for absolute silence to respond, which adds latency and feels robotic. But now with speculative turn, we use our Scribe V2 real-time model to analyze your prosody and semantics in order to speculate when you're done talking based on the context of the conversation. The model will then prefetch the response, making the conversation feel snappy even with the heavier, more expressive model. Now let's jump into a live demo. And you'll see here that I have my V2.5 agent here on the left, my V3 conversational with expressive mode turned on on the right. These agents have the exact same configuration, the exact same system prompts. We'll just see the difference in the voices.
[00:02:54] Speaker 2: Oh no, I am so sorry to hear your flight was canceled. Let me see what I can do. I can help you rebook for tomorrow morning.
[00:03:02] Speaker 1: Tomorrow morning? That's too late. I'm gonna miss my sister's wedding.
[00:03:06] Speaker 2: I completely understand how stressful that is. Let me check partner airlines immediately to see if we can get you there today. We'll do our best to make sure you don't miss that wedding.
[00:03:16] Speaker 1: So that's okay. But let's try the V3.
[00:03:22] Speaker 3: Oh no, I am so sorry to hear your flight was canceled. Let me see what I can do. I can help you rebook for tomorrow morning.
[00:03:30] Speaker 1: Tomorrow morning? That's too late. I'm gonna miss my sister's wedding.
[00:03:33] Speaker 3: I completely understand how stressful that is. Let me check partner airlines immediately to see if we can get you there today. We'll do our best to make sure you don't miss that wedding.
[00:03:43] Speaker 1: So you can hear the difference. The agent isn't just processing a request. It's matching the user's emotion and responding with genuine empathy. You can imagine how this elevates the entire customer experience as a whole. This also works across over 70 languages, providing nuanced delivery in everything from Japanese to Argentinian Spanish. Whether you're building for customer support, gaming, or healthcare, Expresso Mode ensures your customers feel helped, not just processed. Give it a try today, experiment with the audio tags, and see how it transforms your user experience.
We’re Ready to Help
Call or Book a Meeting Now