How ElevenLabs Expressive Mode Makes Agents Sound Human (Full Transcript)

Expressive Mode pairs V3 Conversational TTS with speculative turn-taking to deliver context-aware empathy, faster replies, audio tags, and 70+ languages.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: Welcome back to the Eleven Labs Agents Academy. Today, we're looking at Expressive Mode, a feature designed to blur the line between AI and human interaction. This feature allows your agents to move beyond simple predefined narration and into true context-aware customer interactions. Expressive Mode is powered by two massive upgrades to our conversational stack. First, V3 conversational, our most emotionally intelligent, context-aware text-to-speech model to date, and second, an upgraded turn-taking system that uses real-time signals to decide exactly when to speak, pause, or wait, which makes conversations feel fluid and human. Let's jump into the app now and I'll show you how to turn it on. So I'm going to create a blank agent. Here, let's just name it V3 Agent Academy. And to turn on Expressive Mode, I'm going to click into the voice menu here, and I'm going to change the model to V3 conversational. And when I do that, Expressive Mode is going to be turned on by default. So what makes V3 special is its context-aware ability. It doesn't just read the text, it understands the intent. In previous models, so I'm going to click over to the V2.5 agent over here, you would have these sliders to control conversational styles. So now in V3, and you'll see I'm in V3 here, we've shifted control to the model itself. The model now dynamically adapts its speed and stability based on the emotion needed at that time. You can also use the suggested audio tags here, or if I don't see what I like, I can just add my own. So you can just add the audio tag and then explain when the agent should use this. Now, while you can define these audio tags in the system prompt, we recommend that you use the UI. Simply because when you define it in the system prompt and then change the model back to a previous version, it would actually kind of break the agent and the agent will just read out the text directly or read out the tag directly. So while you can define it in the system prompt if you want complete fine grain granularity and control over your agent, if you do switch back to a previous model, just make sure that you remove it from your system prompt. Now, because the V3 models do a much better job at capturing emotions, it's provided our turn-taking system with a major upgrade. So you'll see here in the advanced settings that speculative turn is on by default. In the past, agents waited for absolute silence to respond, which adds latency and feels robotic. But now with speculative turn, we use our Scribe V2 real-time model to analyze your prosody and semantics in order to speculate when you're done talking based on the context of the conversation. The model will then prefetch the response, making the conversation feel snappy even with the heavier, more expressive model. Now let's jump into a live demo. And you'll see here that I have my V2.5 agent here on the left, my V3 conversational with expressive mode turned on on the right. These agents have the exact same configuration, the exact same system prompts. We'll just see the difference in the voices.

[00:02:54] Speaker 2: Oh no, I am so sorry to hear your flight was canceled. Let me see what I can do. I can help you rebook for tomorrow morning.

[00:03:02] Speaker 1: Tomorrow morning? That's too late. I'm gonna miss my sister's wedding.

[00:03:06] Speaker 2: I completely understand how stressful that is. Let me check partner airlines immediately to see if we can get you there today. We'll do our best to make sure you don't miss that wedding.

[00:03:16] Speaker 1: So that's okay. But let's try the V3.

[00:03:22] Speaker 3: Oh no, I am so sorry to hear your flight was canceled. Let me see what I can do. I can help you rebook for tomorrow morning.

[00:03:30] Speaker 1: Tomorrow morning? That's too late. I'm gonna miss my sister's wedding.

[00:03:33] Speaker 3: I completely understand how stressful that is. Let me check partner airlines immediately to see if we can get you there today. We'll do our best to make sure you don't miss that wedding.

[00:03:43] Speaker 1: So you can hear the difference. The agent isn't just processing a request. It's matching the user's emotion and responding with genuine empathy. You can imagine how this elevates the entire customer experience as a whole. This also works across over 70 languages, providing nuanced delivery in everything from Japanese to Argentinian Spanish. Whether you're building for customer support, gaming, or healthcare, Expresso Mode ensures your customers feel helped, not just processed. Give it a try today, experiment with the audio tags, and see how it transforms your user experience.

ai AI Insights
Arow Summary
The transcript introduces ElevenLabs Agents Academy’s Expressive Mode, enabled by selecting the V3 Conversational text-to-speech model. Expressive Mode aims to make AI agents feel more human through context-aware, emotionally intelligent speech and improved turn-taking. V3 removes manual style sliders (used in V2.5) and instead dynamically adjusts speed and stability based on the emotional context, with optional audio tags that are best configured in the UI to avoid issues when switching models. A major companion upgrade is speculative turn-taking, which uses the Scribe V2 real-time model to analyze prosody and semantics to predict when a user is finished speaking, prefetch responses, and reduce latency. A live demo comparing V2.5 and V3 shows V3 delivering more empathetic, natural responses. Expressive Mode supports 70+ languages and is positioned for use in customer support, gaming, and healthcare.
Arow Title
ElevenLabs Expressive Mode: V3 Conversational + Smarter Turn-Taking
Arow Keywords
ElevenLabs Remove
Agents Academy Remove
Expressive Mode Remove
V3 Conversational Remove
text-to-speech Remove
TTS Remove
context-aware speech Remove
emotional intelligence Remove
audio tags Remove
speculative turn Remove
turn-taking Remove
Scribe V2 Remove
prosody Remove
semantics Remove
latency reduction Remove
customer support Remove
multilingual Remove
70+ languages Remove
Arow Key Takeaways
  • Expressive Mode is enabled by switching the agent voice model to V3 Conversational and is on by default.
  • V3 is context-aware: it infers intent and emotion and dynamically adjusts speaking characteristics without manual style sliders.
  • Audio tags can guide delivery; configuring them in the UI is recommended to prevent issues when reverting to older models.
  • Speculative turn-taking reduces perceived latency by predicting end-of-user speech using real-time prosody and semantic signals via Scribe V2.
  • A/B demos show V3 sounding more empathetic and human, improving customer experience in support, gaming, and healthcare.
  • The feature supports nuanced delivery in 70+ languages, including Japanese and Argentinian Spanish.
Arow Sentiments
Positive: The tone is upbeat and promotional, emphasizing improvements in empathy, naturalness, responsiveness, and broad multilingual support, with an encouraging call to try the feature.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript