How to Test Agents with 11 Agents’ Built-In Framework (Full Transcript)

Learn scenario, tool call, and simulation tests in 11 Agents, plus creating tests from conversations and integrating with CI/CD for reliable agent releases.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: Welcome back to the 11 Labs Agents Academy. Today, we're looking at Agent Testing, a testing framework built directly into 11 Agents platform. Testing ensures your agent handles every conversation the way your business needs it to, before it reaches the user. We currently have three main types of tests. Scenario tests, which you might see as next reply test, tool call tests, and simulation tests. Let's just start with the scenario test. It evaluates your agent's ability to handle certain types of interactions. So you establish the context of these interactions through these conversational nodes here, and I can add another node like so. And after you do that, you're gonna wanna describe the expected response from the agent. In this particular example, I'm working on an outbound sales agent. So I wanna make sure that if the potential customer is not interested in my outbound, that it handles that rejection gracefully and just respects it. I'm also gonna wanna define some examples, so some success examples, some failure examples. And this is really important because this is the context that the test needs to either approve or reject your test. So next, let's go to the tool call test. What makes an agent useful is its ability to call tools. And it's really important that we make sure that it's calling the right tools at the right time. So first, you're gonna wanna select the tool that you wanna test. And some of these tools here, it's gonna ask you to fill out the parameters. It's basically the information that it expects from the agent. And to validate these parameters, you can either use LLM, so non-deterministically, or exact string matching or regex matching, so deterministic validation. You can also set these dynamic variables in all of the tests. Maybe if you wanted to test with some production variables, like customer or order ID, you could define those here. In this particular example, I want to test if my outbound agent can successfully book a meeting on my calendar. So I'm gonna test this cal.com create booking integration here and you can see that these nodes or these turns are defined in a way that will trigger this tool call. And the really cool thing about the 11 agents testing suite is creating tests from inside of a conversation. So I'm gonna navigate over here to this other agent and just scrolling through the conversation and maybe I notice an error with a tool call, for example, right? So it's having a hard time successfully calling this Zendesk open ticket tool call and I can just create a test directly from this conversation by pressing this button here and I'll just select tool call test, create Zendesk ticket and instead of having to manually configure the entire test and all of the conversational turns manually, I can just look for this tool down in here and create the test directly from the conversation. So this is a key feedback loop. Every bad interaction becomes a new edge case that you can test for and prevent from happening again. The third type of test is the simulation test here and these run full end-to-end conversations. So first you describe this simulated user scenario. So you're a VP of engineering, you downloaded our white paper, you're initially hesitant to sign a contract but you're open-minded, right? So that's the user scenario and then you define the agent success criteria. Alex, which is the name of our agent, delivers a concise value prob, handles the competitor objection once and then it should confirm the meeting details and not discuss pricing. You define the maximum number of conversational turns within this conversation. Likewise, you can also add these dynamic variables into any test as well. So what that's gonna do is it's gonna generate a full conversation and evaluate the result of the agent. This is useful for more broad, non-deterministic flows where you wanna ensure that your agent is performing well under the pressure of a more dynamic conversation and you can implement the simulation testing workflow as well as configure all things testing via the 11 API as well. Cool, so once you have all your tests configured, you can either run them individually or you can run them all at once. You'll be brought to this nice view where you can kind of look into each of the tests once they're completed. So you can also add these to your CI CD pipeline using our 11 lab CLI that you can just run the command line here. So every pull request gets validated before anything reaches production. So your agents represent your company. Testing ensures that they're handling all of those edge cases consistently. So make sure you start building out your test suite today and ship your agents with confidence. I'll see you in the next one.

ai AI Insights
Arow Summary
The session introduces Agent Testing built into the 11 Agents platform, explaining how testing validates agent behavior before release. It covers three test types: scenario tests for expected replies in defined conversational contexts (with success/failure examples), tool call tests to ensure correct tool selection and parameter validation (via LLM, exact match, or regex, plus dynamic variables), and simulation tests that run end-to-end conversations against success criteria and turn limits. It highlights creating tests directly from real conversations as a feedback loop for edge cases, running tests individually or in bulk, integrating with CI/CD via the 11 Labs CLI, and automating via the 11 API to ship reliable agents.
Arow Title
Agent Testing in the 11 Agents Platform: Scenario, Tool, and Simulation
Arow Keywords
11 Labs Agents Academy Remove
11 Agents platform Remove
agent testing Remove
scenario tests Remove
next reply test Remove
tool call tests Remove
simulation tests Remove
dynamic variables Remove
parameter validation Remove
regex matching Remove
LLM validation Remove
cal.com booking Remove
Zendesk ticket Remove
create tests from conversation Remove
edge cases Remove
CI/CD Remove
11 Labs CLI Remove
11 API Remove
Arow Key Takeaways
  • Agent Testing is built into the 11 Agents platform to validate conversations before users see them.
  • Use scenario tests to check expected replies within a defined multi-turn context, supported by clear success and failure examples.
  • Use tool call tests to verify the agent calls the correct tool with correct parameters; validate deterministically (exact/regex) or non-deterministically (LLM).
  • Dynamic variables can be used across tests to incorporate real-world IDs or production-like inputs.
  • Create tests directly from problematic real conversations to capture edge cases and prevent regressions.
  • Use simulation tests for end-to-end, more dynamic conversations with explicit success criteria and turn limits.
  • Run tests individually or all at once, review results per test, and automate testing via the 11 API and CI/CD with the 11 Labs CLI.
Arow Sentiments
Positive: The tone is instructional and confident, emphasizing reliability, feedback loops, and shipping with confidence. Phrases like 'really cool thing,' 'nice view,' and 'ship your agents with confidence' signal an upbeat, product-enabling sentiment.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript