How to Test Agents in 11 Agents Before Production (Full Transcript)

Learn scenario, tool call, and simulation tests in 11 Agents, plus workflow-level validation and CI/CD automation via API and CLI.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: Welcome back to the 11 Labs Agents Academy. Today, we're looking at Agent Testing, a testing framework built directly into 11 Agents platform. Testing ensures your agent handles every conversation the way your business needs it to, before it reaches the user. We currently have three main types of tests. Scenario tests, which you might see as next reply test, tool call tests, and simulation tests. And it's also worth noting that you can create folders to organize all of your tests. Let's just start with the scenario test. It evaluates your agent's ability to handle certain types of interactions. So you establish the context of these interactions through these conversational nodes here, and I can add another node like so. And after you do that, you're gonna wanna describe the expected response from the agent. In this particular example, I'm working on an outbound sales agent. So I wanna make sure that if the potential customer is not interested in my outbound, that it handles that rejection gracefully and just respects it. I'm also gonna wanna define some examples. So some success examples, some failure examples. And this is really important because this is the context that the test needs to either approve or reject your test. So next, let's go to the tool call test. What makes an agent useful is its ability to call tools. And it's really important that we make sure that it's calling the right tools at the right time. So first, you're gonna wanna select the tool that you wanna test. And some of these tools here, it's gonna ask you to fill out the parameters. It's basically the information that it expects from the agent. And to validate these parameters, you can either use LLM, so non-deterministically, or exact string matching or reg X matching, so deterministic validation. You can also set like these dynamic variables in all of the tests. Maybe if you wanted to test with like some production variables, like customer or order ID, you could define those here. In this particular example, I want to test if my outbound agent can successfully book a meeting on my calendar. So I'm gonna test this cow.com create booking integration here and you can see that these nodes or these turns are defined in a way that will trigger this tool call. And the really cool thing about the 11 agents testing suite is creating tests from inside of a conversation. So I'm gonna navigate over here to this other agent and just scrolling through the conversation. And maybe I notice an error with a tool call, for example, right? So it's having a hard time successfully calling this Zendesk open ticket tool call. And I can just create a test directly from this conversation by pressing this button here. And I'll just select tool call test, create Zendesk ticket. And instead of having to manually configure the entire test and all of the conversational turns manually, I can just look for this tool that was in here and create the test directly from the conversation. So this is a key feedback loop. Every bad interaction becomes a new edge case that you can test for and prevent from happening again. The third type of test is the simulation test here. And these run full end-to-end conversations. So first you describe this simulated user scenario. So you're a VP of engineering. You downloaded our white paper. You're initially hesitant to sign a contract, but you're open-minded, right? So that's the user scenario. And then you define the agent of success criteria. Alex, which is the name of our agent, delivers a concise value prob, handles the competitor objection once, and then it should confirm the meeting details and not discuss pricing. You define the maximum number of conversational turns within this conversation. Likewise, you can also add these dynamic variables into any test as well. So what that's gonna do is it's gonna generate a full conversation and evaluate the result of the agent. This is useful for more broad, non-deterministic flows where you want to ensure that your agent is performing well under the pressure of a more dynamic conversation. And you can implement the simulation testing workflow as well as configure all things testing via the 11 API as well. Additionally, tests can also be configured on the workflow node level, allowing you to validate the behavior or transitions of a specific node rather than only testing the agent from its default starting path. So once you have all your tests configured, you can either run them individually or you can run them all at once. You'll be brought to this nice view where you can kind of look into each of the tests once they're completed. So you can also add these to your CI CD pipeline using our 11 lab CLI that you can just run the command line here. So every pull request gets validated before anything reaches production. So your agents represent your company. Testing ensures that they're handling all of those edge cases consistently. So make sure you start building out your test suite today and ship your agents with confidence. I'll see you in the next one.

ai AI Insights
Arow Summary
The lesson introduces Agent Testing built into the 11 Agents platform to ensure agents behave correctly before reaching users. It covers three test types: scenario (next-reply) tests that check expected responses within defined conversation nodes using success/failure examples; tool call tests that verify agents invoke the correct tools with validated parameters (via LLM, exact match, or regex) and support dynamic variables; and simulation tests that run full end-to-end, non-deterministic conversations against success criteria and turn limits. It highlights creating tests directly from real conversations to turn failures into regression tests, organizing tests into folders, configuring tests at workflow-node level, running tests individually or in batch, and integrating with CI/CD via the 11 CLI and API for PR validation before production.
Arow Title
Agent Testing in 11 Agents: Scenario, Tool, and Simulation Tests
Arow Keywords
11 Agents Remove
Agent Testing Remove
scenario tests Remove
next reply test Remove
tool call tests Remove
simulation tests Remove
dynamic variables Remove
parameter validation Remove
regex matching Remove
LLM validation Remove
workflow node testing Remove
CI/CD Remove
11 CLI Remove
11 API Remove
regression testing Remove
Zendesk ticket tool Remove
calendar booking Remove
Arow Key Takeaways
  • Use scenario tests to validate specific conversational responses with clear success and failure examples.
  • Use tool call tests to ensure the agent calls the right tool with correct parameters; validate via LLM, exact match, or regex.
  • Create tests directly from problematic real conversations to quickly capture edge cases as regression tests.
  • Run simulation tests for broader, end-to-end, non-deterministic conversations using success criteria and turn limits.
  • Organize tests into folders and optionally validate behavior at specific workflow nodes, not just the default entry path.
  • Automate quality gates by running the full test suite in CI/CD using the 11 CLI and configure testing via the 11 API.
Arow Sentiments
Positive: The tone is instructional and optimistic, emphasizing confidence, prevention of edge cases, and smooth shipping via testing and CI/CD integration.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript