[00:00:00] Speaker 1: Welcome back to the 11 Labs Agents Academy. Today, we're looking at Agent Testing, a testing framework built directly into 11 Agents platform. Testing ensures your agent handles every conversation the way your business needs it to, before it reaches the user. We currently have three main types of tests. Scenario tests, which you might see as next reply test, tool call tests, and simulation tests. Let's just start with the scenario test. It evaluates your agent's ability to handle certain types of interactions. So you establish the context of these interactions through these conversational nodes here, and I can add another node like so. And after you do that, you're gonna wanna describe the expected response from the agent. In this particular example, I'm working on an outbound sales agent. So I wanna make sure that if the potential customer is not interested in my outbound, that it handles that rejection gracefully and just respects it. I'm also gonna wanna define some examples, so some success examples, some failure examples. And this is really important because this is the context that the test needs to either approve or reject your test. So next, let's go to the tool call test. What makes an agent useful is its ability to call tools. And it's really important that we make sure that it's calling the right tools at the right time. So first, you're gonna wanna select the tool that you wanna test. And some of these tools here, it's gonna ask you to fill out the parameters. It's basically the information that it expects from the agent. And to validate these parameters, you can either use LLM, so non-deterministically, or exact string matching or regex matching, so deterministic validation. You can also set these dynamic variables in all of the tests. Maybe if you wanted to test with some production variables, like customer or order ID, you could define those here. In this particular example, I want to test if my outbound agent can successfully book a meeting on my calendar. So I'm gonna test this cal.com create booking integration here and you can see that these nodes or these turns are defined in a way that will trigger this tool call. And the really cool thing about the 11 agents testing suite is creating tests from inside of a conversation. So I'm gonna navigate over here to this other agent and just scrolling through the conversation and maybe I notice an error with a tool call, for example, right? So it's having a hard time successfully calling this Zendesk open ticket tool call and I can just create a test directly from this conversation by pressing this button here and I'll just select tool call test, create Zendesk ticket and instead of having to manually configure the entire test and all of the conversational turns manually, I can just look for this tool down in here and create the test directly from the conversation. So this is a key feedback loop. Every bad interaction becomes a new edge case that you can test for and prevent from happening again. The third type of test is the simulation test here and these run full end-to-end conversations. So first you describe this simulated user scenario. So you're a VP of engineering, you downloaded our white paper, you're initially hesitant to sign a contract but you're open-minded, right? So that's the user scenario and then you define the agent success criteria. Alex, which is the name of our agent, delivers a concise value prob, handles the competitor objection once and then it should confirm the meeting details and not discuss pricing. You define the maximum number of conversational turns within this conversation. Likewise, you can also add these dynamic variables into any test as well. So what that's gonna do is it's gonna generate a full conversation and evaluate the result of the agent. This is useful for more broad, non-deterministic flows where you wanna ensure that your agent is performing well under the pressure of a more dynamic conversation and you can implement the simulation testing workflow as well as configure all things testing via the 11 API as well. Cool, so once you have all your tests configured, you can either run them individually or you can run them all at once. You'll be brought to this nice view where you can kind of look into each of the tests once they're completed. So you can also add these to your CI CD pipeline using our 11 lab CLI that you can just run the command line here. So every pull request gets validated before anything reaches production. So your agents represent your company. Testing ensures that they're handling all of those edge cases consistently. So make sure you start building out your test suite today and ship your agents with confidence. I'll see you in the next one.
We’re Ready to Help
Call or Book a Meeting Now