Speaker 1: My name is Jack Humphrey. I've had the privilege of working for Indeed for the last nine years here in Austin, and I'm extremely excited to talk to you about something I'm passionate about, which is using metrics-driven insights to help improve process in people. I hope many of you have heard of Indeed. We have a very straightforward mission. We help people get jobs. We're the biggest job site in the world with over 200 million unique visitors every month, and we have a global engineering organization. But more relevant to today's topic is kind of how we work in Indeed engineering. And I like to say we move fast and we try things. So we very much believe that the wrong approach is to bet on a small number of great ideas and instead explore lots and lots of ideas to find the ones that work, to eliminate the ones that don't work, and focus in, invest more in the ones that show promise. To do this at Indeed, we've kind of adopted a few practices that I think are pretty key. First, we need to hire great people. You've heard a lot about hiring today and how important it is. We need to hire people who want to be in an environment where their ideas are valued, where it's important for them to bring forward ideas to make things better for our users and in how we do our work. But you've got to then follow through by giving them the ownership and autonomy to actually take ownership of what they're doing and deliver on exploring ideas. And then, very important, you've got to make sure that they have the tools to do this in an effective way, and specifically tools to be very data-driven and to be able to reason about, okay, I've got this idea, how am I going to prove success, or how am I going to invalidate this idea? And to that end, over the years, we've developed a couple of tools, and these are the two main tools that we use for this, and we've open-sourced them as well. Proctor is an A-B testing framework, and Imotip is a data analytics platform. You can learn more about these on our open-source website, but I'm going to just say a few more words about Imotip because I mention it a couple of other times in the presentation. So as I said, Imotip is a data analytics platform. It is scalable, efficient, and fast, and what it really allows us to do is rapidly explore and analyze time series data sets, and to do that for arbitrarily large data sets. It comes with a query language that is SQL-like. It has a web user interface and a distributed backend, and we've invested a lot in this over the years, and we really use it as a way to run our business. I think that I want to call out, like, I loved the idea that Brian put forward of, you know, get a dashboard, get one metric on a dashboard, and dashboards are great. I love dashboards, and Imotip powers a bunch of dashboards at Indeed. What's wonderful and why I say rapid exploration is that, so you've got a dashboard in front of you. Do you trust the number? And I'll come back to that more in a second, but if you can dive down and see kind of how that data brought forward that number and slice and dice in a number of ways, like you can with Imotip, it's extremely powerful. So metrics-driven development, we have these ways of using data so that we can quickly and iteratively experiment with ideas, so that we can quickly and iteratively improve. But what I want to talk about is not how we had used this to do our product development, but how we've kind of turned this inwards. And so in recent years, it became really important to me to think about how can we use metrics to improve how we work? How can we use metrics to help people improve? And so that's what I want to talk about, and this is a process that really is basically the same thing we do for products, but I'll walk through it. First of all, and again, echoing something Brian said, measure everything. You don't know what questions you're going to want to ask. You've got to measure everything. You've got to get all the data in so that you can explore it. Second, once you've got all that data, ask lots of questions. Get the data into a form where, whether it's an Imotip or whatever you do, get it in a form where you can ask lots of questions and really extract learnings from the data, because that's why you measured everything, so that you could learn. And then take your learnings and turn those into ideas for improvement or ideas to get better. And then it's a virtuous cycle. You've got to then measure to confirm that your idea was actually a good idea and made an improvement. And if it wasn't, move on and try something else. I also think you can do this to help people improve. And it's exactly the same process, except you aren't implementing process as improvement. You're coaching individuals or coaching teams. And I imagine that when I say something like that, a lot of you feel like this chihuahua, who I really like. He's really skeptical. And you might say, is measuring process a good idea? Is measuring people a good idea? And I think it is. That's why I'm giving this talk. But there's a big asterisk, which you've got to proceed with caution. And I think if you felt skeptical when I said you can use this for people, I'm right there with you. And I'll talk more about that. So proceed with caution. I'll talk about two things that you should be cautious about when using data this way. The first is something called Goodhart's Law. I hadn't heard of this until relatively recently, but it really resonated with me. It says when a measure becomes a target, it ceases to be a good measure. And I think this is sort of probably intuitive to a lot of people. You hear about things like gaming the stats. And I think the reason that a measure ceases to be a good measure when you make it into a target is that you start measuring people's ability to hit the target instead of whatever the measure was originally. And that's not to say that targets are bad, but you can get all sorts of behaviors that you didn't intend when you're really giving an incentive to hit the target rather than to improve whatever it was you wanted to improve. So pick targets very carefully. Be conscious of how Goodhart's Law might come into play. Avoid targets when you can avoid them. And then the second thing is something I call the number six principle. And I've named it after this character from a classic TV show, The Prisoner. Character's named number six and has this famous line, I am not a number. And the idea here is nobody likes to be reduced to a number. Nobody likes to be reduced to a set of statistics or a set of numbers. And so if you want to use numbers to think about how people are doing, then you've got to be really careful to make sure you emphasize that numbers are not going to define how you think about them, that numbers are not going to be sort of this is who you are to me. You've got to be sure that everybody understands that you view them holistically and that you acknowledge that some things can't be measured. And there are qualitative impacts that people have on your organization that can be awesome and incredibly positive and also extremely negative and may not show up in the numbers at all. Numbers can lie, right? But I still think they're useful. Measures aren't inherently bad. It's all in how you use them. And so the way that I like to talk about it and the way that I like to talk about it with my teams is the metrics should serve the team, not the other way around. The metrics should serve the team. The team should not serve the metrics. And if you think about it, the team serving the metrics is kind of what a target is. It's kind of like what a KPI can be if you're not careful. But the point is everyone should embrace this attitude that we can understand how we're doing. We can understand if we're headed in the right direction using these numbers, but not let them drive us. OK, so how have I done this? And in recent years, again, I've tried to turn this approach inward. And so the first step is measure everything. And at Indeed, measure everything generally means get it into MOTEP somehow, get a data set into MOTEP that you can ask questions of. So we do that with everything that happens in our products. And in recent years, we've been doing everything that kind of impacts process. So every system of record that we can actually get into a data set in MOTEP, we do. Get commits, updates to JIRA issues, production deploys, edits on our wiki. I even have a data set on that. And lots and lots of more things. Because again, you don't know what questions you're going to want to ask. So get everything in a form that you can ask questions of. And I'm going to talk about an example here in a second that's going to zoom in on what I mean when I say we can ask questions about JIRA issue updates. Because we track all of our work in JIRA. So once you're measuring everything, again, ask lots of questions of the data. Try to extract learnings. And here's an example. So I'll walk you through an example. This is a hypothesis in the form of a complaint. And it was a real one that we dealt with at Indeed. Translation verification takes too long. So any time there's a change to a string, Indeed is in lots and lots of countries, lots of languages, has to be translated. I don't have time to get into the details of why translation verification was too long. But I will note that we track translation work in JIRA, conveniently. And we have this... As of last year, we have this data set in MOTEP that lets us look back and see in a time series data set every action that was taken on every JIRA issue in our system and a lot of other useful information about those actions. So our question becomes, from this hypothesis, becomes, OK, it's taking too long. How long is it taking? Let's get to some numbers. Let's not just rely on the anecdote. To Vivek's point earlier, we've got an anecdote. Let's see if we can back that up with data. And the specific question we can ask of our data set is, how long are translations represented in JIRA in this pending verification state? And so I've got some IQL on here. It's not important that you understand the IQL syntax, but just to kind of walk you through how I asked the question in the form of a query here, I'm saying, I'm going to look at this JIRA actions data set. I'm going to look at a time range, which is around three months. I'm going to filter down to issues of this type translation. I'm going to look at things that moved from pending verification to another state. And I'm going to filter down to a single project. And that's not a real project name. Lorem is not a real project name. But this is real data for a real project at Indeed, just protecting the guilty. And so then the next part is saying, we want to group the data up into buckets, and in this case, in time, buckets of one day. And we're going to extract this metric called time and state, which we measure on every one of these time series events. So we'll say how long, we'll basically be summing up how long it was in the pending verification state. That comes back in seconds. MOTEP is going to politely do the math to turn that into days. And here's a graph that it generates. This is a graph that came back instantaneously. I said I wanted it to be cumulative, so it was cumulative. And I see that over the course of this three months, that total time and state added up to 232 days. So that sounds like a lot of days for this project. It's not that big of a project. So this is some evidence for the hypothesis. But it's really, really important not to stop with just, like, you get an answer you like, stop, right? No, you've got to be skeptical. You've got to, and again, I love that Vivek's presentation earlier. You've got to be skeptical. You've got to ask more questions. You've got to triangulate the truth somehow as best you can. And there's lots more questions that I asked or could ask about this particular problem. But for example, I could switch my query around and just change one part of it and say, give me the number of distinct issue keys. So I want to know, was this a few issues that added up to 232 days, lots of issues? And what does it look like over time? This is not a cumulative graph, but this is a graph showing that it spiked over 30 issues on some days and was relatively quiet or nothing happening on other days. And this starts to build a story for me of, OK, things are happening in batches, right? Things are transitioning out of the state in batches. I can switch it to cumulative and see that it adds up to 278 issues over that amount of time. So I'm building my case. I'm building evidence for my hypothesis. And that's great. So fast forward to, like, now let's do something, right? If you've got good measurements and you've got good questions, you're learning. And when you're learning, you can start to say, what should we prioritize doing? And translation verification was a problem for Indeed. So let's try to improve. In this case, I won't leave you hanging. I will follow through here. There was a better way. And I'm not going to explain what the better way was, except to say that it was about getting translations separated from code all the way to the deploy. So getting translations into production without being part of a code deploy. That seems like a pretty good solution to the problem of it takes a long time for translations to get verified. And so we rolled that out incrementally, project by project. And so I can say, like, I'm thinking about doing this on this legacy project lorem. And it's going to take some effort. Is it really worth it? So let's find a similar project that's already using the new process. In this case, I'll call it Ipsum. And let's see if we can compare them side by side. So now we're kind of in the state of, like, we're rolling out a solution. Let's measure again to see if it's actually making things better. So I did a slight tweak to my IQL query so that I could look at what's the 90th percentile? Let's think about the worst cases for time and pending verification. And here I'm no longer filtering down to one project, but instead I'm filtering in my group by. So I'm saying, I just want to see data for two projects, lorem and Ipsum. And I'm saying, Imhotep, please give me back the 90th percentile of time and state. So it comes back with some tabular data, which I've annotated here. But the great news is the new process looks faster. This similar project is seeing the 90th percentile at 1.8 days, much faster than the 12 days for the legacy project. So now the next step is we've got good evidence to actually go invest in doing the work. And then we'll come back and measure again and make sure that we improved. So that's just a really one example that I pulled out to talk about improving process using this metrics-driven approach. Now I'd like to talk a little bit about the people side and helping people improve. It was several years ago I got interested in how I could help people think about how their work is trending over time and help their managers understand what they were working on and create more transparency in kind of a convenient way. And I built this tool that we use at Indeed called Hindsight. And the tagline is quarterly end stats. But basically the idea is that we have in this is a very simple UI that kind of shows you over time what you've been doing. And it's mostly JIRA-based, but not entirely JIRA-based. It uses some other systems as well. And the rows here are different kinds of actions, like resolving issues, creating issues, commenting on issues, having your issues reopened because of some problem. And the columns are quarters. So you can kind of look at raw numbers over time of these actions that you're taking. And all the numbers are clickable. So you can click on, like I can say, OK, 65 issues in Q2. What was the breakdown of that? And I can see that it was across eight projects and kind of see the count by project. And then I can immediately dive over into JIRA to see what the actual changes were or what the actual work was. And so this is just a kind of a standard, consistent way to look at what you're doing over time. Now, what about the number six principle? What about Goodhart's Law? How do they factor in here? So one of the things that we've done, and I will say, like, I created this. I rolled this out. I was extremely nervous about all the ways this could go wrong. And I was thinking about Goodhart's Law, even though I didn't know the name of it. And I was thinking about how people would hate to think that we're thinking of them as numbers. So we did a couple of things that I think were really important in kind of socializing this and making it work the way we wanted it to work. One is we said, hindsight is a, look, this is a starting point for discussion. These numbers don't define you. These numbers might imply something that's absolutely not true. And so there's no meaning directly from this number. But this might surface a trend over time that you and your manager should talk about. This might, it might be positive, it might be negative, it might be neutral. There might be things that we could observe about a particular quarter that we want to dig into and say, what was going on here? And so it's a starting point for discussion between an individual and their coach. And in that way, I think it's really, really valuable. And if people believe that that's what it is, then it doesn't have all that negative impact of I'm being reduced to numbers. Secondly, we constantly guard against treating anything involving hindsight as targets. Immediately after rolling it out, I started getting asked questions like, what's the median across our developers? What's a reasonable expectation for a junior engineer, for a senior engineer? And I've really just held the line and said, nope, we're not going to use it that way. That's not what it's for. And that's also why it's a card interface per person and not a dashboard where you can look at people next to each other. Of course, you can open two browser windows and look at people next to each other. But we're really saying that's not what it's for. That's not a leaderboard. So in that way, we've actually been successful, and I've had some external validation from managers who've come in since we implemented this and said, when they saw it, they panicked and said, I shouldn't have come to work here. And then within a few months, they see, OK, no, it's healthy. It's working the way Jack hoped it would work. I'll give you an example conversation. So I might look at your stats and say, you resolved 100 issues this quarter. And wow, you did a lot of work. And then I look down at the reopen line, 30 of them were reopened. So now I'm starting to tell myself a story. You're super productive. You're getting a lot done. But you're trying to ship a lot of buggy code, and the QA engineers are reopening your issues. But again, remember, be skeptical. Don't jump to conclusions. And it's really easy at this scale to dig into the data. You don't have to do any fancy queries. You just have to go read 30 JIRA issues and see, were these all bugs? And in fact, the answer might be, only 10 of them were bugs. And that's totally fine. That's kind of like none of them seemed all that horrible even. And maybe the rest were because of misunderstandings or unclear requirements, which might actually raise some other questions, which you want to sort of explore with the team about what's happening in the team or what's happening with product management or quality assurance or some other thing. The point is, conversations happen because you've got some starting point, which are numbers, which are measures. And those conversations between a manager and an employee amongst a team could generate some really great ideas for improvement. So I'll leave you with that. Answer a question. Learn. Improve. I've seen it work. You can make it work. You don't have to be afraid, but you have to be careful. And I hope you'll think about ways it could work for your teams. Thanks. Thank you.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now