Understanding Decision Tree Regression: Intuition, Math, and Visualizations

Convert Your Audio To Text

4.9/5

3746 customer reviews

Learn the concept of regression using decision trees, including intuition, math, and visualizations. Watch till the end for a comprehensive understanding.

Decision Tree Regression Clearly Explained

Added on 09/29/2024

Speakers

Add new speaker

Speaker 1: Hello, people from the future. Welcome to Normalize Nerd. Today, I will explain the concept behind regression using decision trees. I will discuss both the intuition and the math behind this. It will contain a lot of visualizations. So watch this video till the end. If you want to see more videos like this, please subscribe to this channel and hit the bell icon. We have a discord server too. So please feel free to join. The link will be in the description. So let's get started. First of all, please make sure that you have watched my previous video on decision tree classifiers because there I have explained the basics of decision trees and you will need that knowledge for this video also. The link is in the description. So today's topic is regression. Let's have a simple regression dataset. We have two features, X0 and X1. Y is the target variable. For our convenience, we will represent this data in two dimensions. The horizontal axis denotes X0 and the vertical axis denotes X1. To represent the Y value, I have used colors. Yellow means a lower value of Y and red means a higher value. If you are familiar with linear regression, you can see that it's easily solvable by finding a best fit 2D line, that is a plane. But how can we solve such a regression problem using a decision tree? Well, the general concept is the same as the decision tree classifier. We recursively split the data using a binary tree until we are left with pure leaf nodes. There are just two differences. How we define impurity and how we make a prediction. For now, keep these two questions in the back of your mind. I will explain them later. First I want to show you the decision tree for this problem and how it splits the dataset.

Speaker 2: So here's our tree. Focus on the root node.

Speaker 1: I'm taking all the data points here. As you already know, now we are gonna split it based on the condition of the root node. The condition is whether the X0 feature is less than or equal to 1. In the plot, the splitting condition looks like this vertical line. Every point that lies either left or on the line satisfies this condition. We will place those points in the left child and the points that don't meet the condition will go to the right. After the first split only, you can see the yellowish points are on the left and the reddish points are on the right. This shows that the impurity of the nodes are reducing. We are gonna follow the splitting rule for all the remaining nodes. Let's turn on the autopilot. Okay, here's the complete regression tree with the proper splits. Now the question is how to predict the Y value of a new data point. Let's have a new point at 16, minus 2. At first, we check if X0 is less than or equal to 1. It doesn't satisfy this condition. So it goes to the right. Here also the condition fails and we move to the right. Finally, this condition is true. Hence we move to the left. So we arrive at a leaf node that contains only 3 data points. To predict the value of our new data point, we just need to find the average Y value of all the 3 data points present in this leaf node. Yes, it's that simple. The prediction turns out to be around 181. By following the same method, we can predict the Y value of any point on this 2D plane. Let me show you how that looks like. The interesting thing is, even though it was a regression problem, still we end up dividing the feature space into several regions which is very different from other regression techniques. The color of a region tells you the predicted value of every single point in that region. Okay, now comes the most important part. How do we split the dataset?

Speaker 2: Let's clear the clutter and focus on the root node.

Speaker 1: Here we will have the whole dataset and the task will be to find the best splitting condition. Just like my previous video, we will examine two candidate conditions. The first condition is x0 less than or equal to 1.

Speaker 2: If we follow this condition, then the splits will look like this. Our second condition is x1 less than or equal to 2.

Speaker 1: In this case, the division is like this. Don't forget that the points that satisfy the condition go to the left and the rest to the right. Now the question is, which is a better split? To find this, we need to calculate which split is decreasing the impurity of the child nodes the most. For that, we need to compute variance reduction. Yes, in the context of regression, we use variance as a measure of impurity. Just like we used entropy or Gini index in the classification problem. Focus on the complete dataset. We are gonna compute the variance of the whole dataset using this formula. Remember, a higher value of variance means a higher impurity. So the variance at the root turns out to be this.

Speaker 2: Similarly, let's compute the individual variance for all the divided datasets. Okay, now we need to compute the variance reduction.

Speaker 1: For that, we just subtract the combined variance of the child nodes from the parent node. The weights are just the relative size of the child with respect to the parent.

Speaker 2: Let's compute this for both the splits. Just look at this.

Speaker 1: The variance reduction for the first split is so much more than the second one. And it makes sense because in the first split, there is a significant difference between the child nodes. The left one has more yellow points and the right one has more red points. But in the second split, the red and yellow points are kind of mixed in both the nodes. This tells us that the first split can decrease the impurity much more than the second one. So finally, we come to the conclusion that we should choose the first one. Here we only compared two splits. In reality, the model evaluates the variance reduction for every possible split and selects the best one. This process of selection happens recursively unless we have reached our desired depth. I hope now you have a very good understanding about decision tree regression. In the next video, I will show you how to code such a tree from scratch. If you like this video, please share this video and don't forget to subscribe. Stay safe and thanks for watching.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3746 customer reviews

1/737

Verified Order

“You are doing great and the transcription was perfect last time. ”

Terrence Corrigan

Dec 3, 2025

“I've utilized GoTranscript as a Producer for many projects in many languages and I'm very happy with their services. Their turnaround time is amazing, and more importantly their accuracy of providing a human transcriber is accurate -- and I can trust them, regardless of the language.”

David Haneke

Nov 25, 2025

“I loved it”

Ivy

Oct 29, 2025

“Price is fair, accurate transcriptions and user friendly.I would recommend.”

Robert

Oct 20, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support