Visualizing Decision Trees in Scikit-Learn 0.21: New Functions and Tips
Explore new Scikit-Learn 0.21 functions for visualizing decision trees: plot tree with matplotlib and export text without external libraries.
File
Visualize a decision tree two different ways
Added on 09/28/2024
Speakers
add Add new speaker

Speaker 1: Tip number 24, this is one of my favorites so far. So one of the reasons people use decision trees is because of their high interpretability and to interpret them, you have to visualize them, which is the point of this tip. So two new functions in Scikit-Learn 0.21 for visualizing decision trees, plot tree, which uses matplotlib instead of graphviz, and export text, which doesn't require any external libraries. So let's scroll down to the tree. In previous versions of Scikit-Learn, if you wanted to visualize a tree, you would have to use graphviz, and graphviz was a pain to install, and even when you got it working, it was a bit of a pain to use. You would run some Python code, it would output a file, you would leave Python, go to the command line, and convert it to a different file type, and then you could look at it. So plot tree, which is what I'm using here, is better in two ways. It only uses matplotlib, and the tree appears directly in your notebook. So let me briefly explain what you are looking at if you've never visualized a decision tree. So each box is a node. There are three internal nodes and four leaf nodes. The internal node is everywhere there was a split, so it tells you at the top the rule that was used to split that node, and the way it works is if the rule is true, you go left, if the rule is false, you go right. We see the genie impurity before the split, the number of samples before the split, the classes of those samples before the split, and the majority class in that node. So in this dataset, sex of zero is male, so if male, you go left, if female, you go right. Now that split was chosen by the tree to maximize the decrease in impurity, meaning the goal of the split is to increase the node purity. So below it, in these boxes, you'll see the new genie impurity, the new number of samples in that node, the class proportions within that node, and the majority class in that node. And you can see that the genie impurity has decreased in both of the boxes below. And then these boxes, these nodes split again, which is why they have another splitting rule to decide whether you go left or right here, and left or right here. Now these leaf nodes are the same, except the tree has stopped growing, so there's no more splits, hence why there's no more rules listed in those nodes. The color coding you're seeing in all of these nodes is based upon the genie impurity. Darker means more pure, which is ultimately what the tree is trying to achieve. This bottom right one is all white because it is perfectly split between the classes. Now if you were to plot a regression tree, it would look very similar to this, except that you would see mean squared error instead of genie impurity as the criteria for splitting. Final note is that this can only be used with a single tree, not an ensemble of trees like random forests. Finally, let's take a look at export text at the bottom, also new in 0.21. This is the same tree as this here above, just visualized in a different way. It, of course, does not require a matplotlib. It doesn't include nearly as much information, but it is much more compact.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript