Decision Tree Made Simple

Himanshu Sachdeva
Oct 2, 2020
4 min read

Updated: Oct 4, 2020

Decision Trees are a non-parametric* supervised learning method used for classification and regression. *Non-parametric algorithms do not make strong assumptions about the form of the mapping function and they are free to learn any functional form from the training data. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

The decision trees are easy to interpret. Almost always, you can identify the various factors that lead to the decision. In fact, trees are often underestimated for their ability to relate the predictor variables to the predictions. With decision trees, you can easily explain all the factors leading to a particular decision/prediction. Hence, they are easily understood by business people.

With high interpretability and an intuitive algorithm, decision trees mimic the human decision-making process and excel in dealing with categorical data. Unlike other algorithms such as logistic regression or SVMs, decision trees do not find a linear relationship between the independent and the target variable. Rather, they can be used to model highly nonlinear data.

As the name goes, a decision tree uses a tree-like model to make predictions. It resembles an upside-down tree. It is also very similar to how you make decisions in real life: you ask a series of questions in a nested if-then-else structure to arrive at a decision.

On each node, you ask a question to further split the data held by the node. If the test passes, you go left; otherwise, you go right. A decision tree splits the data into multiple sets. Then, each of these sets is further split into subsets to arrive at a decision (or label classification) which is represented as leaf of the tree.

But how do you decide, among all the available attributes, which one to be used to split first?

If all the data points have identical labels, there is no need to apply any rule or split the data, which means the decision tree is empty. This suggest that more homogenous the labels are in the dataset, the simpler the decision tree will be.

In real-world data sets, you will almost never get completely homogenous data sets (or even nodes after splitting). So try to do the best you can, i.e. try to split the nodes such that the resulting nodes are as homogenous as possible. A data set is completely homogeneous if it contains only a single class label.

Features are chosen such that they maximize the homogeneity after splitting. The tree will pick those features first that maximally increase the homogeneity, in a way providing the maximum information about the data set. You will find the most informative features of the data set towards the top of a tree.

Tree models, where the target variable can take a discrete set of values are called classification trees. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Classification And Regression Tree (CART) is a general term for this.

Regression with Decision Trees

There are cases where you cannot directly apply linear regression to solve a regression problem. Linear regression will fit only one model to the entire data set; whereas you may want to divide the data set into multiple subsets and apply linear regression to each set separately.

In regression problems, a decision tree splits the data into multiple subsets. The difference between decision tree classification and decision tree regression is that in regression each leaf represents a linear regression model, as opposed to a class label. For classification, the measure of homogeneity is Ginni Index or the information gain/entropy. For regression, the measure of homogeneity is variance. If you get a data set where you want to perform linear regression on multiple subsets, decision tree regression is a good idea.

Advantages of the Decision Tree

Predictions made by a decision tree are easily interpretable.
A decision tree does not assume anything specific about the nature of the attributes in a data set. It can seamlessly handle all kinds of data — numeric, categorical, strings, Boolean, and so on.
It does not require normalisation since it has to only compare the values within an attribute.
Decision trees often give us an idea of the relative importance of the explanatory attributes that are used for prediction.

Disadvantages of the Decision Tree

The decision trees may consist of lots of layers, which can make it complex.
The decision trees are prone to overfitting. If allowed to grow with no check on its complexity, a tree will keep splitting till it has correctly classified (or rather, mugged up) all the data points in the training set.
Decision trees tend to be very unstable, which is an implication of overfitting. A few changes in the data can change a tree considerably.
For more class labels, the computational complexity of the decision tree may increase.
Can create biased learned trees if some classes dominate.
Need to be careful with parameter tuning.

How to avoid overfitting the Decision Tree model

Overfitting is one of the major problem for every model in machine learning. If model is overfitted it will generalize poorly with new samples.

There are two broad strategies to control overfitting in decision trees: truncation and pruning.

Truncation - also known as pre-pruning. Here we consciously stop the tree while it is still growing so that it may not end up with too many leaves containing very few data points. One approach to decide when to stop building the tree is to define threshold homogeneity. If the homogeneity after splitting is less than the defined threshold, then you should stop further split. Besides, you can also limit the depth of the tree or set the minimum size of the partition after the split to contain the growth of the tree.
Pruning - Let the tree grow to any complexity. Then, cut the branches of the tree in a bottom-up fashion, starting from the leaves. This results in a decrease in tree complexity and also helps in reducing overfitting. You check the performance (measured as accuracy) of the pruned tree, and if it is higher than the accuracy of the original tree, then you keep that branch chopped.

This article provides a basic explanation of Decision Trees in Machine Learning. Feel free to share your comment(s) or contact me.

Decision Tree Made Simple

Regression with Decision Trees

Advantages of the Decision Tree

Disadvantages of the Decision Tree

How to avoid overfitting the Decision Tree model

Recent Posts

Comments