Pruning in Decision Tree

The most common problem while learning a decision tree is to learn how to work on the optimal size for the resulting tree which leads to better accuracy of the models. Trees have too many branches and layers which results in overfitting of the training data. Pruning is the process that helps in preventing the overfitting of the training data. 

In Pruning a decision tree means that it generally removes the subtree that is redundant and which is not used for split and get replaced by leaf nodes.

It is divided into two types:

1. Pre-pruning

Pre-pruning is known as Early Stopping Rules. In this method, the growth of the decision tree stops at an early stage. Here the subtree construction is halted at a particular node after calculating Gini Impurity or the Information Gain. Suppose that the example of pruning conditions has information gain(Attr)> mingain or treeDepth == MaxDepth. If it satisfies this condition then we need to prune the subtree. It will replace the decision node with a leaf node else we will continue in building the tree using the decision tree algorithm.

Another method is to stop the overfitting and also stop the tree-building process early before it starts producing leaves with very small samples. This is known as early stopping or also pre-pruning decision trees.

This process has the advantage of being faster and more efficient. It also avoids generating overly complex subtrees which overfit the training data. 

2. Post-pruning

This method prune after the tree has been built.  We can grow according to our decision tree algorithm and then prune subtrees using a bottom-up fashion. Here we will start from the bottom decision node and then based on  Gini Impurity or Information Gain, we will decide whether to keep it or replace it with a leaf node.

Suppose for example we need to prune out the subtree that has the least information gain. While deciding the leaf node, we want to know what leaf our decision tree algorithm would have created if it didn’t create this decision node.

Cost-complexity pruning

It falls under the post-pruning category. It works on calculating a Tree Score based on the Residual Sum of Squares (RSS) for the subtree, and a Tree Complexity Penalty.

Here, Tree Complexity Penalty is the function of the number of leaves in the subtree.  Tree Score is defined as  RSS+aT where a is alpha which is a hyperparameter we find using cross-validation, and T is the number of leaves in the subtree.

Calculate the Tree Score for all the subtrees in the decision tree and then compare it and pick the lowest tree score. 
Therefore we can also observe from the equation that the value of alpha also determines the choice of the subtree because alpha value uses cross-validation. This process gets repeated until the different values of alpha gives us the sequence of trees.
The value of alpha that on average gives the lowest Tree Score is the final value of alpha. Finally, our pruned decision tree will be a tree corresponding to the final value of alpha.