Decision Tree Splitting Criteria

Decision Trees are one of the most popular machine learning algorithms which are most commonly used and for data science beginners it is very important to understand the working of Decision Trees. 

Decision Trees generally decides by splitting nodes into sub-nodes. The process is performed multiple times many times while training the machine until homogenous nodes are only left. Thus, everyone should be aware of node splitting. 

Node Splitting is the process that divides nodes into various sub-nodes to create pure nodes relatively.

This can be done in 4 methods which are divided into two categories based on the type of target variable.

Continuous Target Variable

Decision Tree Splitting Method #1: Reduction in Variance

This method is used for splitting nodes when the given target variable is continuous. It uses variance as a measure because it decides the condition on which nodes are split into sub-nodes known as child nodes.
Variance is genrally used for calculationg the homogenous nature of node. 

Variance = \frac{\sum \left ( X-\mu \right )^{2} }{N}

These steps are followed  for splitting a decision tree using this method:

  • Firstly calculate the variance for each child node.
  • Then calculate the variance of each split as the weighted average variance for child nodes.
  • Compare all the varaince and then select the split whose variance is the lowest.
  • And then again follow the above 3 steps until you achieve homogeneous nodes.

Note: If the node gets entirely homogeneous, then the variance is equal to zero.

Categorical Target Variable

Decision Tree Splitting Method #2: Information Gain

It is used to split the nodes when the given target variable is categorical. It follows the concept of entropy.  Entropy is used in calculationg the purity of the given node. It is inversly proportional to the purity of the node i.e. lower the value of entropy higher is the purtiy of the node and vice versa.

IG=1-entropy

Note: As we subtract entropy from 1 the IG is higher for the pure node with the max value of 1. The entropy for the homogeneous node is zero.

Formula for Entropy

entropy = - \sum_{i=1}^{n}p_{i}log_{2}p_{i}

Steps to split a decision tree using Information Gain:

  • Firstly calculate the entropy of each child node for each split.
  • Then calculate the entropy of each split as weighted average entropy for child nodes.
  • Compare all the entropy or IG and then select the split whose entropy is lowest and highest information gain.
  • And then again follow the above 3 steps until you achieve homogeneous nodes.

Decision Tree Splitting Method #3: Gini Impurity

This method is used for splitting the nodes when the target variable is categorical.  It is the simplest and most popular method which can be used for splitting the decision tree.

Gini is the type probability of correctly labelling the randomly chosen elements if it was randomly labelled according to the distribution of labels in the node.

Formula for Gini:

gini= \sum_{i=1}^{n} p_{i}^{2}

The Gini Impurity value is:

gini impurity=1-gini       i.e.,

giniimpurity= 1-\sum_{i=1}^{n} p_{i}^{2}

It is inversely proportional to the homogeneous nature of the node i.e. lower the value of Gini Impurity higher is the homogeneous nature of the node and vice versa.

Steps to split a decision tree using Gini Impurity:

  • Firstly calculate the Gini Impurity of each child node for each split.
  • Then calculate the Gini Impurity of each split as weighted average Gini Impurity for child nodes.
  • Compare all the Gini Impurity and then select the split whose Gini Impurity is the lowest.
  • And then again follow the above 3 steps until you achieve homogeneous nodes.

Decision Tree Splitting Method #4: Chi-Square

It is used for splitting nodes when the target variable is categorical. It works on the concept of statistical significance of differences between the parent node and child nodes. 

Chi-Square=\sqrt{\frac{\left ( actual-expected \right )^{2}}{expected}}

In the formula, Expected implies the Expected value of a class for which child nodes are based on the distribution of class in the parent node. Actual is the actual value for a class in the child node.

From the formula, we get the Chi-Square for a class. Now take the sum of Chi-Square value for all the given classes of a node and then calculate Chi-Square for that node.

The value will show the difference between parent and child nodes, i.e. Higher the value higher will be differences between parent and child nodes then homogentiy will also be higher.

Steps to split a decision tree using Chi-Square:

  • Firstly individually calculate the Chi-Square of each child node for each split by taking the sum of Chi-Square values for each class in a node.
  • Then calculate the Chi-Square value of each split as the sum of Chi-Square values for child nodes.
  • Compare all the  Chi-Square and then select the split whose  Chi-Square is highest.
  • And then again follow the above 3 steps until you achieve homogeneous nodes.