Decision Tree: Information Gain

As we know the concept of entropy plays a very important role in calculating the information gain.

Information gain is totally based on the Information theory. Information gain is defined as the measure of how much information is provided by the class. It helps us to determine the order of attributes in the node of the decision tree. It can also be used in determining how good the splitting of nodes is in a decision tree is. It also measures the reduction in entropy used for splitting a dataset according to a given value of a random variable.

Higher Information gain gives lower entropy groups of samples. It aims in decreasing the level of entropy starting from the beginning from the root node to the leaf nodes. It computes the difference between entropy before and after the split and specifies the impurity in-class elements.

Information Gain Formula

Information Gain = Entropy (before splitting) - Entropy (after splitting)

Entropy is defined as

$Entropy(S)=-\sum_{i=1}^{n}p_{i} log_{2}(p_{i})$

where p_iis the probability of data from the subset of the given dataset. Generally, it is not preferred as the "log" function results in computational complexity.

Some basic features of Information Gain:

It is non-negative.
It is the basic feature to decide whether the feature should be used in splitting the node or not. The feature with the optimal split having the highest value of information gain is used at a node of the decision tree is used as the feature for splitting the node.
It also determines the uncertainty after splitting the dataset on the particular features such as if the information gain value increases then that feature is more useful for classification.
The feature having the highest information gain is considered as the best feature to be chosen for the split.
It can work with both continuous and discrete variables.
This concept falls under the C4.5 algorithm for generating the decision trees and selecting the optimal split for a decision tree node.

Steps to follow for building a decision tree using information gain:

Calculate the Information gain and entropy for each attribute.
Now compare all the attributes and then the attribute with the highest value from a set should be selected as root nodes.
Then build a child node for each value of the attribute.
Now repeat iteratively until the construction of the tree is finished.