Decision Tree: Entropy

Entropy is one of the important aspects of Machine Learning. Entropy is the measure of uncertainty of the random variables and the goal of the machine learning models and data scientists is to reduce these uncertainties. The higher entropy contains more information content. It measures the “amount of information” present in a variable. 

In 1948, Claude E. Shannon,  who was a mathematician and an electrical engineer, wrote a paper on ' A Mathematical Theory of Communication ' in which he mentioned the issue of measure of information and uncertainty. He was also known as the 'father of information theory ' as he invented the theory of information. 

"Information theory is the mathematical approach for studying the code of information along with the quantification, storage, and communication of information.”

As per the information theory, the entropy of a random variable is the average level of “information“  or “uncertainty” inherent in the variable’s possible outcomes. That is, the more certain or the more deterministic an event is, the less information it will contain. 

Entropy is used for building an appropriate decision tree by selecting the best split. It is also defined as the measure of the purity of the sub-split. It always lies between  0 to 1. It can be calculated by this formula.

Entropy=-\sum_{i=1}^{n} p_{i} log_{2} p_{i}

Then algorithm calculates the entropy for each feature before and after each and every split, it selects the best feature and starts splitting according to it. 

Steps to Calculate Entropy

Consider that there is a class in a given dataset The entropy can be calculated by using the formula:

Entropy=-\sum_{i=1}^{n} p_{i} log_{2} p_{i} 

where pi is the probability of randomly selecting. Let us suppose that there are three colours balloons i.e. yellow, red and orange in the given dataset. If we have two yellow, three red and three orange balloons then our equation will be 

Entropy = −(pylog2py+prlog2pr+polog2po)

where py, pr, pare probabilities of choosing yellow, red or orange colours balloons.

Thus,  We have

 p_{y}=\frac{2}{8},​​​

 p_{r}=\frac{3}{8}and

p_{o}=\frac{3}{8}

Now, the equation is 

Entropy=-(\frac{2}{8}log_{2}(\frac{2}{8})+\frac{3}{8}log_{2}(\frac{3}{8})+\frac{3}{8}log_{2}(\frac{3}{8}))

and entropy is 0.41

Note: When the dataset has no class or all the observation belongs to the same class then entropy is 0. This type of dataset does not have any kind of impurity. Hence learning of this kind of dataset is not required.