Decision Tree: Variance Reduction

Variance Reduction is used when we have a "Continuous Target Variable". 

variance reduction in variance

It uses the standard formula of variance which is generally used for statistics. This formula is used to select the best split. Here the variance measure is used to decide the feature on which node is split into child nodes.

In the case of variance the lower the value of variance higher the purity of the node and the higher the value of variance more impure node, we will get.

  • It is used for calculating the homogeneity of the node.
  • It is generally used in regression problems. 

Steps to calculate the variance:

  1. Calculate the variance for each child node using the below given standard formula.variance reduction in variance
  2. Then calculate the variance for each split which will work as a weighted average variance of child nodes.
  3. Here we will select the split which has the lowest variance value. These steps will continue until the homogeneous node is achieved.