The major difference between R-squared and adjusted R-squared is that R-squared does not penalise the model for having a higher number of variables. Thus, if you keep on adding variables to the model, the R-squared value will always increase (or remain the same in case the value of the correlation between that variable and the dependent variable is zero). Thus, R-squared assumes that any variable added to the model will increase the predictive power.
Adjusted R-squared, on the other hand, penalises models based on the number of variables present in it. Its formula is given as:
Adj. R2=1−(1−R2)(N−1) / N−k−1
where 'N' is the number of datapoints and 'k' is the number of features.
So, if you add a variable and the adjusted R-squared drops, you can be certain that that variable is insignificant to the model and, hence, should not be used. Thus, in the case of multiple linear regression, you should always look at the adjusted R-squared value in order to keep redundant variables out of your regression model.
Comments