The assumptions of linear regression are:
- Assumption about the form of the model: It is assumed that there is a linear relationship between the dependent and independent variables. It is known as the ‘linearity assumption’.
- Assumptions about the residuals:
- Normality assumption: It is assumed that the error terms, ε(i), are normally distributed.
- Zero mean assumption: It is assumed that the residuals have a mean value of zero, i.e., the error terms are normally distributed around zero.
- Constant variance assumption: It is assumed that the residual terms have the same (but unknown) variance, σ2 . This assumption is also known as the assumption of homogeneity or homoscedasticity.
- Independent error assumption: It is assumed that the residual terms are independent of each other, i.e., their pair-wise covariance is zero.
- Assumptions about the estimators:
- The independent variables are measured without error.
- The independent variables are linearly independent of each other, i.e., there is no multicollinearity in the data.
If the residuals are not normally distributed, their randomness is lost, which implies that the model is not able to explain the relation in the data. Also, the mean of the residuals should be zero.
Y(i)i = β0+ β1x(i) + ε(i)
This is the assumed linear model, where ε is the residual term.
E(Y) = E(β0+ β1x(i) + ε(i))
= E(β0+ β1x(i) + ε(i))
If the expectation(mean) of residuals, E(ε(i)), is zero, the expectations of the target variable and the model become the same, which is one of the targets of the model. The residuals (also known as the error terms) should be independent, meaning there is no correlation between the residuals and the predicted values, or among the residuals. Any correlation implies that there is some relation that the regression model is not able to identify.
If the independent variables are not linearly independent of each other, the uniqueness of the least squares solution (or normal equation solution) is lost.