You should never just run a regression without having a good look at your data because simple linear regression has quite a few shortcomings:
- It is sensitive to outliers.
- It models linear relationships only.
- A few assumptions are required to make the inference.
These phenomena can be best explained by the Anscombe's Quartet, shown below:
As we can see, all the four linear regression are exactly the same. But there are some peculiarities in the data sets that have fooled the regression line. While the first one seems to be doing a decent job, the second one clearly shows that linear regression can only model linear relationships and is incapable of handling any other kind of data. The third and fourth images showcase the linear regression model's sensitivity to outliers. Had the outlier not been present, we could have got a great line fitted through the data points. So, we should never run a regression without having a good look at our data.
Comments