Here we will see the step by step calculation of Linear Regression with the help of an example.
Let's suppose that we are provided with the following data.
x | y |
---|---|
1 | 3 |
2 | 1 |
3 | 5 |
4 | 2 |
5 | 6 |
6 | 9 |
7 | 10 |
8 | 4 |
9 | 8 |
Plot all these data points on the graph.
Now, our goal is to draw the regression line which passes through all these points with the least error.
Step-1: Calculate the mean of the x and y-axis.
Note: The line of regression will pass through the mean of x & y always.
Mean of x-axis=5.0
Mean of y-axis=5.34
Now we will draw a point where these two values will intersect. Thus this is the point from where our line of regression will pass.
Here, the blue point is the mean value of the x and y-axis.
Step-2: Calculate the value of m
As we know the equation of a line is y=mx+c. Here
where (x-x̄) is nothing but the distance of all the points from x=5.
( y-ȳ)s nothing but the distance of all the points from y=5.34.
x | y | x-x̄ | y-ȳ | (x-x̄)(y-ȳ) | (x-x̄)2 |
---|---|---|---|---|---|
1 | 3 | -4 | -2.34 | 9.34 | 16 |
2 | 1 | -3 | -4.34 | 13 | 9 |
3 | 5 | -2 | -0.34 | 0.67 | 4 |
4 | 2 | -1 | -3.34 | 3.34 | 1 |
5 | 6 | 0 | 0.67 | 0 | 0 |
6 | 9 | 1 | 3.67 | 3.67 | 1 |
7 | 10 | 2 | 4.67 | 4.67 | 4 |
8 | 4 | 3 | -1.34 | -1.34 | 9 |
9 | 8 | 4 | 2.67 | 2.67 | 16 |
Then,
Therefore,
m=0.6
Step-3: Calculate the value of c
Equation of line, y=mx+c.
From the above calculations, we got
x=5, y=5.34 and m=0.6
Therefore,
5.34=5*0.6+c
c=2.34
Step-4: Find the equation of Regression Line using Equation of a line
From all the above-calculated values, we can conclude that our regression line would intersect the y-axis at 2.94 and it will also cross the point (5,5.34).
Therefore, the equation of the Regression line would be
y=0.6x+2.34.
Step-5: We will check whether the variables are dependent on independent variables or not for this process firstly we will predict the values of y.
For m=0.6,
c=2.34,
y=0.6x+2.34,
Then calculate the R2 for the given data Here R2 is the goodness of fit.
x | y | y-ȳ | yp | yp-ȳ | (y-ȳ)2 | (yp-ȳ)2 |
---|---|---|---|---|---|---|
1 | 3 | -2.34 | 2.94 | -2.4 | 5.4 | 5.76 |
2 | 1 | -4.34 | 3.54 | -1.8 | 18.8 | 3.24 |
3 | 5 | -0.34 | 4.14 | -1.2 | 0.11 | 1.44 |
4 | 2 | -3.34 | 4.74 | -0.6 | 11.1 | 0.36 |
5 | 6 | 0.67 | 5.34 | 0 | 0.44 | 0 |
6 | 9 | 3.67 | 5.94 | 0.6 | 13.4 | 0.36 |
7 | 10 | 4.67 | 6.54 | 1.2 | 21.8 | 1.44 |
8 | 4 | -1.34 | 7.14 | 1.8 | 1.7 | 3.24 |
9 | 8 | 2.67 | 7.74 | 2.4 | 7.1 | 5.76 |
R2 =0.27
Here, R2 tends towards 0 in this case we can say that independent variables are not at all related to dependent variables. More the values of R2 more will be the dependency. If we increase the value of R2 the error will decrease.
Here green line represents the line of regression.
Comments