Cost Function in Linear Regression

After training our model, we need to see how well our model is performing. While accuracy functions tell us how well the model is performing, they do not provide us with an insight on how to better improve them. Hence, we need a correctional function that can help us compute when the model is the most accurate, as we need to hit that small spot between an undertrained model and an overtrained model. 

A Cost Function is used to measure just how wrong the model is in finding a relation between the input and output. It tells you how badly your model is behaving/predicting

Linear Regression Cost Function Formula

Suppose that there is a Linear Regression model that uses a straight line to fit the model. This is done by a straight line equation. 

y = mx+c

In the equation,we can see that two entities can have changeable values (variable) a, which is the point at which the line intercepts the x-axis, and b, which is how steep the line will be, or slope.At first, if the variables are not properly optimized, you get a line that might not properly fit the model. As you optimize the values of the model, for some variables, you will get the perfect fit. The perfect fit will be a straight line running through most of the data points while ignoring the noise and outliers.

A properly fit Linear Regression model looks as shown below : 

For the Linear regression model, the cost function will be the minimum of the Root Mean Squared Error of the model, obtained by subtracting the predicted values from actual values. The cost function will be the minimum of these error values.

Cost Function(J)=\frac{1}{n}\sum_{i=0}^{n}(h_{\Theta} (x^{i})-y^{i})^{2}

By the definition of gradient descent, you have to find the direction in which the error decreases constantly. This can be done by finding the difference between errors. The small difference between errors can be obtained by differentiating the cost function and subtracting it from the previous gradient descent to move down the slope.

Gradient Descent(\Theta_{j})=\Theta_{j}-a\frac{\partial J }{\partial\Theta}

In the above equations, a is known as the learning rate. It decides how fast you move down the slope. If alpha is large, you take big steps, and if it is small; you take small steps. If alpha is too large, you can entirely miss the least error point and our results will not be accurate. If it is too small it will take too long to optimize the model and you will also waste computational power. Hence you need to choose an optimal value of alpha.                                                           

Goal of Cost Function

The goal of Cost Function in Machine Learning is to start on a random point and find the global minimum point where the slope of the curve is almost zero. The gradient at a point is the vector of partial derivates, where the direction represents the greatest rate of increase of the function. So starting at a point on the surface, to move towards the minimum we should move in the negative direction of the gradient at that point.

To train the model we actually predict the new value for given independent features, however, those features have some real value in datasets. In Regression, if the model's predicted value is closer to the corresponding real value will be the optimal model. Cost function measures how close predicted with respect to real value. The gradient Descent method will be used to minimize the cost function.