Linear Regression, Part 1 - Cost Functions

Tue December 14, 2021
machine-learning linear-regression cost-functions

The objective of regression is to fit a line through a given data set. Hence, if we are given a set of points;

$$ (x_1, y_1), (x_2, y_2), \dots, (x_i, y_i), \dots ,(x_n, y_n)$$

we want to fit the line $\hat{y} = a_0 + a_1 x$ through the given set of data. The equation $ a_0 + a_1 x$ is known as the hypothesis function.

linear regression

It would be trivial if a line could exactly represent the data provided. Usually, the points will be scattered. Some distance will be present between the line that fits best and the actual data. This distance is the error, also called the residual. It represents the difference between the observed values and the predicted values, that is, for any point $i$,

$$(Residual)_i$$ $$ = \hat{y}_i-y_i$$
$$= y_i -(a_0 + a_1 x_i)$$
$$= y_i - a_0 - a_1 x_i$$

The objective is to make the aggregation of residuals as small as possible. This is where the concept of the cost function comes in. The cost function collects all the residuals as a single number and, therefore, measures how well the model is performing. Several cost functions are used; we will examine the three most popular in this post.

Mean Error

This method averages all residuals and is, therefore, the most basic of the cost functions;

$$ME = \frac{\sum_{i=1}^{n} \hat{y}_i-y_i}{n}$$

It is somewhat limited as negative value residuals eliminate positive value residuals. Therefore it is possible to obtain a Mean Error of zero for a terrible approximation.

This method is hence seldom used in practice.

Mean Absolute Error

A logical next step to counter for the deficiencies of the Mean Error calculation is to make all residuals positive by considering the absolute value of the residuals;

$$ME = \frac{\sum_{i=1}^{n} | \hat{y}_i-y_i | }{n}$$

Mean Absolute Error is used where the data being considered contains outliers.

Mean Square Error

Another alternative to counteract the limitations of Mean error is to square the values of the residuals;

$$MSE = \frac{\sum_{i=1}^{n} ( \hat{y}_i-y_i )^{2} }{n}$$

Mean Square Error diminishes the effect of negligible residuals. Still, it also amplifies the other residuals, which can be small in some instances. This method is therefore not recommended when the data contains outliers.

Root Mean Square Error

The Root Mean Square Error is typically used for regression problems when the data does not contain much noise. Similarly to MSE, RMSE will give a higher weight to large residuals.

$$RMSE = \sqrt{ \frac{\sum_{i=1}^{n} ( \hat{y}_i-y_i )^{2} }{n} }$$

Notes about Azure ML, Part 10 - An end-to-end AzureML example; Model Optimization

Creation and execution of an AzureML Model Optimization Experiment
machine-learning azure ml hyperparameter tuning model optimization

Notes about Azure ML, Part 9 - An end-to-end AzureML example Pipeline creation and execution

Creation and execution of a multi-step AzureML pipeline the selects the best model for a given dataset.
machine-learning azure ml pipeline

Notes about Azure ML, Part 8 - An end-to-end AzureML example; Workspace creation and data upload

In this first post of a series that will cover an end-to-end machine learning project in Azure Ml we will look at how to create a workspace and upload data to it.
machine-learning azure ml dataset datastore
comments powered by Disqus

machine-learning 25 python 21 fuzzy 14 hugo_cms 11 azure-ml 10 linear-regression 10 gradient-descent 9 type2-fuzzy 8 type2-fuzzy-library 8 type1-fuzzy 5 cnc 4 dataset 4 datastore 4 it2fs 4 excel 3 paper-workout 3 r 3 c 2 c-sharp 2 experiment 2 iot 2 programming 2 robotics 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 embedded 1 fuzzy-logic 1 game 1 hyperparameter-tuning 1 javascript 1 learning 1 mathjax 1 maths 1 model-optimization 1 mxchip 1 pandas 1 pipeline 1 random_walk 1 roc 1 tools 1 vscode 1 wsl 1