Liner Regression, Part 2 - Deriving the Univariate case

Mon December 20, 2021
machine-learning linear-regression gradient-descent python

This post is a continuation of a previous post where the cost functions used in linear regression scenarios are used. We will start by revisiting the mean square error (MSE) cost function;

$$MSE = \frac{\sum_{i=1}^{n} ( \hat{y}_i-y_i )^{2} }{n}$$

which, as explained in the previous post, is

$$MSE = \frac{\sum_{i=1}^{n} (y_i-a_0-a_1 x_i)^{2} }{n}$$

The objective is to adjust $a_0$ and $a_1$ such that the MSE is minimized. This is achieved by deriving the MSE with respect to $a_0$ and $a_1$, and finding the minimum case by equating to zero.

$$\frac{\partial MSE}{\partial a_0} = 0$$

and

$$\frac{\partial MSE}{\partial a_1} = 0$$

Now,

$$\frac{\partial MSE}{\partial a_0} = \frac{\sum_{i=1}^{n} 2( y_i-a_0-a_1 x_i )(-1) }{n}$$

$$ = \frac{2}{n} \sum_{i=1}^{n} -y_i+a_0+a_1 x_i $$

At minimum, $\frac{\partial MSE}{\partial a_0} = 0$, i.e.

$$\frac{2}{n} \sum_{i=1}^{n} -y_i+a_0+a_1 x_i = 0 $$

$$\sum_{i=1}^{n} - y_i+a_0+a_1 x_i = 0 $$

$$-\sum_{i=1}^{n} y_i + \sum_{i=1}^{n} a_0 + \sum_{i=1}^{n} a_1 x_i = 0 $$

$$\sum_{i=1}^{n} a_0 + \sum_{i=1}^{n} a_1 x_i = \sum_{i=1}^{n} y_i$$

or

$$ n a_0 + a_1 \sum_{i=1}^{n} x_i = \sum_{i=1}^{n} y_i$$

Similarly,

$$\frac{\partial MSE}{\partial a_1} = \frac{\sum_{i=1}^{n} 2( y_i-a_0-a_1 x_i )(-x_i) }{n}$$

$$ = \frac{2}{n} \sum_{i=1}^{n} ( y_i-a_0-a_1 x_i )(-x_i)$$

$$ = \frac{2}{n} \sum_{i=1}^{n} -x_i y_i + a_0 x_i + a_1 x_i^2 $$

At minimum, $\frac{\partial MSE}{\partial a_1} = 0$, i.e.

$$\frac{2}{n} \sum_{i=1}^{n} -x_i y_i + a_0 x_i + a_1 x_i^2 = 0 $$

$$\sum_{i=1}^{n} -x_i y_i + a_0 x_i + a_1 x_i^2 = 0 $$

$$\sum_{i=1}^{n} -x_i y_i + a_0 x_i + a_1 x_i^2 = 0 $$

$$ - \sum_{i=1}^{n} x_i y_i + \sum_{i=1}^{n} a_0 x_i + \sum_{i=1}^{n} a_1 x_i^2 = 0 $$

$$\sum_{i=1}^{n} a_0 x_i + \sum_{i=1}^{n} a_1 x_i^2 = \sum_{i=1}^{n} x_i y_i $$

This can be written in matrix form as

$ \begin{pmatrix} n & \sum_{i=1}^{n} x_i \\
\sum_{i=1}^{n} x_i & \sum_{i=1}^{n} x_i^2 \end{pmatrix} $ $ \begin{pmatrix} a_0 \\
a_1 \end{pmatrix} = $ $ \begin{pmatrix} \sum_{i=1}^{n} y_i \\
\sum_{i=1}^{n} x_i y_1 \end{pmatrix} $

This can be solved using Cramer’s rule. $$ a_0 = \frac { \begin{vmatrix} \sum_{i=1}^{n} y_i & \sum_{i=1}^{n} x_i\\
\sum_{i=1}^{n} y_i x_i & \sum_{i=1}^{n} x_i^2 \end{vmatrix} }{\sum_{i=1}^{n} n x_i^2 - (\sum_{i=1}^{n} x_i)^2} $$

$$=\frac{\sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i - \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i x_i}{\sum_{i=1}^{n} n x_i^2 - (\sum_{i=1}^{n} x_i)^2} $$

Similarly,

$$ a_1 = \frac { \begin{vmatrix} n & \sum_{i=1}^{n} y_i\\
\sum_{i=1}^{n} x_i & \sum_{i=1}^{n} x_i y_i \end{vmatrix} }{\sum_{i=1}^{n} n x_i^2 - (\sum_{i=1}^{n} x_i)^2} $$

$$=\frac{n \sum_{i=1}^{n} x_i y_i - \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i}{\sum_{i=1}^{n} n x_i^2 - (\sum_{i=1}^{n} x_i)^2} $$

$$ =\frac{ \sum_{i=1}^{n} x_i y_i - n\bar{x}\bar{y}}{\sum_{i=1}^{n} x_i^2 - n\bar{x}^2}$$

We also note that as,

$$ n a_0 + a_1 \sum_{i=1}^{n} x_i = \sum_{i=1}^{n} y_i$$

$$ n a_0 = \sum_{i=1}^{n} y_i - a_1 \sum_{i=1}^{n} x_i$$

$$ a_0 = \frac{\sum_{i=1}^{n} y_i}{n} - a_1 \frac{\sum_{i=1}^{n} x_i}{n}$$

$$ = \bar{y} - a_1 \bar{x}$$




Logistic Regression

Derivation of logistic regression
machine-learning

Notes about Azure ML, Part 11 - Model Validation in AzureML

March 9, 2023
machine-learning azure ml hyperparameter tuning model optimization

Paper Implementation - Uncertain rule-based fuzzy logic systems Introduction and new directions-Jerry M. Mendel; Prentice-Hall, PTR, Upper Saddle River, NJ, 2001,    555pp., ISBN 0-13-040969-3. Example 9-4, page 261

October 8, 2022
type2-fuzzy type2-fuzzy-library fuzzy python IT2FS paper-workout
comments powered by Disqus


machine-learning 27 python 21 fuzzy 14 azure-ml 11 hugo_cms 11 linear-regression 10 gradient-descent 9 type2-fuzzy 8 type2-fuzzy-library 8 type1-fuzzy 5 cnc 4 dataset 4 datastore 4 it2fs 4 excel 3 paper-workout 3 r 3 c 2 c-sharp 2 experiment 2 hyperparameter-tuning 2 iot 2 model-optimization 2 programming 2 robotics 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 embedded 1 fuzzy-logic 1 game 1 javascript 1 learning 1 mathjax 1 maths 1 mxchip 1 pandas 1 pipeline 1 random_walk 1 roc 1 tools 1 vscode 1 wsl 1