Liner Regression, Part 4 - The Multi-variable scenario

Wed December 22, 2021
machine-learning linear-regression gradient-descent python

Introduction

In previous posts we discussed the univariate linear regression model and how we can implement the model in python.

We have seen how we can fit a line, ˆy=a0+a1x, to a dataset of given points, and how linear regression techniques estimate the values of a0 and a1 using the cost functions. We have seen that the residual is the difference between the observed values and the predicted values, that is, for any point i,

ei=yi^yi

We have looked at the Mean Square Error, the sum of the squared residuals divided by the number of points; hence our objective is to make the aggregation of residuals as small as possible.

argmina0,a1ni=1(yia0a1xi)2n

we have seen that when we differentiate the cost function with respect to a0 and a1,

a0=ˉya1ˉx

and

a1=ni=1xiyinˉxˉyni=1x2inˉx2

Multi-variable Case

Most real-world problems have multiple features, and therefore our approximation is a hyperplane, which is a linear combination of the features, expressed as

ˆy=a0+a1x1+a2x2++anxn

Hence if we define, Y=(y1y2yn)

X=(1x11x12x1m1x21x22x2m1xn1xn2xnm)

β=(a0a1an)

then,

ˆY=Xβ

the residuals

E=(e1e2en)=(y1ˆy1y2ˆy2ynˆyn)= YˆY

We will here introduce the residual sum-of-squares cost function, which is very similar to the mean square error cost function, but it is defined as

RSS=ni=1e2i

We have noticed in the previous cases that the effect of considering the mean is eliminated during the derivation of the cost function and equating to zero.

we also notice that

RSS=ETE=(YˆY)T(YˆY)=(YXβ)T(YXβ)=YTYYTXβTXTY+βTXTXβ

Matrix Differentiation

Before we continue, we will first remind ourselves of the following:

If we are given two independent matrices x, and A, where x is an m by 1 matrix and A is an n by m matrix, then;

for y=A dydx=0,

for y=Ax dydx=A,

for y=xA dydx=AT,

for y=xTAx dydx=2xTA,

Hence, differentiating the cost function with respect to β,

RSSβ=0YTX(XTY)T+2βTXTX=YTXYTX+2βTXTX=2YTX+2βTXTX

for minimum RSS, RSSβ=0, hence

2βTXTX=2YTXβTXTX=YTXβT=YTX(XTX)1

and therefore

β=(XTX)1XTY

Two-variable case equations

For the scenario where we have only 2 features, so that ˆy=a0+a1x1+a2x2, we can obtain the following equations for the parameters a0, a1 and a2:

a1=ni=1X22ini=1X1iyini=1X1iX2ini=1X2iyini=1X21ini=1X22i(ni=1X1iX2i)2

a2=ni=1X21ini=1X2iyini=1X1ix2ini=1X1iyini=1X21ini=1X22i(ni=1X1iX2i)2

and

a0=ˉYa1ˉX1a2ˉX2

where ni=1X21i=ni=1x21ini=1x21in

ni=1X21i=ni=1x21ini=1x21in

ni=1X1iyi=ni=1x1ini=1yini=1x1ini=1yin

ni=1X2iyi=ni=1x2ini=1yini=1x2ini=1yin

ni=1X1iX2i=ni=1x1ini=1x2ini=1x1ini=1x2in

It is evident that finding the parameters becomes more difficult as we add more features.




Logistic Regression

Derivation of logistic regression
machine-learning

Notes about Azure ML, Part 11 - Model Validation in AzureML

March 9, 2023
machine-learning azure ml hyperparameter tuning model optimization

Paper Implementation - Uncertain rule-based fuzzy logic systems Introduction and new directions-Jerry M. Mendel; Prentice-Hall, PTR, Upper Saddle River, NJ, 2001,    555pp., ISBN 0-13-040969-3. Example 9-4, page 261

October 8, 2022
type2-fuzzy type2-fuzzy-library fuzzy python IT2FS paper-workout


machine-learning 27 python 21 fuzzy 14 azure-ml 11 hugo_cms 11 linear-regression 10 gradient-descent 9 type2-fuzzy 8 type2-fuzzy-library 8 type1-fuzzy 5 cnc 4 dataset 4 datastore 4 it2fs 4 excel 3 paper-workout 3 r 3 c 2 c-sharp 2 experiment 2 hyperparameter-tuning 2 iot 2 model-optimization 2 programming 2 robotics 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 embedded 1 fuzzy-logic 1 game 1 javascript 1 learning 1 mathjax 1 maths 1 mxchip 1 pandas 1 pipeline 1 random_walk 1 roc 1 tools 1 vscode 1 wsl 1