This post is a continuation of a previous post where the cost functions used in linear regression scenarios are used. We will start by revisiting the mean square error (MSE) cost function;
MSE=∑ni=1(ˆyi−yi)2n
which, as explained in the previous post, is
MSE=∑ni=1(yi−a0−a1xi)2n
The objective is to adjust a0 and a1 such that the MSE is minimized. This is achieved by deriving the MSE with respect to a0 and a1, and finding the minimum case by equating to zero.
∂MSE∂a0=0
and
∂MSE∂a1=0
Now,
∂MSE∂a0=∑ni=12(yi−a0−a1xi)(−1)n
=2nn∑i=1−yi+a0+a1xi
At minimum, ∂MSE∂a0=0, i.e.
2nn∑i=1−yi+a0+a1xi=0
n∑i=1−yi+a0+a1xi=0
−n∑i=1yi+n∑i=1a0+n∑i=1a1xi=0
n∑i=1a0+n∑i=1a1xi=n∑i=1yi
or
na0+a1n∑i=1xi=n∑i=1yi
Similarly,
∂MSE∂a1=∑ni=12(yi−a0−a1xi)(−xi)n
=2nn∑i=1(yi−a0−a1xi)(−xi)
=2nn∑i=1−xiyi+a0xi+a1x2i
At minimum, ∂MSE∂a1=0, i.e.
2nn∑i=1−xiyi+a0xi+a1x2i=0
n∑i=1−xiyi+a0xi+a1x2i=0
n∑i=1−xiyi+a0xi+a1x2i=0
−n∑i=1xiyi+n∑i=1a0xi+n∑i=1a1x2i=0
n∑i=1a0xi+n∑i=1a1x2i=n∑i=1xiyi
This can be written in matrix form as
(n∑ni=1xi∑ni=1xi∑ni=1x2i) (a0a1)= (∑ni=1yi∑ni=1xiy1)
This can be solved using Cramer’s rule. a0=|∑ni=1yi∑ni=1xi∑ni=1yixi∑ni=1x2i|∑ni=1nx2i−(∑ni=1xi)2
=∑ni=1xi∑ni=1yi−∑ni=1xi∑ni=1yixi∑ni=1nx2i−(∑ni=1xi)2
Similarly,
a1=|n∑ni=1yi∑ni=1xi∑ni=1xiyi|∑ni=1nx2i−(∑ni=1xi)2
=n∑ni=1xiyi−∑ni=1xi∑ni=1yi∑ni=1nx2i−(∑ni=1xi)2
=∑ni=1xiyi−nˉxˉy∑ni=1x2i−nˉx2
We also note that as,
na0+a1n∑i=1xi=n∑i=1yi
na0=n∑i=1yi−a1n∑i=1xi
a0=∑ni=1yin−a1∑ni=1xin
=ˉy−a1ˉx