## Linear Regression, Part 3 - Univariate Solution Implementation in Python

##### Tue December 21, 2021

This post continues from the derivation of the univariate linear regression model as explained in the previous post. Here we will use the equations derived and the in practice to implement the model.

### Univarite Function

We start this discussion by considering the function used in this post. The function that we will use is

$$y = 2x + 15 + \xi$$

Where $\xi$ is a random variable that will introduce noise to the data. The data is generated using the following code.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# x between 0 and 100 in steps of 1
x = np.arange(0, 101, 1)

# generate a noisy line
l = (2*x) + 15

random_multiplier = 5
e = np.random.randn(len(x))*random_multiplier
y = l + e

# plot the data
plt.plot(x, y)
plt.plot(x, l, '--')
plt.xlim([0, 101])
plt.ylim([0, 200])
plt.show()

# save the data to a csv file



Note that the data is saved in a CSV file to use subsequently.

### Finding the Linear Regression Coefficients

In the previous post we discussed how if we are given a set of points;

$$(x_1, y_1), (x_2, y_2), \dots, (x_i, y_i), \dots ,(x_n, y_n)$$

It is possible to fit the line

$$\hat{y} = a_0 + a_1 x$$

through the data such that:

$$a_0 = \frac{\sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i - \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i x_i}{n \sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}$$

and

$$a_1 = \frac{n \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i}{n \sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}$$

We notice that the equations are made of several summations, and it might be helpful to list the most important ones out.

$$\sum_{i=1}^{n} x_i$$
$$\sum_{i=1}^{n} y_i$$
$$\sum_{i=1}^{n} y_i x_i$$
$$\sum_{i=1}^{n} x_i^2$$

We notice that terms such as $n \sum_{i=1}^{n} x_i^2$ or $\sum_{i=1}^{n} x_i^2$ can be generated from the functions listed above.

We can therefore implement the equations above in Python as follows. The relevant information is stored in a pandas dataframe to make it easier to access.

import pandas as pd

# import data from csv
data.columns=['x', 'y']

# add new columns required to solve the problem
data['x_sq'] = data['x']**2
data['xy'] = data['x']*data['y']

# calculate the sums of the data
sum_x = data['x'].sum()
sum_y = data['y'].sum()
sum_x_sq = data['x_sq'].sum()
sum_xy = data['xy'].sum()

n = len(data)
print(f'sum_x: {sum_x}, sum_y: {sum_y}, sum_x_sq: {sum_x_sq}, sum_xy: {sum_xy}, n: {n}')

# calculate the slope and intercept
a_0 = (sum_x_sq*sum_y - sum_x*sum_xy)/(n*sum_x_sq - sum_x**2)

a_1 = (n*sum_xy - sum_x*sum_y)/(n*sum_x_sq - sum_x**2)

print(f'a_0: {a_0}, a_1: {a_1}')


The coefficients obtained depend on the noise available in the data. With the code above, we can see that the coefficients are around $a_0 = 15$ and $a_1 = 2$.

#### Paper Implementation - Uncertain rule-based fuzzy logic systems Introduction and new directions-Jerry M. Mendel; Prentice-Hall, PTR, Upper Saddle River, NJ, 2001,    555pp., ISBN 0-13-040969-3. Example 9-4, page 261

##### October 8, 2022
type2-fuzzy type2-fuzzy-library fuzzy python IT2FS paper-workout

#### Notes about Azure ML, Part 10 - An end-to-end AzureML example; Model Optimization

##### Creation and execution of an AzureML Model Optimization Experiment
machine-learning azure ml hyperparameter tuning model optimization

#### Notes about Azure ML, Part 9 - An end-to-end AzureML example Pipeline creation and execution

##### Creation and execution of a multi-step AzureML pipeline the selects the best model for a given dataset.
machine-learning azure ml pipeline