Linear Regression, Part 3 - Univariate Solution Implementation in Python

Tue December 21, 2021
machine-learning linear-regression gradient-descent python

This post continues from the derivation of the univariate linear regression model as explained in the previous post. Here we will use the equations derived and the in practice to implement the model.

Univarite Function

We start this discussion by considering the function used in this post. The function that we will use is

$$y = 2x + 15 + \xi$$

Where $\xi$ is a random variable that will introduce noise to the data. The data is generated using the following code.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# x between 0 and 100 in steps of 1
x = np.arange(0, 101, 1)

# generate a noisy line
l = (2*x) + 15

random_multiplier = 5
e = np.random.randn(len(x))*random_multiplier
y = l + e

# plot the data
plt.plot(x, y)
plt.plot(x, l, '--')
plt.xlim([0, 101])
plt.ylim([0, 200])
plt.show()

# save the data to a csv file
pd.DataFrame(y).to_csv("data.csv", header=False, index=True)

Note that the data is saved in a CSV file to use subsequently.

Finding the Linear Regression Coefficients

In the previous post we discussed how if we are given a set of points;

$$ (x_1, y_1), (x_2, y_2), \dots, (x_i, y_i), \dots ,(x_n, y_n)$$

It is possible to fit the line

$$ \hat{y} = a_0 + a_1 x$$

through the data such that:

$$a_0 = \frac{\sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i - \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i x_i}{n \sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2} $$

and

$$a_1 = \frac{n \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i}{n \sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2} $$

We notice that the equations are made of several summations, and it might be helpful to list the most important ones out.

$$\sum_{i=1}^{n} x_i$$
$$\sum_{i=1}^{n} y_i$$
$$\sum_{i=1}^{n} y_i x_i$$
$$\sum_{i=1}^{n} x_i^2$$

We notice that terms such as $n \sum_{i=1}^{n} x_i^2$ or $\sum_{i=1}^{n} x_i^2$ can be generated from the functions listed above.

We can therefore implement the equations above in Python as follows. The relevant information is stored in a pandas dataframe to make it easier to access.

import pandas as pd

# import data from csv
data = pd.read_csv("data.csv")
data.columns=['x', 'y']

# add new columns required to solve the problem
data['x_sq'] = data['x']**2
data['xy'] = data['x']*data['y']


# calculate the sums of the data
sum_x = data['x'].sum()
sum_y = data['y'].sum()
sum_x_sq = data['x_sq'].sum()
sum_xy = data['xy'].sum()

n = len(data)
print(f'sum_x: {sum_x}, sum_y: {sum_y}, sum_x_sq: {sum_x_sq}, sum_xy: {sum_xy}, n: {n}')

# calculate the slope and intercept
a_0 = (sum_x_sq*sum_y - sum_x*sum_xy)/(n*sum_x_sq - sum_x**2)

a_1 = (n*sum_xy - sum_x*sum_y)/(n*sum_x_sq - sum_x**2)

print(f'a_0: {a_0}, a_1: {a_1}')

The coefficients obtained depend on the noise available in the data. With the code above, we can see that the coefficients are around $a_0 = 15$ and $a_1 = 2$.




Logistic Regression

Derivation of logistic regression
machine-learning

Notes about Azure ML, Part 11 - Model Validation in AzureML

March 9, 2023
machine-learning azure ml hyperparameter tuning model optimization

Paper Implementation - Uncertain rule-based fuzzy logic systems Introduction and new directions-Jerry M. Mendel; Prentice-Hall, PTR, Upper Saddle River, NJ, 2001,    555pp., ISBN 0-13-040969-3. Example 9-4, page 261

October 8, 2022
type2-fuzzy type2-fuzzy-library fuzzy python IT2FS paper-workout
comments powered by Disqus


machine-learning 27 python 21 fuzzy 14 azure-ml 11 hugo_cms 11 linear-regression 10 gradient-descent 9 type2-fuzzy 8 type2-fuzzy-library 8 type1-fuzzy 5 cnc 4 dataset 4 datastore 4 it2fs 4 excel 3 paper-workout 3 r 3 c 2 c-sharp 2 experiment 2 hyperparameter-tuning 2 iot 2 model-optimization 2 programming 2 robotics 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 embedded 1 fuzzy-logic 1 game 1 javascript 1 learning 1 mathjax 1 maths 1 mxchip 1 pandas 1 pipeline 1 random_walk 1 roc 1 tools 1 vscode 1 wsl 1