Logistic Regression

Mon April 10, 2023
machine-learning

Logistic regression is a machine learning algorithm that is commonly used for binary classification tasks.

Given a feature vector XRnx, the goal of logistic regression is to predict the probability ˆy that a binary output variable y takes the value 1, given X, that is ˆy=P(y=1|X), 0y1. For example, in the case of image classification, logistic regression can be used to predict the probability that an image contains a cat.

Logistic regression

The logistic regression model consists of three main components:

The weight vector ω and the bias term b are learned from a labelled training set by minimizing a suitable loss function using techniques such as gradient descent or its variants. Once trained, the logistic regression model can be used to predict the probability of the binary output variable for new input examples.

Sigmoid function

The feedforward process for logistic regression can be described as follows:

To compute the derivatives, we use the chain rule of derivatives:

Lωi=Lˆyˆyzzωi

Lb=Lˆyˆyzzb

We can then use these derivatives to update the weights as follows:

ωiωiαLωi

and

bbαLb

Where α is the learning rate, which controls the step size of the updates. By iteratively performing these updates on a training set, we can find the optimal weight vector ω that minimizes the loss function on the training set.

The Derivatives

Let’s begin by computing the derivative of the loss function with respect to the predicted output ˆy: Lˆy=ˆy((ylog(ˆy)+(1y)log(1ˆy)))

Using the chain rule, we get:

Lˆy=yˆy+1y1ˆy

Here, we have used the fact that: d loga(x)dx=1x log(a)

The derivative of the predicted output ˆy with respect to z:

ˆyz=zσ(z)=z11+ez

Using the quotient rule, we get: ˆyz=ez(1+ez)2=11+ezez1+ez=ˆy(1ˆy)

Here, we have used the fact that: σ(z)=11+ez

The derivative of z with respect to ωi: zωi=ωiωTx+b=ωi(ω1x1++wixi++wnxn)+b=xi

Similarly, zb=bωTx+b=b(ω1x1++wixi++wnxn)+b=1

Therefore

Lωi=Lˆyˆyzzωi=(yˆy+1y1ˆy)(ˆy(1ˆy))(xi)=(ˆyy)xi

and

Lb=Lˆyˆyzzb=(yˆy+1y1ˆy)(ˆy(1ˆy))(1)=(ˆyy)

Extreme Cases

When the predicted value is 1, i.e., ˆy=1, the derivative of the loss with respect to the predicted output, Lˆy, will be undefined because of the term 1y1ˆy in the equation.

Similarly, in the case where the predicted value is exactly 0, i.e., ˆy=0, the derivative of the loss with respect to the predicted output, Lˆy, will also be undefined because of the term yˆy in the equation.

In these cases, the backpropagation step cannot proceed as usual, since the derivative of the loss function with respect to the predicted output is a required component. One approach to address this issue is to add a small value ϵ to ˆy in the calculation of the loss function, so that the logarithm term is well-defined. This is sometimes called “label smoothing”.

Another approach is to use a modified loss function that does not have this issue, such as the hinge loss used in support vector machines (SVMs). However, it’s worth noting that logistic regression is a popular and effective method for binary classification, and the issue of undefined derivatives is relatively rare in practice.




Notes about Azure ML, Part 11 - Model Validation in AzureML

March 9, 2023
machine-learning azure ml hyperparameter tuning model optimization

Notes about Azure ML, Part 10 - An end-to-end AzureML example; Model Optimization

Creation and execution of an AzureML Model Optimization Experiment
machine-learning azure ml hyperparameter tuning model optimization

Notes about Azure ML, Part 9 - An end-to-end AzureML example Pipeline creation and execution

Creation and execution of a multi-step AzureML pipeline the selects the best model for a given dataset.
machine-learning azure ml pipeline


machine-learning 27 python 21 fuzzy 14 azure-ml 11 hugo_cms 11 linear-regression 10 gradient-descent 9 type2-fuzzy 8 type2-fuzzy-library 8 type1-fuzzy 5 cnc 4 dataset 4 datastore 4 it2fs 4 excel 3 paper-workout 3 r 3 c 2 c-sharp 2 experiment 2 hyperparameter-tuning 2 iot 2 model-optimization 2 programming 2 robotics 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 embedded 1 fuzzy-logic 1 game 1 javascript 1 learning 1 mathjax 1 maths 1 mxchip 1 pandas 1 pipeline 1 random_walk 1 roc 1 tools 1 vscode 1 wsl 1