Roc Display in Excel

Sat October 3, 2020
machine-learning classifier roc excel

A binary classifier is a function that can be applied to features X to produce a Y value of true (1) or false (0). It is a supervised learning technique; therefore, a test set is extracted from the available data so that the model is validated before deployed in production.

$$ f(x_1, x_2, x_3, \dots, x_n) = Y \in {0, 1} $$

The function will return a value between 0 and 1, and a threshold value is therefore operated to classify the result as a true or false. The model will subsequently classify predictions as true or false according to the threshold value.

Given the above, we can obtain four kinds of results:

An excel sheet has been created to demonstrate these basic concepts. The example in the sheet defines five true and five false samples that are displayed in a graph. A threshold value can be set to determine the TP, TN, FP and FN samples.

Classifier

The number of samples that are TP, TN, FP or FN can be organized in what is known as a confusion matrix. This tool makes it easy to perform calculations that determine the validity of the model at hand.

Accuracy

The accuracy of the model can be defined as the number of samples that were correctly classified divided by the total number of samples, hence,

$$Accuracy=\frac{TP+TN}{TP+FP+TN+FN}$$

Whilst testing using accuracy can be used in some situations, it has a severe flaw that makes it unsuitable in others. In particular, when the number of TN is much greater than TP or vice versa, this measure will always produce a very high result. A typical such scenario when the model at hand is detecting anomalies, for example, tumours from a scanned image.

Precision

A more robust metric is the precision or the ratio of the TP from all the samples classified as true. We, therefore, consider only the positive side of our diagram to obtain this metric:

$$Precision=\frac{TP}{TP+FP}$$

Precision

Receiver Operator Characteristic (ROC)

Two other metrics that can be used are the True Positive Rate (TPR) and the False Positive Rate (FPR). The TPR is the ratio of the correctly classified true samples (TP) and the total number of true samples, (TP and FN), hence,

$$TPR =\frac{TP}{TP+FN}$$

TPR

Conversely, the false positive rate (FPR) is the ratio of the samples incorrectly classified as true (FP) and the total number of false samples (TF and FP), hence.

$$FPR = \frac{FP}{FP+TN}$$

Intuitively we can determine that different values of TPR and FPR are obtained as the threshold value is varied. The (FPR, TPR) pairs obtained for each threshold value can be plotted on what is known as the Receiver Operator Characteristic (ROC) chart.

ROC

The area under the ROC chart is an indicator of the prediction accuracy of the model. The objective of a model is for the ROC curve to reach a TPR value of 1 for the smallest possible value of FPR. Therefore the largest possible area of the ROC curve is the area of a square having sides of 1 and is therefore 1.




Notes about Azure ML, Part 10 - An end-to-end AzureML example; Model Optimization

Creation and execution of an AzureML Model Optimization Experiment
machine-learning azure ml hyperparameter tuning model optimization

Notes about Azure ML, Part 9 - An end-to-end AzureML example Pipeline creation and execution

Creation and execution of a multi-step AzureML pipeline the selects the best model for a given dataset.
machine-learning azure ml pipeline

Notes about Azure ML, Part 8 - An end-to-end AzureML example; Workspace creation and data upload

In this first post of a series that will cover an end-to-end machine learning project in Azure Ml we will look at how to create a workspace and upload data to it.
machine-learning azure ml dataset datastore
comments powered by Disqus


machine-learning 25 python 21 fuzzy 14 hugo_cms 11 azure-ml 10 linear-regression 10 gradient-descent 9 type2-fuzzy 8 type2-fuzzy-library 8 type1-fuzzy 5 cnc 4 dataset 4 datastore 4 it2fs 4 excel 3 paper-workout 3 r 3 c 2 c-sharp 2 experiment 2 iot 2 programming 2 robotics 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 embedded 1 fuzzy-logic 1 game 1 hyperparameter-tuning 1 javascript 1 learning 1 mathjax 1 maths 1 model-optimization 1 mxchip 1 pandas 1 pipeline 1 random_walk 1 roc 1 tools 1 vscode 1 wsl 1