0% found this document useful (0 votes)
5 views

Performance Parameters

The document outlines performance metrics essential for evaluating machine learning models, focusing on classification and regression metrics. It emphasizes the importance of precision, recall, and F1-score, particularly in cases of skewed classes, and discusses the use of ROC curves for model comparison. Additionally, it highlights the benefits of ranking instances based on predicted probabilities rather than solely classifying them.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Performance Parameters

The document outlines performance metrics essential for evaluating machine learning models, focusing on classification and regression metrics. It emphasizes the importance of precision, recall, and F1-score, particularly in cases of skewed classes, and discusses the use of ROC curves for model comparison. Additionally, it highlights the benefits of ranking instances based on predicted probabilities rather than solely classifying them.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lesson 7.

DATA SCIENCE AND


AUTOMATION COURSE

MASTER DEGREE SMART


TECHNOLOGY ENGINEERING

Performance metrics
TEACHER
Mirko Mazzoleni
PLACE
University of Bergamo
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

2 /14
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

3 /14
Metrics
It is extremely important to use quantitative metrics for evaluating a machine learning
model

• Until now, we relied on the cost function value for regression and classification

• Other metrics can be used to better evaluate and understand the model

• For classification
 Accuracy/Precision/Recall/F1-score, ROC curves,…
• For regression
 Normalized RMSE, Normalized Mean Absolute Error (NMAE),…

4 /14
Classification case: metrics for skewed classes
Disease dichotomic classification example

Train logistic regression model ℎ 𝒙 , with 𝑦 = 1 if disease, 𝑦 = 0 otherwise.

Find that you got 1% error on test set (99% correct diagnoses)

The 𝑦 = 1 class has very few examples with


Only 0.50% of patients actually have disease
respect to the 𝑦 = 0 class

If I use a predictor that predicts always the 𝟎 class, I get 99.5% of accuracy!!

For skewed classes, the accuracy metric can be deceptive

5 /14
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

6 /14
Precision and recall
Suppose that 𝑦 = 1 in presence of a rare class that we want to detect

Precision (How much we are precise in the detection) Confusion matrix


Of all patients where we predicted 𝑦 = 1,
what fraction actually has the disease? Actual class

Predicted class
1 (p) 0 (n)
True Positive True Positive
=
# Predicted Positive True Positive + False Positive
True positive False positive
1 (Y)
(TP) (FP)
Recall (How much we are good at detecting)
Of all patients that actually have the disease, what False negative True negative
fraction did we correctly detect as having the disease? 0 (N)
(FN) (TN)
True Positive True Positive
=
# Actual Positive True Positive + False Negative

7 /14
Trading off precision and recall
Logistic regression: 0 ≤ ℎ 𝒙 ≤ 1
At different thresholds, correspond
• Predict 1 if ℎ 𝒙 ≥ 0.5 different confusion matrices!
These thresholds can
be different from 0.5!
• Predict 0 if ℎ 𝒙 < 0.5

Suppose we want to predict 𝑦 = 1 (disease) only if very confident


• Increase threshold → Higher precision, lower recall

Suppose we want to avoid missing too many cases of disease (avoid false negatives).
• Decrease threshold → Higher recall, lower precision

8 /14
F1-score
It is usually better to compare models by means of one number only. The F1 − score can
be used to combine precision and recall

Precision(P) Recall (R) Average F1 Score


Algorithm 1 0.5 0.4 0.45 0.444 The best is Algorithm 1
Algorithm 2 0.7 0.1 0.4 0.175
Algorithm 3 0.02 1.0 0.51 0.0392
Algorithm 3 predict always 𝟏 Average says not correctly
that Algorithm 3 is the best

P+R PR • P = 0 or R = 0 ⇒ F1 score = 0
Average = F1 score = 2
2 P+R
• P = 1 and R = 1 ⇒ F1 score = 1

9 /14
Summaries of the confusion matrix
Different metrics can be computed from the confusion matrix, depending on the class of
interest (https://en.wikipedia.org/wiki/Precision_and_recall)

10 /14
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

11 /14
Ranking instead of classifying
Classifiers such as logistic regression can output a probability of belonging to a class (or
something similar).

• We can use this to rank the different istances and take actions on the cases at top of
the list

• We may have a budget, so we have to target most promising individuals

• Ranking enables to use different techniques for visualizing model performance

12 /14
Ranking instead of classifying
p n

Y 0 0 p n
Instance
True class Score N 100 100 Y
1 0
description
99 100
…………… 1 0,99 N

…………… 1 0,98
…………… 0 0,96 p n
2 0
…………… 0 0,90 Y

…………… 1 0,88 N 98 100


p n
…………… 1 0,87 2 1
Y
…………… 0 0,85 98 99
N
…………… 1 0,80 p n
…………… 0 0,70 Y
6 4
Different confusion
N 94 96
matrices by changing
Adapated from [1] the threshold

13 /14
ROC curves
ROC curves are a very general way to represent and compare the performance of
different models (on a binary classification task)

Perfection Observations
• 0,0 : predict always negative
Random • 1,1 : predict always positive
True positive rate

guessing
• Diagonal line: random classifier
• Below diagonal line: worse than random classifier
• Different classifiers can be compared
• Area Under the Curve (AUC): probability that a randomly
chosen positive instance will be ranked ahead of randomly
chosen negative instance
False positive rate

14 /14

You might also like