Performance Parameters
Performance Parameters
Performance metrics
TEACHER
Mirko Mazzoleni
PLACE
University of Bergamo
Outline
1. Metrics
2 /14
Outline
1. Metrics
3 /14
Metrics
It is extremely important to use quantitative metrics for evaluating a machine learning
model
• Until now, we relied on the cost function value for regression and classification
• Other metrics can be used to better evaluate and understand the model
• For classification
Accuracy/Precision/Recall/F1-score, ROC curves,…
• For regression
Normalized RMSE, Normalized Mean Absolute Error (NMAE),…
4 /14
Classification case: metrics for skewed classes
Disease dichotomic classification example
Find that you got 1% error on test set (99% correct diagnoses)
If I use a predictor that predicts always the 𝟎 class, I get 99.5% of accuracy!!
5 /14
Outline
1. Metrics
6 /14
Precision and recall
Suppose that 𝑦 = 1 in presence of a rare class that we want to detect
Predicted class
1 (p) 0 (n)
True Positive True Positive
=
# Predicted Positive True Positive + False Positive
True positive False positive
1 (Y)
(TP) (FP)
Recall (How much we are good at detecting)
Of all patients that actually have the disease, what False negative True negative
fraction did we correctly detect as having the disease? 0 (N)
(FN) (TN)
True Positive True Positive
=
# Actual Positive True Positive + False Negative
7 /14
Trading off precision and recall
Logistic regression: 0 ≤ ℎ 𝒙 ≤ 1
At different thresholds, correspond
• Predict 1 if ℎ 𝒙 ≥ 0.5 different confusion matrices!
These thresholds can
be different from 0.5!
• Predict 0 if ℎ 𝒙 < 0.5
Suppose we want to avoid missing too many cases of disease (avoid false negatives).
• Decrease threshold → Higher recall, lower precision
8 /14
F1-score
It is usually better to compare models by means of one number only. The F1 − score can
be used to combine precision and recall
P+R PR • P = 0 or R = 0 ⇒ F1 score = 0
Average = F1 score = 2
2 P+R
• P = 1 and R = 1 ⇒ F1 score = 1
9 /14
Summaries of the confusion matrix
Different metrics can be computed from the confusion matrix, depending on the class of
interest (https://en.wikipedia.org/wiki/Precision_and_recall)
10 /14
Outline
1. Metrics
11 /14
Ranking instead of classifying
Classifiers such as logistic regression can output a probability of belonging to a class (or
something similar).
• We can use this to rank the different istances and take actions on the cases at top of
the list
12 /14
Ranking instead of classifying
p n
Y 0 0 p n
Instance
True class Score N 100 100 Y
1 0
description
99 100
…………… 1 0,99 N
…………… 1 0,98
…………… 0 0,96 p n
2 0
…………… 0 0,90 Y
13 /14
ROC curves
ROC curves are a very general way to represent and compare the performance of
different models (on a binary classification task)
Perfection Observations
• 0,0 : predict always negative
Random • 1,1 : predict always positive
True positive rate
guessing
• Diagonal line: random classifier
• Below diagonal line: worse than random classifier
• Different classifiers can be compared
• Area Under the Curve (AUC): probability that a randomly
chosen positive instance will be ranked ahead of randomly
chosen negative instance
False positive rate
14 /14