0% found this document useful (0 votes)
295 views

Model Perf Cheat Sheet

This document provides a cheat sheet on performance measures for binary classification and regression tasks. For binary classification, it defines terms like true positive rate, precision, recall, F1 score, and ROC curves. It also lists relationships between different measures. For regression, it defines errors like mean squared error, root mean squared error, and R-squared, as well as model selection criteria like AIC and BIC. Resampling methods like cross-validation are also covered for estimating prediction error.

Uploaded by

vinodhewards
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
295 views

Model Perf Cheat Sheet

This document provides a cheat sheet on performance measures for binary classification and regression tasks. For binary classification, it defines terms like true positive rate, precision, recall, F1 score, and ROC curves. It also lists relationships between different measures. For regression, it defines errors like mean squared error, root mean squared error, and R-squared, as well as model selection criteria like AIC and BIC. Resampling methods like cross-validation are also covered for estimating prediction error.

Uploaded by

vinodhewards
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Binary classification performances measure cheat sheet

Damien Franois v1.0 - 2009 ([email protected])


Confusion matrix for two possible
outcomes p (positive) and n
(negative)
Actual
p

True negative rate: proportion of


Total
actual negative which are predicted
negative
P
TN / (TN + FP)

p'

true
false
positive postive

n'

false
true
N
negative negative

Predicted

True positive rate: proportion of


actual positives which are predicted
positive
TP / (TP + FN)

Youden's index: arithmetic mean


between sensitivity and specificity
sensitivity - (1 - specificity)
Matthews correlation correlation
between the actual and predicted
(TP . TN FP . FN) /
((TP+FP) (TP+FN) (TP + FP) (TN+FN))1/2
comprised between -1 and 1

(Cumlative) Lift chart plot of the


true positive rate as a function of the
proportion of the population being
predicted positive, controlled by some
classifier parameter (e.g. a threshold)

Discriminant power normalised


likelihood index
Positive likelihood: likelihood that a
total
P'
N'
predicted positive is an actual positive sqrt(3) / .
(log (sensitivity / (1 specificity)) +
Classification accuracy
sensitivity / (1 - specificity)
log (specificity / (1 - sensitivity)))
(TP + TN) / (TP + TN + FP + FN)
<1 = poor, >3 = good, fair otherwise
Error rate
Negative likelihood: likelihood that a
Relationships
(FP + FN) / (TP + TN + FP + FN)
predicted negative is an actual
Graphical tools
negative
sensitivity = recall = true positive rate
Paired criteria
specificity / (1 - sensitivity)
specificity = true negative rate
ROC curve receiver operating
BCR = . (sensitivity + specificity)
characteristic curve : 2-D curve
Precision: (or Positive predictive value) Combined criteria
BCR = 2 . Youden's index - 1
parametrized by one parameter of the
proportion of predicted positives which
F-measure = F1measure
classification algorithm, e.g. some
are actual positive
BCR: Balanced Classification Rate
Accuracy = 1 error rate
TP / (TP + FP)
(TP / (TP + FN) + TN / (TN + FP)) threshold in the true postivie rate /
false positive rate space
BER: Balanced Error Rate, or HTER:
References
AUC The area under the ROC is
Recall: proportion of actual positives
Half Total Error Rate: 1 - BCR
between 0 and 1
which are predicted positive
Sokolova, M. and Lapalme, G. 2009. A
TP / (TP + FN)
F-measure harmonic mean between
systematic analysis of performance
precision and recall
measures for classification tasks. Inf.
2 (precision . recall) /
Process. Manage. 45, 4 (Jul. 2009),
(precision + recall)
Sensitivity: proportion of actual
427-437.
F-measure weighted harmonic mean
positives which are predicted positive
Demsar, J.: Statistical comparisons of
between precision and recall
TP / (TP + FN)
classifiers over multiple data sets.
(1+)2 TP / ((1+)2 TP + 2 FN + FP)
Journal of Machine Learning Research
Specificity: proportion of actual
7 (2006) 130
negative which are predicted negative The harmonic mean between specificity
and sensitivity is also often used and
TN / (TN + FP)
sometimes referred to as F-measure.

Regression performances measure cheat sheet


Damien Franois v0.9 - 2009 ([email protected])
Let
input/output pairs and
function such that for

be a set of
a
,

Absolute error

Robust error measures

Resampling methods

MAD Mean Absolute Deviation

Median Squared error

LOO Leave-one-out: build the model


on
data elements and test on
the remaining one. Iterate
times to
collect all
and compute mean error.

MAPE Mean Absolute Percentage Error


Squared error
SSE Sum of Squared Errors, or
RSS Residual Sum of Squares

Predicted error
PRESS Predicted REsidual Sums of
Squares

-trimmed MSE
where is the set of residuals
where
percents of the largest
values are discarded.
M-estimators

RMSE Root Mean Squared Error

where
is a matrix built by stacking
the
in rows.
is the vector of
where \rho is a non-negative function
with a mininmum in 0, like the
GCV Generalised Cross Validation
parabola, the Hubber function, or the
bisquare function.

NMSE Normalised Mean Squared Error

where
is a matrix built by stacking
Graphical tool
the
in rows.
is the vector of

MSE Mean Squared Error

Information criteria
where var is the empirical variance in
the sample.

AIC Akaike Information Criterion

R-squared

where
is the number of parameters
in the model

where var is the empirical variance in


the sample

BIC Bayesian Information Criterion


where
is the number of parameters
in the model

Plot of predicted value against actual


value. A perfect model places all dots
on the diagonal.

X-Val Cross validation. Randomly


split the data in two parts, use the
first one to build the model and the
second one to test it. Iterate to get a
distribution of the test error of the
model.
K-Fold Cut the data into K parts.
Build the model on the K-1 first parts
and test on the Kth one. Iterate from
1 to K to get a distribution of the test
error of the model.
Bootstrap Draw a random subsample
of the data with replacement. Compute
the error on the whole dataset minus
the training error of the model and
Iterate to get a distribution of such
values. The mean of the distribution is
the optimism. The bootstrap error
estimate is the training error on the
whole dataset plus the optimism.

You might also like