FIX log_loss at boundaries and integer y_pred #24365

lorentzenchr · 2022-09-05T18:09:11Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR fixes log_loss for cases at the boundaries like

assert log_loss([0, 1], [0, 1], eps=0) == 0
assert log_loss([0, 1], [0, 0], eps=0) == np.inf
assert log_loss([0, 1], [1, 1], eps=0) == np.inf

Note that this also fixes the bug of not allowing integer y_pred as in the test cases above.

Old behaviour:

from sklearn.metrics import log_loss

log_loss([0, 1], [0, 1], eps=0)

UFuncTypeError: Cannot cast ufunc 'true_divide' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

log_loss([0, 1.], [0, 1.], eps=0)

nan

log_loss([0, 1.], [0, 0.], eps=0)

nan

log_loss([0, 1.], [1, 1.], eps=0)

nan

TomDLT

LGTM

sklearn/metrics/_classification.py

sklearn/metrics/tests/test_classification.py

TomDLT · 2022-09-26T17:33:29Z

We also need a changelog entry.

jjerphan

LGTM.

jjerphan · 2022-09-27T14:11:23Z

sklearn/metrics/_classification.py

@@ -28,6 +28,7 @@

 from scipy.sparse import coo_matrix
 from scipy.sparse import csr_matrix
+from scipy.special import xlogy


TIL the existence of scipy.special.xlogy!

jjerphan · 2022-09-27T14:11:23Z

sklearn/metrics/tests/test_classification.py

+    loss_true = -np.mean(bernoulli.logpmf(np.array(y_true) == "yes", y_pred[:, 1]))
+    assert_almost_equal(loss, loss_true)


This is much clearer, thank you.

lorentzenchr added 2 commits Sep 5, 2022

ENH use xlogy and allow integer y_pred

a66532f

TST add additional log loss test cases

ea237ea

github-actions bot added the module:metrics label Sep 5, 2022

lorentzenchr added Bug Quick Review For PRs that are quick to review labels Sep 5, 2022

TomDLT approved these changes Sep 21, 2022

View changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/tests/test_classification.py Outdated Show resolved Hide resolved

lorentzenchr added 2 commits Sep 25, 2022

CLN always renormalizing y_pred

82d7f5a

TST use bernoulli.logpmf explicitly

6cfae77

TomDLT approved these changes Sep 26, 2022

View changes

lorentzenchr added 2 commits Sep 26, 2022

DOC add whatsnew

988c66e

Merge branch 'main' into logloss_xlogy

cd0b4c6

jjerphan approved these changes Sep 27, 2022

View changes

jjerphan merged commit 681ab94 into scikit-learn:main Sep 27, 2022
31 checks passed

lorentzenchr deleted the logloss_xlogy branch Sep 27, 2022

lorentzenchr mentioned this pull request Oct 24, 2022

BUG log_loss renormalizes the predictions #24515

Open

Dec	JAN	Apr
	28
2022	2023	2025

FIX log_loss at boundaries and integer y_pred #24365

FIX log_loss at boundaries and integer y_pred #24365

lorentzenchr commented Sep 5, 2022 •

edited

TomDLT left a comment

TomDLT commented Sep 26, 2022

jjerphan left a comment

jjerphan Sep 27, 2022

jjerphan Sep 27, 2022

		loss_true = -np.mean(bernoulli.logpmf(np.array(y_true) == "yes", y_pred[:, 1]))
		assert_almost_equal(loss, loss_true)

FIX log_loss at boundaries and integer y_pred #24365

FIX log_loss at boundaries and integer y_pred #24365

Conversation

lorentzenchr commented Sep 5, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

TomDLT left a comment

Choose a reason for hiding this comment

TomDLT commented Sep 26, 2022

jjerphan left a comment

Choose a reason for hiding this comment

jjerphan Sep 27, 2022

Choose a reason for hiding this comment

jjerphan Sep 27, 2022

Choose a reason for hiding this comment

lorentzenchr commented Sep 5, 2022 •

edited