The Wayback Machine - https://web.archive.org/web/20230128034844/https://github.com/scikit-learn/scikit-learn/pull/24365
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX log_loss at boundaries and integer y_pred #24365

Merged
merged 6 commits into from Sep 27, 2022

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Sep 5, 2022

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR fixes log_loss for cases at the boundaries like

assert log_loss([0, 1], [0, 1], eps=0) == 0
assert log_loss([0, 1], [0, 0], eps=0) == np.inf
assert log_loss([0, 1], [1, 1], eps=0) == np.inf

Note that this also fixes the bug of not allowing integer y_pred as in the test cases above.

Old behaviour:

from sklearn.metrics import log_loss

log_loss([0, 1], [0, 1], eps=0)

UFuncTypeError: Cannot cast ufunc 'true_divide' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

log_loss([0, 1.], [0, 1.], eps=0)

nan

log_loss([0, 1.], [0, 0.], eps=0)

nan

log_loss([0, 1.], [1, 1.], eps=0)

nan

@lorentzenchr lorentzenchr added Bug Quick Review For PRs that are quick to review labels Sep 5, 2022
TomDLT
TomDLT approved these changes Sep 21, 2022
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_classification.py Outdated Show resolved Hide resolved
TomDLT
TomDLT approved these changes Sep 26, 2022
@TomDLT
Copy link
Member

TomDLT commented Sep 26, 2022

We also need a changelog entry.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -28,6 +28,7 @@

from scipy.sparse import coo_matrix
from scipy.sparse import csr_matrix
from scipy.special import xlogy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL the existence of scipy.special.xlogy!

loss_true = -np.mean(bernoulli.logpmf(np.array(y_true) == "yes", y_pred[:, 1]))
assert_almost_equal(loss, loss_true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much clearer, thank you.

@jjerphan jjerphan merged commit 681ab94 into scikit-learn:main Sep 27, 2022
31 checks passed
@lorentzenchr lorentzenchr deleted the logloss_xlogy branch Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug module:metrics Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants