New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX log_loss at boundaries and integer y_pred #24365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We also need a changelog entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -28,6 +28,7 @@ | |||
|
|||
from scipy.sparse import coo_matrix | |||
from scipy.sparse import csr_matrix | |||
from scipy.special import xlogy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL the existence of scipy.special.xlogy
!
loss_true = -np.mean(bernoulli.logpmf(np.array(y_true) == "yes", y_pred[:, 1])) | ||
assert_almost_equal(loss, loss_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much clearer, thank you.
Reference Issues/PRs
None
What does this implement/fix? Explain your changes.
This PR fixes
log_loss
for cases at the boundaries likeNote that this also fixes the bug of not allowing integer
y_pred
as in the test cases above.Old behaviour:
UFuncTypeError: Cannot cast ufunc 'true_divide' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
nan
nan
nan