FIX adapt epsilon value depending of the dtype of the input #24354

Safikh · 2022-09-04T16:34:15Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Change the default epsilon value in logloss from 1e-15 to auto which is equal to eps of y_pred's dtype if y_pred
is a numpy float array else it defaults to 1e-15 as earlier.

Any other comments?

…gloss_float32_fix

Safikh · 2022-09-05T17:10:07Z

@glemaitre Made the changes as per your comment.

…oss_float32_fix

glemaitre

Note to other maintainer:
We should not forget to add @gsiisg as Co-Author when merging the PR.

doc/whats_new/v1.2.rst

glemaitre · 2022-09-06T09:26:00Z

doc/whats_new/v1.2.rst

+- |Fix| :func:`metrics.logloss` takes "auto" as default eps value and it will be equal to
+  eps value of the `y_pred` if `y_pred` is numpy float else it will be 1e-15. This change was
+  made to be able to handle float16 and float32 numpy arrays


Suggested change

- |Fix| :func:`metrics.logloss` takes "auto" as default eps value and it will be equal to

eps value of the `y_pred` if `y_pred` is numpy float else it will be 1e-15. This change was

made to be able to handle float16 and float32 numpy arrays

- |Fix| automatically set `eps` in :func:`metrics.logloss` depending on the input

arrays. `eps` was previously too small by default when passing lower precision

floating point arrays.

We should also add an entry in the "Change models" section stating that we switch from 1e-15 to np.finfo(np.float64).eps by default.

Where is the "Change models" section in the codebase?

sklearn/metrics/_classification.py

sklearn/metrics/tests/test_classification.py

Co-authored-by: Guillaume Lemaitre <[email protected]>

…gloss_float32_fix

Micky774

Thanks for the PR @Safikh! A few small notes.

Micky774 · 2022-09-06T11:32:24Z

sklearn/metrics/tests/test_classification.py

+        y_true = [[0, 1]]
+        y_score = np.array([1], dtype=dtype)
+        loss = log_loss(y_true, y_score, eps="auto")
+        assert_allclose(loss, 0.000977, atol=1e-3)


It would be clearer if this value were computed rather than hard-coded.

sklearn/metrics/_classification.py

Micky774 · 2022-09-06T11:32:24Z

doc/whats_new/v1.2.rst

+- |Fix| automatically set `eps` in :func:`metrics.logloss` depending on the input
+  arrays. `eps` was previously too small by default when passing lower precision
+  floating point arrays.
+  :pr:`24354` by :user:`Safiuddin Khaja <Safikh>` and
+  :user:`gsiisg <gsiisg>`


The changelog entry should reflect that you are adding a new keyword option to eps and changing the default value of eps.

…gloss_float32_fix

gsiisg · 2022-09-06T22:12:49Z

Big thanks to @Safikh and @glemaitre and all contributors!
This is my first time contributing to scikit-learn, so wasn't familiar with the process and created some confusion with a second pull request, will stick with this one from now on. Just want to add a comment to the "eps" comment section, should say that all input of y_pred will pass through sklearn.utils.check_array which will cast ordinary int/float into int64/float64, so that even if the input was not explicit, it will be treated with 64 bit precision unless specified otherwise as in np.float16/32. etc. At first I was worried that y_pred.dtype will error out if the input was [1] etc, but realized later it will be cast as array([1]) which will be 64 bit by default.

…gloss_float32_fix

Safikh · 2022-09-07T15:06:23Z

I'm facing a weird bug where if I set the eps to dtype.eps, the test sklearn/metrics/tests/test_common.py::test_not_symmetric_metric fails. If I set it to something else (that is not a multiple of eps), it works.

gsiisg

I think we should avoid the hard coding of dtype=np.float64 in check_array(), it will blow up the memory used if the original input was np.float16/32, wouldn't it? And if y_pred is cast as 64 bit then we wouldn't have problem with the eps=1e-15 in the first place.

Micky774 · 2022-09-08T03:29:46Z

I think we should avoid the hard coding of dtype=np.float64 in check_array(), it will blow up the memory used if the original input was np.float16/32, wouldn't it? And if y_pred is cast as 64 bit then we wouldn't have problem with the eps=1e-15 in the first place.

You're right. I think it should instead be dtype=[np.float64, np.float32, np.float16] in which case it'll convert to np.float64 if it is anything other than those floating-type. In the int{32, 64} case it'll convert to np.float64 which should be fine.

…t-learn into logloss_float32_fix

…gloss_float32_fix

doc/whats_new/v1.2.rst

sklearn/metrics/_classification.py

glemaitre · 2022-09-12T07:52:59Z

sklearn/metrics/_classification.py

+    y_pred = check_array(
+        y_pred, ensure_2d=False, dtype=[np.float64, np.float32, np.float16]
+    )
+    eps = np.finfo(y_pred.dtype).eps * 1.0001 if eps == "auto" else eps


what is the reason for multiplying by 1.0001. This looks really arbritrary.

The test sklearn/metrics/tests/test_common.py::test_not_symmetric_metric fails for eps. I have not been able to identify why it is happening. But a very minute change to eps seems to fix it.

Yep, the test is not adapted for the log_loss. Indeed, the loss is symmetric when it contains only 0/1 values but not otherwise. Basically, since it takes y_proba and y_pred, the tests make little sense.

We should correct this.

So, should I remove log_loss from the symmetric metrics? Then there would be no need for modifying the eps?

Yes we need to remove the loss from the symmetric metrics and remove this 1.0001.

Co-authored-by: Guillaume Lemaitre <[email protected]>

…gloss_float32_fix

Micky774 · 2022-09-12T16:18:19Z

doc/whats_new/v1.2.rst

+- |Fix| add a `"auto"` option to `eps` in :func:`metrics.logloss`.
+  This option will automatically set the `eps` value depending on the data
+  type `y_pred`.


The wording here still needs to be more explicit.

Also, at this point I would consider this an enhancement which also happens to fix a bug. Wondering what the maintainers think.

Suggested change

- |Fix| add a `"auto"` option to `eps` in :func:`metrics.logloss`.

This option will automatically set the `eps` value depending on the data

type `y_pred`.

- |Enhancement| Adds an `"auto"` option to `eps` in :func:`metrics.logloss`.

This option will automatically set the `eps` value depending on the data

type of `y_pred`. In addition, the default value of `eps` is changed from

`1e-15` to the new `"auto"` option.

Fair enough.

Safikh · 2022-09-29T16:39:00Z

Hi @glemaitre, I think this PR is complete. Is there anything that needs to be changed?

glemaitre

We need to remove this 1.0001.

sklearn/metrics/_classification.py

glemaitre · 2022-11-03T11:03:13Z

sklearn/metrics/_classification.py

+    y_pred = check_array(
+        y_pred, ensure_2d=False, dtype=[np.float64, np.float32, np.float16]
+    )
+    eps = np.finfo(y_pred.dtype).eps * 1.0001 if eps == "auto" else eps


Yes we need to remove the loss from the symmetric metrics and remove this 1.0001.

Co-authored-by: Guillaume Lemaitre <[email protected]>

glemaitre

LGTM. Thanks @Safikh.

@Micky774 @ogrisel do you want to have a look.

Micky774

Overall looks good. I had a couple of small wording nits, and a concern regarding testing the np.float16 dtype. If those are addressed then this should be ready to merge.

doc/whats_new/v1.2.rst

sklearn/metrics/_classification.py

sklearn/metrics/tests/test_classification.py

…t-learn into logloss_float32_fix

Co-authored-by: Meekail Zain <[email protected]>

sklearn/conftest.py

…t-learn into logloss_float32_fix

Micky774

LGTM. @glemaitre feel free to merge if the changes are still acceptable to you

glemaitre · 2022-11-10T08:42:37Z

Thanks @Safikh Merging.

gsiisg · 2022-11-10T22:35:02Z

Thanks everyone!

Change default epsilon in logloss metric from 1e-15 to 1e-7

abc505b

github-actions bot added the module:metrics label Sep 4, 2022

glemaitre changed the title ~~Change default epsilon in logloss metric from 1e-15 to 1e-7~~ FIX adapt epsilon value depending of the dtype of the input Sep 5, 2022

Safikh added 3 commits Sep 5, 2022

add 'auto' as default eps value

491f201

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

bc310ba

…gloss_float32_fix

Add the fix to what's new doc

473e8e7

Safikh added 2 commits Sep 6, 2022

fix docstring for log_loss function

832680b

Merge branch 'main' of github.com:scikit-learn/scikit-learn into logl…

691af51

…oss_float32_fix

glemaitre reviewed Sep 6, 2022

View changes

glemaitre mentioned this pull request Sep 6, 2022

Log loss 16 32bit input #24357

Closed

Safikh and others added 4 commits Sep 6, 2022

Update doc/whats_new/v1.2.rst

56a7e7c

Co-authored-by: Guillaume Lemaitre <[email protected]>

Add couathor to PR

e0f1a2e

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

8bf27ac

…gloss_float32_fix

Add test_log_loss_eps_auto as separate test

abbf32a

Micky774 reviewed Sep 6, 2022

View changes

Safikh added 2 commits Sep 6, 2022

minor fix

fcb79cb

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

4df44ff

…gloss_float32_fix

Safikh added 2 commits Sep 7, 2022

Add np.float64 as dtype to check_array in log_loss

67d53e8

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

91dbe36

…gloss_float32_fix

Merge branch 'main' into logloss_float32_fix

a2bb50f

gsiisg suggested changes Sep 8, 2022

View changes

Safikh added 3 commits Sep 10, 2022

Add multiple dtypes to check array

bd0850c

Merge branch 'logloss_float32_fix' of https://github.com/Safikh/sciki…

17331a9

…t-learn into logloss_float32_fix

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

9d2ebdb

…gloss_float32_fix

glemaitre reviewed Sep 12, 2022

View changes

glemaitre self-requested a review Sep 12, 2022

Safikh and others added 5 commits Sep 12, 2022

Update doc/whats_new/v1.2.rst

bf8e020

Co-authored-by: Guillaume Lemaitre <[email protected]>

Update sklearn/metrics/_classification.py

bfc4b8e

Co-authored-by: Guillaume Lemaitre <[email protected]>

Update sklearn/metrics/tests/test_classification.py

a72712b

Co-authored-by: Guillaume Lemaitre <[email protected]>

Add to contributors

b74af1d

Merge branch 'main' of https://github.com/Safikh/scikit-learn into lo…

e7af1d7

…gloss_float32_fix

Micky774 reviewed Sep 12, 2022

View changes

glemaitre self-requested a review Sep 13, 2022

blackify

8cadc7c

Merge remote-tracking branch 'origin/main' into pr/Safikh/24354

00f00d5

glemaitre reviewed Nov 3, 2022

View changes

Safikh and others added 3 commits Nov 3, 2022

Update sklearn/metrics/_classification.py

2e749f4

Co-authored-by: Guillaume Lemaitre <[email protected]>

Merge branch 'main' into logloss_float32_fix

b3e1f9b

Remove log loss from symmetric metrics

f3bcc0c

glemaitre approved these changes Nov 3, 2022

View changes

DOC add the change of default in the "Change models" section

0f37e09

Micky774 reviewed Nov 4, 2022

View changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/tests/test_classification.py Show resolved Hide resolved

Safikh and others added 4 commits Nov 4, 2022

Add float16 test

abdb31c

Merge branch 'logloss_float32_fix' of https://github.com/Safikh/sciki…

f301c0a

…t-learn into logloss_float32_fix

Apply suggestions from code review

20b4094

Co-authored-by: Meekail Zain <[email protected]>

Merge branch 'main' into logloss_float32_fix

eefd4c7

Micky774 reviewed Nov 5, 2022

View changes

sklearn/conftest.py Outdated Show resolved Hide resolved

Safikh and others added 3 commits Nov 8, 2022

Add log loss test for float16

856cc58

Merge branch 'logloss_float32_fix' of https://github.com/Safikh/sciki…

8d61be6

…t-learn into logloss_float32_fix

Merge branch 'main' into logloss_float32_fix

521cee1

Micky774 approved these changes Nov 9, 2022

View changes

glemaitre merged commit f8986ee into scikit-learn:main Nov 10, 2022
25 checks passed

Safikh deleted the logloss_float32_fix branch Nov 11, 2022

Dec	JAN	Apr
	28
2022	2023	2025

FIX adapt epsilon value depending of the dtype of the input #24354

FIX adapt epsilon value depending of the dtype of the input #24354

Safikh commented Sep 4, 2022 •

edited

Safikh commented Sep 5, 2022

glemaitre left a comment

glemaitre Sep 6, 2022

glemaitre Sep 6, 2022

Safikh Sep 6, 2022

Micky774 left a comment

Micky774 Sep 6, 2022 •

edited

Micky774 Sep 6, 2022

gsiisg commented Sep 6, 2022

Safikh commented Sep 7, 2022

gsiisg left a comment •

edited

Micky774 commented Sep 8, 2022

glemaitre Sep 12, 2022

Safikh Sep 12, 2022 •

edited

glemaitre Sep 13, 2022

Safikh Sep 14, 2022

glemaitre Nov 3, 2022

Micky774 Sep 12, 2022 •

edited

glemaitre Sep 13, 2022

Safikh commented Sep 29, 2022

glemaitre left a comment

glemaitre Nov 3, 2022

glemaitre left a comment

Micky774 left a comment

Micky774 left a comment

glemaitre commented Nov 10, 2022

gsiisg commented Nov 10, 2022

FIX adapt epsilon value depending of the dtype of the input #24354

FIX adapt epsilon value depending of the dtype of the input #24354

Conversation

Safikh commented Sep 4, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Safikh commented Sep 5, 2022

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Micky774 left a comment

Choose a reason for hiding this comment

Micky774 Sep 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsiisg commented Sep 6, 2022

Safikh commented Sep 7, 2022

gsiisg left a comment • edited

Choose a reason for hiding this comment

Micky774 commented Sep 8, 2022

Choose a reason for hiding this comment

Safikh Sep 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Micky774 Sep 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Safikh commented Sep 29, 2022

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

Micky774 left a comment

Choose a reason for hiding this comment

Micky774 left a comment

Choose a reason for hiding this comment

glemaitre commented Nov 10, 2022

gsiisg commented Nov 10, 2022

Safikh commented Sep 4, 2022 •

edited

Micky774 Sep 6, 2022 •

edited

gsiisg left a comment •

edited

Safikh Sep 12, 2022 •

edited

Micky774 Sep 12, 2022 •

edited