Added support for not normalising OneVsRestClassifier predict_proba result #10612

EmreAtes · 2018-02-09T15:56:46Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

I added support for not normalizing the predict_proba result, and returning it as it is, even if the data is not multilabel. This is useful for implementing a rejection threshold.

Any other comments?

I can also add thresholding in ovrclassifier or somewhere else to reject classifications results, but I didn't see anything else like that in scikit, so I don't know if it's a wanted feature.

jnothman

I don't think we want predict_proba to ever return anything but normalized probabilities. Is there a reason decision_function doesn't suit your needs?

EmreAtes · 2018-02-11T17:48:34Z

I was using it on random forests and decision trees as well, which do not have a decision_function.

jnothman · 2018-02-11T22:13:56Z

well perhaps it makes sense to define a decision_function here that calls the estimator's predict_proba but leaves it unnormalized. Wdyt?

EmreAtes · 2018-02-12T15:51:08Z

As far as I know, the decision function is -1 for confident -1 class, +1 for confident +1 class, but it's not necessarily in the [-1, 1] range, and it represents the distance from the hyperplanes. For the random forest probabilities, would decision function be equal to p(x) * 2 - 1, or would calculating a distance from the decision boundaries of the random forest make sense? (i.e. for each tree node, how far away were we from the condition in the node).

If it's documented that it's not necessarily equivalent to the SVM's decision_function, then it might be OK to define decision_function like you suggest.

jnothman · 2018-02-12T20:56:19Z

In the multiclass case, the decision function argmax needs to be consistent with the discrete prediction. Otherwise I don't think it is constrained...

EmreAtes · 2018-02-13T03:44:24Z

I changed some things so that decision_function retuns predict_proba * 2 - 1, if decision_function is not defined. I added a test for this case.

I noticed that decision_function returns an array of shape (n_samples,) instead of (n_samples,1) if there is a single class. I don't know if this is intended functionality, but it's not in the doctest, so I removed it, I can put it back.
I changed if_delegate_has_method to include a fallback method, so it can check for either predict_proba or decision_function, but I'm not sure if this is useful for other cases as well.

sklearn-lgtm · 2018-02-13T17:40:43Z

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

37 for Explicit export is not defined

fixed alerts:

6 for Mismatch between signature and use of an overridden method
3 for Missing call to init during object initialization
3 for Potentially uninitialized local variable
2 for Comparison using is when operands support eq
2 for Wrong number of arguments for format
1 for Conflicting attributes in base classes
1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

sklearn-lgtm · 2018-02-13T18:46:34Z

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

37 for Explicit export is not defined

fixed alerts:

6 for Mismatch between signature and use of an overridden method
3 for Missing call to init during object initialization
3 for Potentially uninitialized local variable
2 for Comparison using is when operands support eq
2 for Wrong number of arguments for format
1 for Conflicting attributes in base classes
1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

jnothman · 2018-02-13T21:38:21Z

Never mind lgtm.com for now. It's a glitch on their side.

…

On 14 February 2018 at 05:46, sklearn-lgtm ***@***.***> wrote: This pull request introduces 37 alerts and fixes 18 - view on lgtm.com <https://lgtm.com/projects/g/scikit-learn/scikit-learn/rev/pr-46a527562b3ce13d3b64ae8ac63edd4841926bb3> *new alerts:* - 37 for Explicit export is not defined *fixed alerts:* - 6 for Mismatch between signature and use of an overridden method - 3 for Missing call to *init* during object initialization - 3 for Potentially uninitialized local variable - 2 for Comparison using is when operands support *eq* - 2 for Wrong number of arguments for format - 1 for Conflicting attributes in base classes - 1 for Mismatch between signature and use of an overriding method ------------------------------ *Comment posted by lgtm.com <https://lgtm.com>* — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10612 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yW1R8balMruDnQEn1JvuJwMYDXTks5tUdiNgaJpZM4SAISq> .

EmreAtes · 2018-02-19T17:06:56Z

@jnothman any comments on the current state?

jnothman · 2018-02-19T20:59:58Z

sorry I haven't had time yet. decision_function should indeed return only a 1d array in the case of binary classification (even if it's annoying)

jnothman

I suppose you had better add a test for that binary case and note it in the docstring

jnothman · 2018-02-19T22:03:39Z

sklearn/multiclass.py

-        except AttributeError:
-            T = np.array([e.predict_proba(X)[:, 1] * 2 - 1
-                          for e in self.estimators_]).T
+        if len(self.estimators_) == 1:


Let's simplify by just placing this logic below, to select the column instead

EmreAtes · 2018-02-26T16:33:20Z

@jnothman , does it look better now?

jnothman

I'm not altogether happy about making if_delegate_has_method less understandable

jnothman · 2018-02-26T22:10:23Z

sklearn/utils/metaestimators.py

@@ -87,10 +87,11 @@ class _IffHasAttrDescriptor(object):
    See https://docs.python.org/3/howto/descriptor.html for an explanation of
    descriptors.
    """
-    def __init__(self, fn, delegate_names, attribute_name):
+    def __init__(self, fn, delegate_names, attribute_name, backup_attribute):


Here attribute_name should become a list, like delegate_names.

However I wonder if we should just use a property rather than making magic.

jnothman · 2018-02-26T22:10:55Z

sklearn/utils/metaestimators.py

@@ -118,7 +124,7 @@ def __get__(self, obj, type=None):
        return out


-def if_delegate_has_method(delegate):
+def if_delegate_has_method(delegate, backup_method=None):


Please document the parameter below

EmreAtes · 2018-02-27T19:52:05Z

@jnothman , I'm open to making a new decorator, but couldn't think of one that is useful in multiple places because I'm not very familiar with the entire codebase. Do you have a specific design in mind?

The remainder of the code is fixed.

jnothman · 2018-02-27T22:23:15Z

no, not a new decorator, just an equivalent custom implementation with `@property` would suffice

EmreAtes · 2018-02-28T18:52:35Z

@jnothman , I added a second decorator to utils/metaestimators.py. I can re-name it because it sounds very close to the original, but I think the name is explanatory enough.

jnothman

I preferred the previous version to this extra decorator. The name is no clearer by making it plural; plural makes it seem like all must be present.

I don't want to complexity the existing decorator if we can help it either, but the solution is to use property, not a new descriptor. Sigh, perhaps I'll live with the existing decorator being more complex.

EmreAtes · 2018-03-06T09:52:18Z

@jnothman
For the first point, I agree that plural makes it seem like all names are needed, so all names are actually needed. The invocation of the new descriptor is like this:

@if_delegate_has_methods(['_first_estimator', 'estimator'],
                         ['decision_function', 'predict_proba'])
    def decision_function(self, X):

As far as I could understand, property is for defining class variables that have getters and setters. Which of the variables would be a property?

jnothman · 2018-03-06T11:47:03Z

Properties are descriptors which will result in hasattr(..., ...) to return False if the property getter raises an attribute error. The descriptor is more-or-less equivalent to:

def decision_function(self):
    if not hasattr(self.estimator, 'decision_function') and not hasattr...:
        raise AttributeError
    def _func(X, y):
        # calculate the decision function
        return df
    return _func

EmreAtes · 2018-03-06T13:58:22Z

@jnothman , I'm not sure if I understood correctly, so I made a sample. https://pastebin.com/6Y9XUwg0 is this a better solution?

jnothman · 2018-03-06T20:31:14Z

not quite. The property needs to return the function that calculates the ovr decision function from data. Sigh. Maybe modifying the util is clearer and simpler

EmreAtes · 2018-03-13T17:25:35Z

@jnothman I put it back together as a single decorator

jnothman

I guess this LGTM

jnothman · 2018-03-13T22:03:05Z

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

EmreAtes · 2018-03-13T23:50:07Z

@jnothman I added it to what's new. Are pull requests are also referenced as :issue:`number`?

jnothman · 2018-03-14T05:53:16Z

yes

EmreAtes · 2018-03-14T13:04:20Z

@jnothman , it's all set then.

jnothman · 2018-03-14T20:26:25Z

you will need a second dev to review

amueller · 2018-09-27T20:42:40Z

I'm not entirely sure what the motivation for this is.
If the goal is to avoid having to duck-type predict_proba and decision_function, why don't we define decision_function in BaseEstimator?

EmreAtes · 2018-09-27T21:10:17Z

Which base_estimator?

If you mean the BaseEstimator in base.py, I'm not sure if this fallback to using 2 * predict_proba - 1 in cases where decision_function is not available is useful for all cases.

I think understanding the difference between the probabilities [0, 0.1, 0.05, 0] and [0, 1, 0.5, 0] for each of the classes in a OneVsRest case is useful, but I'm not sure if this is the best way of going about implementing this. I was initially only thinking a normalize=True/False kwarg for predict_proba.

amueller · 2018-09-28T17:57:08Z

Why do you think it would be useful here and not in other cases? predict_proba has a more limited definition than decision_function (though some people might expect some sigmoid semantics).

Adding kwargs to functions is kind of frowned upon as it's hard to access those in compound estimators like pipelines. And anything that changes the predict_proba contract to not be normalized also seems a bit weird to me.

amueller · 2018-09-28T18:00:20Z

Your goal is basically to implement a rejection threshold, right? So the only reason for this PR is to not do ducktyping in your implementation of the rejection threshold?

I think we'll add thresholding methods soon (hopefully) and I'm pretty sure they will contain the ducktyping. I'm not sure why putting this logic in this particular place would be the right solution.

EmreAtes · 2018-09-28T18:25:26Z

I'm not sure what other cases it would be useful in, but if you think it would, I can add that to BaseEstimator.

Yes, I was trying to implement a rejection threshold for OneVsRestClassifier, so I had to have another class that inherited OneVsRestClassifier and re-implemented predict_proba, which required knowledge of the internal workings of it.

I can add an option to normalize or not to the constructor of OneVsRestClassifier, which would also be accessible from a pipeline. Or, there could be another function that gives some per-class confidence.

amueller · 2018-09-28T18:41:36Z

My question was more why you didn't use decision_function. But I guess you take the probabilities at face value and so the actual values matter? Usually I'd ensure calibration or some particular trade-off on the validation set, so it doesn't matter whether you're using predict_proba or decision_function.

EmreAtes · 2018-09-28T18:52:39Z

I was comparing multiple algorithms, and decision_function wasn't implemented for all of them. Now I understood what you meant by duck typing for rejection threshold implementation. I guess I could have done it outside of OneVsRestClassifier.

For the values, I was calibrating within the training set.

Anyways this was used for a publication which is already published, so feel free to close this PR without merging, because now it seems not very necessary for other people, or for me. Especially if there's going to be a rejection threshold implementation soon.

songololo · 2019-06-21T14:47:07Z

I would likewise find an optional normalize=False flag useful.

In my case I would like to plot probability thresholds vs. recall / precision scores for the respective classes, however, since these are in a normalised form I need to train independent classifiers (i.e. not use OneVsRestClassifier) to get the necessary probabilities.

It would also be interesting from the perspective of considering alternate probability thresholds.

Thanks.

Added support for not normalising ovrclassifier probabilities

ee89f4f

jnothman reviewed Feb 10, 2018

View reviewed changes

Implemented predict_proba fallback for decision_function

6567b71

Style fix

1ae4121

Flake8 fixes

82283ea

jnothman reviewed Feb 19, 2018

View reviewed changes

Fixed decision_function return shape, tests

476b055

jnothman reviewed Feb 19, 2018

View reviewed changes

Moved one class condition to end

dec3235

jnothman reviewed Feb 26, 2018

View reviewed changes

Changed parameter to list

aa5e4a2

EmreAtes added 2 commits February 28, 2018 11:03

Moved multiple method checking to new decorator

9c44175

Changed _IffHasAttrDescriptor to accept strings

158ee98

jnothman reviewed Mar 4, 2018

View reviewed changes

Merged the descriptors

7ad1af8

jnothman approved these changes Mar 13, 2018

View reviewed changes

updated what's new

9748f4b

amueller added the Needs Decision Requires decision label Aug 6, 2019

github-actions bot added the module:utils label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:50

Apr	MAY	Jun
	21
2024	2025	2026

Added support for not normalising OneVsRestClassifier predict_proba result #10612

Are you sure you want to change the base?

Added support for not normalising OneVsRestClassifier predict_proba result #10612

Conversation

EmreAtes commented Feb 9, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jnothman left a comment

Choose a reason for hiding this comment

EmreAtes commented Feb 11, 2018

jnothman commented Feb 11, 2018 via email

EmreAtes commented Feb 12, 2018

jnothman commented Feb 12, 2018 via email

EmreAtes commented Feb 13, 2018

sklearn-lgtm commented Feb 13, 2018

sklearn-lgtm commented Feb 13, 2018

jnothman commented Feb 13, 2018 via email

EmreAtes commented Feb 19, 2018

jnothman commented Feb 19, 2018 via email

jnothman left a comment

Choose a reason for hiding this comment

jnothman Feb 19, 2018

Choose a reason for hiding this comment

EmreAtes commented Feb 26, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman Feb 26, 2018

Choose a reason for hiding this comment

jnothman Feb 26, 2018

Choose a reason for hiding this comment

EmreAtes commented Feb 27, 2018

jnothman commented Feb 27, 2018 via email

EmreAtes commented Feb 28, 2018

jnothman left a comment

Choose a reason for hiding this comment

EmreAtes commented Mar 6, 2018

jnothman commented Mar 6, 2018 • edited Loading

EmreAtes commented Mar 6, 2018

jnothman commented Mar 6, 2018 via email

EmreAtes commented Mar 13, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Mar 13, 2018

EmreAtes commented Mar 13, 2018

jnothman commented Mar 14, 2018 via email

EmreAtes commented Mar 14, 2018

jnothman commented Mar 14, 2018 via email

amueller commented Sep 27, 2018 • edited Loading

EmreAtes commented Sep 27, 2018

amueller commented Sep 28, 2018

amueller commented Sep 28, 2018

EmreAtes commented Sep 28, 2018

amueller commented Sep 28, 2018

EmreAtes commented Sep 28, 2018

songololo commented Jun 21, 2019 • edited Loading

jnothman commented Mar 6, 2018 •

edited

Loading

amueller commented Sep 27, 2018 •

edited

Loading

songololo commented Jun 21, 2019 •

edited

Loading