The Wayback Machine - https://web.archive.org/web/20250521192844/https://github.com/scikit-learn/scikit-learn/pull/10612
Skip to content

Added support for not normalising OneVsRestClassifier predict_proba result #10612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

EmreAtes
Copy link

@EmreAtes EmreAtes commented Feb 9, 2018

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

I added support for not normalizing the predict_proba result, and returning it as it is, even if the data is not multilabel. This is useful for implementing a rejection threshold.

Any other comments?

I can also add thresholding in ovrclassifier or somewhere else to reject classifications results, but I didn't see anything else like that in scikit, so I don't know if it's a wanted feature.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want predict_proba to ever return anything but normalized probabilities. Is there a reason decision_function doesn't suit your needs?

@EmreAtes
Copy link
Author

I was using it on random forests and decision trees as well, which do not have a decision_function.

@jnothman
Copy link
Member

jnothman commented Feb 11, 2018 via email

@EmreAtes
Copy link
Author

As far as I know, the decision function is -1 for confident -1 class, +1 for confident +1 class, but it's not necessarily in the [-1, 1] range, and it represents the distance from the hyperplanes. For the random forest probabilities, would decision function be equal to p(x) * 2 - 1, or would calculating a distance from the decision boundaries of the random forest make sense? (i.e. for each tree node, how far away were we from the condition in the node).

If it's documented that it's not necessarily equivalent to the SVM's decision_function, then it might be OK to define decision_function like you suggest.

@jnothman
Copy link
Member

jnothman commented Feb 12, 2018 via email

@EmreAtes
Copy link
Author

I changed some things so that decision_function retuns predict_proba * 2 - 1, if decision_function is not defined. I added a test for this case.

  • I noticed that decision_function returns an array of shape (n_samples,) instead of (n_samples,1) if there is a single class. I don't know if this is intended functionality, but it's not in the doctest, so I removed it, I can put it back.
  • I changed if_delegate_has_method to include a fallback method, so it can check for either predict_proba or decision_function, but I'm not sure if this is useful for other cases as well.

@sklearn-lgtm
Copy link

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

  • 37 for Explicit export is not defined

fixed alerts:

  • 6 for Mismatch between signature and use of an overridden method
  • 3 for Missing call to init during object initialization
  • 3 for Potentially uninitialized local variable
  • 2 for Comparison using is when operands support eq
  • 2 for Wrong number of arguments for format
  • 1 for Conflicting attributes in base classes
  • 1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

@sklearn-lgtm
Copy link

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

  • 37 for Explicit export is not defined

fixed alerts:

  • 6 for Mismatch between signature and use of an overridden method
  • 3 for Missing call to init during object initialization
  • 3 for Potentially uninitialized local variable
  • 2 for Comparison using is when operands support eq
  • 2 for Wrong number of arguments for format
  • 1 for Conflicting attributes in base classes
  • 1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

@jnothman
Copy link
Member

jnothman commented Feb 13, 2018 via email

@EmreAtes
Copy link
Author

@jnothman any comments on the current state?

@jnothman
Copy link
Member

jnothman commented Feb 19, 2018 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you had better add a test for that binary case and note it in the docstring

except AttributeError:
T = np.array([e.predict_proba(X)[:, 1] * 2 - 1
for e in self.estimators_]).T
if len(self.estimators_) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify by just placing this logic below, to select the column instead

@EmreAtes
Copy link
Author

@jnothman , does it look better now?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not altogether happy about making if_delegate_has_method less understandable

@@ -87,10 +87,11 @@ class _IffHasAttrDescriptor(object):
See https://docs.python.org/3/howto/descriptor.html for an explanation of
descriptors.
"""
def __init__(self, fn, delegate_names, attribute_name):
def __init__(self, fn, delegate_names, attribute_name, backup_attribute):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here attribute_name should become a list, like delegate_names.

However I wonder if we should just use a property rather than making magic.

@@ -118,7 +124,7 @@ def __get__(self, obj, type=None):
return out


def if_delegate_has_method(delegate):
def if_delegate_has_method(delegate, backup_method=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the parameter below

@EmreAtes
Copy link
Author

@jnothman , I'm open to making a new decorator, but couldn't think of one that is useful in multiple places because I'm not very familiar with the entire codebase. Do you have a specific design in mind?

The remainder of the code is fixed.

@jnothman
Copy link
Member

jnothman commented Feb 27, 2018 via email

@EmreAtes
Copy link
Author

@jnothman , I added a second decorator to utils/metaestimators.py. I can re-name it because it sounds very close to the original, but I think the name is explanatory enough.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred the previous version to this extra decorator. The name is no clearer by making it plural; plural makes it seem like all must be present.

I don't want to complexity the existing decorator if we can help it either, but the solution is to use property, not a new descriptor. Sigh, perhaps I'll live with the existing decorator being more complex.

@EmreAtes
Copy link
Author

EmreAtes commented Mar 6, 2018

@jnothman
For the first point, I agree that plural makes it seem like all names are needed, so all names are actually needed. The invocation of the new descriptor is like this:

@if_delegate_has_methods(['_first_estimator', 'estimator'],
                         ['decision_function', 'predict_proba'])
    def decision_function(self, X):

As far as I could understand, property is for defining class variables that have getters and setters. Which of the variables would be a property?

@jnothman
Copy link
Member

jnothman commented Mar 6, 2018

Properties are descriptors which will result in hasattr(..., ...) to return False if the property getter raises an attribute error. The descriptor is more-or-less equivalent to:

def decision_function(self):
    if not hasattr(self.estimator, 'decision_function') and not hasattr...:
        raise AttributeError
    def _func(X, y):
        # calculate the decision function
        return df
    return _func

@EmreAtes
Copy link
Author

EmreAtes commented Mar 6, 2018

@jnothman , I'm not sure if I understood correctly, so I made a sample. https://pastebin.com/6Y9XUwg0 is this a better solution?

@jnothman
Copy link
Member

jnothman commented Mar 6, 2018 via email

@EmreAtes
Copy link
Author

@jnothman I put it back together as a single decorator

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this LGTM

@jnothman
Copy link
Member

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@EmreAtes
Copy link
Author

@jnothman I added it to what's new. Are pull requests are also referenced as :issue:`number`?

@jnothman
Copy link
Member

jnothman commented Mar 14, 2018 via email

@EmreAtes
Copy link
Author

@jnothman , it's all set then.

@jnothman
Copy link
Member

jnothman commented Mar 14, 2018 via email

@amueller
Copy link
Member

amueller commented Sep 27, 2018

I'm not entirely sure what the motivation for this is.
If the goal is to avoid having to duck-type predict_proba and decision_function, why don't we define decision_function in BaseEstimator?

@EmreAtes
Copy link
Author

Which base_estimator?

If you mean the BaseEstimator in base.py, I'm not sure if this fallback to using 2 * predict_proba - 1 in cases where decision_function is not available is useful for all cases.

I think understanding the difference between the probabilities [0, 0.1, 0.05, 0] and [0, 1, 0.5, 0] for each of the classes in a OneVsRest case is useful, but I'm not sure if this is the best way of going about implementing this. I was initially only thinking a normalize=True/False kwarg for predict_proba.

@amueller
Copy link
Member

Why do you think it would be useful here and not in other cases? predict_proba has a more limited definition than decision_function (though some people might expect some sigmoid semantics).

Adding kwargs to functions is kind of frowned upon as it's hard to access those in compound estimators like pipelines. And anything that changes the predict_proba contract to not be normalized also seems a bit weird to me.

@amueller
Copy link
Member

Your goal is basically to implement a rejection threshold, right? So the only reason for this PR is to not do ducktyping in your implementation of the rejection threshold?

I think we'll add thresholding methods soon (hopefully) and I'm pretty sure they will contain the ducktyping. I'm not sure why putting this logic in this particular place would be the right solution.

@EmreAtes
Copy link
Author

I'm not sure what other cases it would be useful in, but if you think it would, I can add that to BaseEstimator.

Yes, I was trying to implement a rejection threshold for OneVsRestClassifier, so I had to have another class that inherited OneVsRestClassifier and re-implemented predict_proba, which required knowledge of the internal workings of it.

I can add an option to normalize or not to the constructor of OneVsRestClassifier, which would also be accessible from a pipeline. Or, there could be another function that gives some per-class confidence.

@amueller
Copy link
Member

My question was more why you didn't use decision_function. But I guess you take the probabilities at face value and so the actual values matter? Usually I'd ensure calibration or some particular trade-off on the validation set, so it doesn't matter whether you're using predict_proba or decision_function.

@EmreAtes
Copy link
Author

I was comparing multiple algorithms, and decision_function wasn't implemented for all of them. Now I understood what you meant by duck typing for rejection threshold implementation. I guess I could have done it outside of OneVsRestClassifier.

For the values, I was calibrating within the training set.

Anyways this was used for a publication which is already published, so feel free to close this PR without merging, because now it seems not very necessary for other people, or for me. Especially if there's going to be a rejection threshold implementation soon.

@songololo
Copy link

songololo commented Jun 21, 2019

I would likewise find an optional normalize=False flag useful.

In my case I would like to plot probability thresholds vs. recall / precision scores for the respective classes, however, since these are in a normalised form I need to train independent classifiers (i.e. not use OneVsRestClassifier) to get the necessary probabilities.

It would also be interesting from the perspective of considering alternate probability thresholds.

Thanks.

@amueller amueller added the Needs Decision Requires decision label Aug 6, 2019
Base automatically changed from master to main January 22, 2021 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants