New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP006 - Metadata Routing task list #22893
Comments
Do you see this as a collaborative effort? To what extent can we still use #20350? To what extent can we share test components? Which of these metaestimators have existing fit param routing that needs to be deprecated? |
Your list appears to be missing some |
Also missing are functions in |
Most of what's in #20350 can be used. We can open this for collaboration once we figure out a nice way to deprecate existing routing strategies in our meta-estimators. I'll work on it this week. |
Also updated the list, it should be quite complete now I think. |
Re deprecation, one might have thought that using the old logic where there's no request from a meta-estimator's descendant consumers makes sense... but a splitter's request for |
Hey @adrinjalali, how is the work on meta-estimators going? I'm wondering if we should do something crazy like pair programming on it if it's proving hard to get started? |
I spent a few days trying to write common tests for meta-estimators but that didn't go anywhere. After being stuck for a while, @thomasjpfan and I spent some time last week together and we decided to start with simple individual tests, starting for one meta-estimator, and then refactor the tests later when we find recurring patterns. Right now I'm working on multioutput meta-estimators and should have a PR coming today. |
Yes, I think starting with individual tests makes sense, even if I've been curious about reusable components... I think the base components are already well enough tested that tests beyond that need not be super extensive. |
This PR adds metadata routing to BaggingClassifier and BaggingRegressor (see scikit-learn#22893). With this change, in addition to sample_weight, which was already supported, it's now also possible to pass arbitrary fit_params to the sub estimator. Implementation Most of the changes should be pretty straightforward with the existing infrastructure for testing metadata routing. There was one aspect which was not quite trivial though: The current implementation of bagging works by inspecting the sub estimator's fit method. If the sub estimator supports sample_weight, then subsampling is performed by making use of sample weight. This will also happen if the user does not explicitly pass sample weight. At first, I wanted to change the implementation such that if sample weights are requested, subsampling should use the sample weight approach, otherwise it shouldn't. However, that breaks a couple of tests, so I rolled back the change and stuck very closely to the existing implementation. I can't judge if this prevents the user from doing certain things or if subsampling using vs not using sample_weight are equivalent. Coincidental changes The method _validate_estimator on the BaseEnsemble class used to validate, and then set as attribute, the sub estimator. This was inconvenient because for get_metadata_routing, we want to fetch the sub estimator, which is not easily possible with this method. Therefore, a change was introduced that the method now returns the sub estimator and the caller is now responsible for setting it as an attribute. This has the added advantages that the caller can now decide the attribute name and that this method now more closely mirrors _BaseHeterogeneousEnsemble._validate_estimators. Affected by this change are random forests, extra trees, and ada boosting. The function process_routing used to mutate the incoming param dict (adding new items), now it creates a shallow copy first. Extended docstring for check_input of BaseBagging._fit. Testing I noticed that the bagging tests didn't have a test case for sparse input + using sample weights, so I extended an existing test to cover it. The test test_bagging_sample_weight_unsupported_but_passed now raises a TypeError, not ValueError, when sample_weight are passed but not supported.
Hi @adrinjalali Can I work on this issue? Would it be reasonable to start working on LogisticRegressionCV? |
@OmarManzoor you can give it a try, but beware the work in this issue is very involved, I would probably recommend something less involved at this point for you, but giving it a try doesn't hurt :) |
I tried checking out LogisticRegressionCV. On comparing it with the other meta-estimators whose PRs have been created and going through the overall mechanism of routing, this one seems a bit different as it does not seem to contain any child estimators. Instead it inherits from LogisticRegression and calls the functions _log_reg_scoring_path and _logistic_regression_path. Moreover it seems that the basic scoring might be also covered by an earlier PR of yours. |
Yes, I remember looking at that and it not being straightforward. In this case, instead of it being only a router, it's a consumer for whatever |
This issue is to track the work we need to do before we can merge
sample-props
branch intomain
:sample-props
. This PR only touchesBaseEstimator
and hence consumers. It does NOT touch meta-estimators, scorers or cv splitters.sample-props
: FEAT add metadata routing to splitters #22765sample-props
(note that this involves an ongoing discussion on whether we'd like to mutate a scorer or not): FEAT SLEP6: scorers #22757get_metadata_routing
in easy cases, and a whole lot more in cases where we'd like to keep backward compatibility in parsing input args suchs asestimator__param
inPipeline
sample-props
, do a few tests with third party estimators to see if they'd work out of the box. Note that consumer estimators should work out of the box as long as they inherit fromBaseEstimator
and theirfit
accepts metadata as explicit arguments rather than**kwargs
._metadata_requests.py
and work with scikit-learn meta-estimators w/o depending on the library.sample-props
intomain
: FEAT SLEP006: metadata routing #24027Enhancements:
Open issues:
set_{method}_request
methods #23933Our plan is to hopefully have this feature in 1.1, which we should be releasing in late April/early May.
Here's a list of meta-estimators which need to be updated:
cc @jnothman @thomasjpfan @lorentzenchr
The text was updated successfully, but these errors were encountered: