SLEP006 - Metadata Routing task list #22893

adrinjalali · 2022-03-18T16:27:22Z

This issue is to track the work we need to do before we can merge sample-props branch into main:

Enhancements:

Open issues:

SLEP006 - metadata handling: fit, transform, fit_transform #22987: ENH make fit_transform and fit_predict composite methods #24585
SLEP006 - equal requests? {fit, partial_fit}, {predict, predict_proba, predict_log_proba, decision_function} #22988: ENH allow all main top level methods to have a corresponding set metadata request #23342
RFC SLEP006: allow users to enable a "strict" mode in metadata routing #23920
RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928
SLEP006: mypy can't find set_{method}_request methods #23933

Our plan is to hopefully have this feature in 1.1, which we should be releasing in late April/early May.

Here's a list of meta-estimators which need to be updated:

cc @jnothman @thomasjpfan @lorentzenchr

The text was updated successfully, but these errors were encountered:

jnothman · 2022-03-19T11:08:16Z

Do you see this as a collaborative effort? To what extent can we still use #20350? To what extent can we share test components? Which of these metaestimators have existing fit param routing that needs to be deprecated?

jnothman · 2022-03-19T11:09:45Z

Your list appears to be missing some *CV estimators (E.g. ElasticNetCV) that will have to route to splitters if not scorers.

jnothman · 2022-03-19T11:15:57Z

Also missing are functions in sklearn.model_selection._validation

adrinjalali · 2022-03-22T15:06:04Z

Do you see this as a collaborative effort? To what extent can we still use #20350? To what extent can we share test components? Which of these metaestimators have existing fit param routing that needs to be deprecated?

Most of what's in #20350 can be used. We can open this for collaboration once we figure out a nice way to deprecate existing routing strategies in our meta-estimators. I'll work on it this week.

adrinjalali · 2022-03-22T15:06:29Z

Also updated the list, it should be quite complete now I think.

jnothman · 2022-03-23T13:05:25Z

Re deprecation, one might have thought that using the old logic where there's no request from a meta-estimator's descendant consumers makes sense... but a splitter's request for groups by default might mess that up...

jnothman · 2022-03-28T12:45:33Z

Hey @adrinjalali, how is the work on meta-estimators going? I'm wondering if we should do something crazy like pair programming on it if it's proving hard to get started?

adrinjalali · 2022-03-28T12:49:10Z

I spent a few days trying to write common tests for meta-estimators but that didn't go anywhere. After being stuck for a while, @thomasjpfan and I spent some time last week together and we decided to start with simple individual tests, starting for one meta-estimator, and then refactor the tests later when we find recurring patterns.

Right now I'm working on multioutput meta-estimators and should have a PR coming today.

jnothman · 2022-03-28T12:54:56Z

Yes, I think starting with individual tests makes sense, even if I've been curious about reusable components... I think the base components are already well enough tested that tests beyond that need not be super extensive.

This PR adds metadata routing to BaggingClassifier and BaggingRegressor (see scikit-learn#22893). With this change, in addition to sample_weight, which was already supported, it's now also possible to pass arbitrary fit_params to the sub estimator. Implementation Most of the changes should be pretty straightforward with the existing infrastructure for testing metadata routing. There was one aspect which was not quite trivial though: The current implementation of bagging works by inspecting the sub estimator's fit method. If the sub estimator supports sample_weight, then subsampling is performed by making use of sample weight. This will also happen if the user does not explicitly pass sample weight. At first, I wanted to change the implementation such that if sample weights are requested, subsampling should use the sample weight approach, otherwise it shouldn't. However, that breaks a couple of tests, so I rolled back the change and stuck very closely to the existing implementation. I can't judge if this prevents the user from doing certain things or if subsampling using vs not using sample_weight are equivalent. Coincidental changes The method _validate_estimator on the BaseEnsemble class used to validate, and then set as attribute, the sub estimator. This was inconvenient because for get_metadata_routing, we want to fetch the sub estimator, which is not easily possible with this method. Therefore, a change was introduced that the method now returns the sub estimator and the caller is now responsible for setting it as an attribute. This has the added advantages that the caller can now decide the attribute name and that this method now more closely mirrors _BaseHeterogeneousEnsemble._validate_estimators. Affected by this change are random forests, extra trees, and ada boosting. The function process_routing used to mutate the incoming param dict (adding new items), now it creates a shallow copy first. Extended docstring for check_input of BaseBagging._fit. Testing I noticed that the bagging tests didn't have a test case for sparse input + using sample weights, so I extended an existing test to cover it. The test test_bagging_sample_weight_unsupported_but_passed now raises a TypeError, not ValueError, when sample_weight are passed but not supported.

OmarManzoor · 2022-09-12T10:32:47Z

Hi @adrinjalali Can I work on this issue? Would it be reasonable to start working on LogisticRegressionCV?

adrinjalali · 2022-09-12T14:17:20Z

@OmarManzoor you can give it a try, but beware the work in this issue is very involved, I would probably recommend something less involved at this point for you, but giving it a try doesn't hurt :)

OmarManzoor · 2022-09-13T12:23:33Z

@OmarManzoor you can give it a try, but beware the work in this issue is very involved, I would probably recommend something less involved at this point for you, but giving it a try doesn't hurt :)

I tried checking out LogisticRegressionCV. On comparing it with the other meta-estimators whose PRs have been created and going through the overall mechanism of routing, this one seems a bit different as it does not seem to contain any child estimators. Instead it inherits from LogisticRegression and calls the functions _log_reg_scoring_path and _logistic_regression_path. Moreover it seems that the basic scoring might be also covered by an earlier PR of yours.

adrinjalali · 2022-09-13T16:41:10Z

Yes, I remember looking at that and it not being straightforward. In this case, instead of it being only a router, it's a consumer for whatever LogisticRegression accepts, and a router for the CV and scorer.

adrinjalali added the Meta-issue General issue associated to an identified list of tasks label Mar 18, 2022

adrinjalali mentioned this issue Mar 29, 2022

FEAT multioutput routes metadata #22986

Merged

adrinjalali added this to the 1.2 milestone Mar 29, 2022

adrinjalali mentioned this issue Jul 16, 2022

Refactor metadata routing classes used in tests #23918

Open

This was referenced Jul 26, 2022

[MRG] Add fit_params to RFECV.fit #24004

Closed

ENH make sure warn_on errors on invalid child #24023

Merged

BenjaminBossan mentioned this issue Aug 5, 2022

SLEP006: CalibratedClassifierCV #24126

Merged

BenjaminBossan mentioned this issue Aug 24, 2022

SLEP006: Metadata routing for bagging #24250

Open

adrinjalali added the Hard Hard level of difficulty label Sep 12, 2022

OmarManzoor mentioned this issue Sep 15, 2022

SLEP006: ClassifierChain and RegressorChain routing #24443

Merged

OmarManzoor mentioned this issue Sep 22, 2022

SLEP006 - Add Metadata Routing to LogisticRegressionCV #24498

Open

glemaitre modified the milestones: 1.2, 1.3 Nov 16, 2022

Feb	MAR	Jun
	04
2022	2023	2025

SLEP006 - Metadata Routing task list #22893

SLEP006 - Metadata Routing task list #22893

adrinjalali commented Mar 18, 2022 •

edited

jnothman commented Mar 19, 2022

jnothman commented Mar 19, 2022

jnothman commented Mar 19, 2022

adrinjalali commented Mar 22, 2022

adrinjalali commented Mar 22, 2022

jnothman commented Mar 23, 2022 •

edited

jnothman commented Mar 28, 2022

adrinjalali commented Mar 28, 2022

jnothman commented Mar 28, 2022

OmarManzoor commented Sep 12, 2022 •

edited

adrinjalali commented Sep 12, 2022

OmarManzoor commented Sep 13, 2022 •

edited

adrinjalali commented Sep 13, 2022

SLEP006 - Metadata Routing task list #22893

SLEP006 - Metadata Routing task list #22893

Comments

adrinjalali commented Mar 18, 2022 • edited

jnothman commented Mar 19, 2022

jnothman commented Mar 19, 2022

jnothman commented Mar 19, 2022

adrinjalali commented Mar 22, 2022

adrinjalali commented Mar 22, 2022

jnothman commented Mar 23, 2022 • edited

jnothman commented Mar 28, 2022

adrinjalali commented Mar 28, 2022

jnothman commented Mar 28, 2022

OmarManzoor commented Sep 12, 2022 • edited

adrinjalali commented Sep 12, 2022

OmarManzoor commented Sep 13, 2022 • edited

adrinjalali commented Sep 13, 2022

adrinjalali commented Mar 18, 2022 •

edited

jnothman commented Mar 23, 2022 •

edited

OmarManzoor commented Sep 12, 2022 •

edited

OmarManzoor commented Sep 13, 2022 •

edited