The Wayback Machine - https://web.archive.org/web/20220524211305/https://github.com/scikit-learn/scikit-learn/pull/20860
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add support for 'passthrough' in FeatureUnion #20860

Merged
merged 22 commits into from Sep 14, 2021

Conversation

shubhraneel
Copy link
Contributor

@shubhraneel shubhraneel commented Aug 27, 2021

Reference Issues/PRs

Fixes #20699

What does this implement/fix? Explain your changes.

  • Adds a feature to 'passthrough' to pass the features without a transformation and added test and documentation for it.
  • Changes made in FeatureUnion._validate_transformers() to allow 'passthrough'
  • Changes made in FeatureUnion._iter() to use FunctionTransformer(none) in place of passthrough to represent the identity transformer.
  • Added FeatureUnion._passthrough_function() which returns a FunctionTransformer(none) with get_feature_names set. This separate function is needed as the FunctionTransformer class does not have a get_feature_names function which is needed by FeatureUnion.
  • Wrote a test function test_set_feature_union_passthrough() to test the functionality.

Any other comments?

It may be better to check if more than one transformers are not set to passthrough as having a repetition of the same features is redundant.

@shubhraneel shubhraneel changed the title Passthrough feature union Support 'passthrough' in FeatureUnion Aug 27, 2021
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
@shubhraneel
Copy link
Contributor Author

@shubhraneel shubhraneel commented Aug 28, 2021

@amueller I have added these new changes and I removed the get_feature_names from passthrough as it seems unnecessary. If you suggest get_feature_names could be useful for passthrough, then I will implement it.

@shubhraneel
Copy link
Contributor Author

@shubhraneel shubhraneel commented Aug 28, 2021

@amueller I am not able to understand how shall I fix this Check Changelog check. Could you please help me regarding this?

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Aug 30, 2021

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@shubhraneel
Copy link
Contributor Author

@shubhraneel shubhraneel commented Aug 30, 2021

@glemaitre thanks. I did that.

sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
@glemaitre glemaitre self-requested a review Sep 2, 2021
Copy link
Contributor

@glemaitre glemaitre left a comment

In the test, we should add a case where we passthrough features but with a weight to check that the features are still transformed by the weight.

sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
assert_array_equal(X, ft.fit_transform(X)[:, -columns:])
assert not record

pass
Copy link
Contributor

@glemaitre glemaitre Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this line

@glemaitre glemaitre changed the title Support 'passthrough' in FeatureUnion ENH add support for 'passthrough' in FeatureUnion Sep 2, 2021
Copy link
Member

@ogrisel ogrisel left a comment

First pass. Indeed we probably need additional tests to cover the interaction of "passthrough" with get_feature_names.

Note that this will be impacted by #18444 once merged.

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@glemaitre glemaitre left a comment

@shubhraneel Would you be able to finish this PR?

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
@shubhraneel
Copy link
Contributor Author

@shubhraneel shubhraneel commented Sep 8, 2021

@glemaitre Yes, I will be able to finish this PR

@shubhraneel
Copy link
Contributor Author

@shubhraneel shubhraneel commented Sep 8, 2021

@glemaitre I have addressed all the review points. And, I have added the passthrough with transform_weights in test_pipeline too.

Copy link
Contributor

@glemaitre glemaitre left a comment

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Thank you for the PR @shubhraneel !

Tests look good. Some comments on the implementation.

sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

LGTM

@thomasjpfan thomasjpfan merged commit f309ffb into scikit-learn:main Sep 14, 2021
32 checks passed
@glemaitre glemaitre mentioned this pull request Oct 23, 2021
10 tasks
samronsin pushed a commit to samronsin/scikit-learn that referenced this issue Nov 30, 2021
Co-authored-by: shubhraneel <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants