The Wayback Machine - https://web.archive.org/web/20250629163228/https://github.com/scikit-learn/scikit-learn/pull/31673
Skip to content

Add requires_fit=False tag to FeatureHasher and HashingVectorizer #31673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

nishanthp
Copy link

…x #30689)

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds "requires_fit = False tag" to "FeatureHasher" and "HashingVectorizer" by overriding "_more_tags()" in each class. This aligns the behavior with the tag-based API in common estimator checks.

Any other comments?

  • Added sklearn/tests/test_hash_requires_fit.py to verify that both estimators correctly expose the requires_fit tag.
  • Verified successful compilation and tag evaluation in a clean environment.

Copy link

github-actions bot commented Jun 28, 2025

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/tests/test_hash_requires_fit.py:1:1: I001 [*] Import block is un-sorted or un-formatted
  |
1 | / import pytest
2 | | from sklearn.feature_extraction import FeatureHasher
3 | | from sklearn.feature_extraction.text import HashingVectorizer
  | |_____________________________________________________________^ I001
4 |
5 |   def test_feature_hasher_requires_fit_tag():
  |
  = help: Organize imports

sklearn/tests/test_hash_requires_fit.py:1:8: F401 [*] `pytest` imported but unused
  |
1 | import pytest
  |        ^^^^^^ F401
2 | from sklearn.feature_extraction import FeatureHasher
3 | from sklearn.feature_extraction.text import HashingVectorizer
  |
  = help: Remove unused import: `pytest`

Found 2 errors.
[*] 2 fixable with the `--fix` option.

ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/tests/test_hash_requires_fit.py
+++ sklearn/tests/test_hash_requires_fit.py
@@ -2,12 +2,14 @@
 from sklearn.feature_extraction import FeatureHasher
 from sklearn.feature_extraction.text import HashingVectorizer
 
+
 def test_feature_hasher_requires_fit_tag():
     """Test that FeatureHasher has requires_fit=False tag."""
     hasher = FeatureHasher()
-    assert hasher._more_tags()['requires_fit'] is False
+    assert hasher._more_tags()["requires_fit"] is False
+
 
 def test_hashing_vectorizer_requires_fit_tag():
     """Test that HashingVectorizer has requires_fit=False tag."""
     vectorizer = HashingVectorizer()
-    assert vectorizer._more_tags()['requires_fit'] is False
+    assert vectorizer._more_tags()["requires_fit"] is False

1 file would be reformatted, 924 files already formatted

Generated for commit: f146711. Link to the linter CI: here

@nishanthp nishanthp force-pushed the add-requires-fit-tag branch from 293524c to f146711 Compare June 28, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant