The Wayback Machine - https://web.archive.org/web/20230613161245/https://github.com/scikit-learn/scikit-learn/issues/26013
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per feature "maximum category" counts to OrdinalEncoder #26013

Open
betatim opened this issue Mar 29, 2023 · 4 comments · May be fixed by #26284
Open

Add per feature "maximum category" counts to OrdinalEncoder #26013

betatim opened this issue Mar 29, 2023 · 4 comments · May be fixed by #26284

Comments

@betatim
Copy link
Member

betatim commented Mar 29, 2023

Describe the workflow you want to enable

This is a follow up task for #25677

It would be nice to allow users to specify a per feature number of maxcategories instead of having a global limit as implemented in #25677.

More details in the linked comment.

Describe your proposed solution

Allow users to pass a list of shape (n_features,) or a dict mapping column name to max_categories values to specify the number of maximum categories per feature.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@Aryan-Mishra24
Copy link

Dear @betatim,

I'm reaching out to express my interest in contributing to the scikit-learn project, specifically the issue #26013 that you opened regarding adding per feature maximum category counts to OrdinalEncoder.

I have experience with Python programming and machine learning, and I believe that I can make a meaningful contribution to this project. I'm excited about the idea of allowing users to specify the number of maximum categories per feature, and I'm eager to work on this feature.

I would appreciate it if you could guide me on how to get started with this project. Should I read any specific documentation or study any relevant code before diving in?

Thank you for your time and consideration.

Best regards,
Aryan Mishra

@betatim betatim changed the title Add per feature maximum category counts to OrdinalEncoder Add per feature "maximum category" counts to OrdinalEncoder Apr 3, 2023
@Andrew-Wang-IB45
Copy link
Contributor

Hi @betatim, if this feature can be considered for inclusion, I would like to work on this issue.

@betatim betatim removed the Needs Triage Issue requires triage label Apr 17, 2023
@betatim
Copy link
Member Author

betatim commented Apr 17, 2023

If you want to work on this please do. Try and open a PR as soon as possible so that others can see that you are working on this and people can guide the work. You can mark the PR as "draft" if it isn't ready for reviewing yet.

@Andrew-Wang-IB45
Copy link
Contributor

Hi @betatim, I would opened a working PR. The failures stem from the lack of an updated changelog and the fact that many files not changed by this PR are not properly linted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants