The Wayback Machine - https://web.archive.org/web/20220516112408/https://github.com/scikit-learn/scikit-learn/pull/23038
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EHN Change default value of n_init in cluster.KMeans and cluster.k_means #23038

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Apr 3, 2022

Reference Issues/PRs

Fixes #9729
Resolves #11530 (stalled)

What does this implement/fix? Explain your changes.

Begins deprecation cycle for changing default value of n_init in cluster.KMeans and cluster.k_means to 5.

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

I think we still need to address #9729 (comment) and motivate the change from 10 to 5.

sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Apr 4, 2022

I think we still need to address #9729 (comment) and motivate the change from 10 to 5.

Let's continue the discussion in #9729 before commiting too deeply into this

@Micky774
Copy link
Contributor Author

@Micky774 Micky774 commented Apr 7, 2022

If we go by the example, then we can have a n_init="auto", where: n_init=1 for kmeans++ and n_init=10 for random.

Per discussion in #9729 (comment) I'll go ahead and proceed with introducing auto-resolution for n_init.

@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Apr 7, 2022

I guess it's worth to apply the same pattern to MiniBatchKMeans

@Micky774
Copy link
Contributor Author

@Micky774 Micky774 commented Apr 7, 2022

I guess it's worth to apply the same pattern to MiniBatchKMeans

Should we go ahead and complete this PR first and then open a separate one for MiniBatchKMeans just to keep the scope of the PR small?

sklearn/cluster/_kmeans.py Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
@Micky774
Copy link
Contributor Author

@Micky774 Micky774 commented Apr 8, 2022

I guess it's worth to apply the same pattern to MiniBatchKMeans

Should we go ahead and complete this PR first and then open a separate one for MiniBatchKMeans just to keep the scope of the PR small?

I actually went ahead and extended it to MiniBatchKMeans in this PR. The two estimators share a common _check_params method through _BaseKMeans which already validates n_init-->_n_init. In this case I think it makes more sense to take care of them together since otherwise _n_init would need to be resolved outside of _check_params which is an entirely separate change that would need to be undone if we wanted to add this feature to MiniBatchKMeans later anyways.

sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
@jeremiedbb jeremiedbb added this to the 1.2 milestone Apr 29, 2022
@thomasjpfan thomasjpfan changed the title [MRG] EHN: Change default value of n_init in cluster.KMeans and cluster.k_means EHN Change default value of n_init in cluster.KMeans and cluster.k_means May 1, 2022
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Many tests are failing because we turn future warnings into errors in the CI. For each of these you can either filter the warning or explicitly set n_init, whichever is more appropriate.

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

The ci failures look like they are related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants