The Wayback Machine - https://web.archive.org/web/20230613161200/https://github.com/scikit-learn/scikit-learn/issues/26061
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store spectral embeddings in SpectralClustering #26061

Open
matteo-bastico opened this issue Apr 3, 2023 · 3 comments
Open

Store spectral embeddings in SpectralClustering #26061

matteo-bastico opened this issue Apr 3, 2023 · 3 comments

Comments

@matteo-bastico
Copy link

Describe the workflow you want to enable

Save the spectral embeddings used for clustering in the SpectralClustering class and make them accessible through an attribute, e.g. maps_, to make easier post-processing on the clusters.

Describe your proposed solution

Optionally return the maps in the spectral_clustering method with a new parameter:

def spectral_clustering(
    affinity,
    *,
    n_clusters=8,
    n_components=None,
    eigen_solver=None,
    random_state=None,
    n_init=10,
    eigen_tol="auto",
    assign_labels="kmeans",
    verbose=False,
    return_maps=False
):
    
    ...
    
    if return_maps:
        return maps, labels
    else:
        return labels

Store maps_ attribute in the fit method of the SpectralClustering class:

self.maps_, self.labels_ = spectral_clustering(
            self.affinity_matrix_,
            n_clusters=self.n_clusters,
            n_components=self.n_components,
            eigen_solver=self.eigen_solver,
            random_state=random_state,
            n_init=self.n_init,
            eigen_tol=self.eigen_tol,
            assign_labels=self.assign_labels,
            verbose=self.verbose,
            return_maps=True
        )

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@matteo-bastico matteo-bastico added Needs Triage Issue requires triage New Feature labels Apr 3, 2023
@ogrisel
Copy link
Member

ogrisel commented Apr 6, 2023

Save the spectral embeddings used for clustering in the SpectralClustering class and make them accessible through an attribute, e.g. maps_, to make easier post-processing on the clusters.

@matteo-bastico it would help us decide if you could explain how you would use this.

@thomasjpfan
Copy link
Member

As a workaround, you may recompute the mapping after fitting SpectralClustering:

from sklearn.cluster import SpectralClustering
from sklearn.manifold import spectral_embedding
import numpy as np

X = np.array([[1, 1], [2, 1], [1, 0],
              [4, 7], [3, 5], [3, 6]])
clustering = SpectralClustering(n_clusters=2,
        assign_labels='discretize',
        random_state=0).fit(X)

maps = spectral_embedding(
    clustering.affinity_matrix_,
    n_components=clustering.n_clusters,
    eigen_solver=clustering.eigen_solver,
    random_state=0,
    eigen_tol=clustering.eigen_tol,
    drop_first=False,
)

@thomasjpfan thomasjpfan added Needs Info and removed Needs Triage Issue requires triage labels Apr 6, 2023
@matteo-bastico
Copy link
Author

matteo-bastico commented Apr 11, 2023

Save the spectral embeddings used for clustering in the SpectralClustering class and make them accessible through an attribute, e.g. maps_, to make easier post-processing on the clusters.

@matteo-bastico it would help us decide if you could explain how you would use this.

In my case, I want to compute the medoids of the clusters using the distances in the spectral embedding space instead of the original Euclidean space. But there are others applications in which this feature can be useful.

As a workaround, you may recompute the mapping after fitting SpectralClustering:

from sklearn.cluster import SpectralClustering
from sklearn.manifold import spectral_embedding
import numpy as np

X = np.array([[1, 1], [2, 1], [1, 0],
              [4, 7], [3, 5], [3, 6]])
clustering = SpectralClustering(n_clusters=2,
        assign_labels='discretize',
        random_state=0).fit(X)

maps = spectral_embedding(
    clustering.affinity_matrix_,
    n_components=clustering.n_clusters,
    eigen_solver=clustering.eigen_solver,
    random_state=0,
    eigen_tol=clustering.eigen_tol,
    drop_first=False,
)

Thank you, as a workaround it works but the spectral embeddings are computed twice and for large matrices it is time consuming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants