The Wayback Machine - https://web.archive.org/web/20220530203538/https://github.com/scikit-learn/scikit-learn/issues/23450
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neighbors algorithm based on NSW graphs #23450

Open
LeoSvalov opened this issue May 24, 2022 · 0 comments
Open

Add neighbors algorithm based on NSW graphs #23450

LeoSvalov opened this issue May 24, 2022 · 0 comments
Labels
Needs Triage New Feature

Comments

@LeoSvalov
Copy link

@LeoSvalov LeoSvalov commented May 24, 2022

Describe the workflow you want to enable

Dear Scikit-Learn maintainers, good afternoon!

I would like to add the algorithm to do the approximate nearest neighbors search to the sklearn.neighbors module.
The algorithm may enhance performance in the case of high-dimensional data space, starting from data dimension D > 50.
It will be the alternative to the existing KDTree and BallTree methods.

The desired utilization can be as follows:

from sklearn.neighbors import NSWGraph
import numpy as np
rng = np.random.RandomState(0)
X = rng.random_sample((50, 128))  
nswgraph = NSWGraph()
nswgraph.build(X)
dist, ind = tree.query(X[:1], k=3)  

Describe your proposed solution

The method is based on Navigable small world graphs (NSW graphs) that tends to demonstrate better performance in the high-dimensional data space [1] (D > 50). This data structure is a graph which satisfies the property that the shortest path between two random vertices increases proportionally to the logarithm of the graph size [2].

Describe alternatives you've considered, if relevant

The Scikit-Learn alternatives of the proposed method are the KDTree and BallTree algorithms that give a performance advantage in low-dimensional data but become inefficient as the dimensionality grows (suffer from the curse of dimensionality) [3].

Additional context

Despite the created issue with feature request, you can find the actual implementation proposed as the pull request to the
scikit-learn-extra project.


References

[1] Malkov, Y., Ponomarenko, A., Logvinov, A., & Krylov, V. (2014).
Approximate nearest neighbor algorithm based on navigable small world graphs.
Information Systems, 45, 61-68.
[2] Sandberg, Oskar, and Ian Clarke. "The evolution of navigable small-world networks." arXiv preprint cs/0607025 (2006).
[3] Scikit-Learn User's guide: Nearest Neighbor Algorithms
https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms

@LeoSvalov LeoSvalov added Needs Triage New Feature labels May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage New Feature
Projects
None yet
Development

No branches or pull requests

1 participant