The Wayback Machine - https://web.archive.org/web/20221025103257/https://github.com/scikit-learn/scikit-learn/issues/22587
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF PairwiseDistancesReductions roadmap #22587

Open
jjerphan opened this issue Feb 23, 2022 · 0 comments
Open

PERF PairwiseDistancesReductions roadmap #22587

jjerphan opened this issue Feb 23, 2022 · 0 comments

Comments

@jjerphan
Copy link
Member

jjerphan commented Feb 23, 2022

PairwiseDistancesReductions have been introduced as a hierarchy of Cython classes to implement back-ends of some scikit-learn algorithms.

Pieces of work include:

Subsequent work include:

  • Create a private asv benchmark suite for the private submodule
  • Use a stable sort to support more distance metrics (see this TODO)
  • Support "precomputed" distances
  • Force the use of PairwiseDistancesReductions for F-contiguous arrays
    • use a scikit-learn configuration entry to accept and convert F-contiguous arrays to C-contiguous arrays
  • Document and communicate about the changes of behaviour regarding n_jobs
  • Release the GIL before calling parallel_on_{X|Y} instead of releasing it in those methods
  • Revisit KMeans implementations using PairwiseDistancesReductions
  • Integrate mimalloc to have proper memory allocation for multi-threaded implementations:
    • Changing the default implementation of malloc(1) might come with unexpected changes and might break the whole ecosystem. Also maintaining it might be costly.
  • Introduce a specialised back-end for RadiusNeighbors*.predict* which would remove the costly sequential portion after the current call to radius_neighbors
  • Improve tests:
    • assert_argkmin_results_quasi_equality to report the original distances (before rounding) for the nighbor indices in the 2 rounded dist groups in the AssertionError message to help understand the nature of the failures found
    • parametrise old tests for public API backed by PairwiseDistancesReductions
    • systematically test public API on combinations of sparse and dense datasets

Note that this needs not be personal work, I would be really glad having others help on this subject, proposing changes and implementations! 🙂

@github-actions github-actions bot added the Needs Triage Issue requires triage label Feb 23, 2022
@ogrisel ogrisel changed the title Computational back-end roadmap pairwise distances computational back-end roadmap Feb 24, 2022
@jjerphan jjerphan changed the title pairwise distances computational back-end roadmap PairwiseDistancesReductions back-end roadmap Feb 24, 2022
@jjerphan jjerphan changed the title PairwiseDistancesReductions back-end roadmap PairwiseDistancesReductions roadmap Feb 28, 2022
@jeremiedbb jeremiedbb removed the Needs Triage Issue requires triage label Mar 10, 2022
@jjerphan jjerphan changed the title PairwiseDistancesReductions roadmap PERF PairwiseDistancesReductions roadmap Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants