The Wayback Machine - https://web.archive.org/web/20220520091526/https://github.com/scikit-learn/scikit-learn/pull/22590
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT 32bit datasets support for PairwiseDistancesReduction #22590

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Feb 23, 2022

Reference Issues/PRs

Follows #22134. Experimental POC to assess if Tempita is sufficient.

What does this implement/fix? Explain your changes.

Full design proposal

Context

PairwiseDistancesReduction needs to support float32 and float64 DatasetPairs.

To do so, DatasetPairs needs to be adapted for float32 (X, Y) and concrete PairwiseDistancesReductions needs to do maintain the routing to those.

The current Cython extension types (i.e cdef class) hierarchy currently support 64 bits implementation. It simply breaks down as follows:

                        (abstract)
                 PairwiseDistancesReduction
                            ^
                            |
                            |
                  (concrete 64bit implem.)
                       (Python API)
                  PairwiseDistancesArgKmin
                            ^
                            |
                            |
            (specialized concrete 64bit implem.)
            FastEuclideanPairwiseDistancesArgKmin

Where FastEuclideanPairwiseDistancesArgKmin is called in most cases.

Problem

We need some flexibility to be able to support 32bit datasets while not duplicating the implementations. In this regard, templating (i.e. to have classes be dtype-defined) and type covariance (i.e. if A extends B than Class<A> extends Class<B>) would have come in handy to extent the current hierarchy for 64bit to support 32bit.

Yet, Cython does not support templating in its language constructions nor does it support type covariance.

Also, Cython offers support for fused types however they can't be used on Cython extension types' attributes, making using this useful feature impossible to use in our context without some hacks.

Proposed solution

Still, we can use Tempita to come up with a working solution preserving performance at the cost of maintenance.

To perform this:

  • 32bit is now supported for DistanceMetrics
  • the 64bit implementation of DistanceMetrics are still exposed via the current public API but the 32bit version must remain private.
  • the layout of classes for PairwiseDistancesReductions has been changed using à la facade design pattern. so as to keep the same Python interfaces (namely PairwiseDistancesReduction.is_usable_for, PairwiseDistancesReduction.valid_metrics, PairwiseDistancesArgKmin.compute) but have concrete 32bit and 64bit implementation be defined via Tempita as follows:
                       (abstract)
                PairwiseDistancesReduction
                            ^
                            |
                            +------------------------------------------+--------------------------------------------------+
                            |                                          |                                                  |
                            |                                      (abstract)                                         (abstract)
                            |                             PairwiseDistancesReduction32                       PairwiseDistancesReduction64
                            |                                          ^                                                  ^
                            |                                          |                                                  |
                            |                                          |                                                  |
                            |                                          |                                                  |
                       (Python API)                         (concrete 32bit implem.)                           (concrete 64bit implem.)
                  PairwiseDistancesArgKmin                 PairwiseDistancesArgKmin32                          PairwiseDistancesArgKmin64
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                       (specialized concrete 32bit implem.)               (specialized concrete 64bit implem.)
                                                     FastEuclideanPairwiseDistancesArgKmin32            FastEuclideanPairwiseDistancesArgKmin64

Future extension solution

In the future, we could just use the same pattern. For instance we could have:


                           ...                                        ...                                                ...   
                            |                                          |                                                  |
                            |                                          |                                                  |
                            |                                          |                                                  |
                       (Python API)                         (concrete 32bit implem.)                           (concrete 64bit implem.)
          PairwiseDistancesRadiusNeighborhood          PairwiseDistancesRadiusNeighborhood32             PairwiseDistancesRadiusNeighborhood64
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                       (specialized concrete 32bit implem.)               (specialized concrete 64bit implem.)
                                                 FastEuclideanPairwiseDistancesRadiusNeighborhood32  FastEuclideanPairwiseDistancesRadiusNeighborhood64

TODO:

  • fix the failing test
  • add more tests for 32bit datasets on user-facing interfaces
  • split this PR in smaller ones to ease reviews

Hardware scalability

Adapting this script to use float32 datasets, we access that this implementation scales linearly, similarly to its 64bit counterpart:

speed_up_100000_100000_log

Raw results
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     57.981657               0
1           2   100000  100000          50     29.401138               0
2           4   100000  100000          50     14.627211               0
3           8   100000  100000          50      7.748570               0
4          16   100000  100000          50      4.204991               0
5          32   100000  100000          50      2.385364               0
6          64   100000  100000          50      1.576305               0
7         128   100000  100000          50      2.115476               0
8           1   100000  100000         100     83.216700               0
9           2   100000  100000         100     42.717769               0
10          4   100000  100000         100     21.534403               0
11          8   100000  100000         100     10.926104               0
12         16   100000  100000         100      5.956875               0
13         32   100000  100000         100      3.348170               0
14         64   100000  100000         100      2.083073               0
15        128   100000  100000         100      3.822223               0
16          1   100000  100000         500    290.757614               0
17          2   100000  100000         500    142.708740               0
18          4   100000  100000         500     72.544749               0
19          8   100000  100000         500     35.726813               0
20         16   100000  100000         500     19.464046               0
21         32   100000  100000         500     10.771516               0
22         64   100000  100000         500      7.123072               0
23        128   100000  100000         500     11.439384               0

speed_up_1000000_10000_log

Raw results
    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000   10000          50     57.369851               0
1           2  1000000   10000          50     29.368813               0
2           4  1000000   10000          50     14.890100               0
3           8  1000000   10000          50      7.564469               0
4          16  1000000   10000          50      3.912440               0
5          32  1000000   10000          50      2.094077               0
6          64  1000000   10000          50      1.356988               0
7         128  1000000   10000          50      1.528763               0
8           1  1000000   10000         100     81.371726               0
9           2  1000000   10000         100     42.803727               0
10          4  1000000   10000         100     21.626557               0
11          8  1000000   10000         100     11.082455               0
12         16  1000000   10000         100      5.795145               0
13         32  1000000   10000         100      3.061136               0
14         64  1000000   10000         100      2.006234               0
15        128  1000000   10000         100      2.012048               0
16          1  1000000   10000         500    286.566753               0
17          2  1000000   10000         500    149.337710               0
18          4  1000000   10000         500     75.545747               0
19          8  1000000   10000         500     38.256877               0
20         16  1000000   10000         500     19.557651               0
21         32  1000000   10000         500     11.193385               0
22         64  1000000   10000         500      9.533238               0
23        128  1000000   10000         500      8.433263               0

Speed-ups between 1.0 (e7fb5b8) and this PR @ 65ebc92 (via ca9197a502bf1289db722a6261ff5fe7edf8e981)

Up to ×50 speed-ups in normal configurations.
Some regression when using small datasets and a high number of threads.

1 core
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+      1.07±0.01m          1.18±0m     1.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(10000, 100000, 100)
-         993±1ms          889±1ms     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-        93.2±1ms       82.9±0.5ms     0.89  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         1.97±0m          1.75±0m     0.89  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-        93.2±1ms       82.3±0.2ms     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-      93.1±0.4ms       81.4±0.2ms     0.87  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      93.3±0.6ms       81.6±0.4ms     0.87  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         1.01±0s          831±2ms     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         1.01±0s          827±3ms     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      5.97±0.01s       4.88±0.01s     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-      10.3±0.02s          8.06±0s     0.78  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-         1.04±0s          806±2ms     0.78  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-      10.3±0.03s          8.00±0s     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-         1.05±0s          806±3ms     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-      11.6±0.3ms       8.63±0.1ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      11.7±0.3ms      8.65±0.04ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-       193±0.6ms       99.4±0.6ms     0.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-      20.7±0.3ms      10.4±0.08ms     0.50  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         2.02±0s          998±2ms     0.49  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-         202±1ms       84.7±0.4ms     0.42  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         21.0±0s          8.28±0s     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)
-         2.11±0s          828±3ms     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
2 cores
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
-         970±2ms         857±50ms     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-         1.94±0m          1.66±0m     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-      5.74±0.01s       4.43±0.01s     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-      72.4±0.7ms       42.6±0.2ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      72.4±0.9ms       42.5±0.2ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-        73.3±2ms       42.9±0.1ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-        73.7±2ms       43.1±0.1ms     0.58  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         783±1ms        418±0.7ms     0.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         782±2ms          416±1ms     0.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-         801±1ms          411±1ms     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         804±1ms          411±1ms     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-      7.93±0.04s          4.04±0s     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      7.93±0.03s          4.04±0s     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-      9.65±0.2ms      4.71±0.03ms     0.49  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-      9.76±0.2ms      4.68±0.03ms     0.48  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      19.1±0.2ms      6.37±0.07ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         175±1ms       51.7±0.3ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-         1.80±0s          503±1ms     0.28  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-         182±1ms       44.8±0.1ms     0.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.87±0s          423±1ms     0.23  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-      18.5±0.01s          4.15±0s     0.22  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
4 cores
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
-         1.91±0m          1.61±0m     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-      61.2±0.8ms       23.7±0.2ms     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-      61.3±0.6ms       23.7±0.3ms     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-      63.2±0.6ms       23.9±0.2ms     0.38  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      63.0±0.6ms       23.8±0.2ms     0.38  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      9.09±0.2ms      2.92±0.05ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-         679±1ms          218±1ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         675±2ms          216±1ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      9.44±0.2ms      2.95±0.06ms     0.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-         700±2ms          212±1ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         698±1ms          211±1ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-      6.89±0.02s          2.06±0s     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-      6.88±0.03s          2.05±0s     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      18.3±0.1ms      4.37±0.04ms     0.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-       163±0.9ms       27.9±0.1ms     0.17  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-         1.69±0s          262±1ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-       171±0.9ms       26.3±0.2ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.77±0s          217±1ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
8 cores
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+         499±1ms          730±8ms     1.46  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
-        111±10ms         94.3±7ms     0.85  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 1000, 100)
-         1.91±0m          1.60±0m     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-      10.7±0.4ms      3.55±0.06ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-      10.7±0.4ms      3.40±0.03ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      20.2±0.4ms      4.84±0.04ms     0.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-      68.0±0.6ms       14.4±0.3ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      68.6±0.6ms       14.3±0.3ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      67.9±0.9ms       13.6±0.2ms     0.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-      67.4±0.7ms       13.5±0.2ms     0.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-         722±1ms        117±0.8ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         721±1ms        116±0.8ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-         729±3ms        111±0.8ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         727±2ms        111±0.8ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-      7.06±0.02s          1.07±0s     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      7.06±0.03s          1.06±0s     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-       170±0.9ms       15.8±0.1ms     0.09  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-       179±0.7ms       16.2±0.2ms     0.09  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.73±0s          141±1ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-         1.79±0s        114±0.7ms     0.06  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
16 cores
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+        13.1±1ms        28.0±10ms     2.13  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
+         495±1ms         747±10ms     1.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+        22.5±1ms        32.3±10ms     1.43  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
+         1.67±0s        2.00±0.1s     1.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
+         1.64±0s       1.94±0.03s     1.19  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+         1.64±0s        1.91±0.1s     1.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+         954±1ms       1.09±0.01s     1.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-       1.69±0.1s       1.53±0.02s     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-        67.7±2ms        58.3±20ms     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         1.89±0m          1.58±0m     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-        67.1±2ms         44.0±1ms     0.66  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-        13.1±1ms      5.26±0.07ms     0.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-         171±3ms         56.0±6ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-        69.2±2ms       9.91±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-        69.4±2ms       9.60±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         769±2ms       80.7±0.8ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         767±3ms       80.0±0.7ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         690±3ms       67.9±0.6ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         687±3ms       67.4±0.6ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      7.55±0.03s          580±2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      7.58±0.02s          581±2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-         179±2ms       12.5±0.2ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.83±0s       98.6±0.9ms     0.05  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-         1.69±0s       79.7±0.5ms     0.05  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
32 cores
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+         499±2ms          765±9ms     1.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+      1.77±0.01s        2.13±0.1s     1.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+      1.78±0.01s       2.09±0.06s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+         968±2ms       1.14±0.02s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
+         1.79±0s       2.08±0.04s     1.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
-       1.69±0.1s       1.42±0.01s     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-         1.89±0m          1.57±0m     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-        16.5±2ms       9.70±0.1ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-        16.4±2ms      8.91±0.09ms     0.54  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-         176±5ms       84.9±0.8ms     0.48  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-        25.5±2ms       10.9±0.2ms     0.43  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-        74.0±3ms       10.6±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-        74.5±3ms       10.4±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         775±2ms       62.6±0.2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         775±2ms       62.4±0.2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         185±4ms       12.4±0.1ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         670±3ms       44.0±0.3ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         669±3ms       43.9±0.3ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      7.61±0.03s          334±3ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-      7.64±0.02s         334±20ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-         1.85±0s       80.3±0.2ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-         1.68±0s       51.1±0.3ms     0.03  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
64 cores
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+        90.5±3ms          216±8ms     2.38  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
+        90.5±4ms         184±20ms     2.03  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
+         513±2ms         808±10ms     1.58  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+      1.01±0.01s       1.25±0.03s     1.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
+      1.94±0.01s       2.38±0.08s     1.22  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+      1.96±0.01s       2.31±0.07s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+      1.99±0.01s       2.28±0.07s     1.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
-         689±1ms          621±4ms     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_affinity_propagation(1000, 100000, 100)
-         205±3ms          176±4ms     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-        27.1±5ms         22.5±9ms     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-         1.89±0m       1.56±0.01m     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-       1.89±0.1s       1.53±0.04s     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-        62.8±9ms         50.7±2ms     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_label_spreading(1000, 1000, 100)
-        60.4±9ms         48.1±2ms     0.80  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_label_propagation(1000, 1000, 100)
-        27.1±5ms       19.7±0.4ms     0.73  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-        37.4±5ms       22.7±0.2ms     0.61  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-        89.4±5ms        19.0±20ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-        89.2±5ms       14.4±0.2ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-         921±7ms          145±2ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         921±8ms         95.0±2ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         212±3ms       16.6±0.1ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         2.00±0s          110±2ms     0.05  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-        733±10ms       32.8±0.2ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         728±9ms       31.6±0.3ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      1.73±0.01s       36.3±0.2ms     0.02  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
128 cores
       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+         121±3ms        1.50±0.1s    12.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
+         127±7ms       1.55±0.06s    12.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
+        34.8±2ms         258±20ms     7.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
+        32.5±2ms         211±10ms     6.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
+         235±3ms       1.46±0.03s     6.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
+        44.9±2ms         257±10ms     5.73  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
+      5.78±0.02s       17.8±0.04s     3.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
+      1.10±0.05s       2.66±0.03s     2.41  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
+      2.37±0.02s          5.48±1s     2.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
+        589±50ms       1.14±0.05s     1.94  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+         113±2ms          219±8ms     1.94  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
+         113±2ms          191±7ms     1.69  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
+      2.33±0.06s        3.55±0.3s     1.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+      2.36±0.03s       3.11±0.04s     1.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+      46.4±0.07s       1.01±0.01m     1.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_affinity_propagation(10000, 100000, 100)
+      1.07±0.01s       1.36±0.02s     1.26  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
+      1.10±0.03s       1.39±0.04s     1.26  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
+      10.4±0.08s       13.0±0.03s     1.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
+      1.64±0.03s       1.93±0.05s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)
+        446±20ms         492±40ms     1.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 1000, 100)
-      2.20±0.02s       1.39±0.04s     0.63  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-       21.3±0.2s        13.0±0.2s     0.61  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Any other comments?

Is this proposal too complicated? As @lesteve says "Tant pis pour Tempita"?

@jjerphan jjerphan force-pushed the distance-metrics-32bit branch from 2b68863 to 8ae1cb6 Compare Feb 23, 2022
jjerphan added 2 commits Feb 23, 2022
Also populate the .gitignore with new files
This allows keeping the same interface in Python namely:

 - PairwiseDistancesReduction.is_usable_for
 - PairwiseDistancesReduction.valid_metrics
 - PairwiseDistancesArgKmin.compute

while being able to route to the 32bit and 64bit implementations
defined via Tempita.

The design pattern used here on PairwiseDistancesReduction
and PairwiseDistancesArgKmin is the Facade design pattern.

See: https://refactoring.guru/design-patterns/facade
@jjerphan jjerphan changed the title MAINT 32bit support for DistanceMetric MAINT 32bit support for DistanceMetric and PairwiseDistancesReduction Feb 24, 2022

cdef inline np.ndarray _buffer_to_ndarray(const DTYPE_t* x, np.npy_intp n):
# Wrap a memory buffer with an ndarray. Warning: this is not robust.
# In particular, if x is deallocated before the returned array goes
# out of scope, this could cause memory errors. Since there is not
# a possibility of this for our use-case, this should be safe.

# Note: this Segfaults unless np.import_array() is called above
return PyArray_SimpleNewFromData(1, &n, DTYPECODE, <void*>x)


from libc.math cimport fabs, sqrt, exp, pow, cos, sin, asin
cdef DTYPE_t INF = np.inf

Copy link
Member Author

@jjerphan jjerphan Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this has been moved under the Tempita loop to be templated.

######################################################################
# metric mappings
# These map from metric id strings to class names
METRIC_MAPPING = {'euclidean': EuclideanDistance,
'l2': EuclideanDistance,
'minkowski': MinkowskiDistance,
'p': MinkowskiDistance,
'manhattan': ManhattanDistance,
'cityblock': ManhattanDistance,
'l1': ManhattanDistance,
'chebyshev': ChebyshevDistance,
'infinity': ChebyshevDistance,
'seuclidean': SEuclideanDistance,
'mahalanobis': MahalanobisDistance,
'wminkowski': WMinkowskiDistance,
'hamming': HammingDistance,
'canberra': CanberraDistance,
'braycurtis': BrayCurtisDistance,
'matching': MatchingDistance,
'jaccard': JaccardDistance,
'dice': DiceDistance,
'kulsinski': KulsinskiDistance,
'rogerstanimoto': RogersTanimotoDistance,
'russellrao': RussellRaoDistance,
'sokalmichener': SokalMichenerDistance,
'sokalsneath': SokalSneathDistance,
'haversine': HaversineDistance,
'pyfunc': PyFuncDistance}

Copy link
Member Author

@jjerphan jjerphan Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this has been moved under the Tempita loop to be templated.

.. math::
D(x, y) = \frac{N_{TF} + N_{FT}}{N_{TT} + N_{TF} + N_{FT}}
Copy link
Member Author

@jjerphan jjerphan Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed such doc because this was making Tempita injection fail.

This can be reintroduced.

Copy link
Member

@ogrisel ogrisel Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it would be good to reintroduce it with a notation that does not use backslashes (assuming they are the ones causing the problem).

@jjerphan jjerphan marked this pull request as ready for review Feb 25, 2022
jjerphan added 3 commits Feb 26, 2022
Previously, upcast was done in the critical region.
This causes an unneeded upcast for one of the buffers.

This only upcasts buffers when necessary and where
necessary without duplication contrarily to previously.

Two methods are introduced to perform this upcast
for each strategy.

Yet, this adds some complexity to the templating.
@jjerphan jjerphan changed the title MAINT 32bit support for DistanceMetric and PairwiseDistancesReduction MAINT 32bit support for PairwiseDistancesReduction Feb 28, 2022
cpdef DTYPE_t[::1] _sqeuclidean_row_norms(
const DTYPE_t[:, ::1] X,
ITYPE_t num_threads,
):
"""Compute the squared euclidean norm of the rows of X in parallel.
This is faster than using np.einsum("ij, ij->i") even when using a single thread.
"""
cdef:
# Casting for X to remove the const qualifier is needed because APIs
# exposed via scipy.linalg.cython_blas aren't reflecting the arguments'
# const qualifier.
# See: https://github.com/scipy/scipy/issues/14262
DTYPE_t * X_ptr = <DTYPE_t *> &X[0, 0]
ITYPE_t idx = 0
ITYPE_t n = X.shape[0]
ITYPE_t d = X.shape[1]
DTYPE_t[::1] squared_row_norms = np.empty(n, dtype=DTYPE)

for idx in prange(n, schedule='static', nogil=True, num_threads=num_threads):
squared_row_norms[idx] = _dot(d, X_ptr + idx * d, 1, X_ptr + idx * d, 1)

return squared_row_norms
Copy link
Member Author

@jjerphan jjerphan Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: this has been moved bellow, closer to 32bit and 64bit definitions.

from cython.parallel cimport parallel, prange

from ._dist_metrics cimport DatasetsPair, DenseDenseDatasetsPair
Copy link
Member Author

@jjerphan jjerphan Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: this has been moved bellow, closer to 32bit and 64bit definitions.

cdef:
readonly DatasetsPair datasets_pair

# The number of threads that can be used is stored in effective_n_threads.
#
# The number of threads to use in the parallelisation strategy
# (i.e. parallel_on_X or parallel_on_Y) can be smaller than effective_n_threads:
# for small datasets, less threads might be needed to loop over pair of chunks.
#
# Hence the number of threads that _will_ be used for looping over chunks
# is stored in chunks_n_threads, allowing solely using what we need.
#
# Thus, an invariant is:
#
# chunks_n_threads <= effective_n_threads
#
ITYPE_t effective_n_threads
ITYPE_t chunks_n_threads

ITYPE_t n_samples_chunk, chunk_size

ITYPE_t n_samples_X, X_n_samples_chunk, X_n_chunks, X_n_samples_last_chunk
ITYPE_t n_samples_Y, Y_n_samples_chunk, Y_n_chunks, Y_n_samples_last_chunk

bint execute_in_parallel_on_Y
Copy link
Member Author

@jjerphan jjerphan Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: this has been moved bellow, within 32bit and 64bit definitions (done via Tempita).

Hence, PairwiseDistancesReduction now really just is an interface.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Mar 2, 2022

Thanks for this analysis. It's not clear to me if the use of np.float64 to accumulate the results is intentional or not in scipy. But indeed it's worth exploring if we can also use np.float64 accumulators for our Manhattan implementation.

I also agree for the long term strategy but I am not sure we have access to a C/C++/Cython level API for these distance implementations and if we do, are they stable across scipy versions and what is the minimal scipy version?

@jjerphan
Copy link
Member Author

@jjerphan jjerphan commented Mar 3, 2022

Thanks for this analysis. It's not clear to me if the use of np.float64 to accumulate the results is intentional or not in scipy.

I think that it historically has been the case to return 64bit outputs.

But indeed it's worth exploring if we can also use np.float64 accumulators for our Manhattan implementation.

This is implemented in 7645ba3. Still, the test is failing for t-SNE but not the snippet given above in #22590 (comment).

I also agree for the long term strategy but I am not sure we have access to a C/C++/Cython level API for these distance implementations and if we do, are they stable across scipy versions and what is the minimal scipy version?

It's not the case currently, but we could imagine in the long term to come up with interfaces which make it possible?

@jjerphan
Copy link
Member Author

@jjerphan jjerphan commented Mar 7, 2022

I am thinking on putting this PR on hold because I think we need to agree on a plan to properly extends tests to 32bit datasets and implement it first.

Moreover, I've opened #22719 which precedes this PR and eases its logic, especially regarding the new implementation introduced via #22320 and the subsequent ones.

What do you think?

This has been extracted from 7645ba.
@jjerphan jjerphan added the float32 label Mar 18, 2022
jjerphan added 5 commits Mar 30, 2022
Conflicts:
    .gitignore
    sklearn/metrics/_dist_metrics.pxd.tp
    sklearn/metrics/_dist_metrics.pyx.tp
    sklearn/metrics/_pairwise_distances_reduction.pyx.tp
    sklearn/metrics/setup.py
    sklearn/metrics/tests/test_pairwise_distances_reduction.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cython float32 module:metrics No Changelog Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants