MAINT 32bit datasets support for `PairwiseDistancesReduction` #22590

jjerphan · 2022-02-23T13:51:05Z

Reference Issues/PRs

Follows #22134. Experimental POC to assess if Tempita is sufficient.

What does this implement/fix? Explain your changes.

Full design proposal

Context

PairwiseDistancesReduction needs to support float32 and float64 DatasetPairs.

To do so, DatasetPairs needs to be adapted for float32 (X, Y) and concrete PairwiseDistancesReductions needs to do maintain the routing to those.

The current Cython extension types (i.e cdef class) hierarchy currently support 64 bits implementation. It simply breaks down as follows:

                        (abstract)
                 PairwiseDistancesReduction
                            ^
                            |
                            |
                  (concrete 64bit implem.)
                       (Python API)
                  PairwiseDistancesArgKmin
                            ^
                            |
                            |
            (specialized concrete 64bit implem.)
            FastEuclideanPairwiseDistancesArgKmin

Where FastEuclideanPairwiseDistancesArgKmin is called in most cases.

Problem

We need some flexibility to be able to support 32bit datasets while not duplicating the implementations. In this regard, templating (i.e. to have classes be dtype-defined) and type covariance (i.e. if A extends B than Class<A> extends Class<B>) would have come in handy to extent the current hierarchy for 64bit to support 32bit.

Yet, Cython does not support templating in its language constructions nor does it support type covariance.

Also, Cython offers support for fused types however they can't be used on Cython extension types' attributes, making using this useful feature impossible to use in our context without some hacks.

Proposed solution

Still, we can use Tempita to come up with a working solution preserving performance at the cost of maintenance.

To perform this:

32bit is now supported for DistanceMetrics
the 64bit implementation of DistanceMetrics are still exposed via the current public API but the 32bit version must remain private.
the layout of classes for PairwiseDistancesReductions has been changed using à la facade design pattern. so as to keep the same Python interfaces (namely PairwiseDistancesReduction.is_usable_for, PairwiseDistancesReduction.valid_metrics, PairwiseDistancesArgKmin.compute) but have concrete 32bit and 64bit implementation be defined via Tempita as follows:

                       (abstract)
                PairwiseDistancesReduction
                            ^
                            |
                            +------------------------------------------+--------------------------------------------------+
                            |                                          |                                                  |
                            |                                      (abstract)                                         (abstract)
                            |                             PairwiseDistancesReduction32                       PairwiseDistancesReduction64
                            |                                          ^                                                  ^
                            |                                          |                                                  |
                            |                                          |                                                  |
                            |                                          |                                                  |
                       (Python API)                         (concrete 32bit implem.)                           (concrete 64bit implem.)
                  PairwiseDistancesArgKmin                 PairwiseDistancesArgKmin32                          PairwiseDistancesArgKmin64
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                       (specialized concrete 32bit implem.)               (specialized concrete 64bit implem.)
                                                     FastEuclideanPairwiseDistancesArgKmin32            FastEuclideanPairwiseDistancesArgKmin64

Future extension solution

In the future, we could just use the same pattern. For instance we could have:


                           ...                                        ...                                                ...   
                            |                                          |                                                  |
                            |                                          |                                                  |
                            |                                          |                                                  |
                       (Python API)                         (concrete 32bit implem.)                           (concrete 64bit implem.)
          PairwiseDistancesRadiusNeighborhood          PairwiseDistancesRadiusNeighborhood32             PairwiseDistancesRadiusNeighborhood64
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                                       |                                                  |
                                                       (specialized concrete 32bit implem.)               (specialized concrete 64bit implem.)
                                                 FastEuclideanPairwiseDistancesRadiusNeighborhood32  FastEuclideanPairwiseDistancesRadiusNeighborhood64

TODO:

fix the failing test
add more tests for 32bit datasets on user-facing interfaces
split this PR in smaller ones to ease reviews

Hardware scalability

Adapting this script to use float32 datasets, we access that this implementation scales linearly, similarly to its 64bit counterpart:

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1   100000  100000          50     57.981657               0
1           2   100000  100000          50     29.401138               0
2           4   100000  100000          50     14.627211               0
3           8   100000  100000          50      7.748570               0
4          16   100000  100000          50      4.204991               0
5          32   100000  100000          50      2.385364               0
6          64   100000  100000          50      1.576305               0
7         128   100000  100000          50      2.115476               0
8           1   100000  100000         100     83.216700               0
9           2   100000  100000         100     42.717769               0
10          4   100000  100000         100     21.534403               0
11          8   100000  100000         100     10.926104               0
12         16   100000  100000         100      5.956875               0
13         32   100000  100000         100      3.348170               0
14         64   100000  100000         100      2.083073               0
15        128   100000  100000         100      3.822223               0
16          1   100000  100000         500    290.757614               0
17          2   100000  100000         500    142.708740               0
18          4   100000  100000         500     72.544749               0
19          8   100000  100000         500     35.726813               0
20         16   100000  100000         500     19.464046               0
21         32   100000  100000         500     10.771516               0
22         64   100000  100000         500      7.123072               0
23        128   100000  100000         500     11.439384               0

Raw results

    n_threads  n_train  n_test  n_features  mean_runtime  stderr_runtime
0           1  1000000   10000          50     57.369851               0
1           2  1000000   10000          50     29.368813               0
2           4  1000000   10000          50     14.890100               0
3           8  1000000   10000          50      7.564469               0
4          16  1000000   10000          50      3.912440               0
5          32  1000000   10000          50      2.094077               0
6          64  1000000   10000          50      1.356988               0
7         128  1000000   10000          50      1.528763               0
8           1  1000000   10000         100     81.371726               0
9           2  1000000   10000         100     42.803727               0
10          4  1000000   10000         100     21.626557               0
11          8  1000000   10000         100     11.082455               0
12         16  1000000   10000         100      5.795145               0
13         32  1000000   10000         100      3.061136               0
14         64  1000000   10000         100      2.006234               0
15        128  1000000   10000         100      2.012048               0
16          1  1000000   10000         500    286.566753               0
17          2  1000000   10000         500    149.337710               0
18          4  1000000   10000         500     75.545747               0
19          8  1000000   10000         500     38.256877               0
20         16  1000000   10000         500     19.557651               0
21         32  1000000   10000         500     11.193385               0
22         64  1000000   10000         500      9.533238               0
23        128  1000000   10000         500      8.433263               0

Speed-ups between 1.0 (`e7fb5b8`) and this PR @ `65ebc92` (via ca9197a502bf1289db722a6261ff5fe7edf8e981)

Up to ×50 speed-ups in normal configurations.
Some regression when using small datasets and a high number of threads.

1 core

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+      1.07±0.01m          1.18±0m     1.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(10000, 100000, 100)
-         993±1ms          889±1ms     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-        93.2±1ms       82.9±0.5ms     0.89  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         1.97±0m          1.75±0m     0.89  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-        93.2±1ms       82.3±0.2ms     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-      93.1±0.4ms       81.4±0.2ms     0.87  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      93.3±0.6ms       81.6±0.4ms     0.87  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         1.01±0s          831±2ms     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         1.01±0s          827±3ms     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      5.97±0.01s       4.88±0.01s     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-      10.3±0.02s          8.06±0s     0.78  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-         1.04±0s          806±2ms     0.78  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-      10.3±0.03s          8.00±0s     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-         1.05±0s          806±3ms     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-      11.6±0.3ms       8.63±0.1ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      11.7±0.3ms      8.65±0.04ms     0.74  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-       193±0.6ms       99.4±0.6ms     0.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-      20.7±0.3ms      10.4±0.08ms     0.50  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         2.02±0s          998±2ms     0.49  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-         202±1ms       84.7±0.4ms     0.42  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         21.0±0s          8.28±0s     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)
-         2.11±0s          828±3ms     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

2 cores

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
-         970±2ms         857±50ms     0.88  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-         1.94±0m          1.66±0m     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-      5.74±0.01s       4.43±0.01s     0.77  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
-      72.4±0.7ms       42.6±0.2ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      72.4±0.9ms       42.5±0.2ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-        73.3±2ms       42.9±0.1ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-        73.7±2ms       43.1±0.1ms     0.58  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         783±1ms        418±0.7ms     0.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         782±2ms          416±1ms     0.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-         801±1ms          411±1ms     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         804±1ms          411±1ms     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-      7.93±0.04s          4.04±0s     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      7.93±0.03s          4.04±0s     0.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-      9.65±0.2ms      4.71±0.03ms     0.49  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-      9.76±0.2ms      4.68±0.03ms     0.48  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      19.1±0.2ms      6.37±0.07ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-         175±1ms       51.7±0.3ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-         1.80±0s          503±1ms     0.28  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-         182±1ms       44.8±0.1ms     0.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.87±0s          423±1ms     0.23  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-      18.5±0.01s          4.15±0s     0.22  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

4 cores

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
-         1.91±0m          1.61±0m     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-      61.2±0.8ms       23.7±0.2ms     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-      61.3±0.6ms       23.7±0.3ms     0.39  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-      63.2±0.6ms       23.9±0.2ms     0.38  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      63.0±0.6ms       23.8±0.2ms     0.38  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      9.09±0.2ms      2.92±0.05ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-         679±1ms          218±1ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         675±2ms          216±1ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      9.44±0.2ms      2.95±0.06ms     0.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-         700±2ms          212±1ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         698±1ms          211±1ms     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-      6.89±0.02s          2.06±0s     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-      6.88±0.03s          2.05±0s     0.30  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      18.3±0.1ms      4.37±0.04ms     0.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-       163±0.9ms       27.9±0.1ms     0.17  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-         1.69±0s          262±1ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-       171±0.9ms       26.3±0.2ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.77±0s          217±1ms     0.12  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

8 cores

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+         499±1ms          730±8ms     1.46  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
-        111±10ms         94.3±7ms     0.85  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 1000, 100)
-         1.91±0m          1.60±0m     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-      10.7±0.4ms      3.55±0.06ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-      10.7±0.4ms      3.40±0.03ms     0.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-      20.2±0.4ms      4.84±0.04ms     0.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-      68.0±0.6ms       14.4±0.3ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-      68.6±0.6ms       14.3±0.3ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-      67.9±0.9ms       13.6±0.2ms     0.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-      67.4±0.7ms       13.5±0.2ms     0.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-         722±1ms        117±0.8ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         721±1ms        116±0.8ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-         729±3ms        111±0.8ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         727±2ms        111±0.8ms     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-      7.06±0.02s          1.07±0s     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      7.06±0.03s          1.06±0s     0.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-       170±0.9ms       15.8±0.1ms     0.09  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-       179±0.7ms       16.2±0.2ms     0.09  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.73±0s          141±1ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)
-         1.79±0s        114±0.7ms     0.06  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

16 cores

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+        13.1±1ms        28.0±10ms     2.13  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
+         495±1ms         747±10ms     1.51  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+        22.5±1ms        32.3±10ms     1.43  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
+         1.67±0s        2.00±0.1s     1.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
+         1.64±0s       1.94±0.03s     1.19  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+         1.64±0s        1.91±0.1s     1.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+         954±1ms       1.09±0.01s     1.15  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
-       1.69±0.1s       1.53±0.02s     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-        67.7±2ms        58.3±20ms     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
-         1.89±0m          1.58±0m     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-        67.1±2ms         44.0±1ms     0.66  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
-        13.1±1ms      5.26±0.07ms     0.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-         171±3ms         56.0±6ms     0.33  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-        69.2±2ms       9.91±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-        69.4±2ms       9.60±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         769±2ms       80.7±0.8ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         767±3ms       80.0±0.7ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         690±3ms       67.9±0.6ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         687±3ms       67.4±0.6ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      7.55±0.03s          580±2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-      7.58±0.02s          581±2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-         179±2ms       12.5±0.2ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         1.83±0s       98.6±0.9ms     0.05  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-         1.69±0s       79.7±0.5ms     0.05  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

32 cores

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+         499±2ms          765±9ms     1.53  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+      1.77±0.01s        2.13±0.1s     1.20  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+      1.78±0.01s       2.09±0.06s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+         968±2ms       1.14±0.02s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
+         1.79±0s       2.08±0.04s     1.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
-       1.69±0.1s       1.42±0.01s     0.84  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-         1.89±0m          1.57±0m     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-        16.5±2ms       9.70±0.1ms     0.59  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-        16.4±2ms      8.91±0.09ms     0.54  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-         176±5ms       84.9±0.8ms     0.48  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-        25.5±2ms       10.9±0.2ms     0.43  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-        74.0±3ms       10.6±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-        74.5±3ms       10.4±0.1ms     0.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-         775±2ms       62.6±0.2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         775±2ms       62.4±0.2ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         185±4ms       12.4±0.1ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         670±3ms       44.0±0.3ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         669±3ms       43.9±0.3ms     0.07  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      7.61±0.03s          334±3ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
-      7.64±0.02s         334±20ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 100000, 100)
-         1.85±0s       80.3±0.2ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-         1.68±0s       51.1±0.3ms     0.03  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

64 cores

     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+        90.5±3ms          216±8ms     2.38  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
+        90.5±4ms         184±20ms     2.03  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
+         513±2ms         808±10ms     1.58  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+      1.01±0.01s       1.25±0.03s     1.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
+      1.94±0.01s       2.38±0.08s     1.22  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+      1.96±0.01s       2.31±0.07s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+      1.99±0.01s       2.28±0.07s     1.14  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
-         689±1ms          621±4ms     0.90  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_affinity_propagation(1000, 100000, 100)
-         205±3ms          176±4ms     0.86  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
-        27.1±5ms         22.5±9ms     0.83  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
-         1.89±0m       1.56±0.01m     0.82  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(10000, 100000, 100)
-       1.89±0.1s       1.53±0.04s     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_tsne(1000, 1000, 100)
-        62.8±9ms         50.7±2ms     0.81  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_label_spreading(1000, 1000, 100)
-        60.4±9ms         48.1±2ms     0.80  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_label_propagation(1000, 1000, 100)
-        27.1±5ms       19.7±0.4ms     0.73  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
-        37.4±5ms       22.7±0.2ms     0.61  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
-        89.4±5ms        19.0±20ms     0.21  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
-        89.2±5ms       14.4±0.2ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
-         921±7ms          145±2ms     0.16  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
-         921±8ms         95.0±2ms     0.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
-         212±3ms       16.6±0.1ms     0.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 1000, 100)
-         2.00±0s          110±2ms     0.05  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-        733±10ms       32.8±0.2ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 100000, 100)
-         728±9ms       31.6±0.3ms     0.04  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 100000, 100)
-      1.73±0.01s       36.3±0.2ms     0.02  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

128 cores

       before           after         ratio
     [998e8f20]       [65ebc927]
     <main>           <distance-metrics-32bit>
+         121±3ms        1.50±0.1s    12.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 10000, 100)
+         127±7ms       1.55±0.06s    12.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 10000, 100)
+        34.8±2ms         258±20ms     7.40  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(1000, 1000, 100)
+        32.5±2ms         211±10ms     6.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(1000, 1000, 100)
+         235±3ms       1.46±0.03s     6.25  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 10000, 100)
+        44.9±2ms         257±10ms     5.73  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(1000, 1000, 100)
+      5.78±0.02s       17.8±0.04s     3.08  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 100000, 100)
+      1.10±0.05s       2.66±0.03s     2.41  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 10000, 100)
+      2.37±0.02s          5.48±1s     2.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 100000, 100)
+        589±50ms       1.14±0.05s     1.94  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_isomap(1000, 1000, 100)
+         113±2ms          219±8ms     1.94  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 1000, 100)
+         113±2ms          191±7ms     1.69  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 1000, 100)
+      2.33±0.06s        3.55±0.3s     1.52  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 10000, 100)
+      2.36±0.03s       3.11±0.04s     1.32  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_mean_shift(1000, 1000, 100)
+      46.4±0.07s       1.01±0.01m     1.31  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_affinity_propagation(10000, 100000, 100)
+      1.07±0.01s       1.36±0.02s     1.26  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 10000, 100)
+      1.10±0.03s       1.39±0.04s     1.26  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin_min(10000, 10000, 100)
+      10.4±0.08s       13.0±0.03s     1.24  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_pairwise_distances_argmin(10000, 100000, 100)
+      1.64±0.03s       1.93±0.05s     1.18  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 10000, 100)
+        446±20ms         492±40ms     1.10  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_birch(1000, 1000, 100)
-      2.20±0.02s       1.39±0.04s     0.63  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 10000, 100)
-       21.3±0.2s        13.0±0.2s     0.61  pairwise_argkmin_estimator.PairwiseDistancesArgKminBenchmark.time_nearest_neighbors(10000, 100000, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Any other comments?

Is this proposal too complicated? As @lesteve says "Tant pis pour Tempita"?

Also populate the .gitignore with new files

This allows keeping the same interface in Python namely: - PairwiseDistancesReduction.is_usable_for - PairwiseDistancesReduction.valid_metrics - PairwiseDistancesArgKmin.compute while being able to route to the 32bit and 64bit implementations defined via Tempita. The design pattern used here on PairwiseDistancesReduction and PairwiseDistancesArgKmin is the Facade design pattern. See: https://refactoring.guru/design-patterns/facade

Tempita is a funny preprocessor. 🤙

jjerphan · 2022-02-25T17:24:19Z

sklearn/metrics/_dist_metrics.pyx.tp

-
-cdef inline np.ndarray _buffer_to_ndarray(const DTYPE_t* x, np.npy_intp n):
-    # Wrap a memory buffer with an ndarray. Warning: this is not robust.
-    # In particular, if x is deallocated before the returned array goes
-    # out of scope, this could cause memory errors.  Since there is not
-    # a possibility of this for our use-case, this should be safe.
-
-    # Note: this Segfaults unless np.import_array() is called above
-    return PyArray_SimpleNewFromData(1, &n, DTYPECODE, <void*>x)
-
-
-from libc.math cimport fabs, sqrt, exp, pow, cos, sin, asin
-cdef DTYPE_t INF = np.inf
-


Note to reviewers: this has been moved under the Tempita loop to be templated.

jjerphan · 2022-02-25T17:24:33Z

sklearn/metrics/_dist_metrics.pyx.tp

-######################################################################
-# metric mappings
-#  These map from metric id strings to class names
-METRIC_MAPPING = {'euclidean': EuclideanDistance,
-                  'l2': EuclideanDistance,
-                  'minkowski': MinkowskiDistance,
-                  'p': MinkowskiDistance,
-                  'manhattan': ManhattanDistance,
-                  'cityblock': ManhattanDistance,
-                  'l1': ManhattanDistance,
-                  'chebyshev': ChebyshevDistance,
-                  'infinity': ChebyshevDistance,
-                  'seuclidean': SEuclideanDistance,
-                  'mahalanobis': MahalanobisDistance,
-                  'wminkowski': WMinkowskiDistance,
-                  'hamming': HammingDistance,
-                  'canberra': CanberraDistance,
-                  'braycurtis': BrayCurtisDistance,
-                  'matching': MatchingDistance,
-                  'jaccard': JaccardDistance,
-                  'dice': DiceDistance,
-                  'kulsinski': KulsinskiDistance,
-                  'rogerstanimoto': RogersTanimotoDistance,
-                  'russellrao': RussellRaoDistance,
-                  'sokalmichener': SokalMichenerDistance,
-                  'sokalsneath': SokalSneathDistance,
-                  'haversine': HaversineDistance,
-                  'pyfunc': PyFuncDistance}
-


Note to reviewers: this has been moved under the Tempita loop to be templated.

jjerphan · 2022-02-25T17:25:39Z

sklearn/metrics/_dist_metrics.pyx.tp

-    .. math::
-       D(x, y) = \frac{N_{TF} + N_{FT}}{N_{TT} + N_{TF} + N_{FT}}


I removed such doc because this was making Tempita injection fail.

This can be reintroduced.

Indeed it would be good to reintroduce it with a notation that does not use backslashes (assuming they are the ones causing the problem).

Previously, upcast was done in the critical region. This causes an unneeded upcast for one of the buffers. This only upcasts buffers when necessary and where necessary without duplication contrarily to previously. Two methods are introduced to perform this upcast for each strategy. Yet, this adds some complexity to the templating.

jjerphan · 2022-02-28T09:44:01Z

sklearn/metrics/_pairwise_distances_reduction.pyx.tp

-cpdef DTYPE_t[::1] _sqeuclidean_row_norms(
-    const DTYPE_t[:, ::1] X,
-    ITYPE_t num_threads,
-):
-    """Compute the squared euclidean norm of the rows of X in parallel.
-
-    This is faster than using np.einsum("ij, ij->i") even when using a single thread.
-    """
-    cdef:
-        # Casting for X to remove the const qualifier is needed because APIs
-        # exposed via scipy.linalg.cython_blas aren't reflecting the arguments'
-        # const qualifier.
-        # See: https://github.com/scipy/scipy/issues/14262
-        DTYPE_t * X_ptr = <DTYPE_t *> &X[0, 0]
-        ITYPE_t idx = 0
-        ITYPE_t n = X.shape[0]
-        ITYPE_t d = X.shape[1]
-        DTYPE_t[::1] squared_row_norms = np.empty(n, dtype=DTYPE)
-
-    for idx in prange(n, schedule='static', nogil=True, num_threads=num_threads):
-        squared_row_norms[idx] = _dot(d, X_ptr + idx * d, 1, X_ptr + idx * d, 1)
-
-    return squared_row_norms


Note to reviewer: this has been moved bellow, closer to 32bit and 64bit definitions.

jjerphan · 2022-02-28T09:44:18Z

sklearn/metrics/_pairwise_distances_reduction.pyx.tp

 from cython.parallel cimport parallel, prange

-from ._dist_metrics cimport DatasetsPair, DenseDenseDatasetsPair


Note to reviewer: this has been moved bellow, closer to 32bit and 64bit definitions.

jjerphan · 2022-02-28T09:46:09Z

sklearn/metrics/_pairwise_distances_reduction.pyx.tp

-    cdef:
-        readonly DatasetsPair datasets_pair
-
-        # The number of threads that can be used is stored in effective_n_threads.
-        #
-        # The number of threads to use in the parallelisation strategy
-        # (i.e. parallel_on_X or parallel_on_Y) can be smaller than effective_n_threads:
-        # for small datasets, less threads might be needed to loop over pair of chunks.
-        #
-        # Hence the number of threads that _will_ be used for looping over chunks
-        # is stored in chunks_n_threads, allowing solely using what we need.
-        #
-        # Thus, an invariant is:
-        #
-        #                 chunks_n_threads <= effective_n_threads
-        #
-        ITYPE_t effective_n_threads
-        ITYPE_t chunks_n_threads
-
-        ITYPE_t n_samples_chunk, chunk_size
-
-        ITYPE_t n_samples_X, X_n_samples_chunk, X_n_chunks, X_n_samples_last_chunk
-        ITYPE_t n_samples_Y, Y_n_samples_chunk, Y_n_chunks, Y_n_samples_last_chunk
-
-        bint execute_in_parallel_on_Y


Note to reviewer: this has been moved bellow, within 32bit and 64bit definitions (done via Tempita).

Hence, PairwiseDistancesReduction now really just is an interface.

ogrisel · 2022-03-02T14:39:43Z

Thanks for this analysis. It's not clear to me if the use of np.float64 to accumulate the results is intentional or not in scipy. But indeed it's worth exploring if we can also use np.float64 accumulators for our Manhattan implementation.

I also agree for the long term strategy but I am not sure we have access to a C/C++/Cython level API for these distance implementations and if we do, are they stable across scipy versions and what is the minimal scipy version?

jjerphan · 2022-03-03T10:26:29Z

Thanks for this analysis. It's not clear to me if the use of np.float64 to accumulate the results is intentional or not in scipy.

I think that it historically has been the case to return 64bit outputs.

But indeed it's worth exploring if we can also use np.float64 accumulators for our Manhattan implementation.

This is implemented in 7645ba3. Still, the test is failing for t-SNE but not the snippet given above in #22590 (comment).

I also agree for the long term strategy but I am not sure we have access to a C/C++/Cython level API for these distance implementations and if we do, are they stable across scipy versions and what is the minimal scipy version?

It's not the case currently, but we could imagine in the long term to come up with interfaces which make it possible?

jjerphan · 2022-03-07T13:45:17Z

I am thinking on putting this PR on hold because I think we need to agree on a plan to properly extends tests to 32bit datasets and implement it first.

Moreover, I've opened #22719 which precedes this PR and eases its logic, especially regarding the new implementation introduced via #22320 and the subsequent ones.

What do you think?

This has been extracted from 7645ba.

Conflicts: .gitignore sklearn/metrics/_dist_metrics.pxd.tp sklearn/metrics/_dist_metrics.pyx.tp sklearn/metrics/_pairwise_distances_reduction.pyx.tp sklearn/metrics/setup.py sklearn/metrics/tests/test_pairwise_distances_reduction.py

github-actions bot added module:metrics cython labels Feb 23, 2022

jjerphan mentioned this pull request Feb 23, 2022

PairwiseDistancesReductions roadmap #22587

Open

jjerphan force-pushed the distance-metrics-32bit branch 2 times, most recently from 6d9f8a9 to 2b68863 Compare Feb 23, 2022

MAINT Generate DistanceMetrics for 32bit vectors

8ae1cb6

jjerphan force-pushed the distance-metrics-32bit branch from 2b68863 to 8ae1cb6 Compare Feb 23, 2022

jjerphan added 2 commits Feb 23, 2022

MAINT Generate PairwiseDistancesReduction for 32bit and 64bit

487c0f1

Also populate the .gitignore with new files

jjerphan changed the title ~~MAINT 32bit support for DistanceMetric~~ MAINT 32bit support for DistanceMetric and PairwiseDistancesReduction Feb 24, 2022

jjerphan added 4 commits Feb 25, 2022

Merge branch 'main' into distance-metrics-32bit

2c78948

TST Fix test_sqeuclidean_row_norms

327d249

TST Add fixture to test quasi-equality for 32bit

baf2fc6

MAINT Do not route 32bit specialize implementation yet

3c80b40

jjerphan added the No Changelog Needed label Feb 25, 2022

jjerphan added 3 commits Feb 25, 2022

TST Adapt DistanceMetrics tests for 32bit

3d9d565

fixup! TST Adapt DistanceMetrics tests for 32bit

eb0c65e

MAINT Upcast buffers to 64bit when and where needed

42eba72

Tempita is a funny preprocessor. 🤙

jjerphan reviewed Feb 25, 2022

View changes

jjerphan marked this pull request as ready for review Feb 25, 2022

jjerphan added 3 commits Feb 26, 2022

CLN Improve imports and fix duplicated ignored files

65ebc92

MAINT Do not warn if Y_norm_squared is passed via metric_kwargs

ffb08ac

jjerphan changed the title ~~MAINT 32bit support for DistanceMetric and PairwiseDistancesReduction~~ MAINT 32bit support for PairwiseDistancesReduction Feb 28, 2022

DOC Update whats_new entry

059128f

jjerphan reviewed Feb 28, 2022

View changes

MAINT Accumulate using float64

7645ba3

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py #22663

Merged

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/neighbors/tests/test_lof.py #22665

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/metrics/tests/test_pairwise.py #22666

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/cluster/tests/test_affinity_propagation.py #22667

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/cluster/tests/test_optics.py #22668

Merged

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/cluster/tests/test_spectral.py #22669

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/semi_supervised/tests/test_label_propagation.py #22670

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/cluster/tests/test_birch.py #22671

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/cluster/tests/test_mean_shift.py #22672

Merged

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/manifold/tests/test_isomap.py #22673

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/manifold/tests/test_t_sne.py #22675

Open

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/manifold/tests/test_locally_linear.py #22676

Merged

jjerphan mentioned this pull request Mar 3, 2022

TST use global_dtype in sklearn/feature_selection/tests/test_mutual_info.py #22677

Merged

jjerphan mentioned this pull request Mar 3, 2022

TST Add option to run tests on 32bit data #22680

Open

jjerphan mentioned this pull request Mar 7, 2022

MAINT Refactor the common logic for GEMM in wrapper #22719

Merged

jjerphan mentioned this pull request Mar 10, 2022

Any way to make BinaryTree support fused type? #7059

Open

MAINT 32bit support for DistanceMetric

2f56a49

This has been extracted from 7645ba.

jjerphan mentioned this pull request Mar 11, 2022

MAINT float 32bit support for DistanceMetric #22764

Open

jjerphan added 2 commits Mar 17, 2022

MAINT Remove generated pyx file

dd50629

MAINT Do not generate 32bit version of DatasetsPair for now

545326f

jjerphan added the float32 label Mar 18, 2022

jjerphan added 5 commits Mar 30, 2022

Merge branch 'main' into maint/distance-metrics-32bit

5da38e8

Fix typo

e8b8344

TST Adapt error messages

2d104a4

MAINT Reorganise upcast w.r.t GEMMTermComputer introduction

df17323

Apr	MAY	Sep
	20
2021	2022	2023

scikit-learn / scikit-learn Public

MAINT 32bit datasets support for `PairwiseDistancesReduction` #22590

MAINT 32bit datasets support for `PairwiseDistancesReduction` #22590

jjerphan commented Feb 23, 2022 •

edited

jjerphan Feb 25, 2022

jjerphan Feb 25, 2022

jjerphan Feb 25, 2022

ogrisel Feb 25, 2022

jjerphan Feb 28, 2022

jjerphan Feb 28, 2022

jjerphan Feb 28, 2022 •

edited

ogrisel commented Mar 2, 2022

jjerphan commented Mar 3, 2022 •

edited

jjerphan commented Mar 7, 2022

		.. math::
		D(x, y) = \frac{N_{TF} + N_{FT}}{N_{TT} + N_{TF} + N_{FT}}

		from cython.parallel cimport parallel, prange

		from ._dist_metrics cimport DatasetsPair, DenseDenseDatasetsPair

scikit-learn / scikit-learn Public

MAINT 32bit datasets support for PairwiseDistancesReduction #22590

Are you sure you want to change the base?

MAINT 32bit datasets support for PairwiseDistancesReduction #22590

Conversation

jjerphan commented Feb 23, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Hardware scalability

Speed-ups between 1.0 (e7fb5b8) and this PR @ 65ebc92 (via ca9197a502bf1289db722a6261ff5fe7edf8e981)

Any other comments?

jjerphan Feb 25, 2022

Choose a reason for hiding this comment

jjerphan Feb 25, 2022

Choose a reason for hiding this comment

jjerphan Feb 25, 2022

Choose a reason for hiding this comment

ogrisel Feb 25, 2022

Choose a reason for hiding this comment

jjerphan Feb 28, 2022

Choose a reason for hiding this comment

jjerphan Feb 28, 2022

Choose a reason for hiding this comment

jjerphan Feb 28, 2022 • edited

Choose a reason for hiding this comment

ogrisel commented Mar 2, 2022

jjerphan commented Mar 3, 2022 • edited

jjerphan commented Mar 7, 2022

MAINT 32bit datasets support for `PairwiseDistancesReduction` #22590

MAINT 32bit datasets support for `PairwiseDistancesReduction` #22590

jjerphan commented Feb 23, 2022 •

edited

Speed-ups between 1.0 (`e7fb5b8`) and this PR @ `65ebc92` (via ca9197a502bf1289db722a6261ff5fe7edf8e981)

jjerphan Feb 28, 2022 •

edited

jjerphan commented Mar 3, 2022 •

edited