ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

Micky774 · 2022-04-22T23:44:14Z

Reference Issues/PRs
Fixes #11681

What does this implement/fix? Explain your changes.
Implements the code developed by jakirkham and extends it to meet requirements for current _assert_all_finite function.

Any other comments?
Currently struggling to adapt the function to work with np.float16 arrays.

To Do

Compare performance as "second pass" algorithm- Compare op speed by replacing w/ equality check
Test speed if configured for 1D/2D contiguous arrays
Benchmark w/ non-finite arrays of varying density
not np.isfinite--> np.isinf() and np.isnan()

thomasjpfan

Thank you for the PR!

sklearn/utils/isfinite.pyx

thomasjpfan · 2022-04-23T01:42:07Z

sklearn/utils/isfinite.pyx

+        raise TypeError("Unsupported array type: %s" % repr(numpy.PyArray_TypeObjectFromType(a_type)))
+
+
+cdef inline bint c_isfinite(const_fprecision* a_ptr, size_t step, size_t size, bint_enum disallow_nan) nogil:


There is a lot of indirection for getting disallow_nan passed through. Although, I do not see the motivation for the bint_enum.

I suspect the original implementation is trying to template out the bint_type so the overhead of checking for bint is compiled away in the loop. I am interested to see what the overhead looks like without this optimization.

I'll run some benchmarks once the rest of the pattern is stable -- I'm also interested to see the realized performance difference.

sklearn/utils/isfinite.pyx

sklearn/utils/validation.py

sklearn/utils/isfinite.pyx

thomasjpfan · 2022-04-23T14:20:02Z

sklearn/utils/isfinite.pyx

+    cdef bint_enum err
+
+
+    a_flat = numpy.PyArray_Reshape(a, -1)


If the array is contiguous, I do not think we need to reshape at all.

Actually in general do we need the array to be C/F contiguous at all? We don't really care about accessing the data "in order" per se, so it should be fine without a contiguous array right? (I may be misunderstanding the contiguity of ndarrays)

My overall concern is how reshape can copy and copying X is a huge memory cost to pay. Using C/F contiguous is one way to avoid the possible copy from reshape. If the array is contiguous, the data can be traversed without reshaping.

Also, when the array is contiguous, it can take advantage of data locality and benefit from having the data on the same CPU cache line.

Gotcha -- I was initially talking about removing reshape and simultaneously not stipulating contiguity, but the cache argument makes sense.

Micky774 · 2022-04-23T18:40:06Z

Current benchmarks indicate that the optimization is dependent on dtype and that the current implementation is preferable once the number of elements ~>5000. I don't think we need to really offer a choice between the cython/python implementations since the preferable algorithm is fairly clear cut for by far most cases.

Code for benchmark:

from sklearn.utils.validation import _assert_all_finite
import numpy as np

dtype = np.float32
X = np.random.rand(1000,100).astype(dtype)

%timeit _assert_all_finite(X)

ogrisel · 2022-04-29T14:02:03Z

Could you try to use prange / OpenMP parallelism to try to see if parallelism can further increase the processing speed on a multicore machine?

thomasjpfan · 2022-04-29T14:33:20Z

Could you try to use prange / OpenMP parallelism to try to see if parallelism can further increase the processing speed on a multicore machine?

Even if it is faster with prange, I prefer not to have a drop in single thread performance compared to main. OMP_NUM_THREADS is commonly set to 1 in libraries such as dask or ray to prevent oversubscription.

We can work around this by using np.isfinite + np.sum for single threaded and use this Cython version if there are enough threads. Although, I prefer not to go down this route.

ogrisel · 2022-04-29T15:38:53Z

Even if it is faster with prange, I prefer not to have a drop in single thread performance compared to main. OMP_NUM_THREADS is commonly set to 1 in libraries such as dask or ray to prevent oversubscription.

Are you sure we will get a drop in single-threaded performance with prangewhen OMP_NUM_THREADS=1? I think it's worth measuring it.

…nto cython_assert_isfinite

thomasjpfan · 2022-04-29T17:42:47Z

With a fairly simple implementation in a Jupyter notebook: (Using %load_ext Cython):

Code for only using Cython + prange

%%cython --compile-args=-fopenmp

from libc.math cimport isfinite as c_isfinite
cimport cython
from cython.parallel cimport prange
from cython cimport floating

cdef int cy_isfinite(floating[:, ::1] a, int n_threads):
    cdef:
        int size = a.size
        int i
        floating* a_ptr = &a[0, 0]
        int output = 1
        
    for i in prange(size, num_threads=n_threads, nogil=True):
        if c_isfinite(a_ptr[i]) == 0:
            output = 0
            break
    return output

def my_isfinite(floating[:, ::1] a, int n_threads=1):
    return bool(cy_isfinite(a, n_threads=n_threads))

and these Python version:

from sklearn.utils.extmath import _safe_accumulator_op
import numpy as np

def sk_isfinite(X):
    return np.isfinite(_safe_accumulator_op(np.sum, X))

def np_isfinite_all(X):
    return np.isfinite(X).all()

Running on a fairly sized X:

rng = np.random.RandomState(42)
X = rng.rand(100_000, 100).astype(dtype)

%timeit sk_isfinite(X)
# 1.87 ms ± 8.21 µs per loop

%timeit np_isfinite_all(X)
# 4.71 ms ± 60.5 µs per loop

%timeit my_isfinite(X, n_threads=1)
# 15.8 ms ± 62.8 µs per loop

%timeit my_isfinite(X, n_threads=8)
# 2.47 ms ± 311 µs per loop

# Only using range
%timeit my_isfinite_single_with_range(X)
# 6.25 ms ± 17.6 µs per loop

From the results above scikit-learn's current np.sum + np.isfinite performs better than all Cython implementations. I think it's strange how using range is better compared to using prange with n_threads=1.

Code for only using Cython + range

%%cython

from libc.math cimport isfinite as c_isfinite
cimport cython
from cython cimport floating

cdef int cy_isfinite_with_range(floating[:, ::1] a):
    cdef:
        int size = a.size
        int i
        floating* a_ptr = &a[0, 0]
        int output = 1
        
    with nogil:
        for i in range(size):
            if c_isfinite(a_ptr[i]) == 0:
                output = 0
                break
    return output

def my_isfinite_single_with_range(floating[:, ::1] a):
    return bool(cy_isfinite_with_range(a))

ogrisel · 2022-05-02T07:54:17Z

Thanks @thomasjpfan. It's possible that the np.sum function was recently optimized with SIMD instructions that make more efficient than alternatives.

ogrisel · 2022-05-02T08:00:10Z

Indeed, using py-spy top --native to monitor a process that runs your sk_isfinite on a float32 data array in a while True loop:

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                      
 60.00%  60.00%   38.92s    38.92s   _aligned_contig_cast_float_to_double (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
 40.00%  40.00%   25.05s    25.05s   DOUBLE_pairwise_sum (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  60.00%   0.310s    39.23s   npyiter_copy_to_buffers (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%   0.00%   0.210s    0.210s   npyiter_goto_iterindex (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%   0.00%   0.150s    0.150s   npyiter_copy_from_buffers (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  60.00%   0.090s    39.67s   npyiter_buffered_reduce_iternext_iters2 (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  40.00%   0.040s    25.09s   DOUBLE_add_AVX512F (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00% 100.00%   0.030s    64.84s   reduce_loop (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
  0.00%  40.00%   0.030s    25.12s   generic_wrapped_legacy_loop (numpy/core/_multiarray_umath.cpython-310-x86_64-

the DOUBLE_add_AVX512F is SIMD optimized. However I do not understand why numpy converts the float32 data to float64... This seems like a huge waste.

ogrisel · 2022-05-02T16:38:43Z

However I do not understand why numpy converts the float32 data to float64... This seems like a huge waste.

Actually this is done on purpose in _safe_accumulator_op. Maybe we could not do that when the goal is to detect nan or inf values. We could call np.sum() directly and only return an exact check if the sum finds a nan or inf.

ogrisel · 2022-05-03T07:22:24Z

Something like:

def sk_isfinite(X):
    # Use vectorized sum as a fast detector for inf or nan values
    # but with potential false positives. 
    if not np.isfinite(np.sum(X)):
        # Check again with the slower but exact method to filter out any
        # false positive.
        return np.isfinite(X).all()

This way the 99% of the cases where no ValueError exception are raised should be fast.

Micky774 · 2022-05-26T18:40:22Z

Current implementation benchmarks (generated by this benchmark in a Jupyter notebook):

In the above image, p_inf and p_nan are the densities of np.inf and np.nan values in an array of one-million elements, such that p_inf+p_nan percent of the array is non-finite.

sklearn/utils/validation.py

Micky774 · 2022-05-26T20:18:25Z

Per @jakirkham's suggestion, I made a minor refactor to the Cython implementation to have it serve as a second-pass algorithm. This formulation is now approximately equivalent to main when the array is finite, but significantly faster (2-3x) when there are non-finite elements in large arrays (tested on arrays w/ 10mil elements).

And is slightly faster on smaller arrays (100,000 elements)

sklearn/utils/isfinite.pyx

thomasjpfan · 2022-05-30T20:20:48Z

sklearn/utils/isfinite.pyx

@@ -0,0 +1,76 @@
+# cython: profile=True, linetrace=True


If the benchmarks were run with these on, the results maybe slower than normal.

sklearn/utils/isfinite.pyx

thomasjpfan · 2022-05-30T20:20:48Z

sklearn/utils/validation.py

+    # size > 5000 is a heuristic for when the python implementation is faster
+    first_pass_isfinite = False
+    if is_float:
+        use_cython = not (X.dtype.kind == "c" or X.dtype == np.float16)


Suggested change

use_cython = not (X.dtype.kind == "c" or X.dtype == np.float16)

use_cython = X.dtype in {np.float16, np.float32}

sklearn/utils/isfinite.pyx

Co-authored-by: Thomas J. Fan <[email protected]>

Micky774 added 2 commits Apr 22, 2022

Initial implementation

0dd79ce

Added support for np.float16

ddf2382

Micky774 changed the title ~~ENH Cythonize _assert_all_finite using reduction scheme~~ ENH Cythonize _assert_all_finite using reduction scheme Apr 22, 2022

github-actions bot added cython module:utils labels Apr 22, 2022

Micky774 changed the title ~~ENH Cythonize _assert_all_finite using reduction scheme~~ WIP ENH Cythonize _assert_all_finite using reduction scheme Apr 22, 2022

thomasjpfan reviewed Apr 23, 2022

View changes

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

Micky774 added 4 commits Apr 23, 2022

Corrected type-checking and removed ComplexFloat support

0db6976

Refactored function name for clarity

b1312fb

Finished refactor (forgot to add)

a3a2752

Removed cython support for FP16 and attempted prange implementation

5434849

Micky774 reviewed Apr 23, 2022

View changes

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

thomasjpfan reviewed Apr 23, 2022

View changes

Micky774 added 4 commits Apr 23, 2022

Simplified cython implementation

5f900dd

Merge branch 'main' into cython_assert_isfinite

0d2a36b

Added heuristic and documented explanation

369df2c

Fixed typo and changed wording

2d3ff8b

Micky774 changed the title ~~WIP ENH Cythonize _assert_all_finite using reduction scheme~~ WIP ENH Cythonize _assert_all_finite using stop-on-first strategy Apr 24, 2022

Reintroduced overflow false-positive check

f2ab3de

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

0c3bee1

…nto cython_assert_isfinite

Merge branch 'main' into cython_assert_isfinite

079ff55

Micky774 mentioned this pull request May 12, 2022

MAINT Minor refactor of _assert_all_finite logic #23347

Merged

Merge branch 'main' into cython_assert_isfinite

3167308

Micky774 mentioned this pull request May 23, 2022

MNT Removed _safe_accumulator_op for first-pass algorithm in _assert_all_finite #23446

Merged

Micky774 added 2 commits May 26, 2022

Merge branch 'main' into cython_assert_isfinite

8d6237b

Fixed cython implementation

bd3dfd2

jakirkham reviewed May 26, 2022

View changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

Refactored into second-pass algorithm

b7803b5

Micky774 changed the title ~~WIP ENH Cythonize _assert_all_finite using stop-on-first strategy~~ ENH Cythonize _assert_all_finite using stop-on-first strategy May 27, 2022

Micky774 added 2 commits May 27, 2022

Fixed logic on isinf check

ead02b6

Merge branch 'main' into cython_assert_isfinite

6390eab

thomasjpfan reviewed May 27, 2022

View changes

sklearn/utils/isfinite.pyx Outdated Show resolved Hide resolved

Micky774 and others added 2 commits May 27, 2022

Adopted suggested refactor

8914eed

Merge branch 'main' into cython_assert_isfinite

3697672

thomasjpfan reviewed May 30, 2022

View changes

Micky774 and others added 2 commits May 30, 2022

Merge branch 'main' into cython_assert_isfinite

7fc0452

Apply suggestions from code review

22773e5

Co-authored-by: Thomas J. Fan <[email protected]>

Apr	MAY	Jan
	30
2021	2022	2025

scikit-learn / scikit-learn Public

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

Micky774 commented Apr 22, 2022 •

edited

thomasjpfan left a comment

thomasjpfan Apr 23, 2022

Micky774 Apr 23, 2022

thomasjpfan Apr 23, 2022

Micky774 Apr 23, 2022 •

edited

thomasjpfan Apr 23, 2022

Micky774 Apr 23, 2022

Micky774 commented Apr 23, 2022 •

edited

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022 •

edited

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

ogrisel commented May 3, 2022 •

edited

Micky774 commented May 26, 2022 •

edited

Micky774 commented May 26, 2022

thomasjpfan May 30, 2022

thomasjpfan May 30, 2022

		raise TypeError("Unsupported array type: %s" % repr(numpy.PyArray_TypeObjectFromType(a_type)))


		cdef inline bint c_isfinite(const_fprecision* a_ptr, size_t step, size_t size, bint_enum disallow_nan) nogil:

	use_cython = not (X.dtype.kind == "c" or X.dtype == np.float16)
	use_cython = X.dtype in {np.float16, np.float32}

scikit-learn / scikit-learn Public

ENH Cythonize _assert_all_finite using stop-on-first strategy #23197

Are you sure you want to change the base?

ENH Cythonize _assert_all_finite using stop-on-first strategy #23197

Conversation

Micky774 commented Apr 22, 2022 • edited

To Do

thomasjpfan left a comment

thomasjpfan Apr 23, 2022

Choose a reason for hiding this comment

Micky774 Apr 23, 2022

Choose a reason for hiding this comment

thomasjpfan Apr 23, 2022

Choose a reason for hiding this comment

Micky774 Apr 23, 2022 • edited

Choose a reason for hiding this comment

thomasjpfan Apr 23, 2022

Choose a reason for hiding this comment

Micky774 Apr 23, 2022

Choose a reason for hiding this comment

Micky774 commented Apr 23, 2022 • edited

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022

ogrisel commented Apr 29, 2022

thomasjpfan commented Apr 29, 2022 • edited

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

ogrisel commented May 2, 2022

ogrisel commented May 3, 2022 • edited

Micky774 commented May 26, 2022 • edited

Micky774 commented May 26, 2022

thomasjpfan May 30, 2022

Choose a reason for hiding this comment

thomasjpfan May 30, 2022

Choose a reason for hiding this comment

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

ENH Cythonize `_assert_all_finite` using stop-on-first strategy #23197

Micky774 commented Apr 22, 2022 •

edited

Micky774 Apr 23, 2022 •

edited

Micky774 commented Apr 23, 2022 •

edited

thomasjpfan commented Apr 29, 2022 •

edited

ogrisel commented May 3, 2022 •

edited

Micky774 commented May 26, 2022 •

edited