New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Cythonize _assert_all_finite
using stop-on-first strategy
#23197
base: main
Are you sure you want to change the base?
Conversation
_assert_all_finite
using reduction scheme
_assert_all_finite
using reduction scheme_assert_all_finite
using reduction scheme
sklearn/utils/isfinite.pyx
Outdated
raise TypeError("Unsupported array type: %s" % repr(numpy.PyArray_TypeObjectFromType(a_type))) | ||
|
||
|
||
cdef inline bint c_isfinite(const_fprecision* a_ptr, size_t step, size_t size, bint_enum disallow_nan) nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of indirection for getting disallow_nan
passed through. Although, I do not see the motivation for the bint_enum
.
I suspect the original implementation is trying to template out the bint_type
so the overhead of checking for bint
is compiled away in the loop. I am interested to see what the overhead looks like without this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run some benchmarks once the rest of the pattern is stable -- I'm also interested to see the realized performance difference.
sklearn/utils/isfinite.pyx
Outdated
cdef bint_enum err | ||
|
||
|
||
a_flat = numpy.PyArray_Reshape(a, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the array is contiguous, I do not think we need to reshape at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually in general do we need the array to be C/F contiguous at all? We don't really care about accessing the data "in order" per se, so it should be fine without a contiguous array right? (I may be misunderstanding the contiguity of ndarrays)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My overall concern is how reshape can copy and copying X
is a huge memory cost to pay. Using C/F
contiguous is one way to avoid the possible copy from reshape. If the array is contiguous, the data can be traversed without reshaping.
Also, when the array is contiguous, it can take advantage of data locality and benefit from having the data on the same CPU cache line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha -- I was initially talking about removing reshape and simultaneously not stipulating contiguity, but the cache argument makes sense.
Current benchmarks indicate that the optimization is dependent on Code for benchmark: from sklearn.utils.validation import _assert_all_finite
import numpy as np
dtype = np.float32
X = np.random.rand(1000,100).astype(dtype)
%timeit _assert_all_finite(X) |
_assert_all_finite
using reduction scheme_assert_all_finite
using stop-on-first strategy
Could you try to use |
Even if it is faster with We can work around this by using |
Are you sure we will get a drop in single-threaded performance with |
…nto cython_assert_isfinite
With a fairly simple implementation in a Jupyter notebook: (Using Code for only using Cython + prange%%cython --compile-args=-fopenmp
from libc.math cimport isfinite as c_isfinite
cimport cython
from cython.parallel cimport prange
from cython cimport floating
cdef int cy_isfinite(floating[:, ::1] a, int n_threads):
cdef:
int size = a.size
int i
floating* a_ptr = &a[0, 0]
int output = 1
for i in prange(size, num_threads=n_threads, nogil=True):
if c_isfinite(a_ptr[i]) == 0:
output = 0
break
return output
def my_isfinite(floating[:, ::1] a, int n_threads=1):
return bool(cy_isfinite(a, n_threads=n_threads)) and these Python version: from sklearn.utils.extmath import _safe_accumulator_op
import numpy as np
def sk_isfinite(X):
return np.isfinite(_safe_accumulator_op(np.sum, X))
def np_isfinite_all(X):
return np.isfinite(X).all() Running on a fairly sized X: rng = np.random.RandomState(42)
X = rng.rand(100_000, 100).astype(dtype)
%timeit sk_isfinite(X)
# 1.87 ms ± 8.21 µs per loop
%timeit np_isfinite_all(X)
# 4.71 ms ± 60.5 µs per loop
%timeit my_isfinite(X, n_threads=1)
# 15.8 ms ± 62.8 µs per loop
%timeit my_isfinite(X, n_threads=8)
# 2.47 ms ± 311 µs per loop
# Only using range
%timeit my_isfinite_single_with_range(X)
# 6.25 ms ± 17.6 µs per loop From the results above scikit-learn's current Code for only using Cython + range%%cython
from libc.math cimport isfinite as c_isfinite
cimport cython
from cython cimport floating
cdef int cy_isfinite_with_range(floating[:, ::1] a):
cdef:
int size = a.size
int i
floating* a_ptr = &a[0, 0]
int output = 1
with nogil:
for i in range(size):
if c_isfinite(a_ptr[i]) == 0:
output = 0
break
return output
def my_isfinite_single_with_range(floating[:, ::1] a):
return bool(cy_isfinite_with_range(a)) |
Thanks @thomasjpfan. It's possible that the |
Indeed, using
the |
Actually this is done on purpose in |
Something like: def sk_isfinite(X):
# Use vectorized sum as a fast detector for inf or nan values
# but with potential false positives.
if not np.isfinite(np.sum(X)):
# Check again with the slower but exact method to filter out any
# false positive.
return np.isfinite(X).all() This way the 99% of the cases where no |
Current implementation benchmarks (generated by this benchmark in a Jupyter notebook): In the above image, |
Per @jakirkham's suggestion, I made a minor refactor to the Cython implementation to have it serve as a second-pass algorithm. This formulation is now approximately equivalent to main when the array is finite, but significantly faster (2-3x) when there are non-finite elements in large arrays (tested on arrays w/ 10mil elements). |
_assert_all_finite
using stop-on-first strategy_assert_all_finite
using stop-on-first strategy
@@ -0,0 +1,76 @@ | |||
# cython: profile=True, linetrace=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the benchmarks were run with these on, the results maybe slower than normal.
# size > 5000 is a heuristic for when the python implementation is faster | ||
first_pass_isfinite = False | ||
if is_float: | ||
use_cython = not (X.dtype.kind == "c" or X.dtype == np.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_cython = not (X.dtype.kind == "c" or X.dtype == np.float16) | |
use_cython = X.dtype in {np.float16, np.float32} |
Co-authored-by: Thomas J. Fan <[email protected]>
Reference Issues/PRs
Fixes #11681
What does this implement/fix? Explain your changes.
Implements the code developed by jakirkham and extends it to meet requirements for current
_assert_all_finite
function.Any other comments?
Currently struggling to adapt the function to work with np.float16 arrays.
To Do
not np.isfinite--> np.isinf() and np.isnan()