New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT Replaced np.ndarray
with memview where applicable in linear_model/_cd_fast.pyx
#23147
base: main
Are you sure you want to change the base?
Conversation
np.ndarray[floating, ndim=1, mode='c'] X_data, | ||
np.ndarray[int, ndim=1, mode='c'] X_indices, | ||
np.ndarray[int, ndim=1, mode='c'] X_indptr, | ||
floating[::1] X_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it works with readonly X. Can you test this against a read-only csr matrix (Xcsr.data.setflags(write=False)
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can confirm it works with read-only CSR matrices:
from scipy import sparse
from sklearn import linear_model
import numpy as np
clf = linear_model.ElasticNet(alpha=0.1)
X = sparse.random(100,10)
X.data.setflags(write=False)
y = np.random.random(100)
clf.fit(X, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default ElasticNet does a copy of the input. Also, my first comment was misleading, it will also convert if it's not in csc format. If I change your snippet to
from scipy import sparse
from sklearn import linear_model
import numpy as np
clf = linear_model.ElasticNet(alpha=0.1, copy_X=False)
X = sparse.random(100,10, format="csc")
X.data.setflags(write=False)
y = np.random.random(100)
clf.fit(X, y)
Then I get ValueError: buffer source array is read-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. In that case, since sparse_enet_coordinate_descent
doesn't actually modify X_data
, I went ahead and implemented the fix suggested in #10624, which is the ReadonlyArrayWrapper
added in #20903. I think this should work, but I'm still quite new to the Cython side of things, so please let me know if there are problems with this approach -- thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I guess we can do the same for enet_coordinate_descent
because currently we have the same error for the dense case:
from scipy import sparse
from sklearn import linear_model
import numpy as np
clf = linear_model.ElasticNet(alpha=0.1, copy_X=False)
X = np.asfortranarray(np.random.uniform(size=(100, 10)))
X.setflags(write=False)
y = np.random.random(100)
clf.fit(X, y)
and probably also enet_coordinate_descent_multi_task
. It would be good to also add a test for these read-only cases because it doesn't seem tested currently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can indices
or indptr
be read only?
I think testing is a must here. If we pass actual read only data with ReadOnlyArrayWrapper
, python crashes if the Cython function writes to the memoryview:
%%cython
from cython cimport floating
def f(floating[:] a):
a[0] = 1
from sklearn.utils._readonly_array_wrapper import ReadonlyArrayWrapper
from sklearn.utils._testing import create_memmap_backed_data
import numpy as np
X = np.asarray([1, 2.3])
X_mapped = create_memmap_backed_data(X)
# crashes
f(ReadonlyArrayWrapper(X_mapped))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for this behavior and confirmed the algorithm doesn't modify indices
or indptr
so extended the wrapper to them as well.
np.ndarray
with memview where applicable in linear_model/_cd_fast.pyx
np.ndarray
with memview where applicable in linear_model/_cd_fast.pyx
Reference Issues/PRs
Addresses #10624
What does this implement/fix? Explain your changes.
Replaces
np.ndarray
typing with corresponding memviews insparse_enet_coordinate_descent
andenet_coordinate_descent_multi_task
.Any other comments?
Would it be worth disabling
boundscheck
for any of these functions? Afaik there's not much of a risk of an out-of-bounds access in these functions.