MNT Replaced `np.ndarray` with memview where applicable in `linear_model/_cd_fast.pyx` #23147

Micky774 · 2022-04-17T00:01:42Z

Reference Issues/PRs

Addresses #10624

What does this implement/fix? Explain your changes.

Replaces np.ndarray typing with corresponding memviews in sparse_enet_coordinate_descent and enet_coordinate_descent_multi_task.

Any other comments?

Would it be worth disabling boundscheck for any of these functions? Afaik there's not much of a risk of an out-of-bounds access in these functions.

sklearn/linear_model/_cd_fast.pyx

Co-authored-by: Thomas J. Fan <[email protected]>

jeremiedbb · 2022-04-29T11:28:03Z

sklearn/linear_model/_cd_fast.pyx

-    np.ndarray[floating, ndim=1, mode='c'] X_data,
-    np.ndarray[int, ndim=1, mode='c'] X_indices,
-    np.ndarray[int, ndim=1, mode='c'] X_indptr,
+    floating[::1] X_data,


Not sure it works with readonly X. Can you test this against a read-only csr matrix (Xcsr.data.setflags(write=False)) ?

Can confirm it works with read-only CSR matrices:

from scipy import sparse from sklearn import linear_model import numpy as np clf = linear_model.ElasticNet(alpha=0.1) X = sparse.random(100,10) X.data.setflags(write=False) y = np.random.random(100) clf.fit(X, y)

By default ElasticNet does a copy of the input. Also, my first comment was misleading, it will also convert if it's not in csc format. If I change your snippet to

from scipy import sparse from sklearn import linear_model import numpy as np clf = linear_model.ElasticNet(alpha=0.1, copy_X=False) X = sparse.random(100,10, format="csc") X.data.setflags(write=False) y = np.random.random(100) clf.fit(X, y)

Then I get ValueError: buffer source array is read-only

Ah gotcha. In that case, since sparse_enet_coordinate_descent doesn't actually modify X_data, I went ahead and implemented the fix suggested in #10624, which is the ReadonlyArrayWrapper added in #20903. I think this should work, but I'm still quite new to the Cython side of things, so please let me know if there are problems with this approach -- thanks :)

Looks good. I guess we can do the same for enet_coordinate_descent because currently we have the same error for the dense case:

from scipy import sparse from sklearn import linear_model import numpy as np clf = linear_model.ElasticNet(alpha=0.1, copy_X=False) X = np.asfortranarray(np.random.uniform(size=(100, 10))) X.setflags(write=False) y = np.random.random(100) clf.fit(X, y)

and probably also enet_coordinate_descent_multi_task. It would be good to also add a test for these read-only cases because it doesn't seem tested currently

Can indices or indptr be read only?

I think testing is a must here. If we pass actual read only data with ReadOnlyArrayWrapper, python crashes if the Cython function writes to the memoryview:

%%cython from cython cimport floating def f(floating[:] a): a[0] = 1

from sklearn.utils._readonly_array_wrapper import ReadonlyArrayWrapper from sklearn.utils._testing import create_memmap_backed_data import numpy as np X = np.asarray([1, 2.3]) X_mapped = create_memmap_backed_data(X) # crashes f(ReadonlyArrayWrapper(X_mapped))

Added a test for this behavior and confirmed the algorithm doesn't modify indices or indptr so extended the wrapper to them as well.

Replaced np.ndarray with memview where applicable

4aef3dc

github-actions bot added module:linear_model cython labels Apr 17, 2022

thomasjpfan reviewed Apr 17, 2022

View changes

sklearn/linear_model/_cd_fast.pyx Outdated Show resolved Hide resolved

sklearn/linear_model/_cd_fast.pyx Outdated Show resolved Hide resolved

Micky774 and others added 3 commits Apr 17, 2022

Update sklearn/linear_model/_cd_fast.pyx

c576f64

Co-authored-by: Thomas J. Fan <[email protected]>

Merge branch 'main' into memview_refactor

da07576

Reverted potentially-const fused memview

b91f210

jeremiedbb reviewed Apr 29, 2022

View changes

Micky774 and others added 11 commits Apr 29, 2022

Merge branch 'main' into memview_refactor

377f8b7

Merge branch 'main' into memview_refactor

640b418

Utilize read-only wrapper

ac06930

Merge branch 'main' into memview_refactor

3222692

Added comment suggesting removal after Cython 3 is adopted

dc9345c

Merge branch 'main' into memview_refactor

101a303

Add support for readonly components of CSR format

aca4396

Added test for read-only buffers

0796642

Merge branch 'main' into memview_refactor

f7c5249

Merge branch 'main' into memview_refactor

6c34ecc

Merge branch 'main' into memview_refactor

73ef4bb

Micky774 changed the title ~~MAINT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx~~ MNT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx May 25, 2022

Merge branch 'main' into memview_refactor

fba5cc6

Apr	MAY	Jun
	30
2021	2022	2023

scikit-learn / scikit-learn Public

MNT Replaced `np.ndarray` with memview where applicable in `linear_model/_cd_fast.pyx` #23147

MNT Replaced `np.ndarray` with memview where applicable in `linear_model/_cd_fast.pyx` #23147

Micky774 commented Apr 17, 2022 •

edited

jeremiedbb Apr 29, 2022

Micky774 Apr 29, 2022

jeremiedbb Apr 30, 2022 •

edited

Micky774 May 1, 2022

jeremiedbb May 2, 2022

thomasjpfan May 2, 2022

Micky774 May 12, 2022

scikit-learn / scikit-learn Public

MNT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx #23147

Are you sure you want to change the base?

MNT Replaced np.ndarray with memview where applicable in linear_model/_cd_fast.pyx #23147

Conversation

Micky774 commented Apr 17, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jeremiedbb Apr 29, 2022

Choose a reason for hiding this comment

Micky774 Apr 29, 2022

Choose a reason for hiding this comment

jeremiedbb Apr 30, 2022 • edited

Choose a reason for hiding this comment

Micky774 May 1, 2022

Choose a reason for hiding this comment

jeremiedbb May 2, 2022

Choose a reason for hiding this comment

thomasjpfan May 2, 2022

Choose a reason for hiding this comment

Micky774 May 12, 2022

Choose a reason for hiding this comment

MNT Replaced `np.ndarray` with memview where applicable in `linear_model/_cd_fast.pyx` #23147

MNT Replaced `np.ndarray` with memview where applicable in `linear_model/_cd_fast.pyx` #23147

Micky774 commented Apr 17, 2022 •

edited

jeremiedbb Apr 30, 2022 •

edited