The Wayback Machine - https://web.archive.org/web/20210906165243/https://github.com/scikit-learn/scikit-learn/pull/20481
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Improve source generation using Tempita #20481

Merged

Conversation

@mbatoul
Copy link
Contributor

@mbatoul mbatoul commented Jul 7, 2021

Reference Issues/PRs

Relates to #11000.
Supersedes #13358.
Closes #13358

What does this implement/fix? Explain your changes.

This work builds on top of @massich's work (see #13358):

WeightVector is used in #13346 and has attributes that cannot be fused.

cdef np.ndarray w
cdef np.ndarray aw
cdef double *w_data_ptr
cdef double *aw_data_ptr
cdef double wscale
cdef double average_a
cdef double average_b
cdef int n_features
cdef double sq_norm

This PR uses Tempita to allow float32 float64.

@mbatoul mbatoul changed the title [WIP] MAINT Use Tempita for WeightVector [WIP] MAINT Use Tempita for WeightVector Jul 7, 2021
@mbatoul
Copy link
Contributor Author

@mbatoul mbatoul commented Jul 7, 2021

Answering @NicolasHug's comments on #13358:

Sorry not familiar with tempita (and the tempita doc links are broken): why do you need get_dispatch? You can't simply iterate over the dtypes list?

I removed the get_dispatch function in the template files in this commit. It seems to work fine by simply iterating over the dtypes list. I have found nothing regarding why we would need the get_dispatch function. The Tempita documentation is still unaccessible though.

@jjerphan jjerphan self-requested a review Jul 13, 2021
Copy link
Member

@jjerphan jjerphan left a comment

Thank you, @mbatoul, for pursuing this work.

The templating of WeightVector LGTM.

Here are some comments regarding changes and the scope of this PR.

sklearn/utils/tests/test_cython_templating.py Outdated Show resolved Hide resolved
sklearn/utils/setup.py Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
sklearn/utils/setup.py Outdated Show resolved Hide resolved
sklearn/linear_model/_sag_fast.pyx.tp Show resolved Hide resolved
sklearn/utils/_weight_vector.pxd.tp Outdated Show resolved Hide resolved
from ..utils._weight_vector cimport WeightVector64 as WeightVector
from ..utils._seq_dataset cimport SequentialDataset64 as SequentialDataset
Comment on lines +24 to 25

This comment has been minimized.

@jjerphan

jjerphan Jul 15, 2021
Member

To follow-up with @jeremiedbb's comment (50de266#r261996527): to me, this looks fine as the import is done similarly for SequentialDataset; am I missing something?

This comment has been minimized.

@mbatoul

mbatoul Aug 2, 2021
Author Contributor

This comment has been minimized.

@ogrisel

ogrisel Aug 5, 2021
Member

I cannot see the original comment but this looks fine to me as well.

@mbatoul mbatoul force-pushed the mbatoul:migrate_weightvector_to_use_tempita branch from 89cca51 to 9452515 Aug 2, 2021
@mbatoul mbatoul changed the title [WIP] MAINT Improve source generation using Tempita MAINT Improve source generation using Tempita Aug 3, 2021
@mbatoul mbatoul requested review from jjerphan and jeremiedbb Aug 3, 2021
Copy link
Member

@jjerphan jjerphan left a comment

Here are some final suggestions and a comment for adapting a numerical hard-coded value.

sklearn/utils/tests/test_weight_vector.py Outdated Show resolved Hide resolved
for j in range(xnnz):
idx = x_ind_ptr[j]
innerprod += w_data_ptr[idx] * x_data_ptr[j]
innerprod *= self.wscale
return innerprod

cdef void scale(self, double c) nogil:
cdef void scale(self, {{c_type}} c) nogil:

This comment has been minimized.

@jjerphan

jjerphan Aug 3, 2021
Member

Shall the epsilon value for the comparision on self.wscale (at line 197) be adapted based on c_type?

This comment has been minimized.

@ogrisel

ogrisel Aug 5, 2021
Member

Indeed, it's probably a good idea to use 1e-7 or even 1e-6 when self.wscale has type float32. Otherwise I am not sure this condition will even be triggered on float32 data.

Or we could just use 1e-6 for both dtypes. But maybe this will have non-trivial impacts on the performance of SGDClassifier and co on np.float64 which we should try to avoid introducing as part of this PR.

Unfortunately it's not easy to write tests for this because this is a Cython class with cdef methods.

Maybe we can set epsilon to 1e-6 for float32 and keep 1e-9 unchanged for float64 and just write tests for the float32 case at the public Python API level only in test_sgd.py in a subsequent PR where we enable float32 support for linear SGD models.

sklearn/utils/tests/test_cython_templating.py Outdated Show resolved Hide resolved
@glemaitre glemaitre self-requested a review Aug 3, 2021
@mbatoul mbatoul marked this pull request as ready for review Aug 3, 2021
@ogrisel
ogrisel approved these changes Aug 5, 2021
Copy link
Member

@ogrisel ogrisel left a comment

LGTM once the following comments are addressed:

sklearn/utils/_weight_vector.pxd.tp Outdated Show resolved Hide resolved
for j in range(xnnz):
idx = x_ind_ptr[j]
innerprod += w_data_ptr[idx] * x_data_ptr[j]
innerprod *= self.wscale
return innerprod

cdef void scale(self, double c) nogil:
cdef void scale(self, {{c_type}} c) nogil:

This comment has been minimized.

@ogrisel

ogrisel Aug 5, 2021
Member

Indeed, it's probably a good idea to use 1e-7 or even 1e-6 when self.wscale has type float32. Otherwise I am not sure this condition will even be triggered on float32 data.

Or we could just use 1e-6 for both dtypes. But maybe this will have non-trivial impacts on the performance of SGDClassifier and co on np.float64 which we should try to avoid introducing as part of this PR.

Unfortunately it's not easy to write tests for this because this is a Cython class with cdef methods.

Maybe we can set epsilon to 1e-6 for float32 and keep 1e-9 unchanged for float64 and just write tests for the float32 case at the public Python API level only in test_sgd.py in a subsequent PR where we enable float32 support for linear SGD models.

Copy link
Member

@jjerphan jjerphan left a comment

LGTM. Thanks @mbatoul for this contribution!

@jjerphan
Copy link
Member

@jjerphan jjerphan commented Aug 6, 2021

setup.cfg needs to be edited similarly to .gitignore to list files to ignore:

scikit-learn/setup.cfg

Lines 65 to 70 in 81dde3a

[check-manifest]
# ignore files missing in VCS
ignore =
sklearn/linear_model/_sag_fast.pyx
sklearn/utils/_seq_dataset.pxd
sklearn/utils/_seq_dataset.pyx

@mbatoul mbatoul requested a review from jjerphan Aug 6, 2021
Copy link
Member

@jjerphan jjerphan left a comment

Still LGTM, @mbatoul: no need to ask for a new review. 🙂

setup.cfg Outdated Show resolved Hide resolved
@ogrisel
ogrisel approved these changes Aug 6, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Still LGTM, just another suggestion:

sklearn/utils/_weight_vector.pyx.tp Outdated Show resolved Hide resolved
Copy link
Contributor

@glemaitre glemaitre left a comment

LGTM. Some small doc things otherwise everything looks good.

sklearn/linear_model/_sag_fast.pyx.tp Outdated Show resolved Hide resolved
sklearn/utils/_weight_vector.pyx.tp Outdated Show resolved Hide resolved
sklearn/utils/_weight_vector.pxd.tp Outdated Show resolved Hide resolved
sklearn/utils/_weight_vector.pxd.tp Outdated Show resolved Hide resolved
sklearn/utils/_weight_vector.pxd.tp Outdated Show resolved Hide resolved
sklearn/utils/_weight_vector.pyx.tp Outdated Show resolved Hide resolved
mbatoul and others added 2 commits Aug 9, 2021
@glemaitre glemaitre merged commit 4b8cd88 into scikit-learn:main Aug 9, 2021
28 checks passed
28 checks passed
@github-actions
check
Details
@github-actions
triage
Details
@github-actions
Check build trigger
Details
@github-actions
Build wheel for cp${{ matrix.python }}-${{ matrix.platform_id }}-${{ matrix.manylinux_image }}
Details
@github-actions
triage_file_extensions
Details
@github-actions
Source distribution
Details
@github-actions
Upload to Anaconda
Details
@lgtm-com
LGTM analysis: C/C++ No code changes detected
Details
@lgtm-com
LGTM analysis: JavaScript No code changes detected
Details
@lgtm-com
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
@circleci-artifacts-redirector
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
@azure-pipelines
scikit-learn.scikit-learn Build #20210809.39 succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Get Git Commit) Get Git Commit succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py37_conda_openblas) Linux py37_conda_openblas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux ubuntu_atlas) Linux ubuntu_atlas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux32 debian_atlas_32bit) Linux32 debian_atlas_32bit succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Ubuntu_Bionic py37_conda) Ubuntu_Bionic py37_conda succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py37_pip_openblas_32bit) Windows py37_pip_openblas_32bit succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_forge_mkl) macOS pylatest_conda_forge_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_mkl_no_openmp) macOS pylatest_conda_mkl_no_openmp succeeded
Details
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Aug 9, 2021

Thanks @mbatoul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants