Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? #20488

pythonometrist · 2021-07-09T04:23:19Z

Describe the issue linked to the documentation

The description of cross-validation estimator does not explain whether the final model parameters are estimated on the entire training set, using the optimal hyperparameter obtained through cross-validation.

Typically we use cross-validation to identify optimal hyperparameters (such as SVM or LASSO).

What needs to be clarified is, once the optimal hyperparameter is found via cross-validation (say c in SVM or alpha in LASSO), does say LassoCV, automatically re-estimate the model with the optimal alpha?

Suggest a potential alternative/fix

Simply add a note explaining whether the last re-estimation step using the entire training data occurs or not. Also any suggestions on obtaining the optimal hyperparameter found would also help.

TomDLT · 2021-07-09T17:52:04Z

Answer: Yes, these estimators refit on the full training set. I agree this should be documented somewhere, for instance in the glossary, and in the docstring of each class.

Details:

LassoCV, ElasticNetCV, MultiTaskElasticNetCV, and MultiTaskLassoCV refit.
LarsCV and LassoLarsCV also refit.
GraphicalLassoCV as well.
OrthogonalMatchingPursuitCV as well.
LogisticRegressionCV has a refit parameter to control that behavior.
RidgeCV and RidgeClassifierCV either use a leave-one-out cross-validation that does not require refitting, or directly use a GridSearchCV with refit=True.

himanshu007-creator · 2021-07-11T08:57:14Z

Hi, I would like to work on this issue. Please guide me further on what needs to be done. Thanks😄

TomDLT · 2021-07-12T18:11:48Z

We should add at the top of the docstring of all the listed classes that the estimator refits the model on the full training set after finding the best hyperparameters (reminding the reader what the hyperparameters optimized in each class are). We should also add this as a general info in the glossary (see link above).

deeksha200 · 2021-07-18T12:28:24Z

Yes, After finding the optimal hyperparameters using cross validation , final model uses these optimal hyperparameters and refits on the whole training dataset, and you rightly mentioned it is no where documented.

brgopalakrishnan · 2021-07-26T10:23:05Z

if this is not taken, I am happy to fix. Thank you

TomDLT · 2021-07-26T16:21:54Z

You very welcome!

pythonometrist · 2021-08-12T21:20:42Z

Many Dhanyavadams!

mohitrbhardwaj · 2021-10-18T16:37:55Z

take

pythonometrist added the Documentation label Jul 9, 2021

pythonometrist changed the title ~~Do cross-validation estimators re-estimate the model on the full training set after finding the best hyperparameters?~~ Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? Jul 9, 2021

reshamas added Easy good first issue labels Jul 10, 2021

brgopalakrishnan linked a pull request that will close this issue Jul 29, 2021

DOC explicitely mention refitting strategy in EstimatorCV #20632

Open

github-actions bot assigned mohitrbhardwaj Oct 18, 2021

Sep	OCT	Nov
	26
2020	2021	2022

scikit-learn / scikit-learn Public

Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? #20488

Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? #20488

pythonometrist commented Jul 9, 2021

TomDLT commented Jul 9, 2021

himanshu007-creator commented Jul 11, 2021

TomDLT commented Jul 12, 2021 •

edited

deeksha200 commented Jul 18, 2021

brgopalakrishnan commented Jul 26, 2021

TomDLT commented Jul 26, 2021

pythonometrist commented Aug 12, 2021

mohitrbhardwaj commented Oct 18, 2021

scikit-learn / scikit-learn Public

Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? #20488

Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? #20488

Comments

pythonometrist commented Jul 9, 2021

Describe the issue linked to the documentation

Suggest a potential alternative/fix

TomDLT commented Jul 9, 2021

himanshu007-creator commented Jul 11, 2021

TomDLT commented Jul 12, 2021 • edited

deeksha200 commented Jul 18, 2021

brgopalakrishnan commented Jul 26, 2021

TomDLT commented Jul 26, 2021

pythonometrist commented Aug 12, 2021

mohitrbhardwaj commented Oct 18, 2021

TomDLT commented Jul 12, 2021 •

edited