The Wayback Machine - https://web.archive.org/web/20211026011657/https://github.com/scikit-learn/scikit-learn/issues/20488
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? #20488

Open
pythonometrist opened this issue Jul 9, 2021 · 8 comments · May be fixed by #20632

Comments

@pythonometrist
Copy link

@pythonometrist pythonometrist commented Jul 9, 2021

Describe the issue linked to the documentation

The description of cross-validation estimator does not explain whether the final model parameters are estimated on the entire training set, using the optimal hyperparameter obtained through cross-validation.

Typically we use cross-validation to identify optimal hyperparameters (such as SVM or LASSO).

What needs to be clarified is, once the optimal hyperparameter is found via cross-validation (say c in SVM or alpha in LASSO), does say LassoCV, automatically re-estimate the model with the optimal alpha?

Suggest a potential alternative/fix

Simply add a note explaining whether the last re-estimation step using the entire training data occurs or not. Also any suggestions on obtaining the optimal hyperparameter found would also help.
@pythonometrist pythonometrist changed the title Do cross-validation estimators re-estimate the model on the full training set after finding the best hyperparameters? Do cross-validation estimators like LassoCV re-estimate the model on the full training set after finding the best hyperparameters? Jul 9, 2021
@TomDLT
Copy link
Member

@TomDLT TomDLT commented Jul 9, 2021

Answer: Yes, these estimators refit on the full training set. I agree this should be documented somewhere, for instance in the glossary, and in the docstring of each class.

Details:

@himanshu007-creator
Copy link

@himanshu007-creator himanshu007-creator commented Jul 11, 2021

Hi, I would like to work on this issue. Please guide me further on what needs to be done. Thanks😄

@TomDLT
Copy link
Member

@TomDLT TomDLT commented Jul 12, 2021

We should add at the top of the docstring of all the listed classes that the estimator refits the model on the full training set after finding the best hyperparameters (reminding the reader what the hyperparameters optimized in each class are). We should also add this as a general info in the glossary (see link above).

@deeksha200
Copy link

@deeksha200 deeksha200 commented Jul 18, 2021

Yes, After finding the optimal hyperparameters using cross validation , final model uses these optimal hyperparameters and refits on the whole training dataset, and you rightly mentioned it is no where documented.

@brgopalakrishnan
Copy link

@brgopalakrishnan brgopalakrishnan commented Jul 26, 2021

if this is not taken, I am happy to fix. Thank you

@TomDLT
Copy link
Member

@TomDLT TomDLT commented Jul 26, 2021

You very welcome!

@pythonometrist
Copy link
Author

@pythonometrist pythonometrist commented Aug 12, 2021

Many Dhanyavadams!

@mohitrbhardwaj
Copy link

@mohitrbhardwaj mohitrbhardwaj commented Oct 18, 2021

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants