New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX adaboost return nan in feature importance #20415
FIX adaboost return nan in feature importance #20415
Conversation
Co-authored-by: Olivier Grisel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify what I suggested previously:
sklearn/ensemble/_weight_boosting.py
Outdated
# avoid extremely small sample weight, detail see issue #20320 | ||
sample_weight = np.clip(sample_weight, a_min=epsilon, a_max=None) | ||
# do not clip sample weight when it's exactly 0 | ||
sample_weight[zero_loc] = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion above would instead have been:
# Make near-zero weights exactly 0 to avoid numerical issues when computing
# feature importances.
sample_weight[sample_weight < epsilon] = 0
However this is just a suggestion and is open to discussion because I am not 100% sure this is the correct fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the report. This is kind of surprising... but so be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I indeed find problematic the comparison with 0.0
. It is ugly and might bite us at some point :)
@MaxwellLZH could you provide the code snippet used to build the ROC curve: is it a training or testing score and if it is the testing score, do you observe the same for the training score?
Since #20443 was linked to regressor mainly, I am thinking that this is maybe related but for a classification problem now. It could be nice to check the strategy of resetting weight because setting to zeros and adding weak-learner might lead to a local optimal with a non-diverse ensemble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @glemaitre , the curve can be reproduced with the following code, where data_train.csv
is the same dataset in the original issue data_train.csv
import pandas as pd
import numpy as np
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
train_data=pd.read_csv('~/Downloads/data_train.csv')
model_variables=['RH','t2m','tp_r5','swvl1','SM_r20','tp','cvh','vdi','SM_r10','SM_IDW']
X = train_data[model_variables] # Features
y = train_data.ignition_no
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1024)
train_auc, test_auc = [], []
for n_est in [1, 3, 5, 10, 20, 50, 100]:
est = AdaBoostClassifier(
base_estimator=DecisionTreeClassifier(max_depth=10, random_state=0),
n_estimators=n_est)
est.fit(X_train, y_train)
train_auc.append(roc_auc_score(y_train, est.predict_proba(X_train)[:, 1]))
test_auc.append(roc_auc_score(y_test, est.predict_proba(X_test)[:, 1]))
train_auc
# [0.9408335858626135, 0.9843405707338944, 0.9937636343659565, 0.9972901393075558, 0.9483462433389864, 0.9472818988784156, 0.9391712786487824]
The changes I made is adding sample_weight[sample_weight < epsilon] = 0
after line 162, where epilson = np.finfo(sample_weight.dtype).eps
scikit-learn/sklearn/ensemble/_weight_boosting.py
Lines 156 to 163 in b94bc5e
random_state = check_random_state(self.random_state) | |
for iboost in range(self.n_estimators): | |
# Boosting step | |
sample_weight, estimator_weight, estimator_error = self._boost( | |
iboost, X, y, sample_weight, random_state | |
) | |
Co-authored-by: Olivier Grisel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please add a entry to the changelog in doc/whats_new/v1.0.rst
.
For reference, I tried to see if this PR could fix the bad training convergence behavior observed in #20443 but unfortunately it does not help. |
336e993
to
cc5b899
Compare
Hi @MaxwellLZH , thank you for your work so far. Do you mind fixing conflicts, if you are still interested in working on this pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It seems that the failure shows that we change the behaviour of the estimator. @MaxwellLZH would you have time to check why we don't trigger the warning anymore? I assume that the trick of implementing is avoiding to have degenerated cases. We might still want to trigger the warning in those cases even if it will not appear in the output. |
Sure! I will check the failing tests this week . |
Hi @MaxwellLZH , two approvals but some failing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxwellLZH, the changelog contains some relics from the conflict declaration.
Do you mind fixing them and commit?
The builds are no longer available, it is difficult to review the failing tests.
Thanks!
Co-authored-by: Chiara Marmo <[email protected]>
Co-authored-by: Chiara Marmo <[email protected]>
Hi @MaxwellLZH, I hope you don't mind if I'm stepping in... just thinking that this PR deserves to be merged... :)
will make the test pass. |
Hi @cmarmo, thank you so much for helping with the PR! Really appreciate it ! :) |
Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Chiara Marmo <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Chiara Marmo <[email protected]>
Reference Issues/PRs
This is a fix to #20320. Corresponding test case is also added.
What does this implement/fix? Explain your changes.
As discussed in the original thread, the NaN in feature importance is caused by the extremely small sample weight during the boosting process. A quick fix will be just clipping the sample weight to be at least epsilon.
sample_weight = np.clip(sample_weight, a_min=epsilon, a_max=1.0)