EXA improve example of forest feature importances digits #19429

azihna · 2021-02-10T14:10:17Z

What does this implement/fix? Explain your changes.

References #14528
Changed the model to RandomForestClassifier from ExtraTreesClassifier.
Changed the formatting to the formatting of the current example with explanations.
Added the permutation importance example, for details please see the issue.
Changed the dataset from faces to digits because the faces dataset has n_features >> n_samples. permutation importance takes over 15 minutes to calculate even when the n_rounds parameter is reduced or a much smaller model is used and the results just came through as all zeroes. It is much quicker with the digits dataset and can still show the usage of the methods in a similar fashion.

Any other comments?

glemaitre · 2021-02-11T09:00:47Z

It might not be worth to change this example. We will not have the issue with low-cardinality issue. However, we have issue with the fact that the MDI is computed on training statistics. Maybe, we could add a warning but let the example as is for the rest.

glemaitre · 2021-02-11T09:01:08Z

15 minutes is too long for an example indeed.

azihna · 2021-02-11T09:35:13Z

The 15 minutes is for a RandomForest with 200 trees and permutation n_rounds 2 and n_jobs=-1 with an 8 core CPU machine and all of the permutation results come back as zero. With a model that small the example for MDI also didn't look as good.

Do you want me to revert the example back to faces? I think the example is more explanatory and complete like this but I can just provide a link to one of the other examples.

reshamas · 2021-02-15T18:04:07Z

#DataUmbrella
cc: @Mariam-ke

glemaitre · 2021-02-20T07:31:55Z

I summarize some discussions that we had during the meeting in DataUmbrella.

We should keep the example as is and keep the MDI feature importance.
However, we should mention that the limitations of the MDI are not an issue on this specific example: (i) all features are homogeneous and will not suffer the cardinality bias and (ii) we are interested to represent knowledge of the forest acquired on the training set.

ogrisel · 2021-02-20T08:59:45Z

Also mention that if those two conditions are not met, it's recommended to instead use permutation_importance and link to the relevant section in the user guide (using a sphinx reference).

…nto exa_permutation_faces

azihna · 2021-02-23T21:07:58Z

Can you please advise what to fix for the changelog check? Is it better to just create a new PR?
Edit: It seems like it is fine since the last commit. I just needed to commit once more. Thanks!

Mariam-ke · 2021-03-11T14:19:57Z

@azihna How is this PR going? Please let us know if we can answer any questions.

cc: @reshamas

azihna · 2021-03-11T14:33:29Z

@Mariam-ke I implemented all of the requests and waiting for review. Thanks for following it up :)

reshamas · 2021-03-15T14:02:23Z

@ogrisel @glemaitre
This PR is from the #DataUmbrella sprint. It's one of 4 open PRs from the sprint.

…nto exa_permutation_faces

lorentzenchr

@azihna This is a clear improvement of this example, thank you. If suggested only a few small changes.

examples/ensemble/plot_forest_importances_faces.py

Co-authored-by: Christian Lorentzen <[email protected]>

azihna · 2021-05-21T19:07:06Z

@lorentzenchr Thanks for the review. I accepted all of your suggestions.

examples/ensemble/plot_forest_importances_faces.py

Co-authored-by: Christian Lorentzen <[email protected]>

lorentzenchr

LGTM

lorentzenchr · 2021-05-21T21:30:41Z

@glemaitre @ogrisel In case you want to give it another review pass, I'll wait a little before merging. If not, I'd say let's be efficient and merge as it's a clear improvement.

thomasjpfan

Thank you for the PR @azihna !

examples/ensemble/plot_forest_importances_faces.py

thomasjpfan · 2021-05-22T12:56:49Z

examples/ensemble/plot_forest_importances_faces.py

+# We use the faces data from datasets submodules when using impurity-based
+# feature importance. One drawback of this method is that it cannot be
+# applied on a separate test set.
+# on a separate test set but for this example, we are interested
+# in representing the information learned from the full dataset.
+# Also, we'll set the number of cores to use for the tasks.


I think talking about a test set here while the example does not split on a test set could be confusing to a reader. Here is my suggestion:

First, we load the olivetti faces dataset and limit the dataset to contain only the first
five classes. Then we train a random forest on the dataset and evaluate the
impurity-based feature importance.

@thomasjpfan Thanks a lot for the review!
I think your explanation is much clearer than the previous one but I still think it is important to mention the drawback of using this method. Let me know what you think of the new version.

lorentzenchr · 2021-05-26T11:37:15Z

@thomasjpfan If you've time: Is the current state good for you?

thomasjpfan

Thank you for working on this PR @azihna !

LGTM

EXA improve example of forest feature importances digits (scikit-learn#19429)

Alihan Zihna added 2 commits February 10, 2021 13:56

Add permutation imp and change dataset to digits

c62106c

Fix flake8 errors

0dca5ac

Remove tight layout from plots

7305359

Alihan Zihna added 2 commits February 23, 2021 20:48

Reintroduce faces dataset and add the MDI usage warning

8e156a4

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

16d3444

…nto exa_permutation_faces

Fix bullet points and the reference

71e544c

cmarmo added Documentation Waiting for Reviewer labels Feb 24, 2021

Alihan Zihna added 6 commits April 12, 2021 09:45

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

b68e600

…nto exa_permutation_faces

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

2762e0a

…nto exa_permutation_faces

merge updates

f46f332

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

8b677f4

…nto exa_permutation_faces

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

24c2cd1

…nto exa_permutation_faces

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

f178d83

…nto exa_permutation_faces

lorentzenchr reviewed May 21, 2021

View reviewed changes

azihna and others added 4 commits May 21, 2021 20:05

Update examples/ensemble/plot_forest_importances_faces.py

7f985be

Co-authored-by: Christian Lorentzen <[email protected]>

Update examples/ensemble/plot_forest_importances_faces.py

eed928e

Co-authored-by: Christian Lorentzen <[email protected]>

Update examples/ensemble/plot_forest_importances_faces.py

06a81b8

Co-authored-by: Christian Lorentzen <[email protected]>

Update examples/ensemble/plot_forest_importances_faces.py

0e327c3

Co-authored-by: Christian Lorentzen <[email protected]>

azihna added 2 commits May 21, 2021 20:09

Update plot_forest_importances_faces.py

b83eae8

Fix linting

290b1f7

lorentzenchr reviewed May 21, 2021

View reviewed changes

examples/ensemble/plot_forest_importances_faces.py Outdated Show resolved Hide resolved

azihna added 2 commits May 21, 2021 20:38

Move the location of the question

4c0ea88

Update plot_forest_importances_faces.py

e5d8a47

lorentzenchr reviewed May 21, 2021

View reviewed changes

examples/ensemble/plot_forest_importances_faces.py Outdated Show resolved Hide resolved

Update examples/ensemble/plot_forest_importances_faces.py

b901bdd

Co-authored-by: Christian Lorentzen <[email protected]>

azihna requested a review from lorentzenchr May 21, 2021 20:43

lorentzenchr approved these changes May 21, 2021

View reviewed changes

thomasjpfan reviewed May 22, 2021

View reviewed changes

update according to reviews

9953aef

thomasjpfan approved these changes May 28, 2021

View reviewed changes

thomasjpfan merged commit deda6e2 into scikit-learn:main May 28, 2021

sakibguy added a commit to sakibguy/scikit-learn that referenced this pull request May 30, 2021

Merge pull request #3 from scikit-learn/main

c6bb594

EXA improve example of forest feature importances digits (scikit-learn#19429)

Apr	MAY	Jun
	20
2024	2025	2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXA improve example of forest feature importances digits #19429

EXA improve example of forest feature importances digits #19429

azihna commented Feb 10, 2021 •

edited

Loading

glemaitre commented Feb 11, 2021

glemaitre commented Feb 11, 2021

azihna commented Feb 11, 2021 •

edited

Loading

reshamas commented Feb 15, 2021

glemaitre commented Feb 20, 2021

ogrisel commented Feb 20, 2021 •

edited

Loading

azihna commented Feb 23, 2021 •

edited

Loading

Mariam-ke commented Mar 11, 2021

azihna commented Mar 11, 2021

reshamas commented Mar 15, 2021

lorentzenchr left a comment

azihna commented May 21, 2021

lorentzenchr left a comment

lorentzenchr commented May 21, 2021

thomasjpfan left a comment

thomasjpfan May 22, 2021

azihna May 22, 2021

lorentzenchr commented May 26, 2021

thomasjpfan left a comment

EXA improve example of forest feature importances digits #19429

EXA improve example of forest feature importances digits #19429

Conversation

azihna commented Feb 10, 2021 • edited Loading

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented Feb 11, 2021

glemaitre commented Feb 11, 2021

azihna commented Feb 11, 2021 • edited Loading

reshamas commented Feb 15, 2021

glemaitre commented Feb 20, 2021

ogrisel commented Feb 20, 2021 • edited Loading

azihna commented Feb 23, 2021 • edited Loading

Mariam-ke commented Mar 11, 2021

azihna commented Mar 11, 2021

reshamas commented Mar 15, 2021

lorentzenchr left a comment

Choose a reason for hiding this comment

azihna commented May 21, 2021

lorentzenchr left a comment

Choose a reason for hiding this comment

lorentzenchr commented May 21, 2021

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan May 22, 2021

Choose a reason for hiding this comment

azihna May 22, 2021

Choose a reason for hiding this comment

lorentzenchr commented May 26, 2021

thomasjpfan left a comment

Choose a reason for hiding this comment

azihna commented Feb 10, 2021 •

edited

Loading

azihna commented Feb 11, 2021 •

edited

Loading

ogrisel commented Feb 20, 2021 •

edited

Loading

azihna commented Feb 23, 2021 •

edited

Loading