FIX use unique values of y_true and y_pred in plot_confusion_matrix instead of estimator.classes_ #18405
Conversation
Although the docstring and the API guide say "If 'None' is given, those that appear at least once in `y_true` or `y_pred` are used in sorted order", the "estimator.classes_" field was used.
Although the docstring and the API guide say "If 'None' is given, those that appear at least once in 'y_true' or 'y_pred' are used in sorted order", the 'estimator.classes_' field was used.
Thank you for the PR! Please add a non-regression test that would fail at master but pass in this PR. |
A test for plot_confusion_matrix() behaviour when 'labels=None' and the dataset with true labels contains labels previously unseen by the classifier (and therefore not present in its 'classes_') attribute. According to the function description, it must create a union of the predicted labels and the true labels.
An update to the 'test_error_on_a_dataset_with_unseen_labels()' function to fix 'E501 line too long' errors.
Thank you for the review, @thomasjpfan! This is my first pull request, I will try to do my best to implement and prepare everything correctly. I have added the test test_error_on_a_dataset_with_unseen_labels() that checks tick labels of the confusion matrix plot. |
raise TypeError( | ||
f"Labels in y_true and y_pred should be of the same type. " | ||
f"Got y_true={np.unique(y_true)} and " | ||
f"y_pred={np.unique(y_pred)}. Make sure that the " | ||
f"predictions provided by the classifier coincides with " | ||
f"the true labels." |
thomasjpfan
Sep 24, 2020
Member
Do we have a test to make sure this error is raised?
Do we have a test to make sure this error is raised?
kyouma
Sep 24, 2020
Author
Contributor
I have removed the Try-Except wrapping as the function confusion_matrix(), which is used above to get the matrix itself, contains the same unique_labels() call that was wrapped by the Try-Except block, and the function unique_labels() raises an exception with a description when the true and predicted labels have different types. So if the execution arrives at the line, it will not make any problems.
I have removed the Try-Except wrapping as the function confusion_matrix(), which is used above to get the matrix itself, contains the same unique_labels() call that was wrapped by the Try-Except block, and the function unique_labels() raises an exception with a description when the true and predicted labels have different types. So if the execution arrives at the line, it will not make any problems.
labels=None, display_labels=None) | ||
|
||
disp_labels = set([tick.get_text() for tick in disp.ax_.get_xticklabels()]) | ||
expected_labels = unique_labels(y, fitted_clf.predict(X)) |
thomasjpfan
Sep 24, 2020
Member
In this case, we can list the labels:
display_labels = [tick.get_text() for tick in disp.ax_.get_xticklabels()]
expected_labels = [f'{i}' for range(6)]
assert_array_equal(expected_labels, display_labels)
In this case, we can list the labels:
display_labels = [tick.get_text() for tick in disp.ax_.get_xticklabels()]
expected_labels = [f'{i}' for range(6)]
assert_array_equal(expected_labels, display_labels)
kyouma
Sep 24, 2020
Author
Contributor
Thank you, I have replaced these lines and the assertion check with your code.
Thank you, I have replaced these lines and the assertion check with your code.
This Try-Catch is not necessary, as the same unique_labels() function is called inside confusion_matrix() above and raises an exception with a description if the types of true and predicted labels differ.
Please add an entry to the change log at |
@@ -314,3 +315,16 @@ def test_default_labels(pyplot, display_labels, expected_labels): | |||
|
|||
assert_array_equal(x_ticks, expected_labels) | |||
assert_array_equal(y_ticks, expected_labels) | |||
|
|||
|
|||
def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data): |
thomasjpfan
Sep 24, 2020
Member
We may need to wrap this to be <= 79:
Suggested change
def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data):
def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data, n_classes):
We may need to wrap this to be <= 79:
def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data): | |
def test_error_on_a_dataset_with_unseen_labels(pyplot, fitted_clf, data, n_classes): |
kyouma
Sep 24, 2020
Author
Contributor
Thank you.
Thank you.
…seen_labels() - Replaced the assertion check - Removed the unused import
Mentioned the PR #18405.
The `labels` and `display_labels` parameters have been set to thier default values. Co-authored-by: Thomas J. Fan <[email protected]>
Thank you very much, I have implemented your suggestions and corrections. I have also added the |Fix| entry to |
Minor comments, otherwise LGTM |
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
…nfusion_matrix.py Co-authored-by: Thomas J. Fan <[email protected]>
I have applied the suggested changes. Thank you for your guidance! |
LGTM. I will merge when the CIs will turn green. |
90b9b5d
into
scikit-learn:master
…nstead of estimator.classes_ (scikit-learn#18405) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]>
Although the docstring and the API guide of sklearn.metrics.plot_confusion_matrix() say about the labels argument the following: "If 'None' is given, those that appear at least once in 'y_true' or 'y_pred' are used in sorted order", the estimator.classes_ field was used.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
This change fixes errors when y_true and y_pred doesn't have some values from estimator.classes_.
Any other comments?