-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
FIX: fixed DecisionBoundaryDisplay to display distinct colours for all classess correctly #32867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…l classes correctly.
ThexXTURBOXx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the linked issue - no problems remain. Thank you very much!
|
Thanks for the PR, @leweex95! @AnneBeyer, since you are concerned with Displays, would you be interested in triaging the corresponding issue and reviewing this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assessing the impact of this PR visually using the following code:
Code
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.inspection import DecisionBoundaryDisplay
data = np.array(
[
[-1, -1],
[-2, -1],
[1, 1],
[2, 1],
[2, 2],
[3, 2],
[3, 3],
[4, 3],
[4, 4],
[5, 4],
[5, 5],
]
)
# target = np.asarray([str(i) for i in range(11)])
target = np.arange(11)
clf = LogisticRegression().fit(data, target)
cmap = "gist_rainbow"
_, axes = plt.subplots(nrows=3, ncols=3, figsize=(12, 12), constrained_layout=True)
for plot_method_idx, plot_method in enumerate(["contourf", "contour", "pcolormesh"]):
for response_method_idx, response_method in enumerate(
["predict_proba", "decision_function", "predict"]
):
ax = axes[plot_method_idx, response_method_idx]
display = DecisionBoundaryDisplay.from_estimator(
clf,
data,
multiclass_colors=cmap,
response_method=response_method,
plot_method=plot_method,
ax=ax,
alpha=0.5,
)
ax.scatter(
data[:, 0],
data[:, 1],
c=target.astype(int),
edgecolors="black",
cmap=cmap,
)
if plot_method != "pcolormesh":
if isinstance(display.surface_, list):
levels = len(display.surface_[0].levels)
else:
levels = len(display.surface_.levels)
ax.set_title(f"plot_method={plot_method}\nresponse_method={response_method}\nlevels={levels}")
else:
ax.set_title(f"plot_method={plot_method}\nresponse_method={response_method}")Results
main
this branch
Analysis
So I confirm that this fixes two bugs:
- one for
contourplot method in conjunction with thepredictresponse method; - one for
contourfplot method in conjunction with thepredict_probaanddecision_functionresponse methods.
I think the changelog should be adjusted to better reflect this.
See also some further suggestions to improve the tests.
| [5, 5], | ||
| ] | ||
| ) | ||
| y = np.arange(11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please parametrize this test to make sure that it still works as expected when the dtype of y is object with str instances as class labels?
I think the code in this PR should work since from_estimator always uses a LabelEncoder but better cover this case in this test.
| clf = LogisticRegression().fit(pts, y) | ||
|
|
||
| disp = DecisionBoundaryDisplay.from_estimator( | ||
| clf, pts, response_method="predict", plot_method=plot_method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend this test for predict_proba and decision_function as well, at least for the "contour".
|
Maybe @lucyleeow would also be interested in taking a look at this PR given her past involvement with this class. |
AnneBeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @leweex95!
In general, it looks good to me, I just added a few comments below.
| ) | ||
| else: | ||
| levels = unique_levels | ||
| kwargs.setdefault("levels", levels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is too defensive (I'm not too familiar with the use of levels), but since the user can also pass kwargs (like levels) directly to from_estimator, maybe this should only be computed and added if levels is not given?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. And the test should be updated accordingly.
| # Plot only argmax map for contour | ||
| class_map = self.response.argmax(axis=2) | ||
| # Ensure levels match the number of classes for distinct colors | ||
| kwargs.setdefault("levels", np.unique(class_map)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, only set if not given?
| ) | ||
|
|
||
| # Check that the surface has levels for all classes | ||
| if plot_method in ("contourf", "contour"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plot_method can only be either "contourf" or "contour" (as defined by parametrize), so I don't think this check is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also be set to "pcolormesh" (see the code snippet I posted above) but maybe this is not consistently documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, yes, but this test is (at least currently) only testing "contourf" or "contour". So if the test is extended, it should stay, otherwise, it is not necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah indeed, sorry for the noisy comment. Since this test is about testing the actual number of levels which is undefined for "pcolormesh", let's remove this condition.
|
Thanks for your comments, @AnneBeyer and @ogrisel , pushed the fixes! |
|
Thanks for your PR @leweex95, I think it would be best if #32872 was addressed first (see #32872 (comment) for details) - that fix would probably remove the binary (and predict)/multiclass logic we have currently for color handling, and also it would probably pass @AnneBeyer is going to work on #32872. You're welcome to pick this up after #32872 is fixed (there may be some significant merge conflicts though, so you may find it easier to start a new PR). Edit: Note for the new PR I think we should add a TODO to delete this extra logic once our min matplotlib version is >=3.11 since it has been fixed in matplotlib/matplotlib#27576 |
This PR aims to fix a bug where
DecisionBoundaryDisplay.from_estimatoronly displayed up to 7 distinct colors for classifiers with more than 7 classes when usingplot_method="contourf"orplot_method="contour"Fixes #32866