Skip to content

Conversation

@leweex95
Copy link

@leweex95 leweex95 commented Dec 8, 2025

This PR aims to fix a bug where DecisionBoundaryDisplay.from_estimator only displayed up to 7 distinct colors for classifiers with more than 7 classes when using plot_method="contourf" or plot_method="contour"

Fixes #32866

@github-actions github-actions bot removed the CI:Linter failure The linter CI is failing on this PR label Dec 8, 2025
Copy link

@ThexXTURBOXx ThexXTURBOXx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the linked issue - no problems remain. Thank you very much!

@StefanieSenger
Copy link
Member

Thanks for the PR, @leweex95!

@AnneBeyer, since you are concerned with Displays, would you be interested in triaging the corresponding issue and reviewing this PR?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assessing the impact of this PR visually using the following code:

Code

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.inspection import DecisionBoundaryDisplay


data = np.array(
    [
        [-1, -1],
        [-2, -1],
        [1, 1],
        [2, 1],
        [2, 2],
        [3, 2],
        [3, 3],
        [4, 3],
        [4, 4],
        [5, 4],
        [5, 5],
    ]
)
# target = np.asarray([str(i) for i in range(11)])
target = np.arange(11)
clf = LogisticRegression().fit(data, target)

cmap = "gist_rainbow"
_, axes = plt.subplots(nrows=3, ncols=3, figsize=(12, 12), constrained_layout=True)
for plot_method_idx, plot_method in enumerate(["contourf", "contour", "pcolormesh"]):
    for response_method_idx, response_method in enumerate(
        ["predict_proba", "decision_function", "predict"]
    ):
        ax = axes[plot_method_idx, response_method_idx]
        display = DecisionBoundaryDisplay.from_estimator(
            clf,
            data,
            multiclass_colors=cmap,
            response_method=response_method,
            plot_method=plot_method,
            ax=ax,
            alpha=0.5,
        )
        ax.scatter(
            data[:, 0],
            data[:, 1],
            c=target.astype(int),
            edgecolors="black",
            cmap=cmap,
        )
        if plot_method != "pcolormesh":
            if isinstance(display.surface_, list):
                levels = len(display.surface_[0].levels)
            else:
                levels = len(display.surface_.levels)
            ax.set_title(f"plot_method={plot_method}\nresponse_method={response_method}\nlevels={levels}")
        else:
            ax.set_title(f"plot_method={plot_method}\nresponse_method={response_method}")

Results

main

image

this branch

image

Analysis

So I confirm that this fixes two bugs:

  • one for contour plot method in conjunction with the predict response method;
  • one for contourf plot method in conjunction with the predict_proba and decision_function response methods.

I think the changelog should be adjusted to better reflect this.

See also some further suggestions to improve the tests.

[5, 5],
]
)
y = np.arange(11)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please parametrize this test to make sure that it still works as expected when the dtype of y is object with str instances as class labels?

I think the code in this PR should work since from_estimator always uses a LabelEncoder but better cover this case in this test.

clf = LogisticRegression().fit(pts, y)

disp = DecisionBoundaryDisplay.from_estimator(
clf, pts, response_method="predict", plot_method=plot_method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend this test for predict_proba and decision_function as well, at least for the "contour".

@ogrisel
Copy link
Member

ogrisel commented Dec 9, 2025

Maybe @lucyleeow would also be interested in taking a look at this PR given her past involvement with this class.

Copy link
Contributor

@AnneBeyer AnneBeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @leweex95!

In general, it looks good to me, I just added a few comments below.

)
else:
levels = unique_levels
kwargs.setdefault("levels", levels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is too defensive (I'm not too familiar with the use of levels), but since the user can also pass kwargs (like levels) directly to from_estimator, maybe this should only be computed and added if levels is not given?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. And the test should be updated accordingly.

# Plot only argmax map for contour
class_map = self.response.argmax(axis=2)
# Ensure levels match the number of classes for distinct colors
kwargs.setdefault("levels", np.unique(class_map))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, only set if not given?

)

# Check that the surface has levels for all classes
if plot_method in ("contourf", "contour"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plot_method can only be either "contourf" or "contour" (as defined by parametrize), so I don't think this check is needed.

Copy link
Member

@ogrisel ogrisel Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also be set to "pcolormesh" (see the code snippet I posted above) but maybe this is not consistently documented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, yes, but this test is (at least currently) only testing "contourf" or "contour". So if the test is extended, it should stay, otherwise, it is not necessary here.

Copy link
Member

@ogrisel ogrisel Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed, sorry for the noisy comment. Since this test is about testing the actual number of levels which is undefined for "pcolormesh", let's remove this condition.

@leweex95
Copy link
Author

Thanks for your comments, @AnneBeyer and @ogrisel , pushed the fixes!

@lucyleeow
Copy link
Member

lucyleeow commented Dec 15, 2025

Thanks for your PR @leweex95, I think it would be best if #32872 was addressed first (see #32872 (comment) for details) - that fix would probably remove the binary (and predict)/multiclass logic we have currently for color handling, and also it would probably pass n_classes information, so we can avoid any expensive np.unique calls.

@AnneBeyer is going to work on #32872. You're welcome to pick this up after #32872 is fixed (there may be some significant merge conflicts though, so you may find it easier to start a new PR).

Edit: Note for the new PR I think we should add a TODO to delete this extra logic once our min matplotlib version is >=3.11 since it has been fixed in matplotlib/matplotlib#27576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DecisionBoundaryDisplay.from_estimator only displays up to 7 distinct colours

6 participants