TST Add `confusion_matrix_at_thresholds` to common tests #32883

lucyleeow · 2025-12-11T02:21:30Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

confusion_matrix_at_thresholds was amended to a public function added in #30134. I think it would be worth adding to the common metric tests in test_common.py.
This would also fix one of the test failures in #32755

Also fixes test_binary_sample_weight_invariance - the curve metrics, as well as CONTINUOUS_CLASSIFICATION_METRICS, should also take y_score (i.e. unthresholded scores) NOT y_pred

AI usage disclosure

I used AI assistance for:

Code generation (e.g., when writing an implementation or fixing a bug)
Test/benchmark generation
Documentation (including examples)
Research and understanding

Any other comments?

cc @ogrisel because this is related to #32755 and @adrinjalali and @jeremiedbb who reviewed #30134

lucyleeow · 2025-12-11T02:33:23Z

sklearn/metrics/tests/test_common.py

+        # `median_absolute_error`, and in `diff` when in calculating collinear points
+        # and points in between to drop `roc_curve` means they are not always
+        # equivalent when scaling by a float.


In roc_curve we do:

scikit-learn/sklearn/metrics/_ranking.py

Lines 1271 to 1280 in 7f0900c

if drop_intermediate and fps.shape[0] > 2:

optimal_idxs = xp.where(

xp.concat(

[

xp.asarray([True], device=device),

xp.logical_or(xp.diff(fps, 2), xp.diff(tps, 2)),

xp.asarray([True], device=device),

]

)

)[0]

when scaling by a float, diff can result in extra indices. Note that in confusion_matrix_at_thresholds we already use max floating point precision when calculating tps and fps with cumulative_sum.

Also note drop_intermediate=True is the default.

Details

In the test test_binary_sample_weight_invariance, with no scaling the indices are:

optimal_idxs=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 48, 49])

with scaling 0.3, the indices are:

optimal_idxs=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49])

there are extra indices: 14 and 40 here

ogrisel

Thanks for the PR.

ogrisel · 2025-12-11T09:14:46Z

sklearn/metrics/tests/test_common.py

+    # are scaled by weights, so will vary e.g., scaling by 3 will result in 3 * `tps`
+    if not (
+        name.startswith("unnormalized") or name == "confusion_matrix_at_thresholds"
+    ):


I think we should define a test module constant WEIGHT_SCALE_DEPENDENT_METRICS with all the metric names that are expected to be dependent on the weight scale.

Just found:

scikit-learn/sklearn/metrics/tests/test_common.py

Lines 159 to 163 in 7f0900c

# `confusion_matrix` returns absolute values and hence behaves unnormalized

# . Naming it with an unnormalized_ prefix is necessary for this module to

# skip sample_weight scaling checks which will fail for unnormalized

# metrics.

"unnormalized_confusion_matrix": confusion_matrix,

it seems the current procedure is to add "unnormalized" to the start of any non-partial metrics to indicate scale invariance.

I think it still would be nice to have WEIGHT_SCALE_DEPENDENT_METRICS, so you can see all the scale variant metrics in one place, but I am not 100% we shouldn't just stick to the old procedure...

Edit: Added, but can easily undo that commit, let me know what you think

ogrisel · 2025-12-11T15:50:54Z

@lucyleeow there are a few failing tests to fix before final review.

lucyleeow · 2025-12-12T02:41:54Z

I forgot to actually use WEIGHT_SCALE_DEPENDENT_METRICS. Fixed now, thanks!

lucyleeow added 2 commits December 11, 2025 11:34

add to tests

c9ba00a

add conf matrix, fix binary weight test

713c0d9

github-actions bot added the module:metrics label Dec 11, 2025

lucyleeow commented Dec 11, 2025

View reviewed changes

lucyleeow added the No Changelog Needed label Dec 11, 2025

ogrisel approved these changes Dec 11, 2025

View reviewed changes

lucyleeow mentioned this pull request Dec 11, 2025

TST Add common test for mixed array API inputs for metrics #32755

Open

adrinjalali assigned ogrisel Dec 11, 2025

add WEIGHT_SCALE_DEPENDENT_METRICS

80d0423

lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label Dec 11, 2025

use WEIGHT_SCALE_DEPENDENT_METRICS

59afe79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Add `confusion_matrix_at_thresholds` to common tests #32883

TST Add `confusion_matrix_at_thresholds` to common tests #32883

lucyleeow commented Dec 11, 2025 •

edited

Loading

Uh oh!

lucyleeow Dec 11, 2025 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Dec 11, 2025

Uh oh!

lucyleeow Dec 11, 2025 •

edited

Loading

Uh oh!

ogrisel commented Dec 11, 2025

Uh oh!

lucyleeow commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if drop_intermediate and fps.shape[0] > 2:
	optimal_idxs = xp.where(
	xp.concat(
	[
	xp.asarray([True], device=device),
	xp.logical_or(xp.diff(fps, 2), xp.diff(tps, 2)),
	xp.asarray([True], device=device),
	]
	)
	)[0]

	# `confusion_matrix` returns absolute values and hence behaves unnormalized
	# . Naming it with an unnormalized_ prefix is necessary for this module to
	# skip sample_weight scaling checks which will fail for unnormalized
	# metrics.
	"unnormalized_confusion_matrix": confusion_matrix,

Uh oh!

TST Add confusion_matrix_at_thresholds to common tests #32883

Are you sure you want to change the base?

TST Add confusion_matrix_at_thresholds to common tests #32883

Conversation

lucyleeow commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

AI usage disclosure

Any other comments?

Uh oh!

lucyleeow Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

lucyleeow Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Dec 11, 2025

Uh oh!

lucyleeow commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TST Add `confusion_matrix_at_thresholds` to common tests #32883

TST Add `confusion_matrix_at_thresholds` to common tests #32883

lucyleeow commented Dec 11, 2025 •

edited

Loading

lucyleeow Dec 11, 2025 •

edited

Loading

lucyleeow Dec 11, 2025 •

edited

Loading