Skip to content

Conversation

@lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Dec 11, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

confusion_matrix_at_thresholds was amended to a public function added in #30134. I think it would be worth adding to the common metric tests in test_common.py.
This would also fix one of the test failures in #32755

Also fixes test_binary_sample_weight_invariance - the curve metrics, as well as CONTINUOUS_CLASSIFICATION_METRICS, should also take y_score (i.e. unthresholded scores) NOT y_pred

AI usage disclosure

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Any other comments?

cc @ogrisel because this is related to #32755 and @adrinjalali and @jeremiedbb who reviewed #30134

Comment on lines +1627 to +1629
# `median_absolute_error`, and in `diff` when in calculating collinear points
# and points in between to drop `roc_curve` means they are not always
# equivalent when scaling by a float.
Copy link
Member Author

@lucyleeow lucyleeow Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In roc_curve we do:

if drop_intermediate and fps.shape[0] > 2:
optimal_idxs = xp.where(
xp.concat(
[
xp.asarray([True], device=device),
xp.logical_or(xp.diff(fps, 2), xp.diff(tps, 2)),
xp.asarray([True], device=device),
]
)
)[0]

when scaling by a float, diff can result in extra indices. Note that in confusion_matrix_at_thresholds we already use max floating point precision when calculating tps and fps with cumulative_sum.

Also note drop_intermediate=True is the default.

Details

In the test test_binary_sample_weight_invariance, with no scaling the indices are:

optimal_idxs=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17,
18, 20, 21, 22, 23, 24, 25, 26, 29, 30, 32, 33, 34, 35, 36, 37, 38,
39, 41, 42, 43, 44, 45, 46, 48, 49])

with scaling 0.3, the indices are:

optimal_idxs=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 20, 21, 22, 23, 24, 25, 26, 29, 30, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49])

there are extra indices: 14 and 40 here

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

# are scaled by weights, so will vary e.g., scaling by 3 will result in 3 * `tps`
if not (
name.startswith("unnormalized") or name == "confusion_matrix_at_thresholds"
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should define a test module constant WEIGHT_SCALE_DEPENDENT_METRICS with all the metric names that are expected to be dependent on the weight scale.

Copy link
Member Author

@lucyleeow lucyleeow Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found:

# `confusion_matrix` returns absolute values and hence behaves unnormalized
# . Naming it with an unnormalized_ prefix is necessary for this module to
# skip sample_weight scaling checks which will fail for unnormalized
# metrics.
"unnormalized_confusion_matrix": confusion_matrix,

it seems the current procedure is to add "unnormalized" to the start of any non-partial metrics to indicate scale invariance.

I think it still would be nice to have WEIGHT_SCALE_DEPENDENT_METRICS, so you can see all the scale variant metrics in one place, but I am not 100% we shouldn't just stick to the old procedure...

Edit: Added, but can easily undo that commit, let me know what you think

@lucyleeow lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label Dec 11, 2025
@ogrisel
Copy link
Member

ogrisel commented Dec 11, 2025

@lucyleeow there are a few failing tests to fix before final review.

@lucyleeow
Copy link
Member Author

I forgot to actually use WEIGHT_SCALE_DEPENDENT_METRICS. Fixed now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants