-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
TST Add confusion_matrix_at_thresholds to common tests
#32883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| # `median_absolute_error`, and in `diff` when in calculating collinear points | ||
| # and points in between to drop `roc_curve` means they are not always | ||
| # equivalent when scaling by a float. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In roc_curve we do:
scikit-learn/sklearn/metrics/_ranking.py
Lines 1271 to 1280 in 7f0900c
| if drop_intermediate and fps.shape[0] > 2: | |
| optimal_idxs = xp.where( | |
| xp.concat( | |
| [ | |
| xp.asarray([True], device=device), | |
| xp.logical_or(xp.diff(fps, 2), xp.diff(tps, 2)), | |
| xp.asarray([True], device=device), | |
| ] | |
| ) | |
| )[0] |
when scaling by a float, diff can result in extra indices. Note that in confusion_matrix_at_thresholds we already use max floating point precision when calculating tps and fps with cumulative_sum.
Also note drop_intermediate=True is the default.
Details
In the test test_binary_sample_weight_invariance, with no scaling the indices are:
optimal_idxs=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17,
18, 20, 21, 22, 23, 24, 25, 26, 29, 30, 32, 33, 34, 35, 36, 37, 38,
39, 41, 42, 43, 44, 45, 46, 48, 49])
with scaling 0.3, the indices are:
optimal_idxs=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 20, 21, 22, 23, 24, 25, 26, 29, 30, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49])
there are extra indices: 14 and 40 here
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
sklearn/metrics/tests/test_common.py
Outdated
| # are scaled by weights, so will vary e.g., scaling by 3 will result in 3 * `tps` | ||
| if not ( | ||
| name.startswith("unnormalized") or name == "confusion_matrix_at_thresholds" | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should define a test module constant WEIGHT_SCALE_DEPENDENT_METRICS with all the metric names that are expected to be dependent on the weight scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found:
scikit-learn/sklearn/metrics/tests/test_common.py
Lines 159 to 163 in 7f0900c
| # `confusion_matrix` returns absolute values and hence behaves unnormalized | |
| # . Naming it with an unnormalized_ prefix is necessary for this module to | |
| # skip sample_weight scaling checks which will fail for unnormalized | |
| # metrics. | |
| "unnormalized_confusion_matrix": confusion_matrix, |
it seems the current procedure is to add "unnormalized" to the start of any non-partial metrics to indicate scale invariance.
I think it still would be nice to have WEIGHT_SCALE_DEPENDENT_METRICS, so you can see all the scale variant metrics in one place, but I am not 100% we shouldn't just stick to the old procedure...
Edit: Added, but can easily undo that commit, let me know what you think
|
@lucyleeow there are a few failing tests to fix before final review. |
|
I forgot to actually use |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
confusion_matrix_at_thresholdswas amended to a public function added in #30134. I think it would be worth adding to the common metric tests intest_common.py.This would also fix one of the test failures in #32755
Also fixes
test_binary_sample_weight_invariance- the curve metrics, as well asCONTINUOUS_CLASSIFICATION_METRICS, should also takey_score(i.e. unthresholded scores) NOTy_predAI usage disclosure
I used AI assistance for:
Any other comments?
cc @ogrisel because this is related to #32755 and @adrinjalali and @jeremiedbb who reviewed #30134