-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Fix: Ensure nan_euclidean_distances is symmetric for self-distances #32851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This PR has some broken tests. Furthermore, I suspect that a similar bug happens when |
|
Also, please always include a non-regression test in bugfix PRs. |
…l after corruption.
|
@ogrisel Thanks for the feedback! |
|
Hi @isha822! 👋 I also worked on this issue independently today (started before seeing your PR update) and wanted to share my findings in case they’re helpful. What I observed while testing: Using a random matrix with around 10% NaNs and calling To address this, my approach enforces exact symmetry after the sqrt step: if X is Y: Preserve NaN for fully missing rows, zero out othersdiag_indices = np.arange(len(distances)) This keeps NaN on the diagonal only for rows that are completely missing, and sets other self‑distances to 0. Tests I used:
All existing Question for maintainers (@ogrisel): Is For reference, I’ve opened PR #32874 with this alternative implementation and test so you can compare both approaches side by side. |
|
Hi @abhiyushS, thanks for the detailed investigation! |
The nan_euclidean_distances function produced slightly asymmetric matrices (on the order of 1e-15) when computing distances between X and itself, causing issues with downstream tools like scipy.spatial.distance.squareform.
The Fix:
When Y is None or Y is X, the code now explicitly symmetrizes the result matrix (D + D.T) / 2 and sets the diagonal to 0, ensuring perfect symmetry.
Closes #32848