-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Closed
Description
PR #22315 added AVX512 sorting for 16-bit and 64-bit floats. Along the way we noted:
Sort operation mainly based on 16-bit shuffle/permuting instructions has nothing to do with FP unit except for the part that related to comparison wouldn't affect that much on performance and it can be numerically emulated without even the need for cast between single/half precision.
... then implementing a float16 sort without casting float16 to float32 would also speed up non-AVX512 architectures.
I think the code in question starts here but I am not sure where the actual implementation of the half-float math is.
numpy/numpy/core/src/npysort/quicksort.cpp
Line 114 in 0bd56e7
| quicksort_(type *start, npy_intp num) |
Metadata
Metadata
Assignees
Labels
No labels