-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG: Fixed issue where rolling.kurt() calculations would be effected by values outside of scope #61481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@mroeschke my PR hasn't been reviewed for a while now, just checking if it will be reviewed or if I should just close it. (sorry if it's a bother, I know you guys probably all have a lot on your plates and I didn't know who to ping) |
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Alvaro-Kothe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why are you not using the traditional Kahan summation algorithm?
Can you also run the benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some tests that assert the values explicitly? Make sure that they fail on main.
| if add_mode: | ||
| val_raised = pow(val, power_of_element) | ||
| else: | ||
| val_raised = -pow(val, power_of_element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that computing the val product is more efficient than using pow
| x_length_flag = x_length > 15 and isfinite(x_length) | ||
| val_length_flag = val_length > 15 and isfinite(val_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verification doesn't make much sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should compare the magnitude between x_value and val directly.
| elif x_length_flag: | ||
| comp_value[0][0] += val_raised |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for this?
| if x_length_flag and val_length_flag: | ||
| # Both > 1e15 or < 1-e15 | ||
| x_value[0][0] += val_raised |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not using the kahan compensation?
| comp_value[0][0] += x_value[0][0] | ||
| x_value[0][0] = val_raised |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for this?
|
I see you've been working on your own PR, have you taken on things from this fix? Have been occupied with school work, so haven't had time to look til now. If not, I can still work on it, just lmk |
My approach differs a lot from yours, so no. |
|
More in the solution-sense. I saw a commit for outliers in window values on your PR so I wasn't sure if you've already started tackling the same issue |
|
Got it. I am checking for catastrophic cancellation when updating the 3rd central moment, as it's the most sensible of all. When this happens, I recompute the window. |
|
So should I still fix up this PR then? |
|
Honestly, I don't know. But I think that we should arrive to a general solution for numerical stability (algorithm-wise) to compute the rolling variance, skewness and kurtosis. I don't know if my solution is good enough, or if your approach is better in terms of stability and performance. |
|
Is this issue with data precision limitations? It's been a minute. I did open an enhancement request for implementing double-double arithmetics so we can work with extremely large and small float64s without multiple people implementing different methods of dealing with numerical stability due to data type. What do you think? Issue: #62870 |
Yes, most of the problems are related to arithmetic problems in floating point numbers. Using a more precise data type or stabler algorithms can mitigate some of these problems.
Seems good. But for now, it doesn't seem clear to me how it should be implemented and integrated to the existing functionality. |
|
I have two ideas:
Assuming that's what the question was about |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.Might have found an unrelated issue when calculating kurtosis for numbers >1e6, but I'll have to look into it more and open an issue if that is the case.