Skip to content

Conversation

@eicchen
Copy link
Contributor

@eicchen eicchen commented May 22, 2025

Might have found an unrelated issue when calculating kurtosis for numbers >1e6, but I'll have to look into it more and open an issue if that is the case.

@eicchen eicchen changed the title Fixed issue where rolling.kurt() calculations would be effected by values outside of scope BUG: Fixed issue where rolling.kurt() calculations would be effected by values outside of scope Jun 2, 2025
@eicchen
Copy link
Contributor Author

eicchen commented Jun 10, 2025

@mroeschke my PR hasn't been reviewed for a while now, just checking if it will be reviewed or if I should just close it.

(sorry if it's a bother, I know you guys probably all have a lot on your plates and I didn't know who to ping)

@simonjayhawkins simonjayhawkins added Bug Window rolling, ewma, expanding labels Jun 25, 2025
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 26, 2025
Copy link
Member

@Alvaro-Kothe Alvaro-Kothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why are you not using the traditional Kahan summation algorithm?

Can you also run the benchmarks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some tests that assert the values explicitly? Make sure that they fail on main.

Comment on lines +743 to +746
if add_mode:
val_raised = pow(val, power_of_element)
else:
val_raised = -pow(val, power_of_element)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that computing the val product is more efficient than using pow

Comment on lines +751 to +752
x_length_flag = x_length > 15 and isfinite(x_length)
val_length_flag = val_length > 15 and isfinite(val_length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verification doesn't make much sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should compare the magnitude between x_value and val directly.

Comment on lines +760 to +761
elif x_length_flag:
comp_value[0][0] += val_raised
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for this?

Comment on lines +756 to +758
if x_length_flag and val_length_flag:
# Both > 1e15 or < 1-e15
x_value[0][0] += val_raised
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using the kahan compensation?

Comment on lines +764 to +765
comp_value[0][0] += x_value[0][0]
x_value[0][0] = val_raised
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for this?

@eicchen
Copy link
Contributor Author

eicchen commented Oct 28, 2025

I see you've been working on your own PR, have you taken on things from this fix? Have been occupied with school work, so haven't had time to look til now. If not, I can still work on it, just lmk

@Alvaro-Kothe
Copy link
Member

have you taken on things from this fix?

My approach differs a lot from yours, so no.

@eicchen
Copy link
Contributor Author

eicchen commented Oct 28, 2025

More in the solution-sense. I saw a commit for outliers in window values on your PR so I wasn't sure if you've already started tackling the same issue

@Alvaro-Kothe
Copy link
Member

Got it. I am checking for catastrophic cancellation when updating the 3rd central moment, as it's the most sensible of all. When this happens, I recompute the window.

@eicchen
Copy link
Contributor Author

eicchen commented Oct 28, 2025

So should I still fix up this PR then?

@Alvaro-Kothe
Copy link
Member

Honestly, I don't know. But I think that we should arrive to a general solution for numerical stability (algorithm-wise) to compute the rolling variance, skewness and kurtosis.

I don't know if my solution is good enough, or if your approach is better in terms of stability and performance.

@eicchen
Copy link
Contributor Author

eicchen commented Oct 28, 2025

Is this issue with data precision limitations? It's been a minute. I did open an enhancement request for implementing double-double arithmetics so we can work with extremely large and small float64s without multiple people implementing different methods of dealing with numerical stability due to data type. What do you think? Issue: #62870

@Alvaro-Kothe
Copy link
Member

Is this issue with data precision limitations?

Yes, most of the problems are related to arithmetic problems in floating point numbers. Using a more precise data type or stabler algorithms can mitigate some of these problems.

I did open an enhancement request for implementing double-double arithmetics

Seems good. But for now, it doesn't seem clear to me how it should be implemented and integrated to the existing functionality.

@eicchen
Copy link
Contributor Author

eicchen commented Oct 28, 2025

I have two ideas:

  • Overload the function at runtime depending on if inputs have 14 digits of sigfig
  • Create a separate double-double Cython implementation so we can implement them as needed

Assuming that's what the question was about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Window rolling, ewma, expanding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: df.rolling.{std, skew, kurt} gives unexpected value

3 participants