BUG: Fixed issue where rolling.kurt() calculations would be effected by values outside of scope #61481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

eicchen wants to merge 6 commits into pandas-dev:main from eicchen:PR_staging

Contributor

eicchen commented May 22, 2025 •

edited

Loading

closes BUG: df.rolling.{std, skew, kurt} gives unexpected value #61416
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Might have found an unrelated issue when calculating kurtosis for numbers >1e6, but I'll have to look into it more and open an issue if that is the case.

eicchen added 3 commits

May 22, 2025 17:19


          Merged changes from branch BUG_df.rolling_#61416 to be included in PR

e2e7260


          pre-commit checks I forgot to run :/

76cb380


          Errant type error

7763a83

eicchen changed the title ~~Fixed issue where rolling.kurt() calculations would be effected by values outside of scope~~ BUG: Fixed issue where rolling.kurt() calculations would be effected by values outside of scope

Contributor Author

eicchen commented Jun 10, 2025 •

edited

Loading

@mroeschke my PR hasn't been reviewed for a while now, just checking if it will be reviewed or if I should just close it.

(sorry if it's a bother, I know you guys probably all have a lot on your plates and I didn't know who to ping)

eicchen added 2 commits

June 10, 2025 11:49


          Merge branch 'main' into PR_staging

01ab52e


          Merge branch 'main' into PR_staging

321c093

simonjayhawkins added Bug Window labels

Contributor

github-actions bot commented Jul 26, 2025

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

github-actions bot added the Stale label


          Merge branch 'main' of https://github.com/pandas-dev/pandas into PR_s…

d76ad5c

…taging

eicchen mentioned this pull request

BUG: fix polluted window in skewness computation #62863

Merged

7 tasks

Alvaro-Kothe removed the Stale label

Alvaro-Kothe reviewed

View reviewed changes

Member

Alvaro-Kothe left a comment

Can you explain why are you not using the traditional Kahan summation algorithm?

Can you also run the benchmarks?

pandas/tests/window/test_rolling_skew_kurt.py

Member

Alvaro-Kothe Oct 26, 2025

Can you write some tests that assert the values explicitly? Make sure that they fail on main.

pandas/_libs/window/aggregations.pyx

Comment on lines +743 to +746

    
                  if add_mode:

                      val_raised = pow(val, power_of_element)

                  else:

                      val_raised = -pow(val, power_of_element)

Member

Alvaro-Kothe Oct 26, 2025

I think that computing the val product is more efficient than using pow

pandas/_libs/window/aggregations.pyx

Comment on lines +751 to +752

    
                  x_length_flag = x_length > 15 and isfinite(x_length)

                  val_length_flag = val_length > 15 and isfinite(val_length)

Member

Alvaro-Kothe Oct 26, 2025

This verification doesn't make much sense to me.

Member

Alvaro-Kothe Oct 26, 2025

You should compare the magnitude between x_value and val directly.

pandas/_libs/window/aggregations.pyx

Comment on lines +760 to +761

    
                  elif x_length_flag:

                      comp_value[0][0] += val_raised

Member

Alvaro-Kothe Oct 26, 2025

What is the rationale for this?

pandas/_libs/window/aggregations.pyx

Comment on lines +756 to +758

    
                  if x_length_flag and val_length_flag:

                      # Both > 1e15 or < 1-e15

                      x_value[0][0] += val_raised

Member

Alvaro-Kothe Oct 26, 2025

Why are you not using the kahan compensation?

pandas/_libs/window/aggregations.pyx

Comment on lines +764 to +765

    
                      comp_value[0][0] += x_value[0][0]

                      x_value[0][0] = val_raised

Member

Alvaro-Kothe Oct 26, 2025

What is the rationale for this?

Contributor Author

eicchen commented Oct 28, 2025 •

edited

Loading

I see you've been working on your own PR, have you taken on things from this fix? Have been occupied with school work, so haven't had time to look til now. If not, I can still work on it, just lmk

Member

Alvaro-Kothe commented Oct 28, 2025

have you taken on things from this fix?

My approach differs a lot from yours, so no.

Contributor Author

eicchen commented Oct 28, 2025 •

edited

Loading

More in the solution-sense. I saw a commit for outliers in window values on your PR so I wasn't sure if you've already started tackling the same issue

Member

Alvaro-Kothe commented Oct 28, 2025

Got it. I am checking for catastrophic cancellation when updating the 3rd central moment, as it's the most sensible of all. When this happens, I recompute the window.

Contributor Author

eicchen commented Oct 28, 2025

So should I still fix up this PR then?

Member

Alvaro-Kothe commented Oct 28, 2025

Honestly, I don't know. But I think that we should arrive to a general solution for numerical stability (algorithm-wise) to compute the rolling variance, skewness and kurtosis.

I don't know if my solution is good enough, or if your approach is better in terms of stability and performance.

Contributor Author

eicchen commented Oct 28, 2025 •

edited

Loading

Is this issue with data precision limitations? It's been a minute. I did open an enhancement request for implementing double-double arithmetics so we can work with extremely large and small float64s without multiple people implementing different methods of dealing with numerical stability due to data type. What do you think? Issue: #62870

Member

Alvaro-Kothe commented Oct 28, 2025

Is this issue with data precision limitations?

Yes, most of the problems are related to arithmetic problems in floating point numbers. Using a more precise data type or stabler algorithms can mitigate some of these problems.

I did open an enhancement request for implementing double-double arithmetics

Seems good. But for now, it doesn't seem clear to me how it should be implemented and integrated to the existing functionality.

Contributor Author

eicchen commented Oct 28, 2025 •

edited

Loading

I have two ideas:

Overload the function at runtime depending on if inputs have 14 digits of sigfig
Create a separate double-double Cython implementation so we can implement them as needed

Assuming that's what the question was about

eicchen mentioned this pull request

ENH: Double-double arithmetic for precision operations with float64s #62870

Open

3 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels