Skip to content

Conversation

@sroet
Copy link
Member

@sroet sroet commented Apr 6, 2021

This week my CI failed with:

=================================== FAILURES ===================================
______________________ TestGaussianBiasSelector.test_pick ______________________

self = <openpathsampling.tests.test_shooting.TestGaussianBiasSelector object at 0x7fac57d4fc10>

    def test_pick(self):
        picks = [self.sel.pick(self.mytraj) for _ in range(100)]
        pick_counter = collections.Counter(picks)
        assert set(pick_counter.keys()) == set(range(len(self.mytraj)))
        # final test: 99.5 should happen more than 32.4
>       assert pick_counter[2] > pick_counter[0]
E       assert 13 > 18

openpathsampling/tests/test_shooting.py:70: AssertionError

This PR attaches a random number generator to the selectors that can be seeded during tests to prevent these flaky tests.

The current implementation requires numpy>=1.17 (that is when numpy swapped from RandomState to Generator) @dwhswenson please let me know if that is too strict of a requirement and if I should write some code to support the Legacy RandomState

@dwhswenson
Copy link
Member

The current implementation requires numpy>=1.17 (that is when numpy swapped from RandomState to Generator) @dwhswenson please let me know if that is too strict of a requirement and if I should write some code to support the Legacy RandomState

This is fine in principle: my view is that we should follow NEP 29 with the exception of our until-2.0 support for Python 2.7 (and that the NEP 29 rule for NumPy should also apply to other dependencies.) NumPy 1.16 support could therefore be dropped as of 13 Jan 2021; see drop table in that link. However, it looks like NumPy 1.17 was not released for Python 2.7, so we need an approach compatible with Python 2.7.

That said, fixing the test is probably better than pinning the seed. That test takes a trajectory with 5 possible shooting points, and considers two of them: frame A has about 1/3 the selection probability of frame B. The test assumes that after 100 picks, frame B will have been picked more frequently than frame A. However, apparently the odds of that going the other way is enough that we've noticed multiple failures. Maybe the better approach is to increase the number of picks, or enhance the gap in probability. This test is supposed to be an integration test; if the random number generator changed in some way that it was no longer using the probabilities we provide, we would want the test to be flaky -- that would be a sign of a problem.

BTW, I'm +1 on moving random number generators to instance variables, although I think I'd tend to prefer defaulting to a global RNG (to simplify seeding for a whole simulation in the future), rather than a separate RNG instance for each OPS object instance. Maybe create the RNG instance as paths.default_rng, and then it can be set as the default in the instances.

@sroet sroet mentioned this pull request Apr 6, 2021
@sroet
Copy link
Member Author

sroet commented Apr 6, 2021

That said, fixing the test is probably better than pinning the seed. That test takes a trajectory with 5 possible shooting points, and considers two of them: frame A has about 1/3 the selection probability of frame B. The test assumes that after 100 picks, frame B will have been picked more frequently than frame A. However, apparently the odds of that going the other way is enough that we've noticed multiple failures. Maybe the better approach is to increase the number of picks, or enhance the gap in probability. This test is supposed to be an integration test; if the random number generator changed in some way that it was no longer using the probabilities we provide, we would want the test to be flaky -- that would be a sign of a problem.

That seems fair, I increased the number of picks to 1000 in #999 .

BTW, I'm +1 on moving random number generators to instance variables, although I think I'd tend to prefer defaulting to a global RNG (to simplify seeding for a whole simulation in the future), rather than a separate RNG instance for each OPS object instance. Maybe create the RNG instance as paths.default_rng, and then it can be set as the default in the instances.

I will rewrite this PR with a single rng instance and add some compatability code for python 2.7 as well

@dwhswenson dwhswenson changed the title allow for rng seeding in selectors Add global random number generator; support in shooting pt selectors Apr 7, 2021
@dwhswenson
Copy link
Member

I updated the title to better reflect what I understand this will contain; better for release notes. Please correct if the current title doesn't represent the intent of the PR.

@sroet sroet changed the title Add global random number generator; support in shooting pt selectors Add global random number generator; support in shooting pt selectors and pathmovers Apr 7, 2021
def __init__(self):
StorableNamedObject.__init__(self)

self._rng = default_rng()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be underscored for the safe/load cycle to work

@@ -0,0 +1,24 @@
import numpy as np
import random
Copy link
Member Author

@sroet sroet Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the random import here to (hopefully) make the import less confusing for python2.7

@codecov
Copy link

codecov bot commented Apr 7, 2021

Codecov Report

Merging #998 (b16925e) into master (5f86032) will increase coverage by 0.58%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #998      +/-   ##
==========================================
+ Coverage   80.62%   81.20%   +0.58%     
==========================================
  Files         138      139       +1     
  Lines       14695    15130     +435     
==========================================
+ Hits        11848    12287     +439     
+ Misses       2847     2843       -4     
Impacted Files Coverage Δ
openpathsampling/__init__.py 100.00% <100.00%> (ø)
openpathsampling/pathmover.py 84.89% <100.00%> (+1.03%) ⬆️
openpathsampling/pathmovers/spring_shooting.py 100.00% <100.00%> (ø)
openpathsampling/random.py 100.00% <100.00%> (ø)
openpathsampling/sample.py 75.35% <100.00%> (+0.64%) ⬆️
openpathsampling/shooting.py 100.00% <100.00%> (ø)
openpathsampling/deprecations.py 100.00% <0.00%> (ø)
openpathsampling/analysis/tis/core.py 100.00% <0.00%> (ø)
openpathsampling/analysis/tis/flux.py 100.00% <0.00%> (ø)
openpathsampling/snapshot_modifier.py 100.00% <0.00%> (ø)
... and 74 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f86032...b16925e. Read the comment docs.

@@ -0,0 +1,25 @@
from __future__ import absolute_import
Copy link
Member Author

@sroet sroet Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed as python 2.7 does not understand that I don't want to import this file whenever I say import random in this file...

without this line:

Python 2.7.15 | packaged by conda-forge | (default, Mar  5 2020, 14:56:06) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from openpathsampling.random import random
>>> random.random
<module 'openpathsampling.random' from 'openpathsampling/random.py'>
>>> random.random.random
<module 'openpathsampling.random' from 'openpathsampling/random.py'>
>>> random.random.random.random
<module 'openpathsampling.random' from 'openpathsampling/random.py'>

with this line:

Python 2.7.15 | packaged by conda-forge | (default, Mar  5 2020, 14:56:06) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from openpathsampling.random import random
>>> random.random
<built-in method random of Random object at 0x555a53ee5080>

@sroet
Copy link
Member Author

sroet commented Apr 7, 2021

@dwhswenson Finally figured out what was wrong with python 2.7 (I don't even understand how that import could ever be resolved in the way it did...).

While hunting this down I also added the rng to PathMovers.
This should now be ready for a review!

Copy link
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes very minor. One point I'm a little confused about in the implications of making default_rng a function, instead of just importing an instance. That's worth some additional discussion.

@@ -1,4 +1,4 @@
import random
from openpathsampling.random import random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this is the necessary way of getting to stdlib random? (Long term -- being before 2.0 -- everything in OPS should be using paths.default_rng)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just to make name-space clashes less likely

rng = np.random.default_rng()


def default_rng():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a function. Is there a specific reason it must be? Here are my reasons:

  1. A function implies to users that this is creating something.
  2. The behavior is specifically different from what np.random.default_rng() does.

If you do your TestRandom.test_identities test with np.random.default_rng(), it will fail the is test.

Instead, either name rng as default_rng or DEFAULT_RNG (to really emphasize that it is constant). You can still import it into the main openpathsampling namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm. I think this has to be a function so that the underlying rng object can change, but newly created instances get the new rng. Nonetheless, please rename rng to something like DEFAULT_RNG.

If I understand correctly, in that case, setting the seed before a simulation starts would be something like paths.random.DEFAULT_RNG = np.random.default_rng(seed) before creating any objects. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonetheless, please rename rng to something like DEFAULT_RNG

Will do

If I understand correctly, in that case, setting the seed before a simulation starts would be something like paths.random.DEFAULT_RNG = np.random.default_rng(seed) before creating any objects. Is that right?

Yes, exactly (except that it has to go through paths.random.RandomState(seed) for np<1.17) .

sroet and others added 2 commits April 8, 2021 10:38
Co-authored-by: David W.H. Swenson <dwhs@hyperblazer.net>
@sroet sroet requested a review from dwhswenson April 8, 2021 08:43
@dwhswenson
Copy link
Member

LGTM!

@dwhswenson dwhswenson merged commit 01041df into openpathsampling:master Apr 8, 2021
@sroet sroet deleted the add_generator_to_selection branch April 8, 2021 14:22
@dwhswenson dwhswenson mentioned this pull request Apr 20, 2021
@dwhswenson dwhswenson mentioned this pull request Jul 5, 2021
@sroet sroet mentioned this pull request Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants