Prevent excessive trajectory slicing #978

sroet · 2021-02-19T21:38:03Z

Some more UUID optimization coming from #892.

So while running the test uuid system from #892, I noticed a lot of Trajectory objects getting generated. Looking at the code, every time you slice a Trajectory, you generate a new Trajectory.

To test how many and were I added the following code to trajectory.py (and where)

+++ b/openpathsampling/engines/trajectory.py
@@ -23,6 +23,7 @@ class Trajectory(list, StorableObject):
     """
 
     engine = None
+    kill_count = 1000000
 
     def __init__(self, trajectory=None):
         """
@@ -45,6 +46,10 @@ class Trajectory(list, StorableObject):
             else:
                 self.extend(trajectory)
 
+        Trajectory.kill_count -= 1
+        if Trajectory.kill_count < 0:
+            raise RuntimeError("KILLED")
+

and ran the following script:

sset = testscheme.initial_conditions_from_trajectories([eng_trj, eng_trj2,
                                                        eng_trj3],
                                                        strategies=[
                                                            'split',
                                                            'extend-complex',
                                                            'extend-minimal'])
# Error out early
sset.sanity_check()
testscheme.assert_initial_conditions(sset)

#Equilibrate
equilibration = paths.PathSampling(
    storage=None,
    sample_set=sset,
    move_scheme=testscheme
)
initial = 100000
paths.Trajectory.kill_count=initial
equilibration.run(10)
print("trajs_generated = "+str(initial-paths.Trajectory.kill_count))

(important part here is that I only look at the number of trajectories created after initialization)

Here is a table of number of trajectories created from running this 6 times on master and 6 times on this branch

	main	this PR
	77857	20728
	64553	22852
	73042	14508
	76004	18205
	75349	27016
	82440	33964
mean	74874	22879

This PR saves 70% of all Trajectory objects generated (during sampling). 23_000 still an insane amount in my opinion for just 10 mc steps, but this is at least a decent step into (hopefully) not running into #892 as often.

In the long term; sliced trajectories should probably return read_only views (without an uuid) and not new Trajectories

codecov · 2021-02-19T21:41:44Z

Codecov Report

Merging #978 (7bf903c) into master (3f2c0c4) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #978      +/-   ##
==========================================
+ Coverage   80.25%   80.27%   +0.01%     
==========================================
  Files         136      136              
  Lines       14452    14466      +14     
==========================================
+ Hits        11599    11613      +14     
  Misses       2853     2853

Impacted Files	Coverage Δ
openpathsampling/ensemble.py	`84.74% <100.00%> (+0.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f2c0c4...7bf903c. Read the comment docs.

openpathsampling/ensemble.py

sroet · 2021-02-19T22:04:20Z

@dwhswenson this is ready for a review. As you probably know this code better than I do, are there any other places where a similar strategy might be beneficial? (looking at it quickly, I can probably mimic this pretty easy for _find_subtraj_first, but that function has not popped up yet when I try to kill on Trajectory generation)

dwhswenson

Please do also modify _find_subtraj_first. That should help get rid of some more UUID creation.

What you have here works, but I think there's a much easier way (as opposed to the exttraj iterator, etc.) Wouldn't this do the job:

# subtraj = traj[slice(start, stop)]
# replace with
subtraj = list(traj)[slice(start, stop)]

I think all we need is an iterable of snapshots, so this would create that with very little change to the code.

openpathsampling/ensemble.py

sroet · 2021-02-21T17:50:05Z

What you have here works, but I think there's a much easier way (as opposed to the exttraj iterator, etc.) Wouldn't this do the job:

# subtraj = traj[slice(start, stop)]
# replace with
subtraj = list(traj)[slice(start, stop)]

I think all we need is an iterable of snapshots, so this would create that with very little change to the code.

I did do that in 1c15643 , this might possibly break custom ensembles, but the workaround is generally easy and the limited change/robustness in this code is worth it in my opinion.

As ensembles are not fully covered under coverage, please do have a really good look at this PR. Here is a new set of numbers from this PR:

	master	`1c15643`	`b89bce9`	`1cff417`
	71186	5850	62	45
	75610	3257	114	60
	76504	7526	86	88
	67516	6842	99	113
	71031	4789	47	72
	67648	9378	33	99
mean	71186	6274	74	80

Currently this PR prevents 99.9% of all created trajectories in 10 mc steps And these number seem reasonable for 44 ensembles and 10 mc steps

sroet · 2021-02-21T17:50:24Z

@dwhswenson please have another look at this

openpathsampling/ensemble.py

dwhswenson

A general change that you will want to do everywhere, instead of list(trajectory). Discussed in more detail in a comment, but now that I think about it, it applies to the whole changeset:

You should always try to use one of the proxy methods from Trajectory.

get_as_proxy(idx): get a single snapshot, allowing proxy snapshots
as_proxies(): return a list of all snapshots, which may include proxy snapshots
iter_proxies(): iterator over snaphots, allowing proxies

When you call list(trajectory), it will call Trajectory.__iter__(), which will fully reload the snapshot into memory. This will be slow and isn't needed for a lot of analysis. (I had forgotten about this when I suggested converting to list before; only remembered when I saw changes dropping the as_proxies.)

this might possibly break custom ensembles

I'm not sure where this risk would be. I suppose there might be something weird if you subclass SequentialEnsemble in some strange way. In general, we discourage subclassing SequentialEnsemble (unless you're changing the frame assignment algorithm). Instead, custom ensembles should be subclasses of WrappedEnsemble, wrapping the appropriate SequentialEnsemble. This is what we do for TISEnsemble, MinusInterfaceEnsemble, and VisitAllStatesEnsemble, as a few examples.

The only thing I was worried about with this change was handling proxy snapshots, which you are dealing with.

As ensembles are not fully covered under coverage, please do have a really good look at this PR.

Although ensemble.py doesn't have full coverage, the sections modified here (EnsembleCache and SequentialEnsemble) are pretty close, mainly missing coverage of logging stuff in debug guards and of a few error cases. Additionally, they're tested very thoroughly, involving what is probably thousands of individual asserts. SequentialEnsemble was some very complex code to write, and I didn't trust it until I had written every test I could think of at the time.

openpathsampling/ensemble.py

sroet · 2021-02-23T11:26:02Z

@dwhswenson implemented the proxy methods as you advised, please have another look.

I updated the table in #978 (comment) with the new data. (still sub 10 Trajectories per md step)

dwhswenson

Looks pretty good. A small optional change in a comment, as well as the following thought:

It might be worth just having a function called _get_list_traj which does this getattr/list pair of operations. I suspect CodeClimate will complain about repeated code here, and since the reason for the need is obscure enough, it might be worth it to have a single location with a clearer docstring explaining what's going on. That's optional though.

In case you want to make those changes, I will leave this open for at least 24 hours, merging no earlier than Wed 24 Feb 16:00 GMT (17:00 local). If no comment/update by then, I will merge sometime after that.

openpathsampling/ensemble.py

sroet · 2021-02-23T15:57:17Z

It might be worth just having a function called _get_list_traj which does this getattr/list pair of operations. I suspect CodeClimate will complain about repeated code here, and since the reason for the need is obscure enough, it might be worth it to have a single location with a clearer docstring explaining what's going on. That's optional though.

Yeah, looking back at it that is the cleaner and more maintainable solution, will refactor that now.

sroet · 2021-02-23T16:29:10Z

@dwhswenson refactored the code, it still results in numbers in the same order as 1cff417 . Please have a look if you think the docstring is descriptive enough.

By the way, thank you for the (multiple) in-depth reviews.

dwhswenson · 2021-02-24T14:23:08Z

Please have a look if you think the docstring is descriptive enough.

LGTM. If anything, it's probably more detailed than needed!

By the way, thank you for the (multiple) in-depth reviews.

No problem, and thank you for making the improvements!

Merging this one now.

dwhswenson · 2021-02-24T14:23:15Z

Please have a look if you think the docstring is descriptive enough.

LGTM. If anything, it's probably more detailed than needed!

By the way, thank you for the (multiple) in-depth reviews.

No problem, and thank you for making the improvements!

Merging this one now.

sroet added 2 commits February 19, 2021 19:46

pre-slice and use an iterator in SequentialEnsemble._find_subtraj_final

4a82335

only make an empty trajectory once

ce2568e

sroet commented Feb 19, 2021

View reviewed changes

openpathsampling/ensemble.py Show resolved Hide resolved

dwhswenson requested changes Feb 21, 2021

View reviewed changes

openpathsampling/ensemble.py Outdated Show resolved Hide resolved

openpathsampling/ensemble.py Show resolved Hide resolved

sroet added 2 commits February 21, 2021 18:02

cast trajectory to list instead

1c15643

also prevent slicing in generic_append/prepend

b89bce9

sroet requested a review from dwhswenson February 21, 2021 17:50

sroet commented Feb 21, 2021

View reviewed changes

openpathsampling/ensemble.py Show resolved Hide resolved

dwhswenson approved these changes Feb 22, 2021

View reviewed changes

dwhswenson requested changes Feb 22, 2021

View reviewed changes

openpathsampling/ensemble.py Show resolved Hide resolved

openpathsampling/ensemble.py Outdated Show resolved Hide resolved

openpathsampling/ensemble.py Outdated Show resolved Hide resolved

openpathsampling/ensemble.py Outdated Show resolved Hide resolved

dwhswenson mentioned this pull request Feb 22, 2021

Strange random errors #892

Closed

prevent proxy_loading for trajectories

1cff417

sroet requested a review from dwhswenson February 23, 2021 11:26

dwhswenson approved these changes Feb 23, 2021

View reviewed changes

openpathsampling/ensemble.py Outdated Show resolved Hide resolved

refactor based on comment from @dwhswenson

7bf903c

dwhswenson merged commit 7d562d5 into openpathsampling:master Feb 24, 2021

sroet deleted the prevent_excessive_trajectory_slicing branch February 24, 2021 16:11

dwhswenson added the misc PR label Jul 5, 2021

dwhswenson mentioned this pull request Jul 5, 2021

OpenPathSampling 1.5 #1031

Merged

Prevent excessive trajectory slicing #978

Prevent excessive trajectory slicing #978

Uh oh!

Conversation

sroet commented Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

sroet commented Feb 19, 2021

Uh oh!

dwhswenson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sroet commented Feb 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sroet commented Feb 21, 2021

Uh oh!

Uh oh!

dwhswenson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sroet commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dwhswenson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sroet commented Feb 23, 2021

Uh oh!

sroet commented Feb 23, 2021

Uh oh!

dwhswenson commented Feb 24, 2021

Uh oh!

dwhswenson commented Feb 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sroet commented Feb 19, 2021 •

edited

Loading

codecov bot commented Feb 19, 2021 •

edited

Loading

sroet commented Feb 21, 2021 •

edited

Loading

sroet commented Feb 23, 2021 •

edited

Loading