Skip to content

Conversation

@dwhswenson
Copy link
Member

@dwhswenson dwhswenson commented Aug 27, 2021

Background

"Snapshot features" are a really powerful tool in OPS that make it easy to create a snapshot class based on the fields that should be associated with that snapshot. Those fields vary from engine to engine.

Unfortunately, the function that handles this, the attach_features decorator, is written in a way that makes it extremely hard to maintain or test. It is a single function taking over 500 lines in the file, which has a CodeClimate cognitive complexity of 203. (For context, CodeClimate recommends keeping that number under 5, and I find that anything over 10 or 15 can be a headache to deal with.)

Additionally, the documentation for snapshot features is not very good. The best docs are in the docstring for the _decorator method nested inside attach_features, which can't be viewed unless you actually read the file openpathsampling/engines/features/base.py. This is not easy to find, and even that docstring doesn't quite fully correspond to what the code does.

For these reasons, I want to reimplement the attach_features decorator, and simplify some the process of creating snapshot features.

A big part of the functionality of attach_features is that it actually writes the code for several methods on the snapshot, including: copy, copy_to, create_reversed, create_empty, __init__, and init_empty. The reason to write code here, instead of just using the information in the __features__ object, is to get the correct signature and docstring for introspection. For example, we can do:

In [1]: import openpathsampling as paths

In [2]: paths.engines.toy.Snapshot?
Init signature: paths.engines.toy.Snapshot(velocities=None, coordinates=None, engine=None)
Docstring:
Simulation snapshot. Only references to coordinates and velocities

Attributes
----------
velocities : numpy.ndarray, shape=(atoms, 3), dtype=numpy.float32
    atomic velocities

coordinates : numpy.ndarray, shape=(atoms, 3), dtype=numpy.float32
    atomic coordinates

engine : :class:`openpathsampling.DynamicsEngine`
    reference to the engine used to generate the snapshot

where the signature and docstrings are determined by the features that are attached to the snapshot.

Changes in this PR

The goal here is to simplify both the process of creating new snapshot features and the code for attaching snapshot features, as well as to clean up some of the code that was autogenerated.

New FeatureCollection class

The current implementation uses a FeatureTuple namedtuple which is populated by the attach_features decorator. This has been replaced with a FeatureCollection class. In general the approach is to create a FeatureCollection for each feature module using its from_module method, and then to combine all the FeatureCollections into one FeatureCollection by summing them.

Simpler attach_features decorator

This replaces the old attach_features decorator with a new one, which has considerably simpler code structure. First, all the features are gathered into a single FeatureCollection, then the input class is decorated by adding the necessary functions, etc.

Separate code writing

Rather than including all the code for code writing/compiling in the attach_features method, this is put in a separate module, which should improve testability.

API break: Removing copy, copy_to

Snapshots are immutable data-containing objects. I do not think there is a need for the copy or copy_to methods. There is, however, a copy_with_replacement method -- if you give it no arguments, it will create an exact copy of your snapshot with a different UUID. It also provides the ability to create modified version of snapshots.

API break: Removing init_empty, fixing create_empty

I don't see a purpose for init_empty that isn't already handled by create_empty (again, with the understanding that these are immutable objects). create_empty might be used as a template or a placeholder, so I'm keeping it around, but the code it currently generates is wrong: it should be a classmethod, but it isn't marked as one, and it uses self in the signature but cls in the code.

Moving more functionality into superclass methods (new Snapshot base class)

Rather than writing out self-contained methods for each class, the new code keeps the core logic in a new Snapshot superclass, and simply passes the parameters on when needed. The key observation here is that we need dynamically-generated code for the purpose of signatures and docstrings, but the logic can be in general functions. This should be better for testing and for debugging.

New Parameter class to define field for a snapshot

In the old approach, you needed to go through the following steps to define a snapshot field (e.g., coordinates or velocities)

  1. Add the name to the variables list in the feature module
  2. Decide how to change that feature on time reversal: if it was a bool that flipped, add it to the list named flip in the module. If it was a number that changed sign, add it to the list named minus in the feature.
  3. If it is a numpy array, add it to the list named numpy in the feature.
  4. If it will be loaded by a lazy proxy, add it to the list named lazy in the feature.
  5. Add a docstring to the module file, covering all the variables included.

In the new approach, we have a Parameter class that will contain the relevant information. With the old way, you may have needed to consider several top-level names for a given functionality (does this parameter go in the flip list for time reversal? in the minus list? neither? certainly not both, that would be nonsense -- but you could write code that tried that!) The new approach makes a class that describes each parameter, and the specific types of functionality (what to do on time reversal; how to copy; whether to lazy load) are contained in that object.

I think this is an easier way to think about these things, and it also has the advantage that instead of predefining certain behaviors (e.g., for copying), developers are welcome to create their own.

Time-reversal is treated a little differently than copying. Copying is an issue in computer programming, and is broadly relevant. Time-reversal is specific to MD. It's entirely possible that other such specific problems will arise in some subfield, so I tried to design a generic solution for them. Therefore, 'time_reverse' is an entry in a dict called operations. The operations can be extended with other domain-specific needs as they are found.

Decorator to declare instance methods to attach

If you want to attach functions that would become instance methods of your snapshot class, the old way was to define the function in the feature module and include its name in the list named functions. The new way is to decorate the function with the @function_feature decorator.

(Temporary?) Code to generate new Parameters from old style feature modules

I think the new Parameter approach is more intuitive, but for now I've implemented methods in FeatureCollection that actually convert the old version to the new version. TODO: details to come.

Unchanged: @property

Just as before, and @property in a feature module becomes a property of the snapshot class.

@dwhswenson dwhswenson added the 2.0 issues and PRs for the 2.0 release label Aug 27, 2021
@dwhswenson
Copy link
Member Author

Drafting this PR pretty early (and while it's still in the experimental package) so people interested in developing engines (@bdice, @davidkastner) can see some of what's coming soon. If I keep this in experimental, I might retarget the PR for the 1.x cycle. That would be useful if people wanted to immediately use some of this functionality.

@dwhswenson
Copy link
Member Author

@sroet (and others subscribed to this thread): I'd appreciate some feedback on whether to keep support for the old-style snapshot feature modules or not.

As currently implemented, this code works with the old snapshot feature modules (although there are some API-breaking differences in the resulting snapshots). However, if we will be removing support for the old style of features, I won't bother writing tests for that part.

You can get a sense for the differences in these from the rough example modules I've drafted to be part of the tests. It's pretty straightforward to convert old-style to new-style (that's essentially how support for old-style works internally), and I expect to do that in our built-in features. The only question is whether it's worth leaving support for the old style in 2.0.

@bdice
Copy link
Contributor

bdice commented Aug 28, 2021

I took a brief look at the code. I am not very familiar with the old style, so it is a little hard to provide meaningful feedback. In my understanding, most of the code for adding features to snapshots is internal to the OPS package and its implementations of engines/snapshots. In my understanding, most users don't interact with this API directly. If that's correct, I would definitely advocate for removing legacy/old methods to simplify the code base for new developers. I would prefer for there to be a single supported way to add features, because otherwise it requires a lot of effort to understand what is "old" (and should be avoided) and what is "new."

There should be one-- and preferably only one --obvious way to do it.
-- The Zen of Python

@sroet
Copy link
Member

sroet commented Aug 29, 2021

@sroet (and others subscribed to this thread): I'd appreciate some feedback on whether to keep support for the old-style snapshot feature modules or not.

First, don't put a lot of weight on my opinion, as I have not interacted with this part of the code (I have never written an engine)

I am in favor of dropping as much legacy code as we can get away with for 2.0 as long as there is no user need, mainly to easy maintenance burden. It might be useful to have an unofficial conversion_script just in case users run into issues using OPS 2.0 they can easily convert, which would be dropped for any release after 2.0.

One important party that you might want to ask about this any commercial/industry users of OPS (I don't know of any, but I remember there was some industry collaboration/adoption supposed to happen from the original OPS funding?)

I also fully agree with @bdice's comments about just having one way of doing things making it easier for new developers to figure out the "right" way of doing things

@dwhswenson
Copy link
Member Author

@sroet @bdice Thanks to both of you for the feedback!

Some further context should be given here: Yes, this is code that is mainly used by developers of engines. I am aware of one engine that has been developed for OPS which has not been officially announced (still hoping to convince the devs to contribute that back before 2.0.) That was in an academic context. There may be other engines out there that I don't know about.

As far as one obvious way, the old way would definitely be deprecated, which should encourage the current "obvious" way -- the only question is if it is possible to use the old way.

I don't think there would be a conversion script as such, but I do think there could easily be a guide to developers. With the caveat that old and new snapshots aren't identical due to API breaks, most of the update guide boils down to:

  • For any item in variables, create a Parameter for it
  • If your feature was in numpy, use copy=numpy.copy in Parameter
  • If your feature was in minus, use operations={'time_reverse': operator.neg} in Parameter
  • If your feature was in flip, use operations={'time_reverse': operator.invert} in Parameter
  • If your feature was in lazy, use lazy=True in Parameter
  • For any function in functions, decorate it with @feature_function
  • Docstrings go in each individual Parameter as an instance variable, instead of using the module docstring for all parameters

(This covers the changes here; doesn't include differences between SimStore and netcdfplus.)

Assuming you agree that this is straightforward enough (caveat: I may find other issues as I do the migration) then I'll just use the code that bridges between the two types of snapshots until I fully update all snapshot features that are in core. If we ever need code to support the old versions (which would mainly be to provide workarounds for users who can't get the changes made for some outside engine), we can always dig it out of this PR.

@sroet
Copy link
Member

sroet commented Aug 31, 2021

Assuming you agree that this is straightforward enough (caveat: I may find other issues as I do the migration) then I'll just use the code that bridges between the two types of snapshots until I fully update all snapshot features that are in core.

Seems straightforward enough to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.0 issues and PRs for the 2.0 release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants