schrodinger.seam.examples.active_learning_weigher module

A workflow for generating a “model” that predicts the molecular weight of a molecule based on the number of atoms and types of atoms in the molecule.

This workflow implements an “active learning” framework for selecting which molecules to score every round based on the predictions of the model.

COMPLEXITY: high CONCEPTS: side inputs

Basic usage:

$SCHRODINGER/run seam_example.py active_learning_weigher

To visualize the workflow after running it, run:

$SCHRODINGER/run seamcli.py watcher seam/

class schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel(m: float, b: float)

Bases: object

Linear regression model (y = mx + b) where x is the number of heavy atoms.

m: float
b: float
classmethod initialize()
predict(mol: rdkit.Chem.rdchem.Mol) float
train(scored_mols: Iterable[Tuple[rdkit.Chem.rdchem.Mol, float]]) schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel

Train the model by updating the weights based on the actual weights of the input molecules.

__init__(m: float, b: float) None
class schrodinger.seam.examples.active_learning_weigher.Iteration(model, scoring_transform: type)

Bases: apache_beam.transforms.ptransform.PTransform

__init__(model, scoring_transform: type)
expand(input_pcollections)
class schrodinger.seam.examples.active_learning_weigher.PickTopScoring(model: schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel)

Bases: apache_beam.transforms.ptransform.PTransform

__init__(model: schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel)
expand(unscored_mols)
class schrodinger.seam.examples.active_learning_weigher.PickRandomly(n: int)

Bases: apache_beam.transforms.ptransform.PTransform

__init__(n: int)
expand(unscored_mols)
class schrodinger.seam.examples.active_learning_weigher.ActiveLearning(input_path: pathlib.Path, num_cycles: int, scoring_transform: type)

Bases: apache_beam.transforms.ptransform.PTransform

A pipeline that performs active learning to select the top scoring molecules from a set of unscored molecules. The top scoring molecules are then used to train a model to predict the scores of future molecules. The scoring is parametrized by supplying a scoring_transform that takes a molecule and returns a ScoredMol.

Ultimately writes out the top scoring molecules to a file.

__init__(input_path: pathlib.Path, num_cycles: int, scoring_transform: type)
expand(pbegin)
class schrodinger.seam.examples.active_learning_weigher.CalculateMolWt(label: Optional[str] = None)

Bases: apache_beam.transforms.ptransform.PTransform

expand(pcoll)
schrodinger.seam.examples.active_learning_weigher.main(args=None)