schrodinger.seam.examples.active_learning_weigher module¶
A workflow for generating a “model” that predicts the molecular weight of a molecule based on the number of atoms and types of atoms in the molecule.
This workflow implements an “active learning” framework for selecting which molecules to score every round based on the predictions of the model.
COMPLEXITY: high CONCEPTS: side inputs
Basic usage:
$SCHRODINGER/run seam_example.py active_learning_weigher
To visualize the workflow after running it, run:
$SCHRODINGER/run seamcli.py watcher seam/
- class schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel(m: float, b: float)¶
Bases:
object
Linear regression model (y = mx + b) where x is the number of heavy atoms.
- m: float¶
- b: float¶
- classmethod initialize()¶
- predict(mol: rdkit.Chem.rdchem.Mol) float ¶
- train(scored_mols: Iterable[Tuple[rdkit.Chem.rdchem.Mol, float]]) schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel ¶
Train the model by updating the weights based on the actual weights of the input molecules.
- __init__(m: float, b: float) None ¶
- class schrodinger.seam.examples.active_learning_weigher.Iteration(model, scoring_transform: type)¶
Bases:
apache_beam.transforms.ptransform.PTransform
- __init__(model, scoring_transform: type)¶
- expand(input_pcollections)¶
- class schrodinger.seam.examples.active_learning_weigher.PickTopScoring(model: schrodinger.seam.examples.active_learning_weigher.LinearRegressionModel)¶
Bases:
apache_beam.transforms.ptransform.PTransform
- expand(unscored_mols)¶
- class schrodinger.seam.examples.active_learning_weigher.PickRandomly(n: int)¶
Bases:
apache_beam.transforms.ptransform.PTransform
- __init__(n: int)¶
- expand(unscored_mols)¶
- class schrodinger.seam.examples.active_learning_weigher.ActiveLearning(input_path: pathlib.Path, num_cycles: int, scoring_transform: type)¶
Bases:
apache_beam.transforms.ptransform.PTransform
A pipeline that performs active learning to select the top scoring molecules from a set of unscored molecules. The top scoring molecules are then used to train a model to predict the scores of future molecules. The scoring is parametrized by supplying a
scoring_transform
that takes a molecule and returns a ScoredMol.Ultimately writes out the top scoring molecules to a file.
- __init__(input_path: pathlib.Path, num_cycles: int, scoring_transform: type)¶
- expand(pbegin)¶
- class schrodinger.seam.examples.active_learning_weigher.CalculateMolWt(label: Optional[str] = None)¶
Bases:
apache_beam.transforms.ptransform.PTransform
- expand(pcoll)¶
- schrodinger.seam.examples.active_learning_weigher.main(args=None)¶