schrodinger.application.transforms.ml_train module¶
Transform for training an ML model on molecular structures.
Trains a ligand_ml Smasher model to predict numeric structure properties from molecular structure. Designed for use in active learning workflows.
- class schrodinger.application.transforms.ml_train.TrainModelConfig(*, properties: list[str], train_time: float = 1.0, frac_train: float = 0.9, seed: int | None = None)¶
Bases:
BaseModelConfiguration for
TrainModel.- properties: list[str]¶
- train_time: float¶
- frac_train: float¶
- seed: int | None¶
- model_config: ClassVar[ConfigDict] = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- classmethod validate_properties(v)¶
Validate that properties are non-empty and have numeric prefixes.
- class schrodinger.application.transforms.ml_train.MLModel(blob: ZipBlob, properties: tuple[str, ...])¶
Bases:
objectA trained ML model for predicting molecular properties.
Wraps a trained Smasher model stored as a
ZipBlob.- Parameters:
blob – Compressed model directory.
properties – The property names this model was trained to predict. All properties are prediction targets (response variables).
- properties: tuple[str, ...]¶
- class schrodinger.application.transforms.ml_train.TrainModel(**kwargs)¶
Bases:
PTransformWithConfigTrain an ML model to predict numeric properties from molecular structure.
Takes a PCollection of
Structureobjects and trains a Smasher model to predict the specified numeric properties. Outputs a PCollection containing a singleMLModel.All properties are used as prediction targets (response variables) for multi-target regression.
Example usage:
>>> with beam.Pipeline() as p: ... model = (p ... | beam.Create(structures) ... | TrainModel(properties=['r_i_glide_gscore']))
Multi-target example:
>>> with beam.Pipeline() as p: ... model = (p ... | beam.Create(structures) ... | TrainModel(properties=['r_score_a', 'r_score_b']))
- Parameters:
properties – Numeric property names on the input structures. All are used as prediction targets.
train_time – Training time limit in hours. Defaults to 1.0.
frac_train – Fraction of data used for training vs holdout. Defaults to 0.9.
seed – Random seed for train/holdout split. Defaults to None.
- config_class¶
alias of
TrainModelConfig
- class schrodinger.application.transforms.ml_train.PredictProperties(*, model)¶
Bases:
PTransformPredict properties on structures using a trained ML model.
Takes a PCollection of
Structureobjects and applies a trainedMLModelto predict properties. Predicted values are stored under new property names withpredicted_inserted after the Schrodinger type prefix. For example, a model trained onr_test_scorewill produce predictions underr_predicted_test_score.Uncertainty (standard deviation) values are also stored for each prediction, using
predicted_uncertainty_after the type prefix. For example,r_test_scoreproduces an uncertainty propertyr_predicted_uncertainty_test_score.Structures are processed in batches of up to 20,000 to limit memory usage.
The model can be provided as either:
A direct
MLModelinstance.A
PCollectioncontaining a singleMLModel(consumed asAsSingleton).
Example with a direct model:
>>> with beam.Pipeline() as p: ... predictions = (p ... | beam.Create(structures) ... | PredictProperties(model=trained_model))
Example with a model PCollection:
>>> with beam.Pipeline() as p: ... model_pcoll = p | beam.Create([trained_model]) ... predictions = (p ... | beam.Create(structures) ... | PredictProperties(model=model_pcoll))
- Parameters:
model – An
MLModelinstance or a PCollection containing one.
- __init__(*, model)¶
- schrodinger.application.transforms.ml_train.predicted_property_name(prop: str) str¶
Build the output property name for a prediction.
Inserts
predicted_after the Schrodinger type prefix (r_ori_).Examples:
>>> predicted_property_name('r_test_score') 'r_predicted_test_score' >>> predicted_property_name('r_i_glide_gscore') 'r_predicted_i_glide_gscore'
- Parameters:
prop – Original property name.
- Returns:
Predicted property name.
- schrodinger.application.transforms.ml_train.uncertainty_property_name(prop: str) str¶
Build the output property name for a prediction uncertainty.
Inserts
predicted_uncertainty_after the Schrodinger type prefix (r_ori_).Examples:
>>> uncertainty_property_name('r_test_score') 'r_predicted_uncertainty_test_score' >>> uncertainty_property_name('r_i_glide_gscore') 'r_predicted_uncertainty_i_glide_gscore'
- Parameters:
prop – Original property name.
- Returns:
Uncertainty property name.