schrodinger.application.transforms.ml_train module¶

Transform for training an ML model on molecular structures.

Trains a ligand_ml Smasher model to predict numeric structure properties from molecular structure. Designed for use in active learning workflows.

class schrodinger.application.transforms.ml_train.TrainModelConfig(*, properties: list[str], train_time: float = 1.0, frac_train: float = 0.9, seed: int | None = None)¶

Bases: BaseModel

Configuration for TrainModel.

properties: list[str]¶

train_time: float¶

frac_train: float¶

seed: int | None¶

model_config: ClassVar[ConfigDict] = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_properties(v)¶: Validate that properties are non-empty and have numeric prefixes.

class schrodinger.application.transforms.ml_train.MLModel(blob: ZipBlob, properties: tuple[str, ...])¶

Bases: object

A trained ML model for predicting molecular properties.

Wraps a trained Smasher model stored as a ZipBlob.

Parameters:

blob – Compressed model directory.
properties – The property names this model was trained to predict. All properties are prediction targets (response variables).

blob: ZipBlob¶

properties: tuple[str, ...]¶

__init__(blob: ZipBlob, properties: tuple[str, ...]) → None¶

class schrodinger.application.transforms.ml_train.TrainModel(**kwargs)¶

Bases: PTransformWithConfig

Train an ML model to predict numeric properties from molecular structure.

Takes a PCollection of Structure objects and trains a Smasher model to predict the specified numeric properties. Outputs a PCollection containing a single MLModel.

All properties are used as prediction targets (response variables) for multi-target regression.

Example usage:

>>> with beam.Pipeline() as p:
...     model = (p
...              | beam.Create(structures)
...              | TrainModel(properties=['r_i_glide_gscore']))

Multi-target example:

>>> with beam.Pipeline() as p:
...     model = (p
...              | beam.Create(structures)
...              | TrainModel(properties=['r_score_a', 'r_score_b']))

Parameters:

properties – Numeric property names on the input structures. All are used as prediction targets.
train_time – Training time limit in hours. Defaults to 1.0.
frac_train – Fraction of data used for training vs holdout. Defaults to 0.9.
seed – Random seed for train/holdout split. Defaults to None.

config_class¶: alias of TrainModelConfig

class schrodinger.application.transforms.ml_train.PredictProperties(*, model)¶

Bases: PTransform

Predict properties on structures using a trained ML model.

Takes a PCollection of Structure objects and applies a trained MLModel to predict properties. Predicted values are stored under new property names with predicted_ inserted after the Schrodinger type prefix. For example, a model trained on r_test_score will produce predictions under r_predicted_test_score.

Uncertainty (standard deviation) values are also stored for each prediction, using predicted_uncertainty_ after the type prefix. For example, r_test_score produces an uncertainty property r_predicted_uncertainty_test_score.

Structures are processed in batches of up to 20,000 to limit memory usage.

The model can be provided as either:

A direct MLModel instance.
A PCollection containing a single MLModel (consumed as AsSingleton).

Example with a direct model:

>>> with beam.Pipeline() as p:
...     predictions = (p
...                    | beam.Create(structures)
...                    | PredictProperties(model=trained_model))

Example with a model PCollection:

>>> with beam.Pipeline() as p:
...     model_pcoll = p | beam.Create([trained_model])
...     predictions = (p
...                    | beam.Create(structures)
...                    | PredictProperties(model=model_pcoll))

Parameters:: model – An MLModel instance or a PCollection containing one.

__init__(*, model)¶

schrodinger.application.transforms.ml_train.predicted_property_name(prop: str) → str¶

Build the output property name for a prediction.

Inserts predicted_ after the Schrodinger type prefix (r_ or i_).

Examples:

>>> predicted_property_name('r_test_score')
'r_predicted_test_score'
>>> predicted_property_name('r_i_glide_gscore')
'r_predicted_i_glide_gscore'

Parameters:: prop – Original property name.
Returns:: Predicted property name.

schrodinger.application.transforms.ml_train.uncertainty_property_name(prop: str) → str¶

Build the output property name for a prediction uncertainty.

Inserts predicted_uncertainty_ after the Schrodinger type prefix (r_ or i_).

Examples:

>>> uncertainty_property_name('r_test_score')
'r_predicted_uncertainty_test_score'
>>> uncertainty_property_name('r_i_glide_gscore')
'r_predicted_uncertainty_i_glide_gscore'

Parameters:: prop – Original property name.
Returns:: Uncertainty property name.