schrodinger.application.transforms.ml_train module

Transform for training an ML model on molecular structures.

Trains a ligand_ml Smasher model to predict numeric structure properties from molecular structure. Designed for use in active learning workflows.

class schrodinger.application.transforms.ml_train.TrainModelConfig(*, properties: list[str], train_time: float = 1.0, frac_train: float = 0.9, seed: int | None = None)

Bases: BaseModel

Configuration for TrainModel.

properties: list[str]
train_time: float
frac_train: float
seed: int | None
model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_properties(v)

Validate that properties are non-empty and have numeric prefixes.

class schrodinger.application.transforms.ml_train.MLModel(blob: ZipBlob, properties: tuple[str, ...])

Bases: object

A trained ML model for predicting molecular properties.

Wraps a trained Smasher model stored as a ZipBlob.

Parameters:
  • blob – Compressed model directory.

  • properties – The property names this model was trained to predict. All properties are prediction targets (response variables).

blob: ZipBlob
properties: tuple[str, ...]
__init__(blob: ZipBlob, properties: tuple[str, ...]) None
class schrodinger.application.transforms.ml_train.TrainModel(**kwargs)

Bases: PTransformWithConfig

Train an ML model to predict numeric properties from molecular structure.

Takes a PCollection of Structure objects and trains a Smasher model to predict the specified numeric properties. Outputs a PCollection containing a single MLModel.

All properties are used as prediction targets (response variables) for multi-target regression.

Example usage:

>>> with beam.Pipeline() as p:
...     model = (p
...              | beam.Create(structures)
...              | TrainModel(properties=['r_i_glide_gscore']))

Multi-target example:

>>> with beam.Pipeline() as p:
...     model = (p
...              | beam.Create(structures)
...              | TrainModel(properties=['r_score_a', 'r_score_b']))
Parameters:
  • properties – Numeric property names on the input structures. All are used as prediction targets.

  • train_time – Training time limit in hours. Defaults to 1.0.

  • frac_train – Fraction of data used for training vs holdout. Defaults to 0.9.

  • seed – Random seed for train/holdout split. Defaults to None.

config_class

alias of TrainModelConfig

class schrodinger.application.transforms.ml_train.PredictProperties(*, model)

Bases: PTransform

Predict properties on structures using a trained ML model.

Takes a PCollection of Structure objects and applies a trained MLModel to predict properties. Predicted values are stored under new property names with predicted_ inserted after the Schrodinger type prefix. For example, a model trained on r_test_score will produce predictions under r_predicted_test_score.

Uncertainty (standard deviation) values are also stored for each prediction, using predicted_uncertainty_ after the type prefix. For example, r_test_score produces an uncertainty property r_predicted_uncertainty_test_score.

Structures are processed in batches of up to 20,000 to limit memory usage.

The model can be provided as either:

  • A direct MLModel instance.

  • A PCollection containing a single MLModel (consumed as AsSingleton).

Example with a direct model:

>>> with beam.Pipeline() as p:
...     predictions = (p
...                    | beam.Create(structures)
...                    | PredictProperties(model=trained_model))

Example with a model PCollection:

>>> with beam.Pipeline() as p:
...     model_pcoll = p | beam.Create([trained_model])
...     predictions = (p
...                    | beam.Create(structures)
...                    | PredictProperties(model=model_pcoll))
Parameters:

model – An MLModel instance or a PCollection containing one.

__init__(*, model)
schrodinger.application.transforms.ml_train.predicted_property_name(prop: str) str

Build the output property name for a prediction.

Inserts predicted_ after the Schrodinger type prefix (r_ or i_).

Examples:

>>> predicted_property_name('r_test_score')
'r_predicted_test_score'
>>> predicted_property_name('r_i_glide_gscore')
'r_predicted_i_glide_gscore'
Parameters:

prop – Original property name.

Returns:

Predicted property name.

schrodinger.application.transforms.ml_train.uncertainty_property_name(prop: str) str

Build the output property name for a prediction uncertainty.

Inserts predicted_uncertainty_ after the Schrodinger type prefix (r_ or i_).

Examples:

>>> uncertainty_property_name('r_test_score')
'r_predicted_uncertainty_test_score'
>>> uncertainty_property_name('r_i_glide_gscore')
'r_predicted_uncertainty_i_glide_gscore'
Parameters:

prop – Original property name.

Returns:

Uncertainty property name.