schrodinger.application.transforms.alpareto module

Active learning with Pareto selection for multi-objective molecule selection.

Combines ML model training (TrainModel/PredictProperties) with Pareto ranking (ParetoRank) for iterative, multi-objective molecule selection. Each cycle scores a batch of molecules, trains an ML model on all scored data, uses predictions + Pareto ranking to select the next batch, and repeats.

class schrodinger.application.transforms.alpareto.BaseScorer(**kwargs)

Bases: PTransformWithConfig

Base class for scoring transforms.

Subclasses must set config_class, SCORER_ID, SCORER_PROPERTIES, and implement expand.

SCORER_PROPERTIES values must be valid structure property keys with a numeric prefix (r_ for real or i_ for integer).

SCORER_ID: str = None
SCORER_PROPERTIES: Iterable[str] = None
class schrodinger.application.transforms.alpareto.RdkitConfig

Bases: BaseModel

Configuration for RdkitScorer.

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class schrodinger.application.transforms.alpareto.RdkitScoreProperties

Bases: StrEnum

Properties produced by RdkitScorer.

MOLWT = 'r_paretoscorer_rdkit_molwt'
HEAVY_ATOM_COUNT = 'i_paretoscorer_rdkit_heavy_atom_count'
class schrodinger.application.transforms.alpareto.RdkitScorer(**kwargs)

Bases: BaseScorer

Compute molecular weight and heavy atom count using RDKit.

SCORER_ID: str = 'rdkit'
SCORER_PROPERTIES

alias of RdkitScoreProperties

config_class

alias of RdkitConfig

class schrodinger.application.transforms.alpareto.FEPScoreProperties

Bases: StrEnum

Properties produced by FEPScorer.

PRED_DG = 'r_alpareto_fep_pred_dg'
PRED_DG_UNCERTAINTY = 'r_alpareto_fep_pred_dg_uncertainty'
class schrodinger.application.transforms.alpareto.FEPScorerConfig(*, receptor_file: Path, reference_ligands_file: Path, execution_mode: Literal['webservices'] = 'webservices', project_name: str = 'project', forcefield: Literal['OPLS4', 'OPLS5'] = 'OPLS4', simulation_time: int = 5000, equilibration_time: int = 20, mock: bool = False)

Bases: BaseModel

Configuration for FEPScorer.

Parameters:
  • receptor_file – Path to receptor/environment structures (.maegz).

  • reference_ligands_file – Path to reference ligands (.maegz).

  • execution_mode – Execution backend. Only 'webservices' is currently supported.

  • project_name – Web services project name.

  • forcefield – OPLS forcefield version.

  • simulation_time – FEP simulation time in picoseconds.

  • equilibration_time – FEP equilibration time in picoseconds.

  • mock – Use mock FEP (random dG values) for testing.

receptor_file: Path
reference_ligands_file: Path
execution_mode: Literal['webservices']
project_name: str
forcefield: Literal['OPLS4', 'OPLS5']
simulation_time: int
equilibration_time: int
mock: bool
model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class schrodinger.application.transforms.alpareto.FEPScorer(**kwargs)

Bases: BaseScorer

Score structures using FEP+ via web services.

SCORER_ID: str = 'fep'
SCORER_PROPERTIES

alias of FEPScoreProperties

config_class

alias of FEPScorerConfig

schrodinger.application.transforms.alpareto.SCORER_REGISTRY: dict[str, type[BaseScorer]] = {'fep': <class 'schrodinger.application.transforms.alpareto.FEPScorer'>, 'rdkit': <class 'schrodinger.application.transforms.alpareto.RdkitScorer'>}

Registry mapping scorer identifiers to BaseScorer subclasses.

class schrodinger.application.transforms.alpareto.ObjectivePropertyConfig(*, prefer_smaller_values: bool = True)

Bases: BaseModel

Per-property optimization metadata.

Parameters:

prefer_smaller_values – Whether smaller values are better for this property. Used by Pareto ranking to orient objectives.

prefer_smaller_values: bool
model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class schrodinger.application.transforms.alpareto.EndpointSpec(*, scorer_id: str, config: dict[str, typing.Any] = <factory>, objective_properties: dict[str, schrodinger.application.transforms.alpareto.ObjectivePropertyConfig])

Bases: BaseModel

Specification for a single scoring endpoint.

Each endpoint pairs a scorer (identified by scorer_id) with the subset of its properties to use as Pareto objectives. The scorer always computes all of its SCORER_PROPERTIES on every structure; objective_properties selects which of those feed into Pareto ranking and records per-property optimization direction.

Parameters:
  • scorer_id – Scorer identifier from SCORER_REGISTRY.

  • config – Scorer-specific configuration passed to the scorer.

  • objective_properties – Map of property names to optimization metadata. Each key must be a valid structure property name (r_ or i_ prefix) that the scorer advertises in its SCORER_PROPERTIES.

scorer_id: str
config: dict[str, Any]
objective_properties: dict[str, ObjectivePropertyConfig]
model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validatePropertyNames(v)

Validate that property names have numeric prefixes.

validateScorerAndProperties()

Validate scorer is registered and properties are available.

class schrodinger.application.transforms.alpareto.ParetoActiveLearningConfig(*, endpoints: list[EndpointSpec], num_cycles: int = 3, sample_size: int = 100, train_time_hr: float = 1.0, frac_train: float = 0.9, seed: int | None = None)

Bases: BaseModel

Configuration for ParetoActiveLearning.

Parameters:
  • endpoints – List of endpoint specifications (at least one required).

  • num_cycles – Number of AL cycles to run. Defaults to 3.

  • sample_size – Structures to select per round. Defaults to 100.

  • train_time_hr – Training time limit in hours. Defaults to 1.0.

  • frac_train – Fraction for training vs holdout. Defaults to 0.9.

  • seed – Random seed for initial sample. Defaults to None.

endpoints: list[EndpointSpec]
num_cycles: int
sample_size: int
train_time_hr: float
frac_train: float
seed: int | None
model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validateEndpointsNonempty(v)

Validate that at least one endpoint is specified.

propertyNames() list[str]

Flat list of all objective property names across all endpoints.

classmethod fromYaml(path: str) ParetoActiveLearningConfig

Load a configuration from a YAML file.

Parameters:

path – Path to the YAML configuration file.

Returns:

Parsed ParetoActiveLearningConfig instance.

class schrodinger.application.transforms.alpareto.CompositeScorer(scorers: list[tuple[str, dict]])

Bases: PTransform

Chain multiple scorers looked up from SCORER_REGISTRY.

Each (scorer_id, config) pair is resolved to a BaseScorer subclass and applied in sequence.

Parameters:

scorers – List of (scorer_id, config) tuples where scorer_id is a key in SCORER_REGISTRY and config is a dict of keyword arguments forwarded to the scorer constructor.

__init__(scorers: list[tuple[str, dict]])
class schrodinger.application.transforms.alpareto.ParetoActiveLearning(**kwargs)

Bases: PTransformWithConfig

Active learning pipeline with Pareto-based selection.

Iteratively selects molecules for scoring using a multi-objective Pareto strategy. Round 0 uses random selection; subsequent rounds train an ML model on all scored data and use Pareto ranking of predicted properties to select the next batch.

Parameters:
  • endpoints – Endpoint specifications specifying which scorers to run and which properties to optimize.

  • num_cycles – Number of AL cycles. Defaults to 3.

  • sample_size – Number of structures to select per round. Round 0 uses random selection; subsequent rounds use Pareto ranking. Defaults to 100.

  • train_time_hr – Per round training time limit in hours. Defaults to 1.0.

  • frac_train – Fraction for training vs holdout. Defaults to 0.9.

  • seed – Random seed for initial sample. Defaults to None.

config_class

alias of ParetoActiveLearningConfig