schrodinger.application.transforms.alpareto module¶

Active learning with Pareto selection for multi-objective molecule selection.

Combines ML model training (TrainModel/PredictProperties) with Pareto ranking (ParetoRank) for iterative, multi-objective molecule selection. Each cycle scores a batch of molecules, trains an ML model on all scored data, uses predictions + Pareto ranking to select the next batch, and repeats.

class schrodinger.application.transforms.alpareto.BaseScorer(**kwargs)¶

Bases: PTransformWithConfig

Base class for scoring transforms.

Subclasses must set config_class, SCORER_ID, SCORER_PROPERTIES, and implement expand.

SCORER_PROPERTIES values must be valid structure property keys with a numeric prefix (r_ for real or i_ for integer).

SCORER_ID: str = None¶

SCORER_PROPERTIES: Iterable[str] = None¶

class schrodinger.application.transforms.alpareto.RdkitConfig¶

Bases: BaseModel

Configuration for RdkitScorer.

model_config: ClassVar[ConfigDict] = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class schrodinger.application.transforms.alpareto.RdkitScoreProperties¶

Bases: StrEnum

Properties produced by RdkitScorer.

MOLWT = 'r_paretoscorer_rdkit_molwt'¶

HEAVY_ATOM_COUNT = 'i_paretoscorer_rdkit_heavy_atom_count'¶

class schrodinger.application.transforms.alpareto.RdkitScorer(**kwargs)¶

Bases: BaseScorer

Compute molecular weight and heavy atom count using RDKit.

SCORER_ID: str = 'rdkit'¶

SCORER_PROPERTIES¶: alias of RdkitScoreProperties

config_class¶: alias of RdkitConfig

class schrodinger.application.transforms.alpareto.FEPScoreProperties¶

Bases: StrEnum

Properties produced by FEPScorer.

PRED_DG = 'r_alpareto_fep_pred_dg'¶

PRED_DG_UNCERTAINTY = 'r_alpareto_fep_pred_dg_uncertainty'¶

class schrodinger.application.transforms.alpareto.FEPScorerConfig(*, receptor_file: Path, reference_ligands_file: Path, execution_mode: Literal['webservices'] = 'webservices', project_name: str = 'project', forcefield: Literal['OPLS4', 'OPLS5'] = 'OPLS4', simulation_time: int = 5000, equilibration_time: int = 20, mock: bool = False)¶

Bases: BaseModel

Configuration for FEPScorer.

Parameters:

receptor_file – Path to receptor/environment structures (.maegz).
reference_ligands_file – Path to reference ligands (.maegz).
execution_mode – Execution backend. Only 'webservices' is currently supported.
project_name – Web services project name.
forcefield – OPLS forcefield version.
simulation_time – FEP simulation time in picoseconds.
equilibration_time – FEP equilibration time in picoseconds.
mock – Use mock FEP (random dG values) for testing.

receptor_file: Path¶

reference_ligands_file: Path¶

execution_mode: Literal['webservices']¶

project_name: str¶

forcefield: Literal['OPLS4', 'OPLS5']¶

simulation_time: int¶

equilibration_time: int¶

mock: bool¶

model_config: ClassVar[ConfigDict] = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class schrodinger.application.transforms.alpareto.FEPScorer(**kwargs)¶

Bases: BaseScorer

Score structures using FEP+ via web services.

SCORER_ID: str = 'fep'¶

SCORER_PROPERTIES¶: alias of FEPScoreProperties

config_class¶: alias of FEPScorerConfig

schrodinger.application.transforms.alpareto.SCORER_REGISTRY: dict[str, type[BaseScorer]] = {'fep': <class 'schrodinger.application.transforms.alpareto.FEPScorer'>, 'rdkit': <class 'schrodinger.application.transforms.alpareto.RdkitScorer'>}¶: Registry mapping scorer identifiers to BaseScorer subclasses.

class schrodinger.application.transforms.alpareto.ObjectivePropertyConfig(*, prefer_smaller_values: bool = True)¶

Bases: BaseModel

Per-property optimization metadata.

Parameters:: prefer_smaller_values – Whether smaller values are better for this property. Used by Pareto ranking to orient objectives.

prefer_smaller_values: bool¶

model_config: ClassVar[ConfigDict] = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class schrodinger.application.transforms.alpareto.EndpointSpec(*, scorer_id: str, config: dict[str, typing.Any] = <factory>, objective_properties: dict[str, schrodinger.application.transforms.alpareto.ObjectivePropertyConfig])¶

Bases: BaseModel

Specification for a single scoring endpoint.

Each endpoint pairs a scorer (identified by scorer_id) with the subset of its properties to use as Pareto objectives. The scorer always computes all of its SCORER_PROPERTIES on every structure; objective_properties selects which of those feed into Pareto ranking and records per-property optimization direction.

Parameters:

scorer_id – Scorer identifier from SCORER_REGISTRY.
config – Scorer-specific configuration passed to the scorer.
objective_properties – Map of property names to optimization metadata. Each key must be a valid structure property name (r_ or i_ prefix) that the scorer advertises in its SCORER_PROPERTIES.

scorer_id: str¶

config: dict[str, Any]¶

objective_properties: dict[str, ObjectivePropertyConfig]¶

model_config: ClassVar[ConfigDict] = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validatePropertyNames(v)¶: Validate that property names have numeric prefixes.

validateScorerAndProperties()¶: Validate scorer is registered and properties are available.

class schrodinger.application.transforms.alpareto.ParetoActiveLearningConfig(*, endpoints: list[EndpointSpec], num_cycles: int = 3, sample_size: int = 100, train_time_hr: float = 1.0, frac_train: float = 0.9, seed: int | None = None)¶

Bases: BaseModel

Configuration for ParetoActiveLearning.

Parameters:

endpoints – List of endpoint specifications (at least one required).
num_cycles – Number of AL cycles to run. Defaults to 3.
sample_size – Structures to select per round. Defaults to 100.
train_time_hr – Training time limit in hours. Defaults to 1.0.
frac_train – Fraction for training vs holdout. Defaults to 0.9.
seed – Random seed for initial sample. Defaults to None.

endpoints: list[EndpointSpec]¶

num_cycles: int¶

sample_size: int¶

train_time_hr: float¶

frac_train: float¶

seed: int | None¶

model_config: ClassVar[ConfigDict] = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validateEndpointsNonempty(v)¶: Validate that at least one endpoint is specified.

propertyNames() → list[str]¶: Flat list of all objective property names across all endpoints.

classmethod fromYaml(path: str) → ParetoActiveLearningConfig¶

Load a configuration from a YAML file.

Parameters:: path – Path to the YAML configuration file.
Returns:: Parsed ParetoActiveLearningConfig instance.

class schrodinger.application.transforms.alpareto.CompositeScorer(scorers: list[tuple[str, dict]])¶

Bases: PTransform

Chain multiple scorers looked up from SCORER_REGISTRY.

Each (scorer_id, config) pair is resolved to a BaseScorer subclass and applied in sequence.

Parameters:: scorers – List of (scorer_id, config) tuples where scorer_id is a key in SCORER_REGISTRY and config is a dict of keyword arguments forwarded to the scorer constructor.

__init__(scorers: list[tuple[str, dict]])¶

class schrodinger.application.transforms.alpareto.ParetoActiveLearning(**kwargs)¶

Bases: PTransformWithConfig

Active learning pipeline with Pareto-based selection.

Iteratively selects molecules for scoring using a multi-objective Pareto strategy. Round 0 uses random selection; subsequent rounds train an ML model on all scored data and use Pareto ranking of predicted properties to select the next batch.

Parameters:

endpoints – Endpoint specifications specifying which scorers to run and which properties to optimize.
num_cycles – Number of AL cycles. Defaults to 3.
sample_size – Number of structures to select per round. Round 0 uses random selection; subsequent rounds use Pareto ranking. Defaults to 100.
train_time_hr – Per round training time limit in hours. Defaults to 1.0.
frac_train – Fraction for training vs holdout. Defaults to 0.9.
seed – Random seed for initial sample. Defaults to None.

config_class¶: alias of ParetoActiveLearningConfig