schrodinger.application.transforms.alpareto module¶
Active learning with Pareto selection for multi-objective molecule selection.
Combines ML model training (TrainModel/PredictProperties) with
Pareto ranking (ParetoRank) for iterative, multi-objective molecule
selection. Each cycle scores a batch of molecules, trains an ML model on
all scored data, uses predictions + Pareto ranking to select the next
batch, and repeats.
- class schrodinger.application.transforms.alpareto.BaseScorer(**kwargs)¶
Bases:
PTransformWithConfigBase class for scoring transforms.
Subclasses must set
config_class,SCORER_ID,SCORER_PROPERTIES, and implementexpand.SCORER_PROPERTIESvalues must be valid structure property keys with a numeric prefix (r_for real ori_for integer).- SCORER_ID: str = None¶
- SCORER_PROPERTIES: Iterable[str] = None¶
- class schrodinger.application.transforms.alpareto.RdkitConfig¶
Bases:
BaseModelConfiguration for
RdkitScorer.- model_config: ClassVar[ConfigDict] = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class schrodinger.application.transforms.alpareto.RdkitScoreProperties¶
Bases:
StrEnumProperties produced by
RdkitScorer.- MOLWT = 'r_paretoscorer_rdkit_molwt'¶
- HEAVY_ATOM_COUNT = 'i_paretoscorer_rdkit_heavy_atom_count'¶
- class schrodinger.application.transforms.alpareto.RdkitScorer(**kwargs)¶
Bases:
BaseScorerCompute molecular weight and heavy atom count using RDKit.
- SCORER_ID: str = 'rdkit'¶
- SCORER_PROPERTIES¶
alias of
RdkitScoreProperties
- config_class¶
alias of
RdkitConfig
- class schrodinger.application.transforms.alpareto.FEPScoreProperties¶
Bases:
StrEnumProperties produced by
FEPScorer.- PRED_DG = 'r_alpareto_fep_pred_dg'¶
- PRED_DG_UNCERTAINTY = 'r_alpareto_fep_pred_dg_uncertainty'¶
- class schrodinger.application.transforms.alpareto.FEPScorerConfig(*, receptor_file: Path, reference_ligands_file: Path, execution_mode: Literal['webservices'] = 'webservices', project_name: str = 'project', forcefield: Literal['OPLS4', 'OPLS5'] = 'OPLS4', simulation_time: int = 5000, equilibration_time: int = 20, mock: bool = False)¶
Bases:
BaseModelConfiguration for
FEPScorer.- Parameters:
receptor_file – Path to receptor/environment structures (.maegz).
reference_ligands_file – Path to reference ligands (.maegz).
execution_mode – Execution backend. Only
'webservices'is currently supported.project_name – Web services project name.
forcefield – OPLS forcefield version.
simulation_time – FEP simulation time in picoseconds.
equilibration_time – FEP equilibration time in picoseconds.
mock – Use mock FEP (random dG values) for testing.
- receptor_file: Path¶
- reference_ligands_file: Path¶
- execution_mode: Literal['webservices']¶
- project_name: str¶
- forcefield: Literal['OPLS4', 'OPLS5']¶
- simulation_time: int¶
- equilibration_time: int¶
- mock: bool¶
- model_config: ClassVar[ConfigDict] = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class schrodinger.application.transforms.alpareto.FEPScorer(**kwargs)¶
Bases:
BaseScorerScore structures using FEP+ via web services.
- SCORER_ID: str = 'fep'¶
- SCORER_PROPERTIES¶
alias of
FEPScoreProperties
- config_class¶
alias of
FEPScorerConfig
- schrodinger.application.transforms.alpareto.SCORER_REGISTRY: dict[str, type[BaseScorer]] = {'fep': <class 'schrodinger.application.transforms.alpareto.FEPScorer'>, 'rdkit': <class 'schrodinger.application.transforms.alpareto.RdkitScorer'>}¶
Registry mapping scorer identifiers to
BaseScorersubclasses.
- class schrodinger.application.transforms.alpareto.ObjectivePropertyConfig(*, prefer_smaller_values: bool = True)¶
Bases:
BaseModelPer-property optimization metadata.
- Parameters:
prefer_smaller_values – Whether smaller values are better for this property. Used by Pareto ranking to orient objectives.
- prefer_smaller_values: bool¶
- model_config: ClassVar[ConfigDict] = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- class schrodinger.application.transforms.alpareto.EndpointSpec(*, scorer_id: str, config: dict[str, typing.Any] = <factory>, objective_properties: dict[str, schrodinger.application.transforms.alpareto.ObjectivePropertyConfig])¶
Bases:
BaseModelSpecification for a single scoring endpoint.
Each endpoint pairs a scorer (identified by
scorer_id) with the subset of its properties to use as Pareto objectives. The scorer always computes all of itsSCORER_PROPERTIESon every structure;objective_propertiesselects which of those feed into Pareto ranking and records per-property optimization direction.- Parameters:
scorer_id – Scorer identifier from
SCORER_REGISTRY.config – Scorer-specific configuration passed to the scorer.
objective_properties – Map of property names to optimization metadata. Each key must be a valid structure property name (
r_ori_prefix) that the scorer advertises in itsSCORER_PROPERTIES.
- scorer_id: str¶
- config: dict[str, Any]¶
- objective_properties: dict[str, ObjectivePropertyConfig]¶
- model_config: ClassVar[ConfigDict] = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- classmethod validatePropertyNames(v)¶
Validate that property names have numeric prefixes.
- validateScorerAndProperties()¶
Validate scorer is registered and properties are available.
- class schrodinger.application.transforms.alpareto.ParetoActiveLearningConfig(*, endpoints: list[EndpointSpec], num_cycles: int = 3, sample_size: int = 100, train_time_hr: float = 1.0, frac_train: float = 0.9, seed: int | None = None)¶
Bases:
BaseModelConfiguration for
ParetoActiveLearning.- Parameters:
endpoints – List of endpoint specifications (at least one required).
num_cycles – Number of AL cycles to run. Defaults to 3.
sample_size – Structures to select per round. Defaults to 100.
train_time_hr – Training time limit in hours. Defaults to 1.0.
frac_train – Fraction for training vs holdout. Defaults to 0.9.
seed – Random seed for initial sample. Defaults to None.
- endpoints: list[EndpointSpec]¶
- num_cycles: int¶
- sample_size: int¶
- train_time_hr: float¶
- frac_train: float¶
- seed: int | None¶
- model_config: ClassVar[ConfigDict] = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- classmethod validateEndpointsNonempty(v)¶
Validate that at least one endpoint is specified.
- propertyNames() list[str]¶
Flat list of all objective property names across all endpoints.
- classmethod fromYaml(path: str) ParetoActiveLearningConfig¶
Load a configuration from a YAML file.
- Parameters:
path – Path to the YAML configuration file.
- Returns:
Parsed
ParetoActiveLearningConfiginstance.
- class schrodinger.application.transforms.alpareto.CompositeScorer(scorers: list[tuple[str, dict]])¶
Bases:
PTransformChain multiple scorers looked up from
SCORER_REGISTRY.Each
(scorer_id, config)pair is resolved to aBaseScorersubclass and applied in sequence.- Parameters:
scorers – List of
(scorer_id, config)tuples where scorer_id is a key inSCORER_REGISTRYand config is a dict of keyword arguments forwarded to the scorer constructor.
- __init__(scorers: list[tuple[str, dict]])¶
- class schrodinger.application.transforms.alpareto.ParetoActiveLearning(**kwargs)¶
Bases:
PTransformWithConfigActive learning pipeline with Pareto-based selection.
Iteratively selects molecules for scoring using a multi-objective Pareto strategy. Round 0 uses random selection; subsequent rounds train an ML model on all scored data and use Pareto ranking of predicted properties to select the next batch.
- Parameters:
endpoints – Endpoint specifications specifying which scorers to run and which properties to optimize.
num_cycles – Number of AL cycles. Defaults to 3.
sample_size – Number of structures to select per round. Round 0 uses random selection; subsequent rounds use Pareto ranking. Defaults to 100.
train_time_hr – Per round training time limit in hours. Defaults to 1.0.
frac_train – Fraction for training vs holdout. Defaults to 0.9.
seed – Random seed for initial sample. Defaults to None.
- config_class¶
alias of
ParetoActiveLearningConfig