schrodinger.active_learning.scaffold_methods module

schrodinger.active_learning.scaffold_methods.validate_args(args: argparse.Namespace)
schrodinger.active_learning.scaffold_methods.prepare_args(args: argparse.Namespace)
class schrodinger.active_learning.scaffold_methods.ScaffoldData(count: int = 0, example_ligands: typing.List[str] = <factory>)

Bases: object

Represents data for a single scaffold.

Attributes:

count: The number of times this scaffold appears in the dataset.

example_ligands: A list of SMILES representing example ligands

from the dataset containing this scaffold.

count: int = 0
example_ligands: List[str]
to_dict()

Converts ScaffoldData to a dictionary.

__init__(count: int = 0, example_ligands: typing.List[str] = <factory>) None
schrodinger.active_learning.scaffold_methods.scaffold_dj(input_file: str, block_size: int = 100000, smiles_column: int = 0, return_ligands: bool = False, sample_size: int = None, logger: Optional[logging.Logger] = None) list[str]

This function takes in an input .smi file and parallelizes and runs scaffold analysis on it using scaffold_analysis_worker script. If return_ligands is False, it returns a list of .json files containing dictionary of scaffold data as an output. This dictionary has SMILES of the scaffolds present in the subfile as keys, and counts of each scaffold as value. If return_ligands is True, it returns a list of .smi file containing selected ligands.

Parameters
  • input_file – path to the input .smi file

  • block_size – maximum number of ligands in one subjob assuming enough number of available cpu.

  • smiles_column – SMILES column in the input .smi file.

  • return_ligands – Whether to return list of example ligands for all the scaffolds instead of their counts.

  • sample_size – Number of ligands to return. Only used when return_ligands is True.

  • logger – logger to report to.

Returns

list of files either .smi or .json depending on whether return_ligands is True or False.

schrodinger.active_learning.scaffold_methods.aggregate_scaffold_subjobs(scaff_files: list[str]) tuple[list, list]

This function takes in a list of .json files containing scaffold dictionaries and aggregates them.

Parameters

scaff_files – list of .json files with scaffold dictionaries created by scaffold_analysis_worker script.

Returns

a tuple of two lists containing smiles and counts of scaffolds.

schrodinger.active_learning.scaffold_methods.select_ligands_from_scaffolds(scaffold_dictionary: dict, sample_size: int, output_smi_file: str)

This function reads scaffold data from a scaffold dictionary, samples ligands from different scaffolds, and writes the selected ligand SMILES to an output file. If the scaffold dictionary has more unique scaffolds than ‘samples_size’, then it randomly selects ‘samples_size’ scaffolds and selects one ligand from each scaffold. Otherwise, all the example ligands from all the scaffolds are merged and considered for sampling.

Parameters
  • scaff_dictionary – scaffold dictionary created by get_scaffold_dictionary function.

  • sample_size – required number of ligands to be selected and written.

  • output_smi_file – file to write the SMILES of output ligands.

schrodinger.active_learning.scaffold_methods.get_rdkit_mol(smi: str) Optional[rdkit.Chem.rdchem.Mol]
Parameters

smi – input smiles string

Returns

a rdkit mol object for the input smiles. returns none if rdkit fails to generate a mol object for the input smiles.

schrodinger.active_learning.scaffold_methods.get_scaffold_dictionary(input_file: str, smiles_column: int = 0, return_ligands: bool = False, ligands_per_scaffold: int = 10) dict

This function takes in an input .smi file and clusters the input ligands into Bemis-Murcko scaffolds. It returns a dictionary with scaffold smiles as keys and the corresponding scaffold data as value. The scaffold data for a scaffold consists of either the total count or upto 10 example ligands of the scaffold in the input file.

Parameters
  • input_file – name of .smi file to read from.

  • return_ligands – Whether to store example ligands instead of counts in the output dictionary.

  • ligands_per_scaffold – number of example ligands to save if return_ligands is true.

Returns

dictionary with smiles and counts of scaffolds as keys and values respectively. If return_ligands is set to True, the dictionary value also is a list of example ligands for the scaffold instead of its count. The number of example ligands in this list is set by saved_ligands.