schrodinger.application.transforms.enumerators.substitute module¶
This module includes steps for running fragment-clique-based transform enumerations.
A fragment clique is defined as a list of fragments (represented as SMARTS). For example:
["F[*:1]", "Cl[*:1]", "Br[*:1]"]
The above clique has three fragments matching the flouro, chloro, and bromo groups.
For every clique, a SubstructureSubstitute
- for every fragment (FRAG_A, …) that matches the ‘transformable’ part of
the input molecule
- enumerate the transformation of the input compound FRAG_A >> FRAG_B
where FRAG_B is all other fragments in the clique.
If core_smarts is set the whole input molecule is considered transformable,
otherwise the core-smarts atoms more than n_pair_bonds
into the core, are
considered transformable.
Example:
>>> from pathlib import Path
>>> import json
>>> from rdkit import Chem
>>> import apache_beam as beam
>>> from schrodinger.application.transforms.enumerators import SubstructureSubstitute
>>> from schrodinger.seam.testing.util import assert_that, equal_to
>>>
>>> # write the cliques file, so we can use it in the configuration
>>> CLIQUES_PATH = Path('cliques.json')
>>> with open(CLIQUES_PATH, 'w') as fh:
... json.dump([["F[*:1]", "Cl[*:1]", "Br[*:1]"]], fh)
>>>
>>> TRANSFORMS_PATH = Path('transforms.json')
>>> with open(TRANSFORMS_PATH, 'w') as fh:
... json.dump([], fh)
>>>
>>> PHENYL_SMARTS = 'c1:c:c:c:c:c:1'
>>> input = Chem.MolFromSmiles("Fc1ccccc1")
>>> with beam.Pipeline() as p:
... outputs = (
... p
... | beam.Create([Chem.MolFromSmiles("Fc1ccccc1")])
... | SubstructureSubstitute(
... core_smarts=PHENYL_SMARTS,
... transforms_path=TRANSFORMS_PATH,
... cliques_path=CLIQUES_PATH)
... | beam.Map(lambda mol: Chem.MolToSmiles(mol)))
... assert_that(outputs, equal_to(["Brc1ccccc1", "Clc1ccccc1"]))
- class schrodinger.application.transforms.enumerators.substitute.SmilesTransformPair(smi: str, transform: str)¶
Bases:
object
- smi: str¶
- transform: str¶
- __init__(smi: str, transform: str) None ¶
- class schrodinger.application.transforms.enumerators.substitute.SubstructureSubstitute(core_smarts: Optional[str] = None, transforms_path: Optional[pathlib.Path] = None, cliques_path: Optional[pathlib.Path] = None, sample_size: int = 500000, n_pair_bonds: int = 3, n_apply_bonds: int = 1)¶
Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform that returns all match-molecular-pair transformed molecules based on fragment-cliques in
cliques_path
, optionally protecting thecore_smarts
.In case not all transforms that one wants to apply may be expressed by cliques, combinations with the fragments in the optional
transforms_path
will always be generated.Note that only the first occurrence of the core_smarts in the molecule determines what part is protected. This means that if more than one match is possible, the others will never be protected allowing the first one to be modified.
- __init__(core_smarts: Optional[str] = None, transforms_path: Optional[pathlib.Path] = None, cliques_path: Optional[pathlib.Path] = None, sample_size: int = 500000, n_pair_bonds: int = 3, n_apply_bonds: int = 1)¶
- Parameters
core_smarts – the optional core smarts string used for fragment matching and protection.
transforms_path – optional json file of the list of transforms that are always to be applied (first). Set to None to use the default file.
cliques_path – the optional json file (if gzipped must end with ‘gz’) of the fragment cliques used for enumeration. Set to None to use the default file.
sample_size – the maximum number of randomly sampled outputs to yield from the cliques_file.
n_pair_bonds – the number of bonds beyond which atoms of the core are included for fragment matching (extension of the R-group atoms)
n_apply_bonds – the number of bonds beyond which atoms of the core are protected.
- expand(pcoll)¶
- default_label() str ¶
- class schrodinger.application.transforms.enumerators.substitute.Substitute(transform_smarts: apache_beam.pvalue.PCollection[str])¶
Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform that returns unique, sanitized, desalted, and uncharged, molecules after applying the transform_smarts.
- __init__(transform_smarts: apache_beam.pvalue.PCollection[str])¶
- Parameters
transform_smarts – the reaction smarts for the transformation
- expand(pcoll)¶