schrodinger.application.transforms.enumerators.substitute module

This module includes steps for running fragment-clique-based transform enumerations.

A fragment clique is defined as a list of fragments (represented as SMARTS). For example:

["F[*:1]", "Cl[*:1]", "Br[*:1]"]

The above clique has three fragments matching the flouro, chloro, and bromo groups.

For every clique, a SubstructureSubstitute

  1. for every fragment (FRAG_A, …) that matches the ‘transformable’ part of

    the input molecule

  2. enumerate the transformation of the input compound FRAG_A >> FRAG_B

    where FRAG_B is all other fragments in the clique.

If core_smarts is set the whole input molecule is considered transformable, otherwise the core-smarts atoms more than n_pair_bonds into the core, are considered transformable.

Example:

>>> from pathlib import Path
>>> import json
>>> from rdkit import Chem
>>> import apache_beam as beam
>>> from schrodinger.application.transforms.enumerators import SubstructureSubstitute
>>> from schrodinger.seam.testing.util import assert_that, equal_to
>>>
>>> # write the cliques file, so we can use it in the configuration
>>> CLIQUES_PATH = Path('cliques.json')
>>> with open(CLIQUES_PATH, 'w') as fh:
...     json.dump([["F[*:1]", "Cl[*:1]", "Br[*:1]"]], fh)
>>>
>>> TRANSFORMS_PATH = Path('transforms.json')
>>> with open(TRANSFORMS_PATH, 'w') as fh:
...     json.dump([], fh)
>>>
>>> PHENYL_SMARTS = 'c1:c:c:c:c:c:1'
>>> input = Chem.MolFromSmiles("Fc1ccccc1")
>>> with beam.Pipeline() as p:
...     outputs = (
...         p
...         | beam.Create([Chem.MolFromSmiles("Fc1ccccc1")])
...         | SubstructureSubstitute(
...             core_smarts=PHENYL_SMARTS,
...             transforms_path=TRANSFORMS_PATH,
...             cliques_path=CLIQUES_PATH)
...         | beam.Map(lambda mol: Chem.MolToSmiles(mol)))
...     assert_that(outputs, equal_to(["Brc1ccccc1", "Clc1ccccc1"]))
class schrodinger.application.transforms.enumerators.substitute.SmilesTransformPair(smi: str, transform: str)

Bases: object

smi: str
transform: str
__init__(smi: str, transform: str) None
class schrodinger.application.transforms.enumerators.substitute.SubstructureSubstitute(core_smarts: Optional[str] = None, transforms_path: Optional[pathlib.Path] = None, cliques_path: Optional[pathlib.Path] = None, sample_size: int = 500000, n_pair_bonds: int = 3, n_apply_bonds: int = 1)

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform that returns all match-molecular-pair transformed molecules based on fragment-cliques in cliques_path, optionally protecting the core_smarts.

In case not all transforms that one wants to apply may be expressed by cliques, combinations with the fragments in the optional transforms_path will always be generated.

Note that only the first occurrence of the core_smarts in the molecule determines what part is protected. This means that if more than one match is possible, the others will never be protected allowing the first one to be modified.

__init__(core_smarts: Optional[str] = None, transforms_path: Optional[pathlib.Path] = None, cliques_path: Optional[pathlib.Path] = None, sample_size: int = 500000, n_pair_bonds: int = 3, n_apply_bonds: int = 1)
Parameters
  • core_smarts – the optional core smarts string used for fragment matching and protection.

  • transforms_path – optional json file of the list of transforms that are always to be applied (first). Set to None to use the default file.

  • cliques_path – the optional json file (if gzipped must end with ‘gz’) of the fragment cliques used for enumeration. Set to None to use the default file.

  • sample_size – the maximum number of randomly sampled outputs to yield from the cliques_file.

  • n_pair_bonds – the number of bonds beyond which atoms of the core are included for fragment matching (extension of the R-group atoms)

  • n_apply_bonds – the number of bonds beyond which atoms of the core are protected.

expand(pcoll)
default_label() str
class schrodinger.application.transforms.enumerators.substitute.Substitute(transform_smarts: apache_beam.pvalue.PCollection[str])

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform that returns unique, sanitized, desalted, and uncharged, molecules after applying the transform_smarts.

__init__(transform_smarts: apache_beam.pvalue.PCollection[str])
Parameters

transform_smarts – the reaction smarts for the transformation

expand(pcoll)