schrodinger.application.transforms.filters module

class schrodinger.application.transforms.filters.SubstructureFilter(filters: List[SingleSmartsFilter])

Bases: PTransform

A PTransform that returns structures or molecules that match every SingleSmartsFilter in filters.

Note: SMARTS patterns in the filters should be compatible with implicit H since structures are converted to Mol objects with implicit H before filtering.

Example usage:

>>> from schrodinger.structutils.filter import SingleSmartsFilter
>>> filters = [SingleSmartsFilter(
...   smarts='Br', min_matches=0, max_matches=1, name='Bromine')]
>>> with beam.Pipeline() as p:
...     output = (p
...     | beam.Create(['CC', 'CBr', 'BrCBr'])
...     | beam.Map(lambda smiles: Chem.MolFromSmiles(smiles))
...     | SubstructureFilter(filters)
...     | beam.Map(lambda mol: Chem.MolToSmiles(mol))
...     | beam.LogElements()
...     )
CC
CBr
Parameters:

filters – The SingleSmartsFilters that must all match.

__init__(filters: List[SingleSmartsFilter])
Parameters:

filters – the SingleSmartsFilters that must all match

classmethod FromFilterFile(path: Union[str, Path]) Self

Load substructure filters from an optionally encrypted file.

The file should contain one filter per line, with the format::

<smarts> <min_matches> <max_matches>[ <name>]

writeFilterFile(path: pathlib.Path | str) Path

Write substructure filters to an optionally encrypted file.

normalize(mols: list[rdkit.Chem.rdchem.Mol])

Modify the substructure filter by adjusting the single SMARTS filters to prevent them from rejected the provided molecules.

Parameters:

mols – the molecules to adjust the substructure filter for

exclude(smarts: list[str])

Modify the substructure filter so that any molecule that matches any SMARTS pattern in the smarts argument will be excluded from the output.

Parameters:

smarts – the SMARTS patterns

class schrodinger.application.transforms.filters.PropertySpaceFilter(property_ranges: Dict[str, List[float]], uncharge: bool = False)

Bases: PTransform

A PTransform that returns structures or molecules based on RDKit property ranges, with the option to uncharge the input before filtering.

Parameters:

property_ranges – Dictionary containing property names as keys and lists of two floats as values, representing the minimum and maximum values for the property range.

Possible properties include all rdkit descriptors. This includes all descriptors in the following rdkit modules:

For a comprehensive list of possible properties, see schrodinger.rdkit.descriptors.DESCRIPTORS_DICT.

Example:

>>> property_ranges = {
...     'MolWt': [0, 100], # from rdkit's Lipinski descriptors
...     'NumAromaticRings': [1, 1]
... }
>>> smiles = ['c1ccccc1', 'Brc1ccccc1', 'CC']
>>> with beam.Pipeline() as p:
...     output = (p
...     | beam.Create(smiles)
...     | beam.Map(lambda smiles: Chem.MolFromSmiles(smiles))
...     | PropertySpaceFilter(property_ranges)
...     | beam.Map(lambda mol: Chem.MolToSmiles(mol))
...     | beam.LogElements()
...     )
c1ccccc1
__init__(property_ranges: Dict[str, List[float]], uncharge: bool = False)
expand(molecules)
class schrodinger.application.transforms.filters.StructurePropertyFilter(property_ranges: Dict[str, List[float]])

Bases: PTransform

A PTransform that rejects structures that have one or more property values outside the allowed range as defined by the property_ranges.

Properties that are not on the structure will not be used as filters.

__init__(property_ranges: Dict[str, List[float]])
Parameters:

property_ranges – the property ranges to filter on

expand(pcoll)
class schrodinger.application.transforms.filters.FepAmenable(fep_references_path: Path, max_hac_diff: int = 10, core_smarts: str = '')

Bases: PTransform

A PTransform that returns molecules that have a perturbation that is amenable to FEP calculations.

A perturbation is considered acceptable if the number of heavy atoms in the perturbation from the maximum common substructure (MCS) is less than or equal to max_hac_diff.

The core_smarts parameter can be used to specify a SMARTS pattern that is used to speed up filtering by avoiding the MCS calculation if possible.

__init__(fep_references_path: Path, max_hac_diff: int = 10, core_smarts: str = '')
Parameters:
  • fep_references_path – the path to the FEP references SMILES file

  • max_hac_diff – the maximum number of heavy atoms not part of the maximum common substructure with molecules in the FEP references

expand(pcoll)
validate_mol(mol: Mol, what='Molecule') None
Raises:

ValueError – if mol is not considered FepAmenable

class schrodinger.application.transforms.filters.DistinctStructures(label: Optional[str] = None)

Bases: PTransform

A PTransform that returns the unique structures based on the SMILES.

expand(pcoll)
class schrodinger.application.transforms.filters.TanimotoFilter(references: Iterable[Mol], threshold: float, ignored_smarts: str = '', larger_is_better: bool = True)

Bases: PTransform

A PTransform that returns molecules that have a better Tanimoto similarity score to at least one molecule in references. What is considered better depends on the larger_is_better parameter.

The optional ignored_smarts parameter can be used to ignore certain atoms in the Tanimoto similarity calculation.

The larger_is_better parameter determines whether a larger or equal similarity score than the threshold is required to pass the filter. (Default is True)

__init__(references: Iterable[Mol], threshold: float, ignored_smarts: str = '', larger_is_better: bool = True)
Parameters:
  • references – the molecules to compare against

  • ignored_smarts – the SMARTS pattern for the atoms to ignore in the Tanimoto similarity calculation

  • threshold – the Tanimoto similarity threshold

  • larger_is_better – whether a larger similarity score is better

expand(pcoll)
class schrodinger.application.transforms.filters.ChargedNHpKaFilter(min_pka: float, max_pka: float, exclude_smarts: Optional[str] = None)

Bases: PTransform

A PTransforms that only passes structures if all known pKa values of the hydrogens on a formally charged nitrogen atom fall in the min_pka to max_pka (borders included) range.

The pKa values should be stored in the r_epik_H2O_pKa atom property, as is customarily done by ligprep. If the atom property is not defined, the atom is considered to have an acceptable pKa value.

__init__(min_pka: float, max_pka: float, exclude_smarts: Optional[str] = None)
expand(pcoll)