schrodinger.application.matsci.rdpattern module

Module to generate and evaluate Smiles/SMARTS pattern for both CG and all atomic structure using RDKIT

schrodinger.application.matsci.rdpattern.to_smarts(struct, sanitize=True, include_stereo=True, atom_subset=None, check_connectivity=False)

Get SMARTS for a given structure

  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enables inclusion of information about stereochemistry in the SMARTS. Setting to False can speed this up substantially. For CG system, include stereo option is permanently False

  • check_connectivity (bool) – Whether to check that all the atoms from the atom_subset (or entire structure if it is None) are from one molecule. Raise ValueError if it is not the case.

  • atom_subset (list) – List of atom indices. If None then SMARTS for full structure is computed

Return type



SMARTS pattern for the atom ids provided.

schrodinger.application.matsci.rdpattern.to_smiles(struct, sanitize=True, include_stereo=True, atom_ids=None, fall_back=False)

Get SMILES for a given structure

  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enable to include information about stereochemistry in the SMILES. For CG system, include stereo option is permanently False

  • atom_ids (list) – list of atom indices. If None then SMARTS for full structure is computed

  • fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.

Return type



SMILES pattern for the atom ids provided


Checks if rdkit molecule has stereo centers.


query (rdkit.Chem.rdchem.Mol) – Input molecule

Return type



Whether molecule has stereo centers

schrodinger.application.matsci.rdpattern.evaluate_smarts(struct, smarts, is_cg=None, sanitize=True, uniquify=True, max_matches=1000000000)

Get the list of matches for the passed SMARTS pattern in the reference structure

  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • smarts (str) – SMARTS pattern to find

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • uniquify (bool) – if True, return only unique sets of matching atoms

  • max_matches (int) – the maximum number of matches to return

Return type

list or None


list of list atom/particle indices with matching SMARTS.

schrodinger.application.matsci.rdpattern.evaluate_smarts_by_molecule(struct, smarts, method=SMARTS_METHOD.internal, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None)

Takes a structure and a SMARTS pattern and returns a list of all matching atom indices, where each element in the list is a group of atoms that match the the SMARTS pattern.

  • struct (structure.Structure) – the structure to search

  • smarts (str) – the SMARTS pattern to match

  • method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS

  • uniquify (bool) – if True, return only unique sets of matching atoms

  • matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number

  • molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure

Return type

list or dict


For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule

schrodinger.application.matsci.rdpattern.get_name_element_mapper_cg(struct, is_cg=None)

Gets the mapper between schrodinger particle name and rdkit proxy element name if the structure is CG. None if the structure is AA.

  • struct (structure.Structure) – Structure used for creating the mapper.

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check


Internal mapping dict between schrodinger particle name and rdkit proxy element name if the structure is CG. Return None if it is AA system or no structure is passed.

Return type

dict or None


RuntimError – If element mapper cannot be generated for the coarse-grained structure

schrodinger.application.matsci.rdpattern.validate_smarts(smarts, struct=None, is_cg=None)

Validate smarts. Works both with AA and CG.

  • smarts (str) – SMARTS to validate

  • struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

Return type

str or None


Error message on error, None if SMARTS is valid

schrodinger.application.matsci.rdpattern.has_stereo_smarts(smarts, struct=None, is_cg=None)

Check if SMARTS requires stereo. Works both with AA and CG.


str or list[str]

  • smarts – SMARTS to validate

  • struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

Return type



Whether any SMARTS require stereo or not


Transforms chemical symbols into the corresponding atomic numbers.


symbol (str) – Chemical symbol

Return type



Atomic numbers corresponding to entered symbol if valid atomic symbol is entered else return None

schrodinger.application.matsci.rdpattern.get_sdgr_atom_index(mol, rdkit_atom_indices)

Generator that yields Schrodinger atom indices from rdkit indices.

  • mol (rdkit.Chem.rdchem.Mol) – Molecule

  • rdkit_atom_indices (iterable) – Iterable of mol atom indices

Yield int

Schrodinger atom index

class schrodinger.application.matsci.rdpattern.Pattern(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)

Bases: object

A class to calculate calculate SMARTS and SMILES pattern for a structure multiple times. The class can be memory intensive to allow for increased speed.

PROTECTED_PATTERN_BIT = ['D', 'R', 'r', 'v', 'x', 'X', 'H']
__init__(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)

Initiate Pattern class

  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • implicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. Setting to False can speed this up substantially.

  • fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.


Load rdkit mol in the pattern

property sanitized

Get whether the structure was sanitized or not


sanitization status of the structure

Return type


property smiles

Get the SMILES for the passed structure


SMILES pattern for the passed structure

Return type


getPattern(atom_ids=None, is_smiles_requested=False, isomeric=True)

Get SMILES/SMARTS for full structure or for substructure of given atom ids.

  • atom_ids (list) – list of atom indices

  • is_smiles_requested (bool) – return Smiles pattern if True else Smarts

  • isomeric (bool) – include information about stereochemistry in the SMILES/SMARTS

Return type



SMILES pattern for the atom ids provided

property smarts

Get the SMARTS for the passed structure


SMARTS pattern for the passed structure

Return type



Validate the passed smarts pattern


smarts (str) – SMARTS to validate

Return type

str or None


Error message on error, None if SMARTS is valid

static patternTranslate(s_pattern, mapper)

Replace passed SMARTS/SMILES such that the mapper key values are replaced by

  • s_pattern (str) – The SMARTS/SMILES pattern to change

  • mapper (dictionary where the key is the element name to find in the pattern and value is the name to replace it with) – The mapper used to convert the SMARTS/SMILES pattern


the converted SMARTS/SMILES pattern

Return type



If the structure is a coarse-grained structure convert the SMARTS pattern of proxy elements to coarse grain particle name. Does nothing for atomistic structures


smarts (str) – The SMARTS pattern


The translated SMARTS pattern

Return type



If the structure is a coarse grain structure convert the SMARTS pattern of coarse grain particle name to proxy element name. Does nothing for atomistic structures


smarts (str) – The SMARTS pattern


The translated SMARTS pattern

Return type



Convert list of Schrodinger structure particle indices to RDMol atom indices


particle_indices (tuple) – tuple of Schrodinger structure particle indices

Return type



list of RDMol atom indices


Convert list of RDMol atom indices to Schrodinger structure particle indices


particle_indices (tuple) – tuple of RDMol atom indices

Return type



list of Schrodinger structure particle indices


Get SMILES for each molecule in the structure


The dictionary where the key is the molecule number and the value is the corresponding SMILES pattern

Return type



Get SMARTS for each molecule in the structure


The dictionary where the key is the molecule number and the value is the corresponding SMARTS pattern

Return type



Get the unique representative molecules in the structure. This function can be upto 50 times slower than extracting molecules individually for small molecules like water.


use_smarts (bool) – If true the unique molecules will share the same SMARTS pattern. If false the unique molecules will share the same SMILES pattern.

Return type



list of molecule numbers that are unique


Clear cache

class schrodinger.application.matsci.rdpattern.EvaluateSMARTS(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None)

Bases: object

Class to evaluate SMARTS using the following optimizations: 1. Split structure in molecule and evaluate for each 2. Optionally take a list of SMARTS and cache converted molecular object

Note: Don’t use directly, see evaluate_smarts_by_molecule.

__init__(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None)

Initialize object.

  • struct (structure.Structure) – the structure to search

  • smarts (str) – the SMARTS pattern to match

  • method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS

  • uniquify (bool) – if True, return only unique sets of matching atoms

  • matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number

  • molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure

Return type

list or dict


For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule


Evaluate SMARTS and return list of matches.

Return list

List of matches if single SMARTS was passed (as a string) or list of lists of matches for each passed SMARTS


Evaluate multiple smarts on structure using picked method.


structure.Structure – Structure to use

Return list

List of lists of matches for each SMARTS

updateMatches(matches, mol)

Update list of matches with matches for the input molecule.

  • matches (list) – List of matches to be updated

  • mol (structure.Structure) – Structure object of molecule