schrodinger.application.matsci.rdpattern module

Module to generate and evaluate Smiles/SMARTS pattern for both CG and all atomic structure using RDKIT

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.application.matsci.rdpattern.detailed_atom_smarts(struct, atom_id, sanitize=True, method=SMARTS_METHOD.rdkit)

Get a detailed SMARTS pattern of an atom in the structure that contains atom’s elements, formal charge, degree, and aromaticity. This is because rdkit extracts the atoms to find the pattern and hence details are sometimes lost.

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • atom_id (int) – Atom index for which SMARTS pattern is required

  • sanitize (bool) – Whether RDKit sanitization should be performed.

  • method (msconst.SMARTS_METHOD) – Method to use to generate SMARTS

Return type

str

Returns

SMARTS pattern for the atom ids provided

schrodinger.application.matsci.rdpattern.get_pattern(struct, include_stereo=True, use_canvas=False)

Get pattern object for a given structure

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the mol

  • use_canvas (bool) – Whether to use canvas to generate pattern

Return type

Pattern or CanvasPattern

Returns

Pattern object

schrodinger.application.matsci.rdpattern.get_smiles_and_map(struct, stereo=True, include_hydrogen=False, use_internal=True)

Get SMILES and atom map for a given structure

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which smiles and atom map need to be selected

  • stereo (bool) – Whether the stereochemistry of the structure should be included in the SMILES

  • include_hydrogen (bool) – Whether to include hydrogen in the SMILES and atom map

  • use_internal (bool) – Whether to use internal implementation

Return type

tuple(str, list)

Returns

SMILES and atom map

schrodinger.application.matsci.rdpattern.to_smarts(struct, sanitize=True, include_stereo=True, atom_subset=None, check_connectivity=False, method=SMARTS_METHOD.rdkit)

Get SMARTS for a given structure

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enables inclusion of information about stereochemistry in the SMARTS. Setting to False can speed this up substantially. For CG system, include stereo option is permanently False

  • check_connectivity (bool) – Whether to check that all the atoms from the atom_subset (or entire structure if it is None) are from one molecule. Raise ValueError if it is not the case.

  • atom_subset (list) – List of atom indices. If None then SMARTS for full structure is computed

  • method (msconst.SMARTS_METHOD) – Method to use to generate SMARTS

Return type

str

Returns

SMARTS pattern for the atom ids provided.

schrodinger.application.matsci.rdpattern.to_smiles(struct, sanitize=True, include_stereo=True, atom_ids=None, fall_back=False, implicitH=False)

Get SMILES for a given structure

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enable to include information about stereochemistry in the SMILES. For CG system, include stereo option is permanently False

  • atom_ids (list) – list of atom indices. If None then SMARTS for full structure is computed

  • fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.

  • implicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.

Return type

str

Returns

SMILES pattern for the atom ids provided

schrodinger.application.matsci.rdpattern.has_query_stereo(query)

Checks if rdkit molecule has stereo centers.

Parameters

query (rdkit.Chem.rdchem.Mol) – Input molecule

Return type

bool

Returns

Whether molecule has stereo centers

schrodinger.application.matsci.rdpattern.evaluate_smarts(struct, smarts, is_cg=None, sanitize=True, uniquify=True, max_matches=1000000000, method=SMARTS_METHOD.rdkit)

Get the list of matches for the passed SMARTS pattern in the reference structure

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • smarts (str) – SMARTS pattern to find

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • uniquify (bool) – if True, return only unique sets of matching atoms

  • max_matches (int) – the maximum number of matches to return

  • method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS

Return type

list or None

Returns

list of list atom/particle indices with matching SMARTS.

schrodinger.application.matsci.rdpattern.evaluate_smarts_by_molecule(struct, smarts, method=SMARTS_METHOD.internal, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)

Takes a structure and a SMARTS pattern and returns a list of all matching atom indices, where each element in the list is a group of atoms that match the the SMARTS pattern.

Parameters
  • struct (structure.Structure) – the structure to search

  • smarts (str) – the SMARTS pattern to match

  • method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS

  • uniquify (bool) – if True, return only unique sets of matching atoms

  • matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number

  • molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure

  • timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.

Return type

list or dict

Returns

For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule

schrodinger.application.matsci.rdpattern.get_name_element_mapper_cg(struct, is_cg=None)

Gets the mapper between schrodinger particle name and rdkit proxy element name if the structure is CG. None if the structure is AA.

Parameters
  • struct (structure.Structure) – Structure used for creating the mapper.

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

Returns

Internal mapping dict between schrodinger particle name and rdkit proxy element name if the structure is CG. Return None if it is AA system or no structure is passed.

Return type

dict or None

Raises

RuntimError – If element mapper cannot be generated for the coarse-grained structure

schrodinger.application.matsci.rdpattern.validate_smarts(smarts, struct=None, is_cg=None, use_internal=False, use_canvas=False)

Validate smarts. Works both with AA and CG.

Parameters
  • smarts (str) – SMARTS to validate

  • struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

Return type

str or None

Returns

Error message on error, None if SMARTS is valid

schrodinger.application.matsci.rdpattern.has_stereo_smarts(smarts, struct=None, is_cg=None)

Check if SMARTS requires stereo. Works both with AA and CG.

Type

str or list[str]

Parameters
  • smarts – SMARTS to validate

  • struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

Return type

bool

Returns

Whether any SMARTS require stereo or not

schrodinger.application.matsci.rdpattern.symbol_to_number(symbol)

Transforms chemical symbols into the corresponding atomic numbers.

Parameters

symbol (str) – Chemical symbol

Return type

int/None

Returns

Atomic numbers corresponding to entered symbol if valid atomic symbol is entered else return None

schrodinger.application.matsci.rdpattern.get_sdgr_atom_index(mol, rdkit_atom_indices)

Generator that yields Schrodinger atom indices from rdkit indices.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – Molecule

  • rdkit_atom_indices (iterable) – Iterable of mol atom indices

Yield int

Schrodinger atom index

class schrodinger.application.matsci.rdpattern.Pattern(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)

Bases: object

A class to calculate calculate SMARTS and SMILES pattern for a structure multiple times. The class can be memory intensive to allow for increased speed.

PROTECTED_PATTERN_BIT = ['D', 'R', 'r', 'v', 'x', 'X', 'H']
__init__(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)

Initiate Pattern class

Parameters
  • struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected

  • implicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.

  • is_cg (bool or None) – Whether structure is CG. If None, perform a check

  • sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.

  • include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. Setting to False can speed this up substantially.

  • fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.

loadMol()

Load rdkit mol in the pattern

property sanitized

Get whether the structure was sanitized or not

Returns

sanitization status of the structure

Return type

bool

property smiles

Get the SMILES for the passed structure

Returns

SMILES pattern for the passed structure

Return type

str

getPattern(atom_ids=None, is_smiles_requested=False, isomeric=True)

Get SMILES/SMARTS for full structure or for substructure of given atom ids.

Parameters
  • atom_ids (list) – list of atom indices

  • is_smiles_requested (bool) – return Smiles pattern if True else Smarts

  • isomeric (bool) – include information about stereochemistry in the SMILES/SMARTS

Return type

str

Returns

SMILES pattern for the atom ids provided

toSmiles(**kwargs)
property smarts

Get the SMARTS for the passed structure

Returns

SMARTS pattern for the passed structure

Return type

str

toSmarts(**kwargs)
validateSmarts(smarts)

Validate the passed smarts pattern

Parameters

smarts (str) – SMARTS to validate

Return type

str or None

Returns

Error message on error, None if SMARTS is valid

getDetailedAtomSmarts(**kwargs)
evaluateSmiles(**kwargs)
evaluateSmarts(**kwargs)
static patternTranslate(s_pattern, mapper)

Replace passed SMARTS/SMILES such that the mapper key values are replaced by

Parameters
  • s_pattern (str) – The SMARTS/SMILES pattern to change

  • mapper (dictionary where the key is the element name to find in the pattern and value is the name to replace it with) – The mapper used to convert the SMARTS/SMILES pattern

Returns

the converted SMARTS/SMILES pattern

Return type

str

proxyToParticleName(smarts)

If the structure is a coarse-grained structure convert the SMARTS pattern of proxy elements to coarse grain particle name. Does nothing for atomistic structures

Parameters

smarts (str) – The SMARTS pattern

Returns

The translated SMARTS pattern

Return type

str

particleNameToProxy(smarts)

If the structure is a coarse grain structure convert the SMARTS pattern of coarse grain particle name to proxy element name. Does nothing for atomistic structures

Parameters

smarts (str) – The SMARTS pattern

Returns

The translated SMARTS pattern

Return type

str

toRdIndices(particle_indices)

Convert list of Schrodinger structure particle indices to RDMol atom indices

Parameters

particle_indices (tuple) – tuple of Schrodinger structure particle indices

Return type

list

Returns

list of RDMol atom indices

toStIndices(particle_indices)

Convert list of RDMol atom indices to Schrodinger structure particle indices

Parameters

particle_indices (tuple) – tuple of RDMol atom indices

Return type

list

Returns

list of Schrodinger structure particle indices

getMoleculeSmiles()

Get SMILES for each molecule in the structure

Returns

The dictionary where the key is the molecule number and the value is the corresponding SMILES pattern

Return type

dict

getMoleculeSmarts()

Get SMARTS for each molecule in the structure

Returns

The dictionary where the key is the molecule number and the value is the corresponding SMARTS pattern

Return type

dict

getUniqueMolNums(use_smarts=False)

Get the unique representative molecules in the structure. This function can be upto 50 times slower than extracting molecules individually for small molecules like water.

Parameters

use_smarts (bool) – If true the unique molecules will share the same SMARTS pattern. If false the unique molecules will share the same SMILES pattern.

Return type

list

Returns

list of molecule numbers that are unique

clearCache()

Clear cache

class schrodinger.application.matsci.rdpattern.CanvasPattern(struct, include_stereo=True)

Bases: object

Pattern class for canvas Used only if MS_USE_RDKIT feature flag is disabled.

__init__(struct, include_stereo=True)

Initialize CanvasPattern object

Parameters
  • struct (structure.Structure) – Structure for which patterns need to be selected

  • include_stereo (bool) – Whether the stereochemistry of the structure should be included in the generated pattern

evaluateSmarts(smarts)

Evaluate SMARTS on the structure

Parameters

smarts (str) – SMARTS pattern to evaluate

Return type

list

Returns

List of atom indices matching the SMARTS pattern

class schrodinger.application.matsci.rdpattern.EvaluateSMARTS(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)

Bases: object

Class to evaluate SMARTS using the following optimizations: 1. Split structure in molecule and evaluate for each 2. Optionally take a list of SMARTS and cache converted molecular object

Note: Don’t use directly, see evaluate_smarts_by_molecule.

__init__(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)

Initialize object.

Parameters
  • struct (structure.Structure) – the structure to search

  • smarts (str) – the SMARTS pattern to match

  • method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS

  • uniquify (bool) – if True, return only unique sets of matching atoms

  • matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number

  • molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure

  • timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.

Return type

list or dict

Returns

For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule

run()

Evaluate SMARTS and return list of matches.

Return list

List of matches if single SMARTS was passed (as a string) or list of lists of matches for each passed SMARTS

evaluateMultipleSmarts(struct)

Evaluate multiple smarts on structure using picked method.

Parameters

structure.Structure – Structure to use

Return list

List of lists of matches for each SMARTS

updateMatches(matches, mol)

Update list of matches with matches for the input molecule.

Parameters
  • matches (list) – List of matches to be updated

  • mol (structure.Structure) – Structure object of molecule