schrodinger.application.matsci.rdpattern module¶
Module to generate and evaluate Smiles/SMARTS pattern for both CG and all atomic structure using RDKIT
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.application.matsci.rdpattern.detailed_atom_smarts(struct, atom_id, sanitize=True, method=SMARTS_METHOD.rdkit)¶
Get a detailed SMARTS pattern of an atom in the structure that contains atom’s elements, formal charge, degree, and aromaticity. This is because rdkit extracts the atoms to find the pattern and hence details are sometimes lost.
- Parameters
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
atom_id (int) – Atom index for which SMARTS pattern is required
sanitize (bool) – Whether RDKit sanitization should be performed.
method (msconst.SMARTS_METHOD) – Method to use to generate SMARTS
- Return type
str
- Returns
SMARTS pattern for the atom ids provided
- schrodinger.application.matsci.rdpattern.get_pattern(struct, include_stereo=True, use_canvas=False)¶
Get pattern object for a given structure
- Parameters
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the mol
use_canvas (bool) – Whether to use canvas to generate pattern
- Return type
- Returns
Pattern object
- schrodinger.application.matsci.rdpattern.get_smiles_and_map(struct, stereo=True, include_hydrogen=False, use_internal=True)¶
Get SMILES and atom map for a given structure
- Parameters
struct (schrodinger.structure.Structure) – Structure for which smiles and atom map need to be selected
stereo (bool) – Whether the stereochemistry of the structure should be included in the SMILES
include_hydrogen (bool) – Whether to include hydrogen in the SMILES and atom map
use_internal (bool) – Whether to use internal implementation
- Return type
tuple(str, list)
- Returns
SMILES and atom map
- schrodinger.application.matsci.rdpattern.to_smarts(struct, sanitize=True, include_stereo=True, atom_subset=None, check_connectivity=False, method=SMARTS_METHOD.rdkit)¶
Get SMARTS for a given structure
- Parameters
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enables inclusion of information about stereochemistry in the SMARTS. Setting to
False
can speed this up substantially. For CG system, include stereo option is permanently Falsecheck_connectivity (bool) – Whether to check that all the atoms from the atom_subset (or entire structure if it is None) are from one molecule. Raise ValueError if it is not the case.
atom_subset (list) – List of atom indices. If None then SMARTS for full structure is computed
method (msconst.SMARTS_METHOD) – Method to use to generate SMARTS
- Return type
str
- Returns
SMARTS pattern for the atom ids provided.
- schrodinger.application.matsci.rdpattern.to_smiles(struct, sanitize=True, include_stereo=True, atom_ids=None, fall_back=False, implicitH=False)¶
Get SMILES for a given structure
- Parameters
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedsanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enable to include information about stereochemistry in the SMILES. For CG system, include stereo option is permanently False
atom_ids (list) – list of atom indices. If None then SMARTS for full structure is computed
fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.
implicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.
- Return type
str
- Returns
SMILES pattern for the atom ids provided
- schrodinger.application.matsci.rdpattern.has_query_stereo(query)¶
Checks if rdkit molecule has stereo centers.
- Parameters
query (rdkit.Chem.rdchem.Mol) – Input molecule
- Return type
bool
- Returns
Whether molecule has stereo centers
- schrodinger.application.matsci.rdpattern.evaluate_smarts(struct, smarts, is_cg=None, sanitize=True, uniquify=True, max_matches=1000000000, method=SMARTS_METHOD.rdkit)¶
Get the list of matches for the passed SMARTS pattern in the reference structure
- Parameters
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedsmarts (str) – SMARTS pattern to find
is_cg (bool or None) – Whether structure is CG. If None, perform a check
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
uniquify (bool) – if True, return only unique sets of matching atoms
max_matches (int) – the maximum number of matches to return
method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS
- Return type
list or None
- Returns
list of list atom/particle indices with matching SMARTS.
- schrodinger.application.matsci.rdpattern.evaluate_smarts_by_molecule(struct, smarts, method=SMARTS_METHOD.internal, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)¶
Takes a structure and a SMARTS pattern and returns a list of all matching atom indices, where each element in the list is a group of atoms that match the the SMARTS pattern.
- Parameters
struct (structure.Structure) – the structure to search
smarts (str) – the SMARTS pattern to match
method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS
uniquify (bool) – if True, return only unique sets of matching atoms
matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number
molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure
timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.
- Return type
list or dict
- Returns
For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule
- schrodinger.application.matsci.rdpattern.get_name_element_mapper_cg(struct, is_cg=None)¶
Gets the mapper between schrodinger particle name and rdkit proxy element name if the structure is CG. None if the structure is AA.
- Parameters
struct (structure.Structure) – Structure used for creating the mapper.
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Returns
Internal mapping dict between schrodinger particle name and rdkit proxy element name if the structure is CG. Return None if it is AA system or no structure is passed.
- Return type
dict or None
- Raises
RuntimError – If element mapper cannot be generated for the coarse-grained structure
- schrodinger.application.matsci.rdpattern.validate_smarts(smarts, struct=None, is_cg=None, use_internal=False, use_canvas=False)¶
Validate smarts. Works both with AA and CG.
- Parameters
smarts (str) – SMARTS to validate
struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Return type
str or None
- Returns
Error message on error, None if SMARTS is valid
- schrodinger.application.matsci.rdpattern.has_stereo_smarts(smarts, struct=None, is_cg=None)¶
Check if SMARTS requires stereo. Works both with AA and CG.
- Type
str or list[str]
- Parameters
smarts – SMARTS to validate
struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Return type
bool
- Returns
Whether any SMARTS require stereo or not
- schrodinger.application.matsci.rdpattern.symbol_to_number(symbol)¶
Transforms chemical symbols into the corresponding atomic numbers.
- Parameters
symbol (str) – Chemical symbol
- Return type
int/None
- Returns
Atomic numbers corresponding to entered symbol if valid atomic symbol is entered else return None
- schrodinger.application.matsci.rdpattern.get_sdgr_atom_index(mol, rdkit_atom_indices)¶
Generator that yields Schrodinger atom indices from rdkit indices.
- Parameters
mol (rdkit.Chem.rdchem.Mol) – Molecule
rdkit_atom_indices (iterable) – Iterable of mol atom indices
- Yield int
Schrodinger atom index
- class schrodinger.application.matsci.rdpattern.Pattern(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)¶
Bases:
object
A class to calculate calculate SMARTS and SMILES pattern for a structure multiple times. The class can be memory intensive to allow for increased speed.
- PROTECTED_PATTERN_BIT = ['D', 'R', 'r', 'v', 'x', 'X', 'H']¶
- __init__(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)¶
Initiate Pattern class
- Parameters
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedimplicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.
is_cg (bool or None) – Whether structure is CG. If None, perform a check
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. Setting to
False
can speed this up substantially.fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.
- loadMol()¶
Load rdkit mol in the pattern
- property sanitized¶
Get whether the structure was sanitized or not
- Returns
sanitization status of the structure
- Return type
bool
- property smiles¶
Get the SMILES for the passed structure
- Returns
SMILES pattern for the passed structure
- Return type
str
- getPattern(atom_ids=None, is_smiles_requested=False, isomeric=True)¶
Get SMILES/SMARTS for full structure or for substructure of given atom ids.
- Parameters
atom_ids (list) – list of atom indices
is_smiles_requested (bool) – return Smiles pattern if True else Smarts
isomeric (bool) – include information about stereochemistry in the SMILES/SMARTS
- Return type
str
- Returns
SMILES pattern for the atom ids provided
- toSmiles(**kwargs)¶
- property smarts¶
Get the SMARTS for the passed structure
- Returns
SMARTS pattern for the passed structure
- Return type
str
- toSmarts(**kwargs)¶
- validateSmarts(smarts)¶
Validate the passed smarts pattern
- Parameters
smarts (str) – SMARTS to validate
- Return type
str or None
- Returns
Error message on error, None if SMARTS is valid
- getDetailedAtomSmarts(**kwargs)¶
- evaluateSmiles(**kwargs)¶
- evaluateSmarts(**kwargs)¶
- static patternTranslate(s_pattern, mapper)¶
Replace passed SMARTS/SMILES such that the mapper key values are replaced by
- Parameters
s_pattern (str) – The SMARTS/SMILES pattern to change
mapper (dictionary where the key is the element name to find in the pattern and value is the name to replace it with) – The mapper used to convert the SMARTS/SMILES pattern
- Returns
the converted SMARTS/SMILES pattern
- Return type
str
- proxyToParticleName(smarts)¶
If the structure is a coarse-grained structure convert the SMARTS pattern of proxy elements to coarse grain particle name. Does nothing for atomistic structures
- Parameters
smarts (str) – The SMARTS pattern
- Returns
The translated SMARTS pattern
- Return type
str
- particleNameToProxy(smarts)¶
If the structure is a coarse grain structure convert the SMARTS pattern of coarse grain particle name to proxy element name. Does nothing for atomistic structures
- Parameters
smarts (str) – The SMARTS pattern
- Returns
The translated SMARTS pattern
- Return type
str
- toRdIndices(particle_indices)¶
Convert list of Schrodinger structure particle indices to RDMol atom indices
- Parameters
particle_indices (tuple) – tuple of Schrodinger structure particle indices
- Return type
list
- Returns
list of RDMol atom indices
- toStIndices(particle_indices)¶
Convert list of RDMol atom indices to Schrodinger structure particle indices
- Parameters
particle_indices (tuple) – tuple of RDMol atom indices
- Return type
list
- Returns
list of Schrodinger structure particle indices
- getMoleculeSmiles()¶
Get SMILES for each molecule in the structure
- Returns
The dictionary where the key is the molecule number and the value is the corresponding SMILES pattern
- Return type
dict
- getMoleculeSmarts()¶
Get SMARTS for each molecule in the structure
- Returns
The dictionary where the key is the molecule number and the value is the corresponding SMARTS pattern
- Return type
dict
- getUniqueMolNums(use_smarts=False)¶
Get the unique representative molecules in the structure. This function can be upto 50 times slower than extracting molecules individually for small molecules like water.
- Parameters
use_smarts (bool) – If true the unique molecules will share the same SMARTS pattern. If false the unique molecules will share the same SMILES pattern.
- Return type
list
- Returns
list of molecule numbers that are unique
- clearCache()¶
Clear cache
- class schrodinger.application.matsci.rdpattern.CanvasPattern(struct, include_stereo=True)¶
Bases:
object
Pattern class for canvas Used only if MS_USE_RDKIT feature flag is disabled.
- __init__(struct, include_stereo=True)¶
Initialize CanvasPattern object
- Parameters
struct (structure.Structure) – Structure for which patterns need to be selected
include_stereo (bool) – Whether the stereochemistry of the structure should be included in the generated pattern
- evaluateSmarts(smarts)¶
Evaluate SMARTS on the structure
- Parameters
smarts (str) – SMARTS pattern to evaluate
- Return type
list
- Returns
List of atom indices matching the SMARTS pattern
- class schrodinger.application.matsci.rdpattern.EvaluateSMARTS(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)¶
Bases:
object
Class to evaluate SMARTS using the following optimizations: 1. Split structure in molecule and evaluate for each 2. Optionally take a list of SMARTS and cache converted molecular object
Note: Don’t use directly, see evaluate_smarts_by_molecule.
- __init__(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)¶
Initialize object.
- Parameters
struct (structure.Structure) – the structure to search
smarts (str) – the SMARTS pattern to match
method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS
uniquify (bool) – if True, return only unique sets of matching atoms
matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number
molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure
timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.
- Return type
list or dict
- Returns
For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule
- run()¶
Evaluate SMARTS and return list of matches.
- Return list
List of matches if single SMARTS was passed (as a string) or list of lists of matches for each passed SMARTS
- evaluateMultipleSmarts(struct)¶
Evaluate multiple smarts on structure using picked method.
- Parameters
structure.Structure – Structure to use
- Return list
List of lists of matches for each SMARTS
- updateMatches(matches, mol)¶
Update list of matches with matches for the input molecule.
- Parameters
matches (list) – List of matches to be updated
mol (structure.Structure) – Structure object of molecule