schrodinger.application.matsci.rdpattern module¶
Module to generate and evaluate Smiles/SMARTS pattern for both CG and all atomic structure using RDKIT
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.application.matsci.rdpattern.detailed_atom_smarts(struct, atom_id, sanitize=True, method=SMARTS_METHOD.rdkit)¶
Get a detailed SMARTS pattern of an atom in the structure that contains atom’s elements, formal charge, degree, and aromaticity. This is because rdkit extracts the atoms to find the pattern and hence details are sometimes lost.
- Parameters:
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
atom_id (int) – Atom index for which SMARTS pattern is required
sanitize (bool) – Whether RDKit sanitization should be performed.
method (msconst.SMARTS_METHOD) – Method to use to generate SMARTS
- Return type:
str
- Returns:
SMARTS pattern for the atom ids provided
- schrodinger.application.matsci.rdpattern.get_pattern(struct, include_stereo=True, use_canvas=False)¶
Get pattern object for a given structure
- Parameters:
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the mol
use_canvas (bool) – Whether to use canvas to generate pattern
- Return type:
- Returns:
Pattern object
- schrodinger.application.matsci.rdpattern.get_smiles_and_map(struct, stereo=True, include_hydrogen=False, use_internal=True)¶
Get SMILES and atom map for a given structure
- Parameters:
struct (schrodinger.structure.Structure) – Structure for which smiles and atom map need to be selected
stereo (bool) – Whether the stereochemistry of the structure should be included in the SMILES
include_hydrogen (bool) – Whether to include hydrogen in the SMILES and atom map
use_internal (bool) – Whether to use internal implementation
- Return type:
tuple(str, list)
- Returns:
SMILES and atom map
- schrodinger.application.matsci.rdpattern.to_smarts(struct, sanitize=True, include_stereo=True, atom_subset=None, check_connectivity=False, method=SMARTS_METHOD.rdkit)¶
Get SMARTS for a given structure
- Parameters:
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enables inclusion of information about stereochemistry in the SMARTS. Setting to
False
can speed this up substantially. For CG system, include stereo option is permanently Falsecheck_connectivity (bool) – Whether to check that all the atoms from the atom_subset (or entire structure if it is None) are from one molecule. Raise ValueError if it is not the case.
atom_subset (list) – List of atom indices. If None then SMARTS for full structure is computed
method (msconst.SMARTS_METHOD) – Method to use to generate SMARTS
- Return type:
str
- Returns:
SMARTS pattern for the atom ids provided.
- schrodinger.application.matsci.rdpattern.to_smiles(struct, sanitize=True, include_stereo=True, atom_ids=None, fall_back=False, implicitH=False)¶
Get SMILES for a given structure
- Parameters:
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedsanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enable to include information about stereochemistry in the SMILES. For CG system, include stereo option is permanently False
atom_ids (list) – list of atom indices. If None then SMARTS for full structure is computed
fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.
implicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.
- Return type:
str
- Returns:
SMILES pattern for the atom ids provided
- schrodinger.application.matsci.rdpattern.has_query_stereo(query)¶
Checks if rdkit molecule has stereo centers.
- Parameters:
query (rdkit.Chem.rdchem.Mol) – Input molecule
- Return type:
bool
- Returns:
Whether molecule has stereo centers
- schrodinger.application.matsci.rdpattern.evaluate_smarts(struct, smarts, is_cg=None, sanitize=True, uniquify=True, max_matches=1000000000, method=SMARTS_METHOD.rdkit)¶
Get the list of matches for the passed SMARTS pattern in the reference structure
- Parameters:
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedsmarts (str) – SMARTS pattern to find
is_cg (bool or None) – Whether structure is CG. If None, perform a check
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
uniquify (bool) – if True, return only unique sets of matching atoms
max_matches (int) – the maximum number of matches to return
method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS
- Return type:
list or None
- Returns:
list of list atom/particle indices with matching SMARTS.
- schrodinger.application.matsci.rdpattern.evaluate_smarts_by_molecule(struct, smarts, method=SMARTS_METHOD.internal, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)¶
Takes a structure and a SMARTS pattern and returns a list of all matching atom indices, where each element in the list is a group of atoms that match the the SMARTS pattern.
- Parameters:
struct (structure.Structure) – the structure to search
smarts (str) – the SMARTS pattern to match
method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS
uniquify (bool) – if True, return only unique sets of matching atoms
matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number
molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure
timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.
- Return type:
list or dict
- Returns:
For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule
- schrodinger.application.matsci.rdpattern.get_name_element_mapper_cg(struct, is_cg=None)¶
Gets the mapper between schrodinger particle name and rdkit proxy element name if the structure is CG. None if the structure is AA.
- Parameters:
struct (structure.Structure) – Structure used for creating the mapper.
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Returns:
Internal mapping dict between schrodinger particle name and rdkit proxy element name if the structure is CG. Return None if it is AA system or no structure is passed.
- Return type:
dict or None
- Raises:
RuntimError – If element mapper cannot be generated for the coarse-grained structure
- schrodinger.application.matsci.rdpattern.validate_smarts(smarts, struct=None, is_cg=None, use_internal=False, use_canvas=False)¶
Validate smarts. Works both with AA and CG.
- Parameters:
smarts (str) – SMARTS to validate
struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Return type:
str or None
- Returns:
Error message on error, None if SMARTS is valid
- schrodinger.application.matsci.rdpattern.has_stereo_smarts(smarts, struct=None, is_cg=None)¶
Check if SMARTS requires stereo. Works both with AA and CG.
- Type:
str or list[str]
- Parameters:
smarts – SMARTS to validate
struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Return type:
bool
- Returns:
Whether any SMARTS require stereo or not
- schrodinger.application.matsci.rdpattern.symbol_to_number(symbol)¶
Transforms chemical symbols into the corresponding atomic numbers.
- Parameters:
symbol (str) – Chemical symbol
- Return type:
int/None
- Returns:
Atomic numbers corresponding to entered symbol if valid atomic symbol is entered else return None
- schrodinger.application.matsci.rdpattern.get_sdgr_atom_index(mol, rdkit_atom_indices)¶
Generator that yields Schrodinger atom indices from rdkit indices.
- Parameters:
mol (rdkit.Chem.rdchem.Mol) – Molecule
rdkit_atom_indices (iterable) – Iterable of mol atom indices
- Yield int:
Schrodinger atom index
- class schrodinger.application.matsci.rdpattern.Pattern(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)¶
Bases:
object
A class to calculate calculate SMARTS and SMILES pattern for a structure multiple times. The class can be memory intensive to allow for increased speed.
- PROTECTED_PATTERN_BIT = ['D', 'R', 'r', 'v', 'x', 'X', 'H']¶
- __init__(struct, *, implicitH=False, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)¶
Initiate Pattern class
- Parameters:
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedimplicitH (bool) – Should hydrogens be listed implicitly? If False, hydrogens will be included in the connectivity graph, and 3D coordinates and properties of the hydrogens will be translated. Some pattern matching in RDKit requires implicit hydrogens, however.
is_cg (bool or None) – Whether structure is CG. If None, perform a check
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. Setting to
False
can speed this up substantially.fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.
- loadMol()¶
Load rdkit mol in the pattern
- property sanitized¶
Get whether the structure was sanitized or not
- Returns:
sanitization status of the structure
- Return type:
bool
- property smiles¶
Get the SMILES for the passed structure
- Returns:
SMILES pattern for the passed structure
- Return type:
str
- getPattern(atom_ids=None, is_smiles_requested=False, isomeric=True)¶
Get SMILES/SMARTS for full structure or for substructure of given atom ids.
- Parameters:
atom_ids (list) – list of atom indices
is_smiles_requested (bool) – return Smiles pattern if True else Smarts
isomeric (bool) – include information about stereochemistry in the SMILES/SMARTS
- Return type:
str
- Returns:
SMILES pattern for the atom ids provided
- toSmiles(**kwargs)¶
- property smarts¶
Get the SMARTS for the passed structure
- Returns:
SMARTS pattern for the passed structure
- Return type:
str
- toSmarts(**kwargs)¶
- validateSmarts(smarts)¶
Validate the passed smarts pattern
- Parameters:
smarts (str) – SMARTS to validate
- Return type:
str or None
- Returns:
Error message on error, None if SMARTS is valid
- getDetailedAtomSmarts(**kwargs)¶
- getDetailedAtomsSmarts(**kwargs)¶
- evaluateSmiles(**kwargs)¶
- evaluateSmarts(**kwargs)¶
- static patternTranslate(s_pattern, mapper)¶
Replace passed SMARTS/SMILES such that the mapper key values are replaced by
- Parameters:
s_pattern (str) – The SMARTS/SMILES pattern to change
mapper (dictionary where the key is the element name to find in the pattern and value is the name to replace it with) – The mapper used to convert the SMARTS/SMILES pattern
- Returns:
the converted SMARTS/SMILES pattern
- Return type:
str
- proxyToParticleName(smarts)¶
If the structure is a coarse-grained structure convert the SMARTS pattern of proxy elements to coarse grain particle name. Does nothing for atomistic structures
- Parameters:
smarts (str) – The SMARTS pattern
- Returns:
The translated SMARTS pattern
- Return type:
str
- particleNameToProxy(smarts)¶
If the structure is a coarse grain structure convert the SMARTS pattern of coarse grain particle name to proxy element name. Does nothing for atomistic structures
- Parameters:
smarts (str) – The SMARTS pattern
- Returns:
The translated SMARTS pattern
- Return type:
str
- toRdIndices(particle_indices)¶
Convert list of Schrodinger structure particle indices to RDMol atom indices
- Parameters:
particle_indices (tuple) – tuple of Schrodinger structure particle indices
- Return type:
list
- Returns:
list of RDMol atom indices
- toStIndices(particle_indices)¶
Convert list of RDMol atom indices to Schrodinger structure particle indices
- Parameters:
particle_indices (tuple) – tuple of RDMol atom indices
- Return type:
list
- Returns:
list of Schrodinger structure particle indices
- getMoleculeSmiles()¶
Get SMILES for each molecule in the structure
- Returns:
The dictionary where the key is the molecule number and the value is the corresponding SMILES pattern
- Return type:
dict
- getMoleculeSmarts()¶
Get SMARTS for each molecule in the structure
- Returns:
The dictionary where the key is the molecule number and the value is the corresponding SMARTS pattern
- Return type:
dict
- getUniqueMolNums(use_smarts=False)¶
Get the unique representative molecules in the structure. This function can be upto 50 times slower than extracting molecules individually for small molecules like water.
- Parameters:
use_smarts (bool) – If true the unique molecules will share the same SMARTS pattern. If false the unique molecules will share the same SMILES pattern.
- Return type:
list
- Returns:
list of molecule numbers that are unique
- clearCache()¶
Clear cache
- class schrodinger.application.matsci.rdpattern.CanvasPattern(struct, include_stereo=True)¶
Bases:
object
Pattern class for canvas Used only if MS_USE_RDKIT feature flag is disabled.
- __init__(struct, include_stereo=True)¶
Initialize CanvasPattern object
- Parameters:
struct (structure.Structure) – Structure for which patterns need to be selected
include_stereo (bool) – Whether the stereochemistry of the structure should be included in the generated pattern
- evaluateSmarts(smarts)¶
Evaluate SMARTS on the structure
- Parameters:
smarts (str) – SMARTS pattern to evaluate
- Return type:
list
- Returns:
List of atom indices matching the SMARTS pattern
- class schrodinger.application.matsci.rdpattern.EvaluateSMARTS(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)¶
Bases:
object
Class to evaluate SMARTS using the following optimizations: 1. Split structure in molecule and evaluate for each 2. Optionally take a list of SMARTS and cache converted molecular object
Note: Don’t use directly, see evaluate_smarts_by_molecule.
- __init__(struct, smarts, method, uniquify=True, matches_by_mol=False, molecule_numbers=None, is_cg=None, timing_data=None)¶
Initialize object.
- Parameters:
struct (structure.Structure) – the structure to search
smarts (str) – the SMARTS pattern to match
method (msconst.SMARTS_METHOD) – Method to use to evaluate SMARTS
uniquify (bool) – if True, return only unique sets of matching atoms
matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number
molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure
timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.
- Return type:
list or dict
- Returns:
For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule
- run()¶
Evaluate SMARTS and return list of matches.
- Return list:
List of matches if single SMARTS was passed (as a string) or list of lists of matches for each passed SMARTS
- evaluateMultipleSmarts(struct)¶
Evaluate multiple smarts on structure using picked method.
- Parameters:
structure.Structure – Structure to use
- Return list:
List of lists of matches for each SMARTS
- updateMatches(matches, mol)¶
Update list of matches with matches for the input molecule.
- Parameters:
matches (list) – List of matches to be updated
mol (structure.Structure) – Structure object of molecule