schrodinger.application.matsci.rdpattern module¶
Module to generate and evaluate Smiles/SMARTS pattern for both CG and all atomic structure using RDKIT
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.application.matsci.rdpattern.to_smarts(struct, sanitize=True, include_stereo=True, atom_subset=None, check_connectivity=False)[source]¶
Get SMARTS for a given structure
- Parameters
struct (schrodinger.structure.Structure) – Structure for which patterns need to be selected
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enables inclusion of information about stereochemistry in the SMARTS. Setting to
False
can speed this up substantially.check_connectivity (bool) – Whether to check that all the atoms from the atom_subset (or entire structure if it is None) are from one molecule. Raise ValueError if it is not the case.
atom_subset (list) – List of atom indices. If None then SMARTS for full structure is computed
- Return type
str
- Returns
SMARTS pattern for the atom ids provided.
- schrodinger.application.matsci.rdpattern.to_smiles(struct, sanitize=True, include_stereo=True, atom_ids=None, fall_back=False)[source]¶
Get SMILES for a given structure
- Parameters
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedsanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. It also enable to include information about stereochemistry in the SMILES.
atom_ids (list) – list of atom indices. If None then SMARTS for full structure is computed
fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.
- Return type
str
- Returns
SMILES pattern for the atom ids provided
- schrodinger.application.matsci.rdpattern.has_query_stereo(query)[source]¶
Checks if rdkit molecule has stereo centers.
- Parameters
query (rdkit.Chem.rdchem.Mol) – Input molecule
- Return type
bool
- Returns
Whether molecule has stereo centers
- schrodinger.application.matsci.rdpattern.evaluate_smarts(struct, smarts, is_cg=None, sanitize=True, uniquify=True, max_matches=1000000000)[source]¶
Get the list of matches for the passed SMARTS pattern in the reference structure
- Parameters
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedsmarts (str) – SMARTS pattern to find
is_cg (bool or None) – Whether structure is CG. If None, perform a check
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
uniquify (bool) – if True, return only unique sets of matching atoms
max_matches (int) – the maximum number of matches to return
- Return type
list or None
- Returns
list of list atom/particle indices with matching SMARTS.
- schrodinger.application.matsci.rdpattern.evaluate_smarts_by_molecule(struct, smarts, uniquify=True, matches_by_mol=False, molecule_numbers=None)[source]¶
Takes a structure and a SMARTS pattern and returns a list of all matching atom indices, where each element in the list is a group of atoms that match the the SMARTS pattern.
- Parameters
struct (structure.Structure) – the structure to search
smarts (str) – the SMARTS pattern to match
uniquify (bool) – if True, return only unique sets of matching atoms
matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number
molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure
- Return type
list or dict
- Returns
For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule
- schrodinger.application.matsci.rdpattern.validate_smarts(smarts, struct=None, is_cg=None)[source]¶
Validate smarts. Works both with AA and CG.
- Parameters
smarts (str) – SMARTS to validate
struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Return type
str or None
- Returns
Error message on error, None if SMARTS is valid
- schrodinger.application.matsci.rdpattern.has_stereo_smarts(smarts, struct=None, is_cg=None)[source]¶
Check if SMARTS requires stereo. Works both with AA and CG.
- Type
str or list[str]
- Parameters
smarts – SMARTS to validate
struct (structure.Structure or None) – If None, validate as AA SMARTS. If present, validate either as AA or CG
is_cg (bool or None) – Whether structure is CG. If None, perform a check
- Return type
bool or list[bool]
- Returns
Whether SMARTS require stereo or not
- schrodinger.application.matsci.rdpattern.symbol_to_number(symbol)[source]¶
Transforms chemical symbols into the corresponding atomic numbers.
- Parameters
symbol (str) – Chemical symbol
- Return type
int/None
- Returns
Atomic numbers corresponding to entered symbol if valid atomic symbol is entered else return None
- class schrodinger.application.matsci.rdpattern.DictCache(maxcount=10000)[source]¶
Bases:
collections.OrderedDict
A first in first out dictionary cache which caches the key and associated value.
- __init__(maxcount=10000)[source]¶
Constructs a new lru cache.
- Parameters
maxcount (int) – The maximum number of data that the cache can hold
- __contains__(key, /)¶
True if the dictionary has the specified key, else False.
- __len__()¶
Return len(self).
- clear() None. Remove all items from od. ¶
- copy() a shallow copy of od ¶
- fromkeys(value=None)¶
Create a new ordered dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D’s items ¶
- keys() a set-like object providing a view on D’s keys ¶
- move_to_end(key, last=True)¶
Move an existing element to the end (or beginning if last is false).
Raise KeyError if the element does not exist.
- pop(k[, d]) v, remove specified key and return the corresponding ¶
value. If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem(last=True)¶
Remove and return a (key, value) pair from the dictionary.
Pairs are returned in LIFO order if last is true or FIFO order if false.
- setdefault(key, default=None)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D’s values ¶
- class schrodinger.application.matsci.rdpattern.Pattern(struct, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)[source]¶
Bases:
object
A class to calculate calculate SMARTS and SMILES pattern for a structure multiple times. The class can be memory intensive to allow for increased speed.
- PROTECTED_PATTERN_BIT = ['D', 'R', 'r', 'v', 'x', 'X', 'H']¶
- __init__(struct, is_cg=None, sanitize=True, include_stereo=True, fall_back=False)[source]¶
Initiate Pattern class
- Parameters
struct (
schrodinger.structure.Structure
) – Structure for which patterns need to be selectedis_cg (bool or None) – Whether structure is CG. If None, perform a check
sanitize (bool) – Whether RDKit sanitization should be performed. This option is not applicable for coarsegrained structures.
include_stereo (bool) – Whether the stereochemistry of the structure should be translated into the RDKit mol. Setting to
False
can speed this up substantially.fall_back (bool) – Ignored if sanitize=False. If sanitize=True, will fall back to using a non-sanitized structure if sanitization fails.
- property sanitized¶
Get whether the structure was sanitized or not
- Returns
sanitization status of the structure
- Return type
bool
- property smiles¶
Get the SMILES for the passed structure
- Returns
SMILES pattern for the passed structure
- Return type
str
- getPattern(atom_ids=None, is_smiles_requested=False, isomeric=True)[source]¶
Get SMILES/SMARTS for full structure or for substructure of given atom ids.
- Parameters
atom_ids (list) – list of atom indices
is_smiles_requested (bool) – return Smiles pattern if True else Smarts
isomeric (bool) – include information about stereochemistry in the SMILES/SMARTS
- Return type
str
- Returns
SMILES pattern for the atom ids provided
- toSmiles(atom_ids=None, isomeric=True)[source]¶
Get SMILES for subset of atom ids
- Parameters
atom_ids (iterable) – atom indices
isomeric (bool) – include information about stereochemistry in the SMILES
- Return type
str
- Returns
SMILES pattern for the atom ids provided
- property smarts¶
Get the SMARTS for the passed structure
- Returns
SMARTS pattern for the passed structure
- Return type
str
- toSmarts(atom_ids=None, isomeric=True)[source]¶
Get SMARTS for subset of atom ids
- Parameters
atom_ids (iterable) – atom indices
isomeric (bool) – include information about stereochemistry in the SMARTS
- Return type
str
- Returns
SMARTS pattern for the atom ids provided
- evaluateSmiles(smiles, uniquify=True, use_chirality=True, max_matches=1000000000)[source]¶
Get the list of matches for the passed SMILES pattern in the reference structure
- Parameters
smiles (str) – SMILES pattern to find
uniquify (bool) – if True, return only unique sets of matching atoms
use_chirality (bool) – enables the use of stereochemistry in the matching
max_matches (int) – the maximum number of matches to return
- Return type
list or None
- Returns
list of list atom indices with matching SMILES.
- evaluateSmarts(smarts, uniquify=True, use_chirality=True, max_matches=1000000000)[source]¶
Get the list of matches for the passed SMARTS pattern in the reference structure
- Parameters
smarts (str) – SMARTS pattern to find
uniquify (bool) – if True, return only unique sets of matching atoms
use_chirality (bool) – enables the use of stereochemistry in the matching
max_matches (int) – the maximum number of matches to return
- Return type
list or None
- Returns
list of list atom/particle indices with matching SMARTS.
- static patternTranslate(s_pattern, mapper)[source]¶
Replace passed SMARTS/SMILES such that the mapper key values are replaced by
- Parameters
s_pattern (str) – The SMARTS/SMILES pattern to change
mapper (dictionary where the key is the element name to find in the pattern and value is the name to replace it with) – The mapper used to convert the SMARTS/SMILES pattern
- Returns
the converted SMARTS/SMILES pattern
- Return type
str
- proxyToParticleName(smarts)[source]¶
If the structure is a coarse-grained structure convert the SMARTS pattern of proxy elements to coarse grain particle name. Does nothing for atomistic structures
- Parameters
smarts (str) – The SMARTS pattern
- Returns
The translated SMARTS pattern
- Return type
str
- particleNameToProxy(smarts)[source]¶
If the structure is a coarse grain structure convert the SMARTS pattern of coarse grain particle name to proxy element name. Does nothing for atomistic structures
- Parameters
smarts (str) – The SMARTS pattern
- Returns
The translated SMARTS pattern
- Return type
str
- toRdIndices(particle_indices)[source]¶
Convert list of Schrodinger structure particle indices to RDMol atom indices
- Parameters
particle_indices (tuple) – tuple of Schrodinger structure particle indices
- Return type
list
- Returns
list of RDMol atom indices
- toStIndices(particle_indices)[source]¶
Convert list of RDMol atom indices to Schrodinger structure particle indices
- Parameters
particle_indices (tuple) – tuple of RDMol atom indices
- Return type
list
- Returns
list of Schrodinger structure particle indices
- getMoleculeSmiles()[source]¶
Get SMILES for each molecule in the structure
- Returns
The dictionary where the key is the molecule number and the value is the corresponding SMILES pattern
- Return type
dict
- getMoleculeSmarts()[source]¶
Get SMARTS for each molecule in the structure
- Returns
The dictionary where the key is the molecule number and the value is the corresponding SMARTS pattern
- Return type
dict
- getUniqueMolNums(use_smarts=False)[source]¶
Get the unique representative molecules in the structure
- Parameters
use_smarts (bool) – If true the unique molecules will share the same SMARTS pattern. If false the unique molecules will share the same SMILES pattern.
- Return type
list
- Returns
list of molecule numbers that are unique