schrodinger.application.pathfinder.multiroute module¶
Functions to support multi-route enumeration (AKA “simple reaction enumeration” or “automated reaction enumeration”).
- class schrodinger.application.pathfinder.multiroute.MultiRouteEnumerator(mol, reactions_dict, *, dedup=True, depth=None, descriptors='MolLogP, MolWt, NumChiralCenters, NumHAcceptors, NumHDonors, TPSA', frozen_atoms=frozenset({}), libpath=None, max_per_route=1000, max_routes=100, no_core_hopping=False, product_property_filter_file=None, product_smarts_filter_file=None, ref_mols=None, ch_dist_tol=1.0, ch_ang_tol=15.0, bond_reactions=None, prefilter_reagents=None, fp_dir=None, fp_url=None, fp_namespace=None, forward=False, prng=<module 'random' from '/scr/buildbot/savedbuilds/2023-4/NB/build-133/internal/lib/python3.8/random.py'>, route_prng=None, **unused_args)¶
Bases:
object
A generator of products following the multiroute enumeration protocol.
- __init__(mol, reactions_dict, *, dedup=True, depth=None, descriptors='MolLogP, MolWt, NumChiralCenters, NumHAcceptors, NumHDonors, TPSA', frozen_atoms=frozenset({}), libpath=None, max_per_route=1000, max_routes=100, no_core_hopping=False, product_property_filter_file=None, product_smarts_filter_file=None, ref_mols=None, ch_dist_tol=1.0, ch_ang_tol=15.0, bond_reactions=None, prefilter_reagents=None, fp_dir=None, fp_url=None, fp_namespace=None, forward=False, prng=<module 'random' from '/scr/buildbot/savedbuilds/2023-4/NB/build-133/internal/lib/python3.8/random.py'>, route_prng=None, **unused_args)¶
- Parameters
mol (rdkit.Chem.Mol) – input molecule to be used for the initial retrosynthetic or forward analysis.
dedup (bool) – skip duplicate products (using SMILES for comparison)
depth (int or NoneType) – analysis depth (if None, increasing depths will be attempted until enough routes are found)
descriptors (list of str) – names of RDKit descriptors to compute for each product
frozen_atoms (set of int) – indexes (1-based) of atoms to keep in the product
libpath (list of str) – directories to search for reactant files
max_per_route (int) – maximum number of products per route
max_routes (int) – maximum number of routes to sample
no_core_hopping (bool) – don’t use the special core hopping mode even when possible
product_property_filter_file (str) – name of JSON file with product property filters
product_smarts_filter_file (str) – name of .cflt file with SMARTS patterns
ref_mols (list(Mol) or str) – reference molecules for similarity calculations
ch_dist_tol (float) – core-hopping distance tolerance in Angstroms (maximum allowed change in the distance between side chains, relative to the input structure)
ch_ang_tol (float) – core-hopping angle tolerance in degrees (maximum change bond vector angle for side chains, relative to the input structure)
bond_reactions – dict specifying which reactions are allowed to break certain bonds. Keys are tuples of two ints (sorted atom indexes); values are sets of reaction names.
bond_reactions – {(int, int): set(str)}
prefilter_reagents (int or NoneType) – number of most similar molecules to return.
fp_dir (str or NoneType) – directory to search for fingerprint files in addition to CWD
forward (bool) – use forward analysis mode (routes start from
mol
instead of ending there)prng (random.Random) – pseudo-random number generator to be used for the enumeration, for picking the route and reactants to try at each iteration
route_prng (random.Random or NoneType) – pseudo-random number generator to be used for selecting the subset of routes to use, based on max_routes. If not supplied,
prng
will be used.
- generate_mols()¶
- Return type
generator of Mol
- schrodinger.application.pathfinder.multiroute.get_fp_file(reactant_file, cache_dir, url_base=None, subdir='')¶
Get a fingerprint file for a given reactant file. First look at the cache dir; if not found, and a URL is supplied, try to download it from the server and write it to the cache dir.
For a reactant file named foo.pfx, the fingerprint file must be named foo-<sha1>.fp, where <sha1> is the SHA1 hash of foo.pfx.
- Parameters
reactant_file (str) – reactant file for which we are looking for fingerprints
cache_dir (str) – local directory where fingerprint files are/will be stored
- Url_base
base URL to try to download fingerprint files from
- Subdir
optionally, subdirectory of cache_dir and URL to use
- Returns
path to fingerprint file, if found; else None
- Return type
str or NoneType
- schrodinger.application.pathfinder.multiroute.download_and_decompress(url, dest)¶
GET a gzip-compressed file from a URL and write it out, decompressed to the local filesystem. The contents are first downloaded to a temporary file and then renamed atomically. A locking mechanism is employed to try to prevent concurrent downloads of the same file.
- Parameters
url (str) – URL to download
dest (str) – destination filename
- Returns
dest
if successful, else None (e.g. in case of 404)- Return type
str or NoneType
- schrodinger.application.pathfinder.multiroute.get_fp_basename(reactant_file)¶
Return the basename of the fingerprint file corresponding to the given reactant file, following the convention that for a reactant file named foo.pfx, the fingerprint file must be named foo-<sha1>.fp, where <sha1> is the SHA1 hash of foo.pfx.
- Parameters
reactant_file (str) – reactant file
- Returns
basename of fingerprint file
- Return type
str
- schrodinger.application.pathfinder.multiroute.get_sha1(filename)¶
Return the SHA1 hash of a file.
- Parameters
filename (str) – input file
- Returns
hex SHA1 digest of file contents
- Return type
str
- schrodinger.application.pathfinder.multiroute.get_lock(basename, max_wait, interval=1.0)¶
Create a <basename>.lock file on entry and remove it on exit. If the file already exists, wait up to
max_wait
seconds for the lock to clear. If the timeout is exceeded, assume that the lock is stale and ignore it.This is a very rudimentary mechanism, but is good enough for the purposes of this module, which is just to _try_ to prevent simultaneous downloads of the same file, but where occasional collisions don’t hurt beyond the slight waste of bandwith and temporary use of disk space.
- Parameters
basename (str) – basename of lock file
max_wait (float) – maximum wait in seconds
interval (float) – time to sleep between attempts, in seconds
- Returns
context manager that removes lock file on exit
- Return type
contextlib._GeneratorContextManager
- schrodinger.application.pathfinder.multiroute.has_variable_reactants(route)¶
Check if the route has at least one variable reactant.
- Parameters
route (schrodinger.application.pathfinder.route.RouteNode) – route to analyze
- Returns
does the route have at least one variable reactant?
- Return type
bool
- schrodinger.application.pathfinder.multiroute.is_core_sm(sm, core_atoms)¶
Check if a starting material node corresponds to a core.
- Parameters
sm (schrodinger.application.pathfinder.route.ReagentNode) – starting material
core_atoms (set of int) – core atom indices (1-based)
- Return type
bool
- schrodinger.application.pathfinder.multiroute.meta_sample(samples, dedup=True, prng=<module 'random' from '/scr/buildbot/savedbuilds/2023-4/NB/build-133/internal/lib/python3.8/random.py'>)¶
A generator that, on each cycle, picks a random element of
samples
and yields the next element from said sample. It never stops unless all samples raise StopIteration.Each product gets annotated with properties representing the route that was used to make the molecule.
- Parameters
samples (list of iterator of Mol) – molecule samples
dedup (bool) – skip duplicate products
prng (random.Random) – pseudo-random number generator
- Returns
molecule generator
- Return type
generator of rdkit.Chem.Mol
- schrodinger.application.pathfinder.multiroute.measure_vectors(st, r1, c1, r2, c2)¶
Measure the distance and angle between two bond vectors. The distance is measured between atoms r1 and r2; the angle is between the c1-r1 and c2-r2 vectors.
- Parameters
st (schrodinger.structure.Structure) – structure to measure
r1 (int) – R-group attachment atom 1
c1 (int) – core atom 1
r2 (int) – R-group attachment atom 2
c2 (int) – core atom 2
- Returns
distance and angle
- Return type
float, float
- schrodinger.application.pathfinder.multiroute.st_from_mol(mol)¶
Convert a Mol into a Structure, with 3D coordinates but no added hydrogens. Missing stereochemistry is tolerated.
- Parameters
mol (rdkit.Chem.rdchem.Mol) – molecule
- Returns
Structure
- Return type
- schrodinger.application.pathfinder.multiroute.is_core(graph, free_component, frozen_components)¶
Check if the
free_component
subgraph should be considered a core, meaning that it is connected to more than one of thefree_components
.- Parameters
graph (networkx.classes.graph.Graph) – molecular graph
free_component (set of int) – possible core atom indices
frozen_components (list of set of int) – possible sidechain atom indexes
- Returns
is it a core?
- Return type
bool
- schrodinger.application.pathfinder.multiroute.apply_similarity_filters(products, args)¶
Implement the -sim_keep_percent and -sim_discard_percent functionality.
- Parameters
products (iterator of rdkit.Chem.Mol) – molecules to filter
args (
argparse.Namespace
) – command-line arguments
- Returns
filtered products and number of products to keep
- Return type
generator of rdkit.Chem.Mol, int
- schrodinger.application.pathfinder.multiroute.analyze_frozen_atoms(mol, frozen_atoms)¶
Examine the molecular graph of the input molecule to partition it, based on the set of frozen atoms, into free regions and frozen regions. Also determine which free region is the core, if any.
A core in this context is a contiguous set of non-frozen atoms which is adjacent to two or more sets of frozen atoms (the side chains).
Jobs with two or more cores will abort immediately.
When there is a core, also measure the distances and angles between all the pairs of vectors leading from the core to the side chains. The resulting dict has pairs of atoms as keys, and (distance, angle) tuples as values.
In addition to the set of core atoms, a set of “core neighbors” is also returned. These are the non-core atoms that are directly connected to the core.
- Parameters
mol (rdkit.Chem.rdchem.Mol) – input molecule
frozen_atoms (set of int) – frozen atom indices
- Returns
measurements, core atoms, core neighbors
- Return type
dict {(int, int): (float, float)}, set of int, set of int
- schrodinger.application.pathfinder.multiroute.generate_mols(*a, **d)¶
A generator of products following the multiroute enumeration protocol.
This is a thin functional wrapper around the MultiRouteEnumerator class. For arguments, see MultiRouteEnumerator.__init__.
- Return type
generator of Mol
- schrodinger.application.pathfinder.multiroute.write_products(products, filename, max_products, has_frozen_atoms=False)¶
Write out up to
max_products
from a mol iterator to a file.- Parameters
products (iterator of Mol) – molecules to write
filename (str) – filename
max_products (int) – maximum number of structures to write
has_frozen_atoms (bool) – True if user specified frozen atoms
- schrodinger.application.pathfinder.multiroute.get_output_writer(align_products, filename)¶
If we’re generating maestro structures (.mae, mae.gz, .maegz we need to generate structures, align, them, and then write them out. If not, we can write products out with the MolWriter.
- Parameters
args (argparse.Namespace) – input arguments used to create the output sink
- Align_products
flag indicating product alignment