schrodinger.structutils.analyze module

Functions for analyzing Structure objects.

AslLigandSearcher is a class that identifies putative ligands in a structure. Each putative found ligand is contained in a Ligand instance.

There are also a number of functions for using SMARTS, ASL, and SMILES (e.g. evaluate_smarts_canvas or generate_smiles). Other functions return information about a structure (i.e. get_chiral_atoms or hydrogens_present). There are also several SASA (Solvent Accessible Surface Area) functions (i.e. calculate_sasa_by_atom and calculate_sasa_by_residue and calculate_sasa).

See also the discussion in the Python API overview.

@copyright: Schrodinger, LLC. All rights reserved.

schrodinger.structutils.analyze.get_chiral_atoms(structure)

Return a dictionary of chiral atoms, for which the key is the atom index and the value is one of the following strings: “R”, “S”, “ANR”, “ANS”, “undef”.

ANR and ANS designate “chiralities” of non-chiral atoms that are important for determining the structure of the molecule (ex: cis/trans rings).

Parameters

structure (Structure) – Chirality of atoms within this structure will be determined.

Return type

dict

Returns

Dictionary of chiralities keyed off atom index.

schrodinger.structutils.analyze.get_chiralities(structure)

Return a dictionary of chiral atoms, for which the key is the atom index and the value is a tuple of one of the following strings: “R”, “S”, “ANR”, “ANS”, “undef” and the list of CIP ranked neighbors.

ANR and ANS designate “chiralities” of non-chiral atoms that are important for determining the structure of the molecule (ex: cis/trans rings).

Parameters

structure (Structure) – Chirality of atoms within this structure will be determined.

Return type

dict

Returns

Dictionary of chiralities keyed off atom index.

schrodinger.structutils.analyze.enforce_rdkit_smarts(smarts_func)
schrodinger.structutils.analyze.evaluate_smarts(structure, smarts_expression, verbose=False, first_match_only=False, unique_sets=False, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.evaluate_smarts instead.

schrodinger.structutils.analyze.validate_smarts(smarts, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.validate_smarts instead.

schrodinger.structutils.analyze.count_atoms_in_smarts(smarts, *, use_rdkit=False)

Return the number of atoms that the given SMARTS pattern has.

Parameters

smarts (str) – SMARTS pattern

Returns

Number of atoms in the pattern

Return type

int

Raises

ValueError is the pattern is invalid.

schrodinger.structutils.analyze.lazy_canvas_import()

Initialize _canvas and smiles variables. Since the canvas modules may take some time to import, before it an function loading time.

schrodinger.structutils.analyze.validate_smarts_canvas(smarts, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.validate_smarts instead.

schrodinger.structutils.analyze.evaluate_smarts_canvas(structure, smarts, stereo='annotation_and_geom', start_index=1, uniqueFilter=True, allowRelativeStereo=False, rigorousValidationOfSource=False, hydrogensInterchangeable=False, multiple_smarts=False, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.evaluate_smarts instead.

schrodinger.structutils.analyze.evaluate_smarts_by_molecule(structure, smarts, timing_data=None, canvas=True, use_rdkit=False, matches_by_mol=False, molecule_numbers=None, **kwargs)

Takes a structure and a SMARTS pattern and returns a list of all matching atom indices, where each element in the list is a group of atoms that match the the SMARTS pattern. The advantage of this function over evaluate_smarts_canvas is that it does a SMARTS match for each molecule in a structure rather than over the entire structure at once. SMARTS evaluation scales as N^2 with the size of the structure searched. Doing many SMARTS evaluations over small molecules will have a significant speedup over one SMARTS evaluation over a composite structure. The return value of this function is identical to the return value of the evaluate_smarts_canvas function (or evaluate_smarts function if canvas=False) with the possible exception of the order of the matches. Do not use this function if the SMARTS match can span molecules. This simply fails to match invalid SMARTS patterns and also discards any empty matches.

Additional keyword arguments are passed to the SMARTS matching function

Parameters
  • structure (structure.Structure) – the structure to search

  • smarts (str) – the SMARTS pattern to match

  • timing_data (dict or None) – If supplied this dict will be filled with timing data for the SMARTS finding. Data will be recorded for each molecule searched. Keys will be the number of atoms in a molecule, each value will be a list. Each item in the list will be the time in seconds it took to search a molecule with that many atoms.

  • canvas (bool) – If True, use Canvas SMARTS matching, if False, use mmpatty

  • use_rdkit (bool) – Whether to use RDKIT. Cannot be used together with canvas

  • matches_by_mol (bool) – if True then rather than returning a list of matches return a dictionary of matches key-ed by molecule number

  • molecule_numbers (set) – set of molecule numbers in the structure to be used instead of the entire structure

Return type

list or dict

Returns

For the list (if matches_by_mol is False) each value is a list of atom indices matching the SMARTS pattern, for the dict (if matches_by_mol is True) keys are molecule indices and values are lists of matches for that molecule

schrodinger.structutils.analyze.evaluate_multiple_smarts(structure, smarts_list, verbose=False, first_match_only=False, unique_sets=False, keep_nested=False, *, use_rdkit=False)

Search for multiple SMARTS substructures in Structure structure.

Return a list of lists of ints. Each list of ints is a list of atom indices matching a SMARTS pattern. The multiple SMARTS patterns are combined into one list.

Parameters
  • structure (Structure) – Structure to search for matching substructures.

  • smarts_list (list) – List of SMARTS patterns to look for.

  • verbose (bool) – If True, print additional progress reports from the C implementation.

  • first_match_only (bool) – If False, return all matches for a given starting atom - e.g. [1, 2, 3, 4, 5, 6] and [1, 6, 5, 4, 3, 2] from atom 1 of benzene with the smarts_expression ‘c1ccccc1’. If True, return only the first match found for a given starting atom. Note that setting first_match_only to True does not affect matches with different starting atoms - i.e. benzene will still return six lists of ints for ‘c1ccccc1’, one for each starting atom. To match only once per set of atoms, use unique_sets=True. Note also that setting first_match_only to True does not guarantee that all matching atoms will be found.

  • unique_sets (bool) – If True, the returned list of matches will contain a single (arbitrary) match for any given set of atoms. If False, return the uniquely ordered matches, subject to the behavior specified by the first_match_only parameter.

Return type

list

Returns

Each value is a list of atom indices matching the SMARTS patterns.

schrodinger.structutils.analyze.evaluate_substructure(st, subs_expression, first_match_only=False, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.evaluate_smarts instead

schrodinger.structutils.analyze.generate_asl(st, atom_list)

Generate and return an atom expression for the atoms in Structure st which are listed in atom_list. The ASL expression will be as compact as possible using mol, res and atom expressions where appropriate.

Parameters
  • st (Structure) – Structure holding the ASL atoms.

  • atom_list (list) – List of indices of atoms for which ASL is desired.

Return type

str

Returns

ASL compactly describing the atoms in atom_list.

schrodinger.structutils.analyze.generate_residue_asl(residues)

Create an ASL representing the residues.

Inscode will only be included if at least one of the residues has a non-blank inscode.

Parameters

residues (collections.abc.Iterable(structure._Residue)) – Residue objects to create ASL

Return type

str

schrodinger.structutils.analyze.validate_asl(asl)

Validate the given ASL expression. This is useful for validating an ASL when a structure object is not available - for example when validating a command line option. NOTE: A warning is also printed to stdout if the ASL is not valid.

Parameters

asl (str) – ASL expression.

Returns

True if ASL is valid, False otherwise.

Return type

bool

schrodinger.structutils.analyze.evaluate_asl(st, asl_expr)

Search for substructures matching the ASL (Atom Specification Language) string asl_expr in Structure st.

Parameters
  • st (Structure) – Structure to search for matching substructures.

  • asl_expr (str) – ASL search string.

Return type

list

Returns

List containing indices of matching atoms.

Raises

schrodinger.infra.mm.MmException – If the ASL expression is invalid.

schrodinger.structutils.analyze.get_atoms_from_asl(st, asl_expr)

Return atoms matching the ASL string asl_expr in Structure st.

Parameters
  • st (Structure) – Structure to search for matching substructures.

  • asl_expr (str) – ASL search string.

Return type

generator

Returns

Generator of matching StructureAtom objects.

Raises

schrodinger.infra.mm.MmException – If the ASL expression is invalid.

schrodinger.structutils.analyze.hydrogens_present(st)

Return True if all hydrogens are present in Structure st, False otherwise.

Since all modern force fields require hydrogens, this is a good check to make sure that a structure is ready for force field calculations. This function is implemented by checking to see if the structure can be used as-is in a calculation with OPLS2003.

Warning

Requires atom types to be correct. Consider calling {Structure.retype} first.

Parameters

st (Structure) – Structure to be tested.

Return type

bool

Returns

Are all hydrogens are present?

schrodinger.structutils.analyze.has_valid_lewis_structure(st: schrodinger.structure._structure.Structure) bool

Check whether a valid Lewis structure for the structure is possible. Possible causes of an invalid Lewis structure may be invalid bond orders, charges, or missing hydrogens or other atoms.

This check may be useful before attempting to run any backend calculations on the given structure.

schrodinger.structutils.analyze.generate_tautomer_code(st, considerEZStereo=True, considerRSStereo=True, stereo='annotation_and_geom', strip=False)
Deprecated

schrodinger.structutils.analyze.create_chmmol_from_structure(structure, stereo='annotation_and_geom')

Creates a ChmMol object for a given structure and returns the ChmMol.

Deprecated; use schrodinger.adapter.to_rdkit() and other RDKit API instead. ChmMol is deprecated in favor of RDKit.

Parameters
Return schrodinger.application.canvas.base.ChmMol

The created ChmMol.

schrodinger.structutils.analyze.generate_smiles(st, unique=True, stereo='annotation_and_geom')
Deprecated

Use schrodinger.adapter.to_smiles instead.

schrodinger.structutils.analyze.generate_smarts(st, atom_subset=None, check_connectivity=True, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.to_smarts instead.

schrodinger.structutils.analyze.generate_smarts_canvas(st, atom_subset=None, check_connectivity=True, include_hydrogens=False, honor_maestro_prefs=False, include_stereo=False, *, use_rdkit=False)
Deprecated

Use schrodinger.adapter.to_smarts instead.

schrodinger.structutils.analyze.can_atom_hydrogen_bond(atom)

Returns True if the given atom can be involved in a hydrogen bond.

Parameters

atom (structure._StructureAtom) – Atom in question

Returns

Whether atom can H-bond

Return type

bool

schrodinger.structutils.analyze.generate_crystal_mates(st, radius=10.0, space_group=None, a=None, b=None, c=None, alpha=None, beta=None, gamma=None, group_radius=14.0)

Generate crystal mates for the input Structure st.

Return a list of structures that represent the crystal mates. (Note that the first item in the list represents the identity transformation and as such will be identical to the input structure.)

All crystal mates within radius of the input structure are generated.

The crystal parameters can be specified as parameters to this function or can be standard PDB properties of the input structure. If the structure was read from a PDB file then these crystal properties will usually be present.

The group_radius is used in the crystal mates calculation to determine whether a symmetric element is in contact with the ASU. There should be little reason to change the default value of 14.0.

Parameters
  • st (Structure) – Structure for which crystal mates will be generated.

  • radius (float) – Distance within which to generate crystal mates.

  • space_group (str) – Space group of the crystal. If None, uses st’s s_pdb_PDB_CRYST1_Space_Group.

  • a (float) – Crystal ‘a’ length. If None, uses st’s s_pdb_PDB_CRYST1_a.

  • b (float) – Crystal ‘b’ length. If None, uses st’s s_pdb_PDB_CRYST1_b.

  • c (float) – Crystal ‘c’ length. If None, uses st’s s_pdb_PDB_CRYST1_c.

  • alpha (float) – Crystal ‘alpha’ angle. If None, uses st’s s_pdb_PDB_CRYST1_alpha.

  • beta (float) – Crystal ‘beta’ angle. If None, uses st’s s_pdb_PDB_CRYST1_beta.

  • gamma (float) – Crystal ‘gamma’ angle. If None, uses st’s s_pdb_PDB_CRYST1_gamma.

  • group_radius (float) – Used to determine whether a symmetric element is in contact with the ASU. There should be little reason to change the default value of 14.0.

Return type

list

Returns

list of Structure objects that represent the crystal mates. (Note that the first item in the list represents the identity transformation and as such will be identical to the input structure.)

schrodinger.structutils.analyze.find_overlapping_atoms(st, ignore_hydrogens=False, ignore_waters=False, dist_threshold=0.8)

Search the specified structure for overlapping atoms. Returns a list of (atom1index, atom2index) tuples.

Parameters
  • st (Structure) – Structure to search for overlapping atoms

  • ignore_hydrogens – Whether to ignore hydrogens.

  • ignore_waters – Whether to ignore waters.

  • dist_threshold – Atoms are considered overlapping if their centers are within this distance of each other.

Return type

list

Returns

Each value is a tuple containing the indices of overlapping atoms.

schrodinger.structutils.analyze.generate_molecular_formula(st)

Return a string for the molecular formula in Hill notation for the st. The structure must contain only one molecule.

Parameters

st (Structure) – Find the molecular formula for this structure. Must contain only one molecule.

Return type

str

Returns

The molecular formula for st.

schrodinger.structutils.analyze.is_bond_rotatable(bond, *, allow_methyl=False)

Return True if specified bond is rotatable, False otherwise.

A bond is considered rotatable if all of the following are true…

  1. It is a single bond.

  2. It is not adjacent to a triple bond.

  3. It is not in a ring.

  4. Neither atom is a hydrogen or other terminal atom.

  5. Neither atom is a carbon or nitrogen with three hydrogens attached.

Parameters
  • bond (structure.StructureBond) – bond to test for rotatability.

  • rings (list) – List of ring atom index lists. As an optimization, provide the (sorted) rings list from the find_rings function if you already have it. Otherwise, an SSSR calculation will be done.

  • allow_methyl (bool) – allow -CH3 and -NH3 as rotatable bonds, if True

Return type

bool

Returns

Is the bond rotatable?

schrodinger.structutils.analyze.rotatable_bonds_iterator(st, *, max_size=None, allow_methyl=False)

Return an iterator for rotatable bonds (atomnum1, atomnum2) in the structure.

See the is_bond_rotatable function description for which bonds are considered rotatable.

Parameters
  • st (Structure) – The structure to search for rotatable bonds.

  • rings (list) – List of ring atom index lists. As an optimization, provide the (sorted) rings list from the find_rings function if you already have it. Otherwise, an SSSR calculation will be done.

  • max_size (int) – If specified, yield only rings that have up to this number of bonds. Use this option to exclude large rings; e.g. those in macrocycle-like molecules.

  • allow_methyl (bool) – allow -CH3 and -NH3 as rotatable bonds, if True

Return type

iterator of tuples

Returns

yields tuples of atom index pairs describing a rotatable bond in st.

schrodinger.structutils.analyze.get_num_rotatable_bonds(st, *, max_size=None)

Return the number of rotatable bonds in the Structure st. The count does not include trivial rotors such as terminal methyls, or rotors within rings.

Parameters
  • st (Structure) – The structure to search for rotatable bonds.

  • rings (list of lists of ints) – List of ring atom index lists. As an optimization, provide the (sorted) rings list from the find_rings function if you already have it. Otherwise, an SSSR calculation will be done.

  • max_size (int) – If specified, yield only rings that have up to this number of bonds. Use this option to exclude large rings; e.g. those in macrocycle-like molecules.

Return type

int

Returns

The number of rotatable bonds in st.

schrodinger.structutils.analyze.hbond_iterator(st, atoms=None)

Iterate over hydrogen bond between the atoms specified by the atom_set and the other atoms in st. Each yielded item is a tuple of (atom-index-1, atom-index-2).

NOTE: This function has been updated to simply act as a wrapper to hbond.get_hydrogen_bonds to ensure that hbonds are determined consistently.

Parameters
  • st (Structure) – The structure to search for H-bonds.

  • atoms (list of int or None) – A list of atom indices (or _StructureAtom objects) to analyze. If not specified, then all H-bonds present in the structure are returned.

Return type

list of (_StructureAtom, _StructureAtom)

Returns

list of (donor atom object, acceptor atom object) for each hydrogen bond identified.

schrodinger.structutils.analyze.find_equivalent_atoms(st, span_molecules=True)

Find atoms in the structure that are equivalent. For example, all three hydrogens on a methyl group are equivalent.

Returns a list, each value of which is a list of atoms that are equivalent.

Parameters
  • st (Structure) – The structure to search for equivalent atoms.

  • span_molecules (bool) – If True, don’t consider molecules to be separate entities. If False, constructs global equivalence classes for all atoms in a ct, but will never return an equivalence class across molecules.

Return type

list

Returns

Each value is a list of indices of equivalent atoms.

schrodinger.structutils.analyze.get_approximate_sasa(st, atom_indexes=None, cutoff=7.0)
Deprecated

This function only returns a rough approximation to the solvent accessible surface area. Please use the calculate_sasa function instead.

schrodinger.structutils.analyze.get_approximate_atomic_sasa(st, iat, cutoff=7.0, sasa_probe_radius=1.4, hard_sphere_s=2.5, scale_factor=2.32)
Deprecated

Deprecated in favor of calculate_sasa, which is more accurate.

schrodinger.structutils.analyze.calculate_sasa_by_atom(st, atoms=None, cutoff=8.0, probe_radius=1.4, resolution=0.2, exclude_water=False)

Calculate the solvent-accessible surface area (SASA) for the whole structure, or an atom subset, and returns a list of floats.

Parameters
  • st (Structure) – Structure for which SASA is desired.

  • atoms (list) – List of atom indices or of _StructureAtom objects for the atoms to count. If None, calculates SASA for all atoms. (default: None)

  • cutoff (float) – Atoms within this distance of atoms will be considered for occlusion. Requires atoms to be specified. (default: 8.0A)

  • probe_radius (float) – Probe radius, in units of angstroms, of the solvent molecule. (default: 1.4A)

  • resolution (float) – Resolution to use. Decreasing this number will yield better results, increasing it will speed up the calculation. NOTE: This is NOT the same option as Maestro’s surface resolution, which uses a different algorithm to calculate the surface area. (default: 0.2)

  • exclude_water (bool) – If set to True then explicitly exclude waters in the method. This option is only works when ‘atoms’ argument is passed. (default: False)

Return type

list

Returns

A list of solvent accessible surface area of selected atoms in st.

schrodinger.structutils.analyze.calculate_sasa_by_residue(st, atoms=None, cutoff=8.0, probe_radius=1.4, resolution=0.2, exclude_water=False)

Calculate the solvent-accessible surface area (SASA) for the whole structure, or an atom subset, and then group them by residue.

Parameters
  • st (Structure) – Structure for which SASA is desired.

  • atoms (list) – List of atom indices or of _StructureAtom objects for the atoms to count. If None, calculates SASA for all atoms. (default: None)

  • cutoff (float) – Atoms within this distance of atoms will be considered for occlusion. Requires atoms to be specified. (default: 8.0A)

  • probe_radius (float) – Probe radius, in units of angstroms, of the solvent molecule. (default: 1.4A)

  • resolution (float) – Resolution to use. Decreasing this number will yield better results, increasing it will speed up the calculation. NOTE: This is NOT the same option as Maestro’s surface resolution, which uses a different algorithm to calculate the surface area. (default: 0.2)

  • exclude_water (bool) – If set to True then explicitly exclude waters in the method. This option is only works when ‘atoms’ argument is passed. (default: False)

Return type

list

Returns

a list of solvent accessible surface area of residues (ordered by connectivity) within st.

schrodinger.structutils.analyze.calculate_sasa(st, atoms=None, cutoff=8.0, probe_radius=1.4, resolution=0.2, exclude_water=False, exclude_atoms=None)

Calculate the solvent-accessible surface area (SASA) for the whole structure, or an atom subset.

Parameters
  • st (Structure) – Structure for which SASA is desired.

  • atoms (list) – List of atom indices or of _StructureAtom objects for the atoms to count. If None, calculates SASA for all atoms. (default: None)

  • cutoff (float) – Atoms within this distance of atoms will be considered for occlusion. Requires atoms to be specified. (default: 8.0A)

  • probe_radius (float) – Probe radius, in units of angstroms, of the solvent molecule. (default: 1.4A)

  • resolution (float) – Resolution to use. Decreasing this number will yield better results, increasing it will speed up the calculation. NOTE: This is NOT the same option as Maestro’s surface resolution, which uses a different algorithm to calculate the surface area. (default: 0.2)

  • exclude_water (bool) – If set to True then explicitly exclude waters in the method. This option is only works when ‘atoms’ argument is passed. (default: False)

  • exclude_atoms (list) – aid of atoms that you don’t want in SASA caluclation useful for FEP-type systems where a second ‘image’ of a molecules is present

Return type

float

Returns

The solvent accessible surface area of the selected atoms within st.

schrodinger.structutils.analyze.calc_buried_sasa_by_residue(st, group1_atoms, group2_atoms, resolution=0.5)

Calculate the buried SASA ratio (delta SASA upon binding) for each residue, which measures how much of the residue’s surface is interacting with the other binding partner. Value of 1.0 means that all of residue’s area, as calculated in a subunit, is no longer accessible to solvent when in complex with the other group. Value of 0.0 means that all of that residue is accessible as a complex as well (residue is not on the interaction surface). Value of 0.0 is also used for residues that are fully buried in their subunits.

Parameters
  • st (structure.Structure) – Structure object

  • group1_atoms (list of ints) – Atoms of the first binding partners group.

  • group2_atoms (list of ints) – Atoms of the second binding partners group.

  • resolution (float) – Resolution to use. See calculate_sasa_by_atom().

Returns

Dictionary where keys are residue strings (e.g. “A:123”), and values are the buried SASA ratio for that residue (0.0-1.0).

Return type

dict

schrodinger.structutils.analyze.find_ligands(st, **kwargs) list

Simple function interface for AslLigandSearcher class.

Parameters

st (Structure) – Structure to search.

Return type

list

Returns

a list of Ligand instances for putative ligands within st.

schrodinger.structutils.analyze.center_of_mass(st, atom_indices: Optional[list] = None)

Gets the structure’s center of mass. If specified, this can be limited to a subset of atoms.

NOTE: Periodic boundary conditions (PBC) are NOT honored.

Parameters
Returns

centroid given as 3-element array [x, y, z]

Return type

numpy.array(float)

See schrodinger/geometry/centroid.h

schrodinger.structutils.analyze.radius_of_gyration(st: structure.Structure, atom_indices: Optional[List[int]] = None, mass_weighted: bool = False) float

Calculate radius of gyration (R_gyr or Rg) is a measure of the size of an object of arbitrary shape.

NOTE: Periodic boundary conditions (PBC) are NOT honored. :return: float value in Angstrom units

schrodinger.structutils.analyze.calculate_principal_moments(struct=None, atoms=None, massless=False)

Calculate the principal moments of inertia for a list of atoms. This is calculated with respect to the x, y, and z coordinates of the atom’s center of mass.

Parameters
  • struct (Structure) – If given the moments will be calculated for the entire structure. This overrides any atoms given with the atoms keyword. Either atoms or structure must be given.

  • atoms (list) – list of schrodinger.structure._StructureAtom objects. Atom objects to compute the tensor for. Either atoms or structure must be given.

  • massless (bool) – True if the calculations should be independent of the atomic masses (all mass=1), False (default) if atomic mass should be used.

Return type

tuple

Returns

A tuple of (eigenvalues, eigenvectors) of the inertial tensor. The eigenvalues are the principle moments of inertia and are a list of length 3 floats. The eigenvectors are a list of lists, each inner list is a list of length 3 floats.

schrodinger.structutils.analyze.get_largest_moment_normalized_vector(**kwargs)

Return the normalized eigenvector of the largest moment of inertia. This will be the vector normal to the plane of the largest moment. See calculate_principal_moments for parameters.

Return type

numpy.array

Returns

The normalized vector for the largest moment of inertia

schrodinger.structutils.analyze.find_shortest_bond_path(struct, index1, index2, atom_ids=None)

Find the shortest path of bonded atoms that connects atom1 to atom2

The conversion of this routine to use networkx rather than scipy resulted in a dramatic reduction in both time and memory usage.

Parameters
  • struct (schrodinger.structure.Structure) – The structure containing the atoms index1 and index2

  • index1 (int) – The index of the first atom in the path

  • index2 (int) – The index of the second atom in the path

  • atom_ids (the atom_ids to search path from) – list of int

Return type

list

Returns

A list of indexes of atoms that connect atom index1 to atom index2 along the shortest bond path. Index1 will be the first item in the list and index2 will be the last. The second item in the list will be bonded to index1, the third will be bonded to the second, etc. If index1 == index2, a single item list is returned: [index1]

Raises
  • ValueError – if index1 and index2 are not part of the same molecule

  • MemoryError – if the system is too large

schrodinger.structutils.analyze.create_nx_graph(struct, atoms=None)

Generate a networkx undirected graph of the structure based on bonds

Parameters
Return type

networkx.Graph

Returns

An undirected graph of the structure with edges in place of each bond. Edges are identical regardless of the bond order of the bond.

schrodinger.structutils.analyze.improper_dihedral_iterator(struct=None, atoms=None, nx_graph=None, include_proper_improper=True, include_proper=True)

An iterator over all the improper dihedral angles in a structure or group of atoms.

Parameters
  • struct (schrodinger.structure.Structure) – The structure to find improper dihedrals in. Either struct or nx_graph must be given.

  • atoms (iterable) – Optionally, improper dihedrals will be restricted to this group of atoms - items are schrodinger.structure._StructureAtom objects or atom indexes

  • graph (networkx.Graph) – A networkx graph of the structure with edges representing bonds. If not supplied one will be generated. If graph is supplied, struct and atoms are ignored.

  • include_proper_improper (bool) – whether to include improper dihedrals that define the same degree-of-freedom defined by a proper dihedral obtained from torsion_iterator, for example in the digrams below the neighboring atoms are not bound and so the defined impropers offer new degrees-of-freedom but if they were bound (suppose (R’’,12) was bound to (R’,26) in the first diagram) the degree-of-freedom defined by the improper (7, 37, 12, 26) is redundant with that defined by the proper (7, 37, 12, 26) obtained from torsion_iterator, this boolean controls whether such impropers can be returned

  • include_proper (bool) – whether to include proper dihedrals obtained from torsion_iterator that define a new degree-of-freedom not defined by any improper dihedral, for example in the digram below suppose (R’’,12) was bound to (R’,26) and the bond between (X,37) and (R’,26) did not exist, this boolean controls whether such propers can be returned as impropers

Return type

tuple

Returns

Each iteration yields a 4-integer tuple of atom indexes for an improper dihedral. For each given quadruple all unique topologies are enumerated. For example, for a standard quadruple ordering (i,j,k,l) all 6 of the following topologies are returned: (1) (i,j,k,l) (2) (j,i,k,l) # switch 1,2 (3) (i,j,l,k) # switch 3,4 (4) (j,i,l,k) # switch 1,2 and 3,4 (5) (k,j,i,l) # switch 1,3 (6) (i,l,k,j) # switch 2,4 The standard quadruple ordering (i,j,k,l) considers the indices of the first three atoms as the central atom plus the two lowest index bonding atoms in bond-path ordering as lowest bonding atom, central atom, other bonding atom. The last index in the quadruple is the remaining atom which is not bonded to the last atom in the previously mentioned triple but is bonded to the central atom. If calling with include_proper then those quadruples do not have their topologies enumerated and the ordering will be such that the index of the first atom in the tuple will be smaller than the index of the last atom in the tuple.

For example:

(R,7)
             (X,37)-(R',26)
  /
(R'',12)
  1. (7, 37, 12, 26) (standard)

  2. (37, 7, 12, 26)

  3. (7, 37, 26, 12)

  4. (37, 7, 26, 12)

  5. (12, 37, 7, 26)

  6. (7, 26, 12, 37)

or:

(R,34)
             (X,1)-(R',4)
  /
(R'',78)

(1) (4, 1, 34, 78) (standard) etc.

schrodinger.structutils.analyze.torsion_iterator(struct=None, atoms=None, nx_graph=None)

An iterator over all the bonded torsions in a structure or group of atoms.

Parameters
  • struct (schrodinger.structure.Structure) – The structure to find torsions in. Either struct or nx_graph must be given.

  • atoms (iterable) – Optionally, torsions will be restricted to this group of atoms - items are schrodinger.structure._StructureAtom objects or atom indexes

  • graph (networkx.Graph) – A networkx graph of the structure with edges representing bonds. If not supplied one will be generated. If graph is supplied, struct and atoms are ignored.

Return type

tuple

Returns

Each iteration yields a 4-integer tuple of atom indexes for a dihedral formed by bonded atoms. The index of the first atom in the tuple will be smaller than the index of the last atom in the tuple.

schrodinger.structutils.analyze.angle_iterator(struct=None, atoms=None, nx_graph=None)

An iterator over all the bonded angles in a structure or group of atoms.

Parameters
  • struct (schrodinger.structure.Structure) – The structure to find angles in. Either struct or nx_graph must be given.

  • atoms (iterable) – Optionally, angles will be restricted to this group of atoms - items are schrodinger.structure._StructureAtom objects or atom indexes

  • graph (networkx.Graph) – A networkx graph of the structure with edges representing bonds. If not supplied one will be generated. If graph is supplied, struct and atoms are ignored.

Return type

tuple

Returns

Each iteration yields a 3-integer tuple of atom indexes for an angle formed by bonded atoms. The index of the first atom in the tuple will be smaller than the index of the last atom in the tuple.

schrodinger.structutils.analyze.bond_iterator(struct=None, atoms=None, nx_graph=None)

An iterator over all the bonds in a structure or group of atoms.

Note: It may seem unnecessary to have this function as one must iterate over bonds to form the nx_graph that is then iterated over in this function. However, it may be the case that an nx_graph has already been created for other reasons such as when iterating over all bonds, angles and torsions.

Parameters
  • struct (schrodinger.structure.Structure) – The structure to find bonds in. Either struct or nx_graph must be given.

  • atoms (iterable) – Optionally, bonds will be restricted to this group of atoms - items are schrodinger.structure._StructureAtom objects or atom indexes

  • graph (networkx.Graph) – A networkx graph of the structure with edges representing bonds. If not supplied one will be generated. If graph is supplied, struct and atoms are ignored.

Return type

tuple

Returns

Each iteration yields a 2-integer tuple of atom indexes for a bond formed by bonded atoms. The index of the first atom in the tuple will be smaller than the index of the last atom in the tuple.

schrodinger.structutils.analyze.get_average_structure(sts)

Calculate the average structure between the given conformers.

Parameters

sts (Iterable of structure.Structure objects) – Structures to average

Return type

structure.Structure

Returns

Average structure

schrodinger.structutils.analyze.find_common_substructure(sts, atomTyping=11, allow_broken_rings=True)

Find the maximum substructure that is common between all specified CTs. If any of the structures matches the substructure SMARTS more than once, then all matches are reported - that is why output a “triple” list. Outer list represents input structure, next list represents matches, and inner list is list of atom indices for that match. It’s up to the calling code to decide which of the multiple matches to use (one method is to use the one whose center-of-mass is closest to the COM of the whole ligand). NOTE: This function becomes exponentioally slow with larger number of structures. Recommened maximum around 30 structures.

NOTE: This function checks CANVAS_SHARED exists and checks out CANVAS_FULL

Parameters
  • sts (Iterable of structure.Structure objects) – Structures to average

  • atomTyping (int) – Atom typing scheme to use. For list of available schemes, see $SCHRODINGER/utilities/canvasMCS -h

  • allow_broken_rings (bool) – Whether to allow partial mapping of rings

Return type

List of list of list of ints

Returns

Substructure atoms from each structure. Outer list represents input structures - in order of input; middle list represents matches, inner list represents atom indices for that match.

schrodinger.structutils.analyze.group_by_connectivity(st, atoms)

Groups the atoms by molecule connectivity. Returns a list of atom groups. Each group is a list of atoms that are in the same “molecule” - that are bonded to each other, counting only atoms in specified list. If multiple atoms are in the same molecule, but are separated by atoms that are not in the list (e.g. 2 covalent ligands bound to same protein), they will be grouped separately.

Parameters
  • st (structure.Structure) – Structure that atoms are from.

  • atoms (list of ints) – List of atom indices that are to be grouped.

schrodinger.structutils.analyze.find_common_properties(sts: Iterable[schrodinger.structure._structure.Structure]) Set[str]

Return a set of property names that are common to all selected structures.

Parameters

sts (Iterable of structure.Structure objects) – Structures to analyze

Returns

set of property data names

Return type

set of str

schrodinger.structutils.analyze.read_seqres_from_ct(st: schrodinger.structure._structure.Structure)

ct {schrodinger.Structure} Input ct to process Read the SEQRES data from a ct and return as a pair of lists with the same size. The first has the chain names and the second the sequences ie [‘A’] and [‘ALA ALA ALA ‘]

schrodinger.structutils.analyze.seq_align_match(fullseq, fragseq, pdbnum, breaklist=None, allow_frag_gaps=False)

restricted Needleman-Wunsch

Parameters
  • fullseq (str) – Full sequence to work with. Positions in the alignment that match this and not fragseq are given light penalities. Positions in the alignment that fragseq and not this are either not allowed (allow_frag_gaps=True) or have large penalties. This is intended to be the full protein sequence (from the seqres records) when alignining protein full protein sequences to those actually resolved in the experiment.

  • fragseq (str) – Fragment sequence to work with. This is intended to be the fragment of the protein sequence actually resolved in the experiment (ATOM records) when aligning full protein sequences to those actually resolved in the experiment.

  • pdbnum (list(tuple)) – list of tuples with a integer and a character with the same length as fragseq ie [(1, ‘ ‘, (1, ‘A’), (2, ‘ ‘)]} The residue numbers and insertion codes of the residues in fragseq. This allows for the gap penalties to be disregarded when the residue number suggests a gap.

  • breaklist (list(bool)) – { list of Boolean with the same length as fragseq } True values in this list mean that there is a known break after that residue in fragseq so gap penalties are disregarded.

  • allow_frag_gaps (bool) – see fullseq for a description

Return value is a string with the same length as the alignment. A M will be at any position that matches the fullseq and fragseq. A U will be at any position that mathces fullseq, but not fragseq. A R will be at any position that mathces fragseq, but not fullseq.

class schrodinger.structutils.analyze.AslLigandSearcher(copy_props=True, **kwargs)

Bases: object

Search a Structure instance for putative ligands with an Atom Selection Language expression. Results are returned as a list of Ligand instances.

API example:

st = structure.Structure.read('file.mae')
st_writer = structure.StructureWriter('out.mae')
asl_searcher = AslLigandSearcher()
ligands = asl_searcher.search(st)
for lig in ligands:
    st_writer.append(lig.st)
st_writer.close()

ASL evaluates molecules in a strict sense. Ligands with zero-order bonds to metal and covalently-attached ligands are difficult to find with this naive approach. See __init__ for options that workaround these limitations.

‘sidechain’, ‘backbone’, and ‘ion’ aliases are used by this module. They are taken from first mmasl.ini in the path, but are assumed to be defined as a list of PDB atom names that correspond to atoms of the protein side chains, protein backbone, and small ions respectively.

Since the precise definition of a ligand is context specific and impossible to generally formulate, this class attempts to provide customizable tools for identifying ligands within a structure. It is the caller’s responsibility to customize the search parameters and verify that the hits are appropriate.

For all keyword args, the configured value from Maestro will be used by default. Only specify a keyword arg to override this value.

All kwargs (except copy_props) are passed directly to LigandParameters, as defined in the LigandParamaters class in mmasl.h

Variables
  • copy_props (bool) – If True then copy the ct-level properties from the searched structure to all the found ligand substructures. If False, only the title will be copied.

  • min_heavy_atom_count (int) – Minimum number of heavy atoms required in each ligand molecule.

  • max_atom_count (int) – Maximum number of heavy atoms for a ligand molecule (does not include hydrogens).

  • allow_ion_only_molecules (bool) – Consider charged molecules to be ligands.

  • allow_amino_acid_only_molecules (bool) – If True, consider small molecules containing only amino acids to be ligands.

  • excluded_residue_names (set[str]) – Set of PDB residue names corresponding to atoms which will never be ligands.

  • included_residue_names (set[str]) – Set of PDB residue names corresponding to atoms which always be considered ligands.

See

find_ligands for a simple functional interface to this class.

__init__(copy_props=True, **kwargs)

Initialize searcher.

property min_atom_count
property max_atom_count
property exclude_ions
property exclude_amino_acids
property excluded_residues
search(st) list

Find list of putative ligands matching either ligand_asl or the default internally generated ASL.

Parameters

st (Structure) – Structure to search for ligands.

Return type

list

Returns

a list of Ligand instances. These are putative ligands that match the ASL expression. See Ligand for attributes.

class schrodinger.structutils.analyze.Ligand(complex_st, st, mol_num=None, atom_indexes=None, lig_asl=None, is_covalently_bound=None)

Bases: object

A putative AslLigandSearcher ligand structure with read-only data and convenience methods.

Ligand items sort from smallest to largest, by total number of atoms, then by SMILES.

Parameters: * complex_st: Original complex structure. * st: Ligand substructure * mol_num: Ligand molecule number in the original structure NOTE: molecule contains non-ligand atoms for covalently bound ligands. * atom_indexes: Atom indices into the original structure for this ligand. * atom_objects: List of ligand atom objects from the original structure. * lig_asl: ASL that matches the ligand atoms in the original structure. * is_covalently_bound: Whether the ligand is covalently bound. Depreacted. * pdbres: PDB residue name identifier. * centroid: Centroid of ligand as a 4-element numpy array: [x, y, z, 0.0] * unique_smiles: SMILES string representing this ligand structure.

__init__(complex_st, st, mol_num=None, atom_indexes=None, lig_asl=None, is_covalently_bound=None)
Parameters
  • st (Structure) – Original complex structure.

  • st – Ligand structure.

  • mol_num (int) – Molecular index identifier. Typically, the mol.n from the original structure from whence this ligand structure was derived. Note, depending on the nature of the ligand and the treatment of the original structure this mol.n index may not be valid.

  • atom_indexes (list) – Atom index identifiers. Typically, the at.n from the original structure from whence this ligand structure was derived.

  • lig_asl (str) – ASL identifier. Typically, the expression is defined in terms of the original structure from whence this ligand structure was derived.

Deprecated is_covalently_bound

Whether this ligand is bonds to other atoms (including zero-order bonds). Will be False if the ligand spans a whole molecule.

property is_covalently_bound

The Ligand.is_covalently_bound property returns True if this ligand has any bonds (including zero-order) to any other atoms, and returns False if the ligand spans a complete molecule.

sort_key()

Enable sorting for Ligand objects:

ligands.sort(key=lambda l: l.sort_key())

Comparison criteria for sorting Ligands: total number of atoms, unique smiles string, centroid.

Returns

sort key

Return type

list

property mol_num

Ligand’s molecule number as defined upon instantiation. :rtype: int

Warning: Depending on the nature of the ligand and the treatment of the original structure, e.g. zero-order bonds cut, this mol.n index may not be valid.

property atom_indexes

Indices of the Ligand atoms as defined upon instantiation. :rtype: list

property atom_objects

Atom objects from the original structure for the ligand atoms. :rtype: list

property pdbres

PDB residue name identifier. If the ligand is composed of multiple residues then the names are joined with a ‘-’ separator. :rtype: str

property centroid

Centroid of the Ligand as a 4-element numpy array: [x, y, z, 0.0] :rtype: 4-element numpy array

property unique_smiles

Unique SMILES string representing this ligand structure. :rtype: str

property st

Copy of the ligand Structure. :rtype: Structure

property ligand_asl

Ligand_asl used when searching for the ligand. The ASL defined the ligand in the context of its original structure. :rtype: str

class schrodinger.structutils.analyze.MissingLoopFinder

Bases: object

compare SEQRES and ATOM record and find missing loops. Does not use the order of the residues in the CT to avoid issues that occur when missing loops are searched for after some loops have been added by another program

__init__()
run(ct, include_tails=False, legacy_output=False, debug=False)

Compare the SEQRES records the structual atom records to find missing loops.

Returns

tuples of (<residue object of the residue before the missing loop or NONE if this is a missing N-terminal tail>, <residue object of the residue after the missing loop or NONE if this is a missing C-terminal tail>, <list of residue types missing as a list of 3char strings>

Parameters
  • debug (bool) – is True then the allignments will be printed to stdout

  • include_tails – if True then missing N and C terminal tails will be included

schrodinger.structutils.analyze.get_low_energy_reps(sts, eps, key=None)

Cluster the given structures by energy using the given energy key function and precision and return the lowest energy structures from each cluster sorted by energy.

Parameters
  • sts (list[schrodinger.structure.Structure]) – the structures

  • eps (float) – the precision that controls the size of the clusters, see sklearn.cluster.DBSCAN documentation for more details

  • key (function or None) – the function to get the energy from the structure, if None minimize.compute_energy will be used

Raises

ValueError – if there is an issue

Return type

list[schrodinger.structure.Structure]

Returns

representative structures sorted by increasing energy