schrodinger.structutils.structalign2 module¶
A module for performing protein structure alignment.
If performing the aligment using the SKA program, this requires Prime to be installed and licensed appropriately.
After the structural alignment is performed, some results are added as properties of the input structures. A more comprehensive Alignment object is returned per template-mobile structure pair.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.structutils.structalign2.Alignment(rmsd: float, length: int, transform_matrix: numpy.array, align: tuple[tuple[int, int]], score: float = None)¶
Bases:
NamedTuple
Container to store alignment data.
- Parameters
rmsd – RMSD of the aligned atoms after superposition.
length – number of residues in the alignment.
transform_matrix – 4x4 array where the first 3x3 cells contain the rotation matrix, the last 3x1 is the translation vector, and the 4th row is a spectator row without meaningful information.
align – corresponding atom indices for each aligned structure. Only one atom per residue (e.g. Calpha).
score – pairwise alignment score produced by the underlying method.
- rmsd: float¶
Alias for field number 0
- length: int¶
Alias for field number 1
- transform_matrix: numpy.array¶
Alias for field number 2
- align: tuple[tuple[int, int]]¶
Alias for field number 3
- score: float¶
Alias for field number 4
- schrodinger.structutils.structalign2.get_guide_atoms(st: schrodinger.structure._structure.Structure) list[schrodinger.structure._structure.StructureAtom] ¶
Return a list of guide atoms for a structure.
Guide atoms are CA atoms for protein residues, and C4’ atoms for nucleic acid residues. Any other type of residue (e.g. small molecule) is ignored.
- Parameters
st – Input structure to process.
- Returns
List of guide atoms.
- schrodinger.structutils.structalign2.map_residues_by_shortest_dist(st_ref: schrodinger.structure._structure.Structure, st_mobile: schrodinger.structure._structure.Structure, max_dist: float = 5.0) list[tuple[int | None, int | None]] ¶
Create a pairwise residue mapping from two pre-aligned structures.
Considers two residues equivalent (aligned) if the distance between their respective guide atoms is smaller than max_dist. If there are multiple possible matches for a particular residue, pick the one closest to the reference guide atom. This is used by
ska_pairwise_align
to map residues after the alignment as the original mapping is lost.- Parameters
st_ref – Reference structure; always remains fixed.
st_mobile – Mobile structure to be aligned onto the reference.
max_dist – Maximum distance in Angstroms allowed between. Default value is 5.0A, which is a common threshold for residue matching in other structural alignment tools, e.g. TMalign.
- Returns
List of tuples with equivalent atom indices. The first value always corresponds to an atom on the reference structure. If a residue is not mapped (i.e. no atoms within max_dist), its guide atom will be mapped to None.
- schrodinger.structutils.structalign2.refine_alignment(st_ref: schrodinger.structure._structure.Structure, st_mobile: schrodinger.structure._structure.Structure, init_align: schrodinger.structutils.structalign2.Alignment, max_dist: float = 3.8) schrodinger.structutils.structalign2.Alignment ¶
Refine a structural alignment by removing outlier atom pairs.
The procedure is similar to that described by Gerstein and Levitt in Protein Sci, 1998 (PMC2143933). Beginning with an initial set of equivalent atoms derived from aligning two structures, perform an outlier analysis based on a simple distance criterion and re-fit the structures considering only the remaining atom pairs.
Unlike the procedure described in the paper, the algorithm here does the refinement in one shot, instead of iteratively removing one atom pair at a time. Also, importantly, criterion (i) in the paper is not applied here - pairs can be removed anywhere in the alignment.
The main reasoning behind doing a single-shot removal, vs the iterative approach is that this avoids multiple rounds of superimposition and rmsd calculations. In a benchmark with HOMSTRAD (1031 protein families), this simple one-step procedure was able to improve fitting of a large variety of protein folds at minimal compute cost.
- Parameters
st_ref – Reference structure; always remains fixed.
st_mobile – Mobile structure to be aligned onto the reference.
init_align – Initial alignment to be refined. This may come from any method that produces an Alignment object such as
cealign
.max_dist – Maximum distance in Angstroms allowed between equivalent atoms. Pairs with distances larger than this will be considered outliers and removed from the alignment. The default value is 3.8A and is taken from the original paper.
- Returns
A refined Alignment object.
- schrodinger.structutils.structalign2.cealign(st_ref: schrodinger.structure._structure.Structure, st_mobile: schrodinger.structure._structure.Structure, *, atlist_ref: Optional[list[int]] = None, atlist_mobile: Optional[list[int]] = None, transform_mobile: bool = True, window_size: int = 8, max_gap: int = 30, refine: bool = False) schrodinger.structutils.structalign2.Alignment ¶
Align two structures using the CEAlign method.
Based on the method described in:
Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998 Sep;11(9):739-47. doi: 10.1093/protein/11.9.739. PMID: 9796821.
- Parameters
st_ref – Reference structure, always remains fixed.
st_mobile – Mobile structure to be aligned onto the reference.
atlist_ref – List of atom indices from reference structure to be used for the alignment and superposition. If None, uses guide atoms.
atlist_mobile – List of atom indices from mobile structure to be used for the alignment and superposition. If None, uses guide atoms.
transform_mobile – If True (default), the coordinates of the mobile structure will be modified after calculating the optimal alignment.
window_size – CE algorithm parameter. Defines the length of fragments to be used for the alignment. Shorter values may yield more accurate alignments, up to a certain point, at the expense of computation time. Default is 8, as defined in the original publication.
max_gap – CE algorithm parameter. Size of maximum gap allowed between two aligned fragment pairs, in number of residues. Short values yield faster computation times, but might prevent similar fragments from being found by the algorithm. Default is 30, as defined in the original publication and found to be a reasonable compromise between computational cost and alignment quality.
refine – If True, will attempt to refine the alignment by removing outlier atom pairs. Default is False. See
refine_alignment
for more details.
- Returns
A named tuple with the RMSD (in Angstrom) between the two structures after superposition, the alignment length, a mapping of the aligned residues, and the transform matrix as a 4x4 numpy array.
- schrodinger.structutils.structalign2.ska_pairwise_align(st_ref: schrodinger.structure._structure.Structure, st_mobile: schrodinger.structure._structure.Structure, *, asl_ref: Optional[str] = None, asl_mobile: Optional[str] = None, transform_mobile: bool = True, use_scanning_alignment: bool = False, use_automatic_settings: bool = False, use_standard_residues: bool = False, reorder_by_connectivity: bool = False, reckless_align: bool = False, window_length: int = 5, minimum_length: int = 2, minimum_similarity: float = 1.0, gap_penalty: float = 2.0, deletion_penalty: float = 1.0, custom_logger: Optional[logging.Logger] = None) schrodinger.structutils.structalign2.Alignment ¶
Align two structures using SKA.
- Parameters
st_ref – Reference structure, remains fixed.
st_mobile – Mobile structure, will be transformed to minimize RMSD to the reference after the alignment.
asl_ref – ASL to identify residues in the reference structure to be used for the alignment. If None, will consider all residues.
asl_mobile – ASL to identify residues in the mobile structure to be used for the alignment. If None, will consider all residues.
transform_mobile – If True (default), the coordinates of the mobile structure will be modified after calculating the optimal alignment.
use_scanning_alignment – Will use a scanning alignment based on the specified window length if the sequences have low homology. Default is False.
use_automatic_settings – Use automatic settings to produce a better alignment. Default is False.
use_standard_residues – If True, residue names will be translated to standard forms (e.g. HIE -> HIS). Default is False.
reorder_by_connectivity – If True, structure will be reordered by atom connectivity (N to C terminus in proteins). Default is False.
reckless_align – Run SKA alignment in reckless mode, which produces an alignment even when similarity is very low. Default is False.
window_length – Number of residues to consider in each window when scanning alignment mode is enabled. Default is 5.
minimum_length – Minimum length of sequences needed to enable scanning alignment mode.
minimum_similarity – Minimum similarity needed to enable alignment. Default is False.
gap_penalty – Sequence gap penalty when building the alignment. Default is 2.0.
deletion_penalty – Sequence deletion penalty when building the alignment. Default is 1.0.
custom_logger – Logger object used to emit log messages. If None, use the module logger object.
- Returns
A named tuple with the RMSD (in Angstrom) between the two structures after superposition, the alignment length, a mapping of the aligned residues, and the transform matrix as a 4x4 numpy array.
- schrodinger.structutils.structalign2.align_many(st_ref: schrodinger.structure._structure.Structure, mobile_st_list: list[schrodinger.structure._structure.Structure], **kwargs) list[schrodinger.structutils.structalign2.Alignment] ¶
Aligns each structure in
mobile_st_list
tost_ref
.The coordinates of each of the input structures are modified to minimize RMSD after superposition.
- Parameters
st_ref – Reference structure; always remains fixed.
mobile_st_list – List of mobile structures to be aligned onto the reference.
kwargs – Additional keyword arguments to pass to the alignment method. These are specific to the underlying alignment method.
- Returns
A list of Alignment objects containing multiple data fields about the alignments (e.g. rmsd, score).
- schrodinger.structutils.structalign2.align_pair(st_ref: schrodinger.structure._structure.Structure, st_mobile: schrodinger.structure._structure.Structure, **kwargs) schrodinger.structutils.structalign2.Alignment ¶
Aligns
st_mobile
tost_ref
.The coordinates of the mobile structure are modified to minimize RMSD after superposition.
- Parameters
st_ref – Reference structure; always remains fixed.
st_mobile – Mobile structure to be aligned onto the reference.
kwargs – Additional keyword arguments to pass to the alignment method. These are specific to the underlying alignment method.
- Returns
An Alignment object containing multiple data fields about the alignment (e.g. rmsd, score).