schrodinger.protein.align module¶
- class schrodinger.protein.align.ASLResult(ref_ok, other_ok, other_skips)¶
Bases:
tuple- other_ok¶
Alias for field number 1
- other_skips¶
Alias for field number 2
- ref_ok¶
Alias for field number 0
- exception schrodinger.protein.align.CantAlignException¶
Bases:
ExceptionException raised when an aligner cannot start e.g. due to not enough seqs
- class schrodinger.protein.align.AbstractAligner¶
Bases:
objectBase class of objects that can perform an alignment
- abstract run(aln)¶
Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters:
aln (
schrodinger.protein.alignment.BaseAlignment) – The alignment to align
- class schrodinger.protein.align.RescodeAligner¶
Bases:
AbstractAlignerAligns sequences by rescode
- run(aln)¶
Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters:
aln (
schrodinger.protein.alignment.BaseAlignment) – The alignment to align
- class schrodinger.protein.align.AbstractPairwiseAligner(preserve_reference_gaps=False)¶
Bases:
AbstractAlignerAbstract class for pairwise alignment where gaps can be merged into the entire alignment to preserve relative alignment of all non-reference sequences to the reference sequence.
Subclasses must implement
_getPairwiseGapsto align one sequence to the ref seq. Subclasses may override_runto customize aligning (e.g. validation or setup of additional data needed by_getPairwiseGaps)- __init__(preserve_reference_gaps=False)¶
- Parameters:
preserve_reference_gaps (bool) – Whether to preserve the gaps in the reference sequence.
- run(aln, seqs_to_align=None, **kwargs)¶
kwargsare additional arguments that will be passed to_run.- Parameters:
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(Sequence)) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1])
- Raises:
CantAlignException – If
seqs_to_aligncontains a sequence not found inaln.
- class schrodinger.protein.align.AbstractNWPairwiseAligner(preserve_reference_gaps=False, gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, ss_constraints=False, penalize_end_gaps=True)¶
Bases:
AbstractPairwiseAlignerAbstract class for the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
- Variables:
CONSTRAINT_SCORE – Reward amount for keeping constrained residues aligned
RES_MATCH_BONUS – Reward amount for aligning matching residues. Used by default if a substitution matrix is not specified.
RES_MISMATCH_PENALTY – Penalty for aligning differing residues. Used by default if a subtitution matrix is not specified
- Ctype CONSTRAINT_SCORE:
float
- Ctype RES_MATCH_BONUS:
float
- Ctype RES_MISMATCH_PENALTY:
float
- CONSTRAINT_SCORE = 10000¶
- RES_MATCH_BONUS = 1.0¶
- RES_MISMATCH_PENALTY = 1.0¶
- __init__(preserve_reference_gaps=False, gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, ss_constraints=False, penalize_end_gaps=True)¶
- Parameters:
preserve_reference_gaps (bool) – Whether to preserve the gaps in the reference sequence
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
penalize_end_gaps (bool) – Whether to penalize start/end gaps
- class schrodinger.protein.align.SchrodingerPairwiseAligner(**kwargs)¶
Bases:
AbstractNWPairwiseAlignerImplementation of the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
ability to merge new sequence with existing alignment,
ability to penalize gaps in secondary structure elements,
ability to use custom substitution matrix generated from a family of proteins or provided by the user.
- NOTE::
Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
- __init__(**kwargs)¶
- Parameters:
preserve_reference_gaps (bool) – Whether to preserve the gaps in the reference sequence
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
penalize_end_gaps (bool) – Whether to penalize start/end gaps
- getAlignmentScore()¶
Get the score of the alignment. Found by taking the highest value in the scoring matrix.
- Returns:
Score of the pairwise alignment.
- Return type:
float
- class schrodinger.protein.align.BiopythonPairwiseAligner(*args, **kwargs)¶
Bases:
AbstractNWPairwiseAlignerPairwise alignment using Biopython.
- NOTE::
Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
- __init__(*args, **kwargs)¶
- Parameters:
preserve_reference_gaps (bool) – Whether to preserve the gaps in the reference sequence
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
penalize_end_gaps (bool) – Whether to penalize start/end gaps
- generateSubMatrix()¶
Generate the identity substitution matrix if not provided.
- generateIdentitySubMatrix(res_keys)¶
Generate the basic identity sub matrix based on existing values.
- Parameters:
res_keys – list of values to be included in the sub matrix
- getAlignmentScore()¶
Get the score of the alignment. Found by taking the highest value in the scoring matrix.
- Returns:
Score of the pairwise alignment.
- Return type:
float
- class schrodinger.protein.align.FamilyPairwiseAligner(anno_type: ANNOTATION_TYPES, cdr_scheme: AntibodyCDRScheme = None, custom_annotation: CustomAnnotation = None, *args, **kwargs)¶
Bases:
BiopythonPairwiseAlignerPairwise alignment for family features using Biopython.
- __init__(anno_type: ANNOTATION_TYPES, cdr_scheme: AntibodyCDRScheme = None, custom_annotation: CustomAnnotation = None, *args, **kwargs)¶
- Parameters:
anno_type – Annotation type - one of: antibody_cdr, gpcr_segment, kinase_features
cdr_scheme – Antibody CDR scheme, only required if annotation type is antibody_cdr
custom_annotation – custom annotation w/ descriptions, only required if annotation type is custom_annotation
- run(aln, seqs_to_align=None, **kwargs)¶
Aligns the sequences and removes redundant aligned gaps.
- generateSubMatrix()¶
Generate the identity substitution matrix for the annotation type.
- class schrodinger.protein.align.PrimeSTAAligner(protein_family=None)¶
Bases:
AbstractAlignerSequence alignment using $SCHRODINGER/sta
- __init__(protein_family=None)¶
- Parameters:
protein_family (str or NoneType) – ‘GPCR’ for specialized alignment or None for default templates.
- run(aln, structured_seq=None, constraints=None)¶
- Parameters:
aln (alignment.Alignment) – The alignment containing sequences to align.
structured_seq (ProteinSequence or NoneType) – Structured sequence to use as reference. If None, the first non-reference seq will be aligned.
constraints (list(tuple(Residue, Residue)) or NoneType) – Pairs of (reference_seq, structured_seq) residues to constrain
- class schrodinger.protein.align.ClustalAligner¶
Bases:
AbstractAlignerAligns sequences using the Clustal alignment algorithm.
- run(aln)¶
Aligns the sequences in an alignment
- Parameters:
aln (
schrodinger.protein.alignment.BaseAlignment) – The alignment to align
- class schrodinger.protein.align.SuperpositionAligner(gap_open_penalty=None, gap_extend_penalty=None)¶
Bases:
BiopythonPairwiseAlignerAlign structured sequences based on their superposition.
- __init__(gap_open_penalty=None, gap_extend_penalty=None)¶
- Parameters:
preserve_reference_gaps (bool) – Whether to preserve the gaps in the reference sequence
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
penalize_end_gaps (bool) – Whether to penalize start/end gaps
- class schrodinger.protein.align.AbstractStructureAligner(keywords=None, **kwargs)¶
Bases:
AbstractAlignerSubclasses must reimplement
run: - Call_setUpSeqsto set up instance attributes for the current alignment - Call_setASLsto validate and store ASLs - Call_getUniqueEidSeqsto get the sequences to align - Call_runStructureAlignmentto call the backend- class Result(ref_seq, other_seq, psd, rmsd)¶
Bases:
tuple- other_seq¶
Alias for field number 1
- psd¶
Alias for field number 2
- ref_seq¶
Alias for field number 0
- rmsd¶
Alias for field number 3
- __init__(keywords=None, **kwargs)¶
- Parameters:
keywords (dict) – Keywords to pass to the ska backend
- getResultSeqs()¶
- class schrodinger.protein.align.StructureAligner(keywords=None, **kwargs)¶
Bases:
AbstractStructureAlignerRun structure alignment using the specified sequences to create chain ASLs
- run(aln, seqs_to_align, **kwargs)¶
Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters:
aln (
schrodinger.protein.alignment.BaseAlignment) – The alignment to align
- class schrodinger.protein.align.CustomASLStructureAligner(keywords=None, ref_asl=None, other_asl=None)¶
Bases:
AbstractStructureAlignerRun structure alignment using specified ASLs
- SENTINEL = <object object>¶
- __init__(keywords=None, ref_asl=None, other_asl=None)¶
- Parameters:
keywords (dict) – Keywords to pass to the ska backend
- evaluateASLs(aln, seqs_to_align)¶
Determine whether the ASLs match any atoms in the sequences’ structures
- Parameters:
aln – Alignment
seqs_to_align – Sequences to align
- Return type:
- run(aln, seqs_to_align, **kwargs)¶
Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters:
aln (
schrodinger.protein.alignment.BaseAlignment) – The alignment to align
- class schrodinger.protein.align.MaxIdentityAligner¶
Bases:
BiopythonPairwiseAlignerPairwise aligner that maximizes the number of matching residues between two sequences. There are no penalties for mismatches or gaps.
- __init__()¶
- Parameters:
preserve_reference_gaps (bool) – Whether to preserve the gaps in the reference sequence
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
penalize_end_gaps (bool) – Whether to penalize start/end gaps
- run(aln)¶
kwargsare additional arguments that will be passed to_run.- Parameters:
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(Sequence)) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1])
- Raises:
CantAlignException – If
seqs_to_aligncontains a sequence not found inaln.
- class schrodinger.protein.align.StructurelessGapAligner¶
Bases:
AbstractAlignerAlign all structureless residues with gaps
For example, given the following alignment (where circled letters are structureless residues):
Resnum: 0 1 2 3 4 5 Seq1: Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ A D A
The result will be:
Resnum: 0 1 2 3 4 5 6 7 8 Seq1: ~ ~ ~ Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ ~ ~ ~ A D A
- run(aln, seqs_to_align=None)¶
Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters:
aln (
schrodinger.protein.alignment.BaseAlignment) – The alignment to align
- schrodinger.protein.align.align_seqs_from_structs(structs: list[Structure], AlignerClass: type[AbstractAligner], init_kwargs: dict = None, run_kwargs: dict = None) ProteinAlignment¶
- schrodinger.protein.align.superposition_alignment_from_structs(structs: list[Structure], gap_open_penalty: float = None, gap_extend_penalty: float = None) ProteinAlignment¶
Given a list of protein structures, performs a multi-sequence alignment of the residue sequences of those proteins.
- Parameters:
structs – the list of structures for which to create a sequence alignment. The first structure in the list will be used as the reference
gap_open_penalty – penalty for opening a gap in the alignment. Default is 1.0.
gap_extend_penalty – penalty for extending a gap in the alignment. Default is 0.
- Returns:
a ProteinAlignment object containing the aligned sequences. The sequences will be in the same order as the input structures.