schrodinger.ui.sequencealignment.sequence module¶
Implementation of multiple sequence viewer Sequence class.
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.ui.sequencealignment.sequence.delete_from_str(inp_str, delete_chars)[source]¶
Delete characters from a string.
Note: replaces Python 2 inp_str.translate(None, delete_chars)
- Parameters
inp_str (str) – A string to delete characters from. In Python 2, unicode input will be cast to str
delete_chars (str) – Characters to delete from the string
- Returns
The input string with the delete_chars removed
- Return type
str
- class schrodinger.ui.sequencealignment.sequence.Sequence[source]¶
Bases:
object
The sequence class represents a single basic sequence object. The Sequence object can correspond to amino acid sequence, nucleic acid sequence, annotation (such as secondary structure assignment or hydrophobicity plot) or helper object (for example, a ruler).
- appendResidue(residue)[source]¶
Appends a new residue to self.
:type residue : sequence alignment Residue object
- appendResidues(codes, use_numbers=False)[source]¶
Create new residues based on a single-code string and append them to existing sequence. Converts upper-case characters to lower-case, recognize gaps (‘.’, ‘-’, ‘~’) and ignore other characters.
- Parameters
codes (string) – single-code amino acid
use_numbers (boolean) – If true, this function will try to recognize residue numbers included in the sequence and assign them to the residues.
- removeStructureless()[source]¶
Removes structureless (SEQRES) residues from the sequence and its children.
- replaceSequence(new_sequence)[source]¶
This method replaces current sequence with the provided string.
- Parameters
new_sequence (str) – Must be same gapless length as old sequence.
- Return type
bool
- Returns
True if successful
- toString(with_gaps=True)[source]¶
Returns a string representation of self.
- Parameters
with_gaps (boolean (default=True)) – optional parameter, if True the returned string will include gaps, if False - only actual residue codes.
- unpaddedLength()[source]¶
Returns a length of the sequence with rightmost gaps stripped out.
- Return type
int
- Returns
length of the stripped sequence
- gaplessLength()[source]¶
Returns a length of the sequence excluding gaps.
- Return type
int
- Returns
actual sequence length (number of residues)
- numberOfGaps()[source]¶
Returns a number of gaps in the sequence.
- Return type
int
- Returns
number of gaps in the sequence
- getResidue(index, ungapped=False, hidden=True)[source]¶
Returns a residue at a given sequence position, or None if the given position is invalid.
- Parameters
index (int) – sequence position
- Return type
Residue
- Returns
residue for a given position, or None if the position is invalid
- getResidueIndex(id)[source]¶
Returns index of residue with given id
:type id : string :param id : str(res.num) + str(res.icode)
:rtype : int if valid id, None if not :return : index of res if valid id, None if not
- getUngappedIndex(index)[source]¶
Returns a residue index corresponding to ungapped position.
- Parameters
index (int) – Residue index in gapped sequence
- Return type
int
- Returns
Index in ungapped sequence.
- insertGaps(position, n_gaps, active=True)[source]¶
Inserts a specified number of gaps at a specified position.
- Parameters
position (int) – sequence position where the gaps will be inserted
n_gaps (int) – number of gaps to be inserted at the position
- Return type
int
- Returns
number of gaps actually inserted at the position
- removeGaps(position, n_gaps)[source]¶
Removes a specified number of gaps (or less) at a given position, starting from position and going to C-terminus. (towards higher index)
- Parameters
position (int) – sequence position from where the gaps will be removed
n_gaps (int) – number of gaps to be removed at the position
- Return type
int
- Returns
number of gaps actually removed at the position
- removeGapsBackwards(position, n_gaps)[source]¶
Removes a specified number of gaps (or less) at a given position, starting at the position and going to N-terminus. (towards lower index)
- Parameters
position (int) – sequence position from where the gaps will be removed
n_gaps (int) – number of gaps to be removed at the position
- Return type
int
- Returns
number of gaps actually removed at the position
- removeAllGaps(selected_only=False)[source]¶
Removes all gaps from the sequence. If selected_only, only removes gaps if gaps are selected.
- hasSelectedResidues()[source]¶
- Return type
bool
- Returns
True if any of the residues are selected, False otherwise
- hasAllSelectedResidues()[source]¶
Checks if all residues in the sequence are selected.
- Return type
bool
- Returns
True if all residues are selected, False otherwise
- calculatePlotValues(half_window_size, min_value=None, max_value=None)[source]¶
Calculates window-averaged plot values, and the plot value extrema.
- Parameters
half_window_size (int) – half-size of the window (can be 0 if not averaging)
min_value (float) – optional minimum value, if None then the minimum will be calculated
max_value (float) – optional maximum value, if None then the minimum will be calculated
- propagateGapsToChildren(target_child=None)[source]¶
Propagates gaps from a parent sequence to all children. This method should be called after loading multiple alignment in order to ensure gap consistency between parent sequence and its children.
- Parameters
target_child (
Sequence
) – If specified, only this child sequence will be used.
- propagateGaps(sequence, parent_sequence=None, replace=False)[source]¶
Propagates gaps from self to a given sequence. Sequence is supposed to be a subset of self.
- Return type
list of
Residue
- Returns
list of residues including gaps at matching positions
- calcIdentity(reference_sequence, consider_gaps, in_columns)[source]¶
This method calculates sequence identity between self and a specified reference sequence, assuming that both sequences are already aligned.
- Parameters
reference_sequence (
Sequence
) – reference sequenceconsider_gaps (bool) – Should we include gaps in the calculation.
- Return type
float
- Returns
sequence identity (between 0.0 and 1.0)
- calcSimilarity(reference_sequence, consider_gaps, in_columns)[source]¶
This method calculates sequence similarity between self and a specified reference sequence, assuming that both sequences are already aligned.
- Parameters
reference_sequence (
Sequence
) – reference sequenceconsider_gaps (bool) – Should we include gaps in the calculation.
- Return type
float
- Returns
sequence similarity (between 0.0 and 1.0)
- calcHomology(reference_sequence, consider_gaps, in_columns)[source]¶
This method calculates sequence homolgy between self and a specified reference sequence, assuming that both sequences are already aligned. The homology criterion is based on “side chain chemistry” descriptor matching.
- Parameters
reference_sequence (
Sequence
) – reference sequenceconsider_gaps (bool) – Should we include gaps in the calculation.
- Return type
float
- Returns
sequence similarity (between 0.0 and 1.0)
- calcScore(reference_sequence, consider_gaps, in_columns)[source]¶
This method calculates sequence similarity score between self and a specified reference sequence, assuming that both sequences are already aligned.
- Parameters
reference_sequence (
Sequence
) – reference sequence- Return type
float
- Returns
sequence similarity score
- ungappedId(position, start, end, backwards=False)[source]¶
Returns residue ID for the first ungapped position in a specified region, starting from position and going forward or backwards. If no valid position is found (i.e. all residues in the specified region are gaps), returns an empty string.
- Parameters
start (int) – lower boundary of the search region
end (int) – upper boundary of the search region
position (int) – initial position
backwards (bool) – if True, search the sequence backwards
- Return type
string
- Returns
ungapped residue ID, or empty string if no valid residue is found
- hasAnnotationType(annotation_type)[source]¶
Checks if the sequence already has this annotation type.
- Parameters
annotation_type (int) – annotation type
- Return type
bool
- Returns
True if the sequence has this annotation type already, False otherwise
- inactivePosition(pos)[source]¶
Finds first inactive residue position after given position.
- Parameters
pos (int) – start position in sequence to begin search
- Return type
int
- Returns
position of first inactive res. If none, returns -1
- makeShortName(name=None)[source]¶
This method converts a long sequence name into a short name that is displayed on a screen.
- compare(sequence)[source]¶
Compares gapless version of self with other sequences and calculates identity between both.
- getPDBId(with_chain=True)[source]¶
This function tries to generate a PDB ID based on the sequence name.
It supports different name formats: 1abcD, pdb|1abc|D, 1ABCD If the conversion fails, it will return an empty string.
- translateDNA(translation_table={'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAT': 'N', 'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T', 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGT': 'S', 'ATA': 'I', 'ATC': 'I', 'ATG': 'M', 'ATT': 'I', 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAT': 'H', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R', 'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L', 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAT': 'D', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G', 'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V', 'TAA': 'X', 'TAC': 'Y', 'TAG': 'X', 'TAT': 'Y', 'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S', 'TGA': 'X', 'TGC': 'C', 'TGG': 'W', 'TGT': 'C', 'TTA': 'L', 'TTC': 'F', 'TTG': 'L', 'TTT': 'F'})[source]¶
Translates the sequence from nucleotide codes to amino acids.
- translateRNA(translation_table={'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAU': 'N', 'ACA': 'U', 'ACC': 'U', 'ACG': 'U', 'ACU': 'U', 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGU': 'S', 'AUA': 'I', 'AUC': 'I', 'AUG': 'M', 'AUU': 'I', 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAU': 'H', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCU': 'P', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGU': 'R', 'CUA': 'L', 'CUC': 'L', 'CUG': 'L', 'CUU': 'L', 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAU': 'D', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCU': 'A', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGU': 'G', 'GUA': 'V', 'GUC': 'V', 'GUG': 'V', 'GUU': 'V', 'UAA': 'X', 'UAC': 'Y', 'UAG': 'X', 'UAU': 'Y', 'UCA': 'S', 'UCC': 'S', 'UCG': 'S', 'UCU': 'S', 'UGA': 'X', 'UGC': 'C', 'UGG': 'W', 'UGU': 'C', 'UUA': 'L', 'UUC': 'F', 'UUG': 'L', 'UUU': 'F'})[source]¶
Translates the sequence from nucleotide codes to amino acids.