schrodinger.ui.sequencealignment.sequence module¶

Implementation of multiple sequence viewer Sequence class.

schrodinger.ui.sequencealignment.sequence.delete_from_str(inp_str, delete_chars)[source]¶

Delete characters from a string.

Note: replaces Python 2 inp_str.translate(None, delete_chars)

Parameters

inp_str (str) – A string to delete characters from. In Python 2, unicode input will be cast to str
delete_chars (str) – Characters to delete from the string

Returns

The input string with the delete_chars removed

Return type

str

class schrodinger.ui.sequencealignment.sequence.Sequence[source]¶

Bases: object

The sequence class represents a single basic sequence object. The Sequence object can correspond to amino acid sequence, nucleic acid sequence, annotation (such as secondary structure assignment or hydrophobicity plot) or helper object (for example, a ruler).

__init__()[source]¶

appendResidue(residue)[source]¶

Appends a new residue to self.

:type residue : sequence alignment Residue object

appendResidues(codes, use_numbers=False)[source]¶

Create new residues based on a single-code string and append them to existing sequence. Converts upper-case characters to lower-case, recognize gaps (‘.’, ‘-’, ‘~’) and ignore other characters.

Parameters

codes (string) – single-code amino acid
use_numbers (boolean) – If true, this function will try to recognize residue numbers included in the sequence and assign them to the residues.

removeStructureless()[source]¶: Removes structureless (SEQRES) residues from the sequence and its children.

replaceSequence(new_sequence)[source]¶

This method replaces current sequence with the provided string.

Parameters: new_sequence (str) – Must be same gapless length as old sequence.
Return type: bool
Returns: True if successful

toString(with_gaps=True)[source]¶

Returns a string representation of self.

Parameters: with_gaps (boolean (default=True)) – optional parameter, if True the returned string will include gaps, if False - only actual residue codes.

text()[source]¶: Returns self as a string.

gaplessText()[source]¶: Returns self as a gapless string.

copyForUndo(deep_copy=True)[source]¶

length()[source]¶

Returns a length of the sequence.

Return type: int
Returns: lengh of the sequence

unpaddedLength()[source]¶

Returns a length of the sequence with rightmost gaps stripped out.

Return type: int
Returns: length of the stripped sequence

gaplessLength()[source]¶

Returns a length of the sequence excluding gaps.

Return type: int
Returns: actual sequence length (number of residues)

gaplessResidues()[source]¶: Returns a list of gapless residues.

numberOfGaps()[source]¶

Returns a number of gaps in the sequence.

Return type: int
Returns: number of gaps in the sequence

countActiveGaps(pos)[source]¶

getResidue(index, ungapped=False, hidden=True)[source]¶

Returns a residue at a given sequence position, or None if the given position is invalid.

Parameters: index (int) – sequence position
Return type: Residue
Returns: residue for a given position, or None if the position is invalid

getResidueIndex(id)[source]¶

Returns index of residue with given id

:type id : string :param id : str(res.num) + str(res.icode)

:rtype : int if valid id, None if not :return : index of res if valid id, None if not

getUngappedIndex(index)[source]¶

Returns a residue index corresponding to ungapped position.

Parameters: index (int) – Residue index in gapped sequence
Return type: int
Returns: Index in ungapped sequence.

insertGaps(position, n_gaps, active=True)[source]¶

Inserts a specified number of gaps at a specified position.

Parameters

position (int) – sequence position where the gaps will be inserted
n_gaps (int) – number of gaps to be inserted at the position

Return type

int

Returns

number of gaps actually inserted at the position

removeGaps(position, n_gaps)[source]¶

Removes a specified number of gaps (or less) at a given position, starting from position and going to C-terminus. (towards higher index)

Parameters

position (int) – sequence position from where the gaps will be removed
n_gaps (int) – number of gaps to be removed at the position

Return type

int

Returns

number of gaps actually removed at the position

removeGapsBackwards(position, n_gaps)[source]¶

Removes a specified number of gaps (or less) at a given position, starting at the position and going to N-terminus. (towards lower index)

Parameters

position (int) – sequence position from where the gaps will be removed
n_gaps (int) – number of gaps to be removed at the position

Return type

int

Returns

number of gaps actually removed at the position

removeAllGaps(selected_only=False)[source]¶: Removes all gaps from the sequence. If selected_only, only removes gaps if gaps are selected.

unselectResidues()[source]¶: Unselects all residues in the sequence

selectAllResidues()[source]¶

invertSelection()[source]¶

hasSelectedResidues()[source]¶

Return type: bool
Returns: True if any of the residues are selected, False otherwise

hasSelectedChildren()[source]¶: Returns True if any of its children are selected.

hasAllSelectedResidues()[source]¶

Checks if all residues in the sequence are selected.

Return type: bool
Returns: True if all residues are selected, False otherwise

deleteSelectedResidues()[source]¶: Removes all selected residues from the sequence.

hideChildren()[source]¶: Hides all child sequences (effectively collapsing the sequence).

showChildren()[source]¶: Shows all child sequences (effectively expanding the sequence).

calculatePlotValues(half_window_size, min_value=None, max_value=None)[source]¶

Calculates window-averaged plot values, and the plot value extrema.

Parameters

half_window_size (int) – half-size of the window (can be 0 if not averaging)
min_value (float) – optional minimum value, if None then the minimum will be calculated
max_value (float) – optional maximum value, if None then the minimum will be calculated

propagateGapsToChildren(target_child=None)[source]¶

Propagates gaps from a parent sequence to all children. This method should be called after loading multiple alignment in order to ensure gap consistency between parent sequence and its children.

Parameters: target_child (Sequence) – If specified, only this child sequence will be used.

propagateGaps(sequence, parent_sequence=None, replace=False)[source]¶

Propagates gaps from self to a given sequence. Sequence is supposed to be a subset of self.

Return type: list of Residue
Returns: list of residues including gaps at matching positions

calcIdentity(reference_sequence, consider_gaps, in_columns)[source]¶

This method calculates sequence identity between self and a specified reference sequence, assuming that both sequences are already aligned.

Parameters

reference_sequence (Sequence) – reference sequence
consider_gaps (bool) – Should we include gaps in the calculation.

Return type

float

Returns

sequence identity (between 0.0 and 1.0)

calcSimilarity(reference_sequence, consider_gaps, in_columns)[source]¶

This method calculates sequence similarity between self and a specified reference sequence, assuming that both sequences are already aligned.

Parameters

reference_sequence (Sequence) – reference sequence
consider_gaps (bool) – Should we include gaps in the calculation.

Return type

float

Returns

sequence similarity (between 0.0 and 1.0)

calcHomology(reference_sequence, consider_gaps, in_columns)[source]¶

This method calculates sequence homolgy between self and a specified reference sequence, assuming that both sequences are already aligned. The homology criterion is based on “side chain chemistry” descriptor matching.

Parameters

reference_sequence (Sequence) – reference sequence
consider_gaps (bool) – Should we include gaps in the calculation.

Return type

float

Returns

sequence similarity (between 0.0 and 1.0)

calcScore(reference_sequence, consider_gaps, in_columns)[source]¶

This method calculates sequence similarity score between self and a specified reference sequence, assuming that both sequences are already aligned.

Parameters: reference_sequence (Sequence) – reference sequence
Return type: float
Returns: sequence similarity score

previousUngappedResidue(position)[source]¶

nextUngappedResidue(position)[source]¶

ungappedId(position, start, end, backwards=False)[source]¶

Returns residue ID for the first ungapped position in a specified region, starting from position and going forward or backwards. If no valid position is found (i.e. all residues in the specified region are gaps), returns an empty string.

Parameters

start (int) – lower boundary of the search region
end (int) – upper boundary of the search region
position (int) – initial position
backwards (bool) – if True, search the sequence backwards

Return type

string

Returns

ungapped residue ID, or empty string if no valid residue is found

hasAnnotationType(annotation_type)[source]¶

Checks if the sequence already has this annotation type.

Parameters: annotation_type (int) – annotation type
Return type: bool
Returns: True if the sequence has this annotation type already, False otherwise

sanitize()[source]¶: Removes all gaps and illegal residue codes from self.

makeInactive()[source]¶

makeActive()[source]¶

haveAnchors(pos)[source]¶

inactivePosition(pos)[source]¶

Finds first inactive residue position after given position.

Parameters: pos (int) – start position in sequence to begin search
Return type: int
Returns: position of first inactive res. If none, returns -1

makeShortName(name=None)[source]¶: This method converts a long sequence name into a short name that is displayed on a screen.

createAnnotationSequence()[source]¶: Creates an empty annotation.

createSecondaryAssignment()[source]¶: Creates an empty secondary structure assignment annotation.

createSSBondAssignment()[source]¶: Creates an empty disulfide bond assignment annotation.

compare(sequence)[source]¶: Compares gapless version of self with other sequences and calculates identity between both.

getPDBId(with_chain=True)[source]¶

This function tries to generate a PDB ID based on the sequence name.

It supports different name formats: 1abcD, pdb|1abc|D, 1ABCD If the conversion fails, it will return an empty string.

isValidTemplate(reference=None)[source]¶

isValidProtein(global_annotation=False)[source]¶

isRuler()[source]¶

isDNA()[source]¶: Returns True if the sequence is DNA sequence.

translateDNA(translation_table={'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAT': 'N', 'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T', 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGT': 'S', 'ATA': 'I', 'ATC': 'I', 'ATG': 'M', 'ATT': 'I', 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAT': 'H', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R', 'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L', 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAT': 'D', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G', 'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V', 'TAA': 'X', 'TAC': 'Y', 'TAG': 'X', 'TAT': 'Y', 'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S', 'TGA': 'X', 'TGC': 'C', 'TGG': 'W', 'TGT': 'C', 'TTA': 'L', 'TTC': 'F', 'TTG': 'L', 'TTT': 'F'})[source]¶: Translates the sequence from nucleotide codes to amino acids.

isRNA()[source]¶: Returns True if the sequence is RNA sequence.

translateRNA(translation_table={'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAU': 'N', 'ACA': 'U', 'ACC': 'U', 'ACG': 'U', 'ACU': 'U', 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGU': 'S', 'AUA': 'I', 'AUC': 'I', 'AUG': 'M', 'AUU': 'I', 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAU': 'H', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCU': 'P', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGU': 'R', 'CUA': 'L', 'CUC': 'L', 'CUG': 'L', 'CUU': 'L', 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAU': 'D', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCU': 'A', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGU': 'G', 'GUA': 'V', 'GUC': 'V', 'GUG': 'V', 'GUU': 'V', 'UAA': 'X', 'UAC': 'Y', 'UAG': 'X', 'UAU': 'Y', 'UCA': 'S', 'UCC': 'S', 'UCG': 'S', 'UCU': 'S', 'UGA': 'X', 'UGC': 'C', 'UGG': 'W', 'UGU': 'C', 'UUA': 'L', 'UUC': 'F', 'UUG': 'L', 'UUU': 'F'})[source]¶: Translates the sequence from nucleotide codes to amino acids.

renumberResidues(start, incr, preserve_ins_codes=False)[source]¶

getValues(gapless=False)[source]¶: Returns a list of residue values.

isSortable(reference=None)[source]¶: Returns True if the sequence is sortable, False otherwise.

repair()[source]¶: Repairs the sequence by setting sequence-residue associations for all residues. Also, adds missing attributes (using default values) to the sequence.