schrodinger.ui.sequencealignment.sequence module

Implementation of multiple sequence viewer Sequence class.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.ui.sequencealignment.sequence.delete_from_str(inp_str, delete_chars)[source]

Delete characters from a string.

Note: replaces Python 2 inp_str.translate(None, delete_chars)

Parameters
  • inp_str (str) – A string to delete characters from. In Python 2, unicode input will be cast to str

  • delete_chars (str) – Characters to delete from the string

Returns

The input string with the delete_chars removed

Return type

str

class schrodinger.ui.sequencealignment.sequence.Sequence[source]

Bases: object

The sequence class represents a single basic sequence object. The Sequence object can correspond to amino acid sequence, nucleic acid sequence, annotation (such as secondary structure assignment or hydrophobicity plot) or helper object (for example, a ruler).

__init__()[source]
appendResidue(residue)[source]

Appends a new residue to self.

:type residue : sequence alignment Residue object

appendResidues(codes, use_numbers=False)[source]

Create new residues based on a single-code string and append them to existing sequence. Converts upper-case characters to lower-case, recognize gaps (‘.’, ‘-’, ‘~’) and ignore other characters.

Parameters
  • codes (string) – single-code amino acid

  • use_numbers (boolean) – If true, this function will try to recognize residue numbers included in the sequence and assign them to the residues.

removeStructureless()[source]

Removes structureless (SEQRES) residues from the sequence and its children.

replaceSequence(new_sequence)[source]

This method replaces current sequence with the provided string.

Parameters

new_sequence (str) – Must be same gapless length as old sequence.

Return type

bool

Returns

True if successful

toString(with_gaps=True)[source]

Returns a string representation of self.

Parameters

with_gaps (boolean (default=True)) – optional parameter, if True the returned string will include gaps, if False - only actual residue codes.

text()[source]

Returns self as a string.

gaplessText()[source]

Returns self as a gapless string.

copyForUndo(deep_copy=True)[source]
length()[source]

Returns a length of the sequence.

Return type

int

Returns

lengh of the sequence

unpaddedLength()[source]

Returns a length of the sequence with rightmost gaps stripped out.

Return type

int

Returns

length of the stripped sequence

gaplessLength()[source]

Returns a length of the sequence excluding gaps.

Return type

int

Returns

actual sequence length (number of residues)

gaplessResidues()[source]

Returns a list of gapless residues.

numberOfGaps()[source]

Returns a number of gaps in the sequence.

Return type

int

Returns

number of gaps in the sequence

countActiveGaps(pos)[source]
getResidue(index, ungapped=False, hidden=True)[source]

Returns a residue at a given sequence position, or None if the given position is invalid.

Parameters

index (int) – sequence position

Return type

Residue

Returns

residue for a given position, or None if the position is invalid

getResidueIndex(id)[source]

Returns index of residue with given id

:type id : string :param id : str(res.num) + str(res.icode)

:rtype : int if valid id, None if not :return : index of res if valid id, None if not

getUngappedIndex(index)[source]

Returns a residue index corresponding to ungapped position.

Parameters

index (int) – Residue index in gapped sequence

Return type

int

Returns

Index in ungapped sequence.

insertGaps(position, n_gaps, active=True)[source]

Inserts a specified number of gaps at a specified position.

Parameters
  • position (int) – sequence position where the gaps will be inserted

  • n_gaps (int) – number of gaps to be inserted at the position

Return type

int

Returns

number of gaps actually inserted at the position

removeGaps(position, n_gaps)[source]

Removes a specified number of gaps (or less) at a given position, starting from position and going to C-terminus. (towards higher index)

Parameters
  • position (int) – sequence position from where the gaps will be removed

  • n_gaps (int) – number of gaps to be removed at the position

Return type

int

Returns

number of gaps actually removed at the position

removeGapsBackwards(position, n_gaps)[source]

Removes a specified number of gaps (or less) at a given position, starting at the position and going to N-terminus. (towards lower index)

Parameters
  • position (int) – sequence position from where the gaps will be removed

  • n_gaps (int) – number of gaps to be removed at the position

Return type

int

Returns

number of gaps actually removed at the position

removeAllGaps(selected_only=False)[source]

Removes all gaps from the sequence. If selected_only, only removes gaps if gaps are selected.

unselectResidues()[source]

Unselects all residues in the sequence

selectAllResidues()[source]
invertSelection()[source]
hasSelectedResidues()[source]
Return type

bool

Returns

True if any of the residues are selected, False otherwise

hasSelectedChildren()[source]

Returns True if any of its children are selected.

hasAllSelectedResidues()[source]

Checks if all residues in the sequence are selected.

Return type

bool

Returns

True if all residues are selected, False otherwise

deleteSelectedResidues()[source]

Removes all selected residues from the sequence.

hideChildren()[source]

Hides all child sequences (effectively collapsing the sequence).

showChildren()[source]

Shows all child sequences (effectively expanding the sequence).

calculatePlotValues(half_window_size, min_value=None, max_value=None)[source]

Calculates window-averaged plot values, and the plot value extrema.

Parameters
  • half_window_size (int) – half-size of the window (can be 0 if not averaging)

  • min_value (float) – optional minimum value, if None then the minimum will be calculated

  • max_value (float) – optional maximum value, if None then the minimum will be calculated

propagateGapsToChildren(target_child=None)[source]

Propagates gaps from a parent sequence to all children. This method should be called after loading multiple alignment in order to ensure gap consistency between parent sequence and its children.

Parameters

target_child (Sequence) – If specified, only this child sequence will be used.

propagateGaps(sequence, parent_sequence=None, replace=False)[source]

Propagates gaps from self to a given sequence. Sequence is supposed to be a subset of self.

Return type

list of Residue

Returns

list of residues including gaps at matching positions

calcIdentity(reference_sequence, consider_gaps, in_columns)[source]

This method calculates sequence identity between self and a specified reference sequence, assuming that both sequences are already aligned.

Parameters
  • reference_sequence (Sequence) – reference sequence

  • consider_gaps (bool) – Should we include gaps in the calculation.

Return type

float

Returns

sequence identity (between 0.0 and 1.0)

calcSimilarity(reference_sequence, consider_gaps, in_columns)[source]

This method calculates sequence similarity between self and a specified reference sequence, assuming that both sequences are already aligned.

Parameters
  • reference_sequence (Sequence) – reference sequence

  • consider_gaps (bool) – Should we include gaps in the calculation.

Return type

float

Returns

sequence similarity (between 0.0 and 1.0)

calcHomology(reference_sequence, consider_gaps, in_columns)[source]

This method calculates sequence homolgy between self and a specified reference sequence, assuming that both sequences are already aligned. The homology criterion is based on “side chain chemistry” descriptor matching.

Parameters
  • reference_sequence (Sequence) – reference sequence

  • consider_gaps (bool) – Should we include gaps in the calculation.

Return type

float

Returns

sequence similarity (between 0.0 and 1.0)

calcScore(reference_sequence, consider_gaps, in_columns)[source]

This method calculates sequence similarity score between self and a specified reference sequence, assuming that both sequences are already aligned.

Parameters

reference_sequence (Sequence) – reference sequence

Return type

float

Returns

sequence similarity score

previousUngappedResidue(position)[source]
nextUngappedResidue(position)[source]
ungappedId(position, start, end, backwards=False)[source]

Returns residue ID for the first ungapped position in a specified region, starting from position and going forward or backwards. If no valid position is found (i.e. all residues in the specified region are gaps), returns an empty string.

Parameters
  • start (int) – lower boundary of the search region

  • end (int) – upper boundary of the search region

  • position (int) – initial position

  • backwards (bool) – if True, search the sequence backwards

Return type

string

Returns

ungapped residue ID, or empty string if no valid residue is found

hasAnnotationType(annotation_type)[source]

Checks if the sequence already has this annotation type.

Parameters

annotation_type (int) – annotation type

Return type

bool

Returns

True if the sequence has this annotation type already, False otherwise

sanitize()[source]

Removes all gaps and illegal residue codes from self.

makeInactive()[source]
makeActive()[source]
haveAnchors(pos)[source]
inactivePosition(pos)[source]

Finds first inactive residue position after given position.

Parameters

pos (int) – start position in sequence to begin search

Return type

int

Returns

position of first inactive res. If none, returns -1

makeShortName(name=None)[source]

This method converts a long sequence name into a short name that is displayed on a screen.

createAnnotationSequence()[source]

Creates an empty annotation.

createSecondaryAssignment()[source]

Creates an empty secondary structure assignment annotation.

createSSBondAssignment()[source]

Creates an empty disulfide bond assignment annotation.

compare(sequence)[source]

Compares gapless version of self with other sequences and calculates identity between both.

getPDBId(with_chain=True)[source]

This function tries to generate a PDB ID based on the sequence name.

It supports different name formats: 1abcD, pdb|1abc|D, 1ABCD If the conversion fails, it will return an empty string.

isValidTemplate(reference=None)[source]
isValidProtein(global_annotation=False)[source]
isRuler()[source]
isDNA()[source]

Returns True if the sequence is DNA sequence.

translateDNA(translation_table={'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAT': 'N', 'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T', 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGT': 'S', 'ATA': 'I', 'ATC': 'I', 'ATG': 'M', 'ATT': 'I', 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAT': 'H', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R', 'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L', 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAT': 'D', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G', 'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V', 'TAA': 'X', 'TAC': 'Y', 'TAG': 'X', 'TAT': 'Y', 'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S', 'TGA': 'X', 'TGC': 'C', 'TGG': 'W', 'TGT': 'C', 'TTA': 'L', 'TTC': 'F', 'TTG': 'L', 'TTT': 'F'})[source]

Translates the sequence from nucleotide codes to amino acids.

isRNA()[source]

Returns True if the sequence is RNA sequence.

translateRNA(translation_table={'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAU': 'N', 'ACA': 'U', 'ACC': 'U', 'ACG': 'U', 'ACU': 'U', 'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGU': 'S', 'AUA': 'I', 'AUC': 'I', 'AUG': 'M', 'AUU': 'I', 'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAU': 'H', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCU': 'P', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGU': 'R', 'CUA': 'L', 'CUC': 'L', 'CUG': 'L', 'CUU': 'L', 'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAU': 'D', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCU': 'A', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGU': 'G', 'GUA': 'V', 'GUC': 'V', 'GUG': 'V', 'GUU': 'V', 'UAA': 'X', 'UAC': 'Y', 'UAG': 'X', 'UAU': 'Y', 'UCA': 'S', 'UCC': 'S', 'UCG': 'S', 'UCU': 'S', 'UGA': 'X', 'UGC': 'C', 'UGG': 'W', 'UGU': 'C', 'UUA': 'L', 'UUC': 'F', 'UUG': 'L', 'UUU': 'F'})[source]

Translates the sequence from nucleotide codes to amino acids.

renumberResidues(start, incr, preserve_ins_codes=False)[source]
getValues(gapless=False)[source]

Returns a list of residue values.

isSortable(reference=None)[source]

Returns True if the sequence is sortable, False otherwise.

repair()[source]

Repairs the sequence by setting sequence-residue associations for all residues. Also, adds missing attributes (using default values) to the sequence.