schrodinger.protein.sequence module

Implementation of ProteinSequence, Sequence, and StructureSequence class.

StructureSequence allows iteration over all sequences in a given protein CT, and iteration over residues of each (in sequence order).

schrodinger.protein.sequence.guess_seq_type(res_names)

Takes an iterable of residue names and returns the appropriate sequence class

Parameters:

res_names (Iterable(str)) – An iterable of residue names. Note that all residue names must be uppercase.

Returns:

The appropriate class for the input sequence

Return type:

Type[AbstractSingleChainSequence]

Note that we use a pretty simple heuristic here.

schrodinger.protein.sequence.make_sequence(elements, *args, **kwargs)

Guesses the appropriates Sequence type from the names of residues and returns an instance with those residues.

Parameters:

elements (list(str)) – A list of strings to examine

Return type:

protein.Sequence

Returns:

An instance of the appropriate type

class schrodinger.protein.sequence.Sequence

Bases: object

class schrodinger.protein.sequence.AbstractSequence(*args, **kwargs)

Bases: Sequence, QObject

A base class for single-chain and combined-chain biological sequences.

Variables:
  • ORIGIN (enum.Enum) – Possible sequence origins

  • AnnotationClass (annotation.SequenceAnnotations) – Class to use for annotations

  • ElementClass (Type[residue.Residue]) – Class to use for elements

  • ALPHABET (dict(str, residue.ElementType)) – A mapping of string representations of elements to element types. Concrete subclasses must define this.

  • _gap_chars (tuple(str)) – A tuple of permissible gap characters in the element list; the first item will be used for serialization.

  • _unknown_res_type (residue.ElementType) – The type for an unknown residue

  • residuesChanged (QtCore.pyqtSignal) – A signal emitted when sequence residues are changed. Emitted with the indices of the first and last changed residues.

  • lengthAboutToChange (QtCore.pyqtSignal) – A signal emitted when the sequence length is about to change. Emitted with the old and new lengths.

  • lengthChanged (QtCore.pyqtSignal) – A signal emitted when the sequence length is changed. Emitted with the old and new lengths.

  • nameChanged (QtCore.pyqtSignal) – A signal emitted when the sequence name is changed.

  • visibilityChanged (QtCore.pyqtSignal) – A signal emitted when the visibility is changed.

  • structureChanged (QtCore.pyqtSignal) – A signal emitted when the structure changes.

  • annotationTitleChanged (QtCore.pyqtSignal) – A signal emitted when an annotation title is changed.

class ORIGIN

Bases: Enum

Maestro = 1
PyMOL = 2
AnnotationClass = None
ElementClass

alias of Residue

ALPHABET = {}
residuesRemoved

A pyqtSignal emitted by instances of the class.

residuesAdded

A pyqtSignal emitted by instances of the class.

residuesChanged

A pyqtSignal emitted by instances of the class.

lengthAboutToChange

A pyqtSignal emitted by instances of the class.

lengthChanged

A pyqtSignal emitted by instances of the class.

nameChanged

A pyqtSignal emitted by instances of the class.

visibilityChanged

A pyqtSignal emitted by instances of the class.

structureChanged

A pyqtSignal emitted by instances of the class.

annotationTitleChanged

A pyqtSignal emitted by instances of the class.

pfamChanged

A pyqtSignal emitted by instances of the class.

descriptorsCleared

A pyqtSignal emitted by instances of the class.

__init__(*args, **kwargs)
residues()

Return an iterable of residues, ignoring gaps.

Returns:

Iterable of residues

Return type:

iter(Residue)

getNextResidue(res)

Return the next residue in the sequence (ignoring gaps) or None if this is the last residue.

Parameters:

res (schrodinger.protein.residue.Residue) – A given residue in the sequence

Returns:

The previous residue in the sequence

Return type:

schrodinger.protein.residue.Residue

getPreviousResidue(res)

Return the previous residue in the sequence (ignoring gaps) or None if this is the first residue.

Parameters:

res (schrodinger.protein.residue.Residue) – A given residue in the sequence

Returns:

The previous residue in the sequence

Return type:

schrodinger.protein.residue.Residue

iterNeighbors()

Return an iterable of three element tuples consisting of (prev_res, curr_res, next_res), ignoring gaps.

None is used for neighbors of first and last residues in the sequence, and does not indicate gaps here.

Returns:

Iterable of 3-tuples, each element of the each tuple being either a schrodinger.protein.residue.Residue or None

Return type:

iter(tuple(Residue or NoneType, Residue, Residue or NoneType))

index(res)

Returns the index of the specified residue.

Parameters:

res (residue.Residue) – The residue to find

Returns:

The index of the residue

Return type:

int

indices(residues)

Returns the indices of all specified residues. Note that there is no guarantee that the returned integers will be in the same order as the input residues. (For combined-chain sequences, it’s highly likely that they won’t be.)

Parameters:

res (Iterable(residue.Residue)) – The residues to find indices of

Returns:

The indices of the residues

Return type:

list[int]

getRun(res)

For a given residue or gap, return a set of all adjacent element indices in the sequence that are also residues or gaps.

Parameters:

res (residue.AbstractSequenceElement) – Residue to get the run of

Reuturn:

Set of residue indices in the run

Return type:

set(int)

insertElements(index, elements)

Insert a list of elements or sequence element into this sequence.

If elements is a string or iterable of strings, residue numbers will be automatically assigned.

Parameters:
  • index (int) – The index at which to insert elements

  • elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert

mutate(start, end, elements)

Mutate sequence elements starting at the given index to the provided elements.

Parameters:
  • start (int) – The index at which to start mutating

  • end (int) – The index of the last mutated element (exclusive)

  • elements (iterable(self.ElementClass) or iterable(str)) – The elements to which to mutate the sequence

append(element)

Appends an element to the sequence

Parameters:

element – The element to append to this sequence

Type:

element: self.ElementClass or basestring

extend(elements)

Extends the sequence with elements from an iterable

Parameters:

elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence

getSubsequence(start, end)

Return a sequence containing a subset of the elements in this one

Parameters:
  • start (int) – The index at which the subsequence should start

  • end (int) – The index at which the subsequence should end (exclusive)

Returns:

The requested subsequence

Return type:

AbstractSequence

removeElements(eles)

Remove elements from the sequence.

Parameters:

eles (list(residue.AbstractSequenceElement)) – A list of elements to remove from the sequence.

Raises:

ValueError – If any of the given elements are not in the sequence.

property gap_char
getGaplessLength()
Returns:

Length of this sequence ignoring gaps

Return type:

int

getGaps()
Return type:

list(residue.Gap)

Returns:

The gaps in the sequence.

addGapsByIndices(gap_idxs)

Add gaps to the sequence from a list of gap indices. Note that these indices are based on numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see addGapsBeforeIndices.

Parameters:

gap_idxs (list(int)) – A list of gap indices

validateGapIndices(gap_idxs)

Make sure that the specified gap indices are valid input for addGapsByIndices. If the indices are invalid, a ValueError will be raised. Indices are considered invalid if:

  • they refer to a position that’s more than one residue past the end of the sequence

  • they are negative

Parameters:

gap_idxs (list[int]) – The gap indices to validate

addGapsBeforeIndices(indices)

Add one gap to the alignment before each of the specified residue positions. Note that these indices are based on numbering before the insertion. To insert gaps using indices based on numbering after the insertion, see addGapsByIndices.

Parameters:

indices (list(int)) – A list of indices to insert gaps before.

removeTerminalGaps()

Remove gaps from the end of the sequence

getTerminalGaps()

Return terminal gaps.

Returns:

A list of terminal gaps (in ascending index order)

Return type:

list(residue.Gap)

getGapCount()
Returns:

the number of gaps in the sequence

Return type:

int

removeAllGaps()

Remove gaps from the sequence

property annotations
getAnnotation(index, annotation)

Returns the annotation at the specified index or None for a gap.

Raises:

ValueError – if the annotation is not available

getAnnotationValueForComparison(col, anno, ann_index=0, cdr_scheme=None, vernier=False)

Get an annotation value appropriate for determining whether two residues have the same annotation value. e.g. for binding site, only same-distance contacts of the same ligand should compare equal.

Parameters:
  • anno (annotation.ProteinSequenceAnnotations.ANNOTATION_TYPES) – Protein sequence annotation enum member

  • ann_index (int) – Annotation index for multi-value annotations

  • cdr_scheme (annotation.AntibodyCDRScheme) – CDR scheme for antibody annotation. (Will be ignored if anno is not SEQ_ANNO_TYPES.antibody_cdr.)

  • vernier (bool) – Whether Vernier Zones are enabled

Returns:

Annotation value for comparison or None if the corresponding residue is a gap or has no value for this annotation

Return type:

object or None

getNumAnnValues(ann)
clearAllCaching()

This method should be implemented in subclasses that cache any data.

setPfam(new_pfam, pfam_name)
clearPfam()
hasPfam()
hasStructure()
Returns:

Whether this sequence has an associated structure.

Return type:

bool

getStructure()
Returns:

The associated structure. Will return None if there is no associated structure.

Return type:

schrodinger.structure.Structure or NoneType

setStructure(struc)

Set the associated structure. Can only be used on sequences with an associated structure.

Parameters:

struc (schrodinger.structure.Structure) – The new structure for this sequence

Raises:

RuntimeError – If there’s no structure associated with this sequence object.

getIdentity(other, consider_gaps=True, only_consider=None)

Return a float scoring the identity between the sequence and another sequence, assuming that they’re already aligned

Parameters:
Returns:

The sequence identity score (between 0.0 and 1.0)

Return type:

float

getSimilarity(other, consider_gaps=True, only_consider=None)

Return a float score of the similarity count between the sequence and another sequence, assuming that they’re already aligned.

Parameters:
Returns:

The sequence similarity score (between 0.0 and 1.0)

Return type:

float

getConservation(other, consider_gaps=True, only_consider=None)

Return a float scoring the homology conservation between the sequence and another sequence, assuming that they’re already aligned.

The homology criterion is based on “side chain chemistry” descriptor matching.

Parameters:
Returns:

The sequence conservation score (between 0.0 and 1.0)

Return type:

float

getSimilarityScore(other, consider_gaps=True, only_consider=None)

Return the total score of similarity between the sequence and a other sequence, assuming that they’re already aligned.

Parameters:
  • other (schrodinger.protein.sequence.Sequence) – A sequence to compare against

  • consider_gaps (bool) – Ignored because the similarity with a gap is 0.0

  • only_consider (set or None) – A set of residues to restrict attention to

Returns:

The total sequence similarity score

Return type:

int

getStructureResForRes(res)
Parameters:

res (residue.Residue) – Residue to get structure residue for

Returns:

Structure residue or None if no matching residue is found

Return type:

schrodinger.structure._Residue or NoneType

structuredResidueCount()

Get the number of residues in this sequence with an associated structured residue. :rtype: int

hasStructuredResidues()

Return whether this sequence has any structured residues. This method is equivalent to bool(seq.structuredResidueCount()) but doesn’t (typically) require iterating through the entire sequence.

Return type:

bool

getProperty(seq_prop)

Get the sequence’s value corresponding to the given SequenceProperty object

Parameters:

seq_prop (schrodinger.protein.properties.SequenceProperty) – The object describing the sequence property

Returns:

The value of the sequence property

Return type:

float or None

updateDescriptors(descriptors, property_source)

Updates the descriptor dicts with new descriptor values

Parameters:
  • descriptors (dict[str, float]) – A dict mapping descriptor names to their values

  • property_source (properties.PropertySource) – The source of the descriptors

clearDescriptors()
property descriptors
hasDescriptors()
class schrodinger.protein.sequence.AbstractSingleChainSequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=None)

Bases: AbstractSequence

Base class for single-chain biological sequences

Note: Protein-specific functionality should go in ProteinSequence.

Variables:

sequenceCopied (QtCore.pyqtSignal) – A signal emitted when this sequence is copied. Emitted with the sequence being copied and the newly created copy. This signal is used by the structure model to make sure that the newly created copy is kept in sync with the structure.

sequenceCopied

A pyqtSignal emitted by instances of the class.

__init__(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=None)

Make a sequence object from a list of strings and/or self.ElementClass

Strings are converted to self.ElementClass using a mapping of strings to element types.

Parameters:
  • elements (iterable(self.ElementClass or None) or iterable(str or None)) – An iterable of elements making up the sequence. None are interpretted as gaps.

  • name (str) – The name of the sequence (possibly shortened), used for display purposes. For a FASTA sequence, this should be a short identifier such as the Uniprot ID.

  • origin (Sequence.ORIGIN or None) – A piece of metadata indicating where the sequence came from

  • entry_id – An entry associated with the sequence, if any

  • entry_name – An entry name associated with the sequence, if any

  • pdb_id – An id associated with the sequence, if any

  • chain (str) – The chain to which the sequence belongs

  • structure_chain (str) – The chain of the structure this sequence is associated with. This is usually the same as chain if the sequence has a structure but isn’t necessarily.

  • long_name (str) – The full name of the sequence. For a FASTA sequence, this should be the full FASTA header.

  • resnums (Iterable(int)) – Residue numbers to assign to elements. If not given, residues will be numbered starting from one. Regardless of what’s given here, no residue numbering will occur if elements is an iterable of ElementClass and any element already has a residue number set. If this iterable is shorter than elements, additional numbers will be generated by incrementing the last number present.

Type:

str

Type:

str

Type:

str

__len__()
__contains__(item)
classmethod makeSeqElement(element)
Parameters:

element (str or cls.ElementClass) – A sequence element or string representation thereof

Returns:

sequence element

Return type:

cls.ElementClass

classmethod isValid(elements)
Parameters:

elements (iterable(str) or str) – An iterable of string representations of elements making up the sequence

Returns:

Tuple indicating whether valid and a set of invalid characters, if any

Return type:

tuple(bool, set(str))

property name
property chain
property fullname
Returns:

a formatted name + optional chain name for the sequence

Return type:

str

property visibility
index(res, ignore_gaps=False)

Returns the index of the specified residue

Parameters:
  • res (residue.Residue) – The residue to find

  • ignore_gaps (bool) – Whether the index returned should ignore gaps in the sequence or not.

Raises:

A ValueError if the residue is not present

Return type:

int

Returns:

The index of the residue

indices(residues, ignore_gaps=False)

Returns the indices of all specified residues. Note that there is no guarantee that the returned integers will be in the same order as the input residues. (For combined-chain sequences, it’s highly likely that they won’t be.)

Parameters:
  • res (Iterable(residue.Residue)) – The residues to find indices of

  • ignore_gaps (bool) – Whether the indices returned should ignore gaps in the sequence or not.

Returns:

The indices of the residues

Return type:

list[int]

insertElements(index, elements)

Insert a list of elements or sequence element into this sequence.

If elements is a string or iterable of strings, residue numbers will be automatically assigned.

Parameters:
  • index (int) – The index at which to insert elements

  • elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert

mutate(start, end, elements)

Mutate sequence elements starting at the given index to the provided elements.

Parameters:
  • start (int) – The index at which to start mutating

  • end (int) – The index of the last mutated element (exclusive)

  • elements (iterable(self.ElementClass) or iterable(str)) – The elements to which to mutate the sequence

append(element)

Appends an element to the sequence

Parameters:

element – The element to append to this sequence

Type:

element: self.ElementClass or basestring

extend(elements)

Extends the sequence with elements from an iterable

Parameters:

elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence

getSubsequence(start, end)

Return a sequence containing a subset of the elements in this one

Parameters:
  • start (int) – The index at which the subsequence should start

  • end (int) – The index at which the subsequence should end (exclusive)

Returns:

The requested subsequence

Return type:

AbstractSequence

removeElements(eles)

Remove elements from the sequence.

Parameters:

eles (list(residue.AbstractSequenceElement)) – A list of elements to remove from the sequence.

Raises:

ValueError – If any of the given elements are not in the sequence.

getGaplessLength()
Returns:

Length of this sequence ignoring gaps

Return type:

int

addGapsByIndices(gap_idxs)

Add gaps to the sequence from a list of gap indices. Note that these indices are based on numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see addGapsBeforeIndices.

Parameters:

gap_idxs (list(int)) – A list of gap indices

setSSA(new_ssa)
setSSAPredictions(pred)
setDisorderedRegionsPredictions(pred)
setDomainArrangementPredictions(pred)
setSolventAccessibilityPredictions(pred)
deletePrediction(prediction_type)
deleteAllPredictions()
property origin
Returns:

A piece of metadata indicating where the sequence came from

Rtype origin:

Sequence.ORIGIN or None

hasStructure()
Returns:

Whether this sequence has an associated structure.

Return type:

bool

getStructure()
Returns:

The associated structure. Will return None if there is no associated structure.

Return type:

schrodinger.structure.Structure or NoneType

setStructure(struc)

Set the associated structure. Can only be used on sequences with an associated structure.

Parameters:

struc (schrodinger.structure.Structure) – The new structure for this sequence

Raises:

RuntimeError – If there’s no structure associated with this sequence object.

onStructureChanged()
setResidueMap(residue_map)

Set a new mapping between ResidueKey and structured residues.

Note: the only intended user of this method is schrodinger.application.msv.seqio.StructureConverter, where the ResidueKey is computed from the structure._Residue used to create the residue.Residue. If the sequence has a structure, the map can be generated using generateResidueMap.

Parameters:

residue_map (dict(residue.ResidueKey, residue.Residue)) – Mapping between residue key and Residue

generateResidueMap()

Create residue map based on current structured residues.

Note: this method requires self.hasStructure() to be True and self.entry_id to be set. If this sequence was produced by schrodinger.application.msv.seqio.StructureConverter, there should already be a residue map and this method does not need to be called.

Raises:

RuntimeError – If sequence has no structure or entry id

getResByKey(res_key)
Parameters:

res_key (residue.ResidueKey) – Residue key: (entry_id, chain, resnum, inscode)

Returns:

Residue matching key or None if no matching residue is found

Return type:

residue.Residue or NoneType

Raises:

RuntimeError – If sequence has no structure

getStructureResForRes(res)
Parameters:

res (residue.Residue) – Residue to get structure residue for

Returns:

Structure residue or None if no matching residue is found

Return type:

schrodinger.structure._Residue or NoneType

class schrodinger.protein.sequence.ProteinSequenceMeta

Bases: wrappertype

Metaclass for split-chain and combined-chain protein sequences

property ALPHABET
class schrodinger.protein.sequence.AbstractProteinSequenceMixin(*args, **kwargs)

Bases: object

A mixin for code shared between split-chain and combined-chain protein sequences.

disulfideBondsCacheCleared

A pyqtSignal emitted by instances of the class.

predictionsChanged

A pyqtSignal emitted by instances of the class.

secondaryStructureChanged

A pyqtSignal emitted by instances of the class.

kinaseFeaturesChanged

A pyqtSignal emitted by instances of the class.

kinaseConservationChanged

A pyqtSignal emitted by instances of the class.

property ALPHABET
__init__(*args, **kwargs)
property disulfide_bonds
Returns:

A sorted tuple of the valid disulfide bonds.

Return type:

tuple(residue.DisulfideBond)

property secondary_structures

A list of _SecondaryStructure namedtuples containing the type of secondary structure and where the secondary structures begin and end.

Returns:

A list of namedtuples containing an SS_TYPE from schrodinger.structure and the residue indexes marking the limits of the secondary structure.

Return type:

list(_SecondaryStructure)

clearDisulfideBondsCache()
isKinaseChain() bool
hasDisorderedRegionsPredictions()
hasDisulfideBondPredictions()
hasDomainArrangementPredictions()
hasSolventAccessibility()
hasSSAPredictions()
property pred_secondary_structures
class schrodinger.protein.sequence.ProteinSequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)

Bases: JsonableClassMixin, AbstractProteinSequenceMixin, AbstractSingleChainSequence

A single-chain protein sequence.

Variables:

secondaryStructureCacheCleared – A signal emitted when the secondary structure cache has been cleared. Used to keep the CombinedChainProteinSequence cache in sync. If listening for changes in the secondary structure values, use secondaryStructureChanged instead.

AnnotationClass

alias of ProteinSequenceAnnotations

ElementClass

alias of Residue

secondaryStructureCacheCleared

A pyqtSignal emitted by instances of the class.

__init__(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)

See AbstractSingleChainSequence for additional documentation.

Parameters:
  • disulfide_bonds (Iterable(tuple(int, int))) – A list of pairs of residue indices to link via disulfide bonds.

  • pred_disulfide_bonds (Iterable(tuple(int, int))) – A list of pairs of residue indices to link via predicted disulfide bonds.

toJsonImplementation()

Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.

Returns:

A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders

classmethod fromJsonImplementation(json_obj)

Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.

Parameters:

json_dict (dict) – A dictionary loaded from a JSON string or file.

Returns:

An instance of the derived class.

Return type:

cls

classmethod adapter47007(json_dict)
classmethod adapter48002(json_dict)
classmethod adapter52065(json_dict)
classmethod isValid(elements)
Parameters:

elements (iterable(str) or str) – An iterable of string representations of elements making up the sequence

Returns:

Tuple indicating whether valid and a set of invalid characters, if any

Return type:

tuple(bool, set(str))

renumberResidues(new_rescode_map)

Renumber residues in this sequence given a dictionary mapping old rescodes to the new desired rescodes.

isKinaseChain()
property is_kinase_annotated
setKinaseFeatures(feature_map: Dict[Residue, KinaseFeatureLabel])
Parameters:

feature_map – Map of residue to kinase feature label

property is_kinase_cons_annotated
setKinaseConservation(cons_map, lig_asl)
Parameters:
property disulfide_bonds
Returns:

A sorted tuple of the valid disulfide bonds.

Return type:

tuple(residue.DisulfideBond)

property pred_disulfide_bonds
removeStructurelessResidues(start=0, end=None)

Remove any structureless residues

Parameters:
  • start (int) – The index at which to start filtering structureless residues.

  • end (int) – The index at which to end filtering

encodeForPatternSearch(with_ss=False, with_flex=False, with_asa=False)

Convert to sequence dict expected by find_generalized_pattern.

Parameters:
  • with_ss (bool) – Whether to include secondary structure information.

  • with_flex (bool) – Whether to include flexibility information.

  • with_asa (bool) – Whether to include accessible surface area information.

Return type:

dict

Returns:

dictionary of sequence data

clearAllCaching()

This method should be implemented in subclasses that cache any data.

setGPCRAnnotations(gpcr_annotation_map)
class schrodinger.protein.sequence.NucleicSequenceMeta

Bases: ProteinSequenceMeta

ALPHABET = None
class schrodinger.protein.sequence.NucleicAcidSequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)

Bases: ProteinSequence

AnnotationClass

alias of NucleicAcidSequenceAnnotations

ElementClass

alias of Nucleotide

ALPHABET = None
COMPLEMENT_FN = None
REVERSE_COMPLEMENT_FN = None
property is_kinase_cons_annotated
property is_kinase_annotated
isKinaseChain()
getTranslation()

Get a translated sequence. This method uses BioPython’s translate method to convert a nucleic acid sequence into an amino acid sequence

Returns:

A translated protein sequence. The name and chain from the nucleic acid sequence are copied over

Return type:

ProteinSequence

getReverse()

Get the reverse of a DNA or RNA sequence. Residues are renumbered. Creates a new Sequence; original object is unmodified.

Returns:

The reversed nucleic acid sequence, of the same type as self. The name and chain from the nucleic acid sequence are copied over.

Return type:

NucleicAcidSequence

getComplement()

Get the complement of a DNA or RNA sequence. This method uses BioPython’s complement method for nucleic acid sequences. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.

Returns:

The complementary nucleic acid sequence, of the same type as self. The name and chain from the nucleic acid sequence are copied over.

Return type:

NucleicAcidSequence

getReverseComplement()

Get the reverse complement of a DNA or RNA sequence. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.

Returns:

The reverse complement nucleic acid sequence, of the same type as self. The name and chain from the nucleic acid sequence are copied over.

Return type:

NucleicAcidSequence

encodeForPatternSearch(with_ss=False, with_flex=False, with_asa=False)

Convert to sequence dict expected by find_generalized_pattern.

Parameters:
  • with_ss (bool) – Whether to include secondary structure information.

  • with_flex (bool) – Whether to include flexibility information.

  • with_asa (bool) – Whether to include accessible surface area information.

Return type:

dict

Returns:

dictionary of sequence data

class schrodinger.protein.sequence.DNASequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)

Bases: NucleicAcidSequence

ALPHABET = mappingproxy({'DA': DeoxyribonucleotideType('A', 'DA', 'Adenine'), 'DC': DeoxyribonucleotideType('C', 'DC', 'Cytosine'), 'DG': DeoxyribonucleotideType('G', 'DG', 'Guanine'), 'DT': DeoxyribonucleotideType('T', 'DT', 'Thymine'), 'A': DeoxyribonucleotideType('A', 'DA', 'Adenine'), 'C': DeoxyribonucleotideType('C', 'DC', 'Cytosine'), 'G': DeoxyribonucleotideType('G', 'DG', 'Guanine'), 'T': DeoxyribonucleotideType('T', 'DT', 'Thymine')})
static COMPLEMENT_FN(sequence, inplace=False)

Return the complement as a DNA sequence.

If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.

>>> my_seq = "CGA"
>>> complement(my_seq)
'GCT'
>>> my_seq = Seq("CGA")
>>> complement(my_seq)
Seq('GCT')
>>> my_seq = MutableSeq("CGA")
>>> complement(my_seq)
MutableSeq('GCT')
>>> my_seq
MutableSeq('CGA')

Any U in the sequence is treated as a T:

>>> complement(Seq("CGAUT"))
Seq('GCTAA')

In contrast, complement_rna returns an RNA sequence:

>>> complement_rna(Seq("CGAUT"))
Seq('GCUAA')

Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:

>>> complement("ACGTUacgtuXYZxyz")
'TGCAAtgcaaXRZxrz'

The sequence is modified in-place and returned if inplace is True:

>>> my_seq = MutableSeq("CGA")
>>> complement(my_seq, inplace=True)
MutableSeq('GCT')
>>> my_seq
MutableSeq('GCT')

As strings and Seq objects are immutable, a TypeError is raised if reverse_complement is called on a Seq object with inplace=True.

static REVERSE_COMPLEMENT_FN(sequence, inplace=False)

Return the reverse complement as a DNA sequence.

If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.

>>> my_seq = "CGA"
>>> reverse_complement(my_seq)
'TCG'
>>> my_seq = Seq("CGA")
>>> reverse_complement(my_seq)
Seq('TCG')
>>> my_seq = MutableSeq("CGA")
>>> reverse_complement(my_seq)
MutableSeq('TCG')
>>> my_seq
MutableSeq('CGA')

Any U in the sequence is treated as a T:

>>> reverse_complement(Seq("CGAUT"))
Seq('AATCG')

In contrast, reverse_complement_rna returns an RNA sequence:

>>> reverse_complement_rna(Seq("CGAUT"))
Seq('AAUCG')

Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:

>>> reverse_complement("ACGTUacgtuXYZxyz")
'zrxZRXaacgtAACGT'

The sequence is modified in-place and returned if inplace is True:

>>> my_seq = MutableSeq("CGA")
>>> reverse_complement(my_seq, inplace=True)
MutableSeq('TCG')
>>> my_seq
MutableSeq('TCG')

As strings and Seq objects are immutable, a TypeError is raised if reverse_complement is called on a Seq object with inplace=True.

class schrodinger.protein.sequence.RNASequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)

Bases: NucleicAcidSequence

ALPHABET = mappingproxy({'A': RibonucleotideType('A', 'A', 'Adenine'), 'C': RibonucleotideType('C', 'C', 'Cytosine'), 'G': RibonucleotideType('G', 'G', 'Guanine'), 'U': RibonucleotideType('U', 'U', 'Uracil')})
static COMPLEMENT_FN(sequence, inplace=False)

Return the complement as an RNA sequence.

If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.

>>> my_seq = "CGA"
>>> complement_rna(my_seq)
'GCU'
>>> my_seq = Seq("CGA")
>>> complement_rna(my_seq)
Seq('GCU')
>>> my_seq = MutableSeq("CGA")
>>> complement_rna(my_seq)
MutableSeq('GCU')
>>> my_seq
MutableSeq('CGA')

Any T in the sequence is treated as a U:

>>> complement_rna(Seq("CGAUT"))
Seq('GCUAA')

In contrast, complement returns a DNA sequence:

>>> complement(Seq("CGAUT"))
Seq('GCTAA')

Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:

>>> complement_rna("ACGTUacgtuXYZxyz")
'UGCAAugcaaXRZxrz'

The sequence is modified in-place and returned if inplace is True:

>>> my_seq = MutableSeq("CGA")
>>> complement(my_seq, inplace=True)
MutableSeq('GCT')
>>> my_seq
MutableSeq('GCT')

As strings and Seq objects are immutable, a TypeError is raised if reverse_complement is called on a Seq object with inplace=True.

static REVERSE_COMPLEMENT_FN(sequence, inplace=False)

Return the reverse complement as an RNA sequence.

If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.

>>> my_seq = "CGA"
>>> reverse_complement_rna(my_seq)
'UCG'
>>> my_seq = Seq("CGA")
>>> reverse_complement_rna(my_seq)
Seq('UCG')
>>> my_seq = MutableSeq("CGA")
>>> reverse_complement_rna(my_seq)
MutableSeq('UCG')
>>> my_seq
MutableSeq('CGA')

Any T in the sequence is treated as a U:

>>> reverse_complement_rna(Seq("CGAUT"))
Seq('AAUCG')

In contrast, reverse_complement returns a DNA sequence:

>>> reverse_complement(Seq("CGAUT"), inplace=False)
Seq('AATCG')

Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:

>>> reverse_complement_rna("ACGTUacgtuXYZxyz")
'zrxZRXaacguAACGU'

The sequence is modified in-place and returned if inplace is True:

>>> my_seq = MutableSeq("CGA")
>>> reverse_complement_rna(my_seq, inplace=True)
MutableSeq('UCG')
>>> my_seq
MutableSeq('UCG')

As strings and Seq objects are immutable, a TypeError is raised if reverse_complement is called on a Seq object with inplace=True.

class schrodinger.protein.sequence.NASequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)

Bases: NucleicAcidSequence

A nucleic acid sequence. Agnostic to backbone type and capable of representing bases typical to either DNA or RNA.

ALPHABET = mappingproxy({'DA': DeoxyribonucleotideType('A', 'DA', 'Adenine'), 'DC': DeoxyribonucleotideType('C', 'DC', 'Cytosine'), 'DG': DeoxyribonucleotideType('G', 'DG', 'Guanine'), 'DT': DeoxyribonucleotideType('T', 'DT', 'Thymine'), 'A': RibonucleotideType('A', '6MA', 'Adenine'), 'C': RibonucleotideType('C', 'OMC', 'Cytosine'), 'G': RibonucleotideType('G', 'OMG', 'Guanine'), 'U': RibonucleotideType('U', 'DU', 'Uracil'), 'AMP': RibonucleotideType('A', 'AMP', 'Adenine'), 'ADP': RibonucleotideType('A', 'ADP', 'Adenine'), 'ATP': RibonucleotideType('A', 'ATP', 'Adenine'), '1MA': RibonucleotideType('A', '1MA', 'Adenine'), '6MA': RibonucleotideType('A', '6MA', 'Adenine'), 'CMP': RibonucleotideType('C', 'CMP', 'Cytosine'), 'CDP': RibonucleotideType('C', 'CDP', 'Cytosine'), 'CTP': RibonucleotideType('C', 'CTP', 'Cytosine'), '5MC': RibonucleotideType('C', '5MC', 'Cytosine'), '5HC': RibonucleotideType('C', '5HC', 'Cytosine'), '5FC': RibonucleotideType('C', '5FC', 'Cytosine'), '1CC': RibonucleotideType('C', '1CC', 'Cytosine'), 'OMC': RibonucleotideType('C', 'OMC', 'Cytosine'), 'GMP': RibonucleotideType('G', 'GMP', 'Guanine'), 'GDP': RibonucleotideType('G', 'GDP', 'Guanine'), 'GTP': RibonucleotideType('G', 'GTP', 'Guanine'), '1MG': RibonucleotideType('G', '1MG', 'Guanine'), '2MG': RibonucleotideType('G', '2MG', 'Guanine'), 'M2G': RibonucleotideType('G', 'M2G', 'Guanine'), '7MG': RibonucleotideType('G', '7MG', 'Guanine'), 'OMG': RibonucleotideType('G', 'OMG', 'Guanine'), 'UMP': RibonucleotideType('U', 'UMP', 'Uracil'), 'UDP': RibonucleotideType('U', 'UDP', 'Uracil'), 'UTP': RibonucleotideType('U', 'UTP', 'Uracil'), 'PSU': RibonucleotideType('Ψ', 'PSU', 'Uracil'), 'H2U': RibonucleotideType('U', 'H2U', 'Uracil'), '5MU': RibonucleotideType('U', '5MU', 'Uracil'), 'DU': RibonucleotideType('U', 'DU', 'Uracil'), 'TMP': DeoxyribonucleotideType('T', 'TMP', 'Thymine'), 'TDP': DeoxyribonucleotideType('T', 'TDP', 'Thymine'), 'TTP': DeoxyribonucleotideType('T', 'TTP', 'Thymine'), 'T': DeoxyribonucleotideType('T', 'TTP', 'Thymine'), 'Ψ': RibonucleotideType('Ψ', 'PSU', 'Uracil')})
getTranslation()

Get a translated sequence. This method uses BioPython’s translate method to convert a nucleic acid sequence into an amino acid sequence

Returns:

A translated protein sequence. The name and chain from the nucleic acid sequence are copied over

Return type:

ProteinSequence

getComplement()

Get the complement of a DNA or RNA sequence. This method uses BioPython’s complement method for nucleic acid sequences. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.

Returns:

The complementary nucleic acid sequence, of the same type as self. The name and chain from the nucleic acid sequence are copied over.

Return type:

NucleicAcidSequence

getReverseComplement()

Get the reverse complement of a DNA or RNA sequence. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.

Returns:

The reverse complement nucleic acid sequence, of the same type as self. The name and chain from the nucleic acid sequence are copied over.

Return type:

NucleicAcidSequence

class schrodinger.protein.sequence.CombinedChainSequenceMeta(cls, bases, classdict, *, wraps=None, wrapped_constants=(), wrapped_properties=(), wrapped_getters=(), wrapped_setters=())

Bases: DocstringWrapperMetaClass, ProteinSequenceMeta

The metaclass for CombinedChainProteinSequence. This metaclass wraps the specified class attributes.

class schrodinger.protein.sequence.GapRegion(from_start: int, from_end: int)

Bases: object

Container for information about gaps to add to or remove from the start and end of a chain

from_start: int
from_end: int
__init__(from_start: int, from_end: int) None
class schrodinger.protein.sequence.CombinedChainProteinSequence(seqs)

Bases: AbstractProteinSequenceMixin, AbstractSequence

A sequence that contains multiple chains from the same protein. Instances of this class do not directly contain any residues themselves and instead wrap one or several ProteinSequence objects.

Note:

CombinedChainProteinSequence.visibility properly reports entry inclusion state, but it may not correctly report entry visibility (e.g. partially visible vs. fully visible). The MSV structure icons only report inclusion state and the visibility of included entries isn’t reported anywhere in the panel, though, so this limitation doesn’t have any impact on functionality.

AnnotationClass

alias of CombinedChainProteinSequenceAnnotations

__init__(seqs)
Parameters:

seqs (list(ProteinSequence)) – A list of the split-chain sequences to wrap.

__len__()
property fullname
index(res)

Returns the index of the specified residue

Parameters:
  • res (residue.Residue) – The residue to find

  • ignore_gaps (bool) – Whether the index returned should ignore gaps in the sequence or not.

Raises:

A ValueError if the residue is not present

Return type:

int

Returns:

The index of the residue

indices(residues)

Returns the indices of all specified residues. Note that the returned integers will likely not be in the same order as the input residues.

Parameters:

res (Iterable(residue.CombinedChainResidueWrapper)) – The residues to find indices of

Returns:

The indices of the residues

Return type:

list[int]

insertElements(index, elements)

Insert a list of elements or sequence element into this sequence.

If elements is a string or iterable of strings, residue numbers will be automatically assigned.

Parameters:
  • index (int) – The index at which to insert elements

  • elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert

mutate(start, end, elements)

Mutate sequence elements. See parent class for additional method documentation.

Raises:

MultipleChainsError – If the specified residue range spans multiple chains.

assertCanMutateResidues(start, end)

Make sure that we can mutate the specified residues. If not, raise an exception.

Parameters:
  • start (int) – The index at which to start mutating

  • end (int) – The index of the last mutated element (exclusive)

Raises:

MultipleChainsError – If the specified residue range spans multiple chains.

append(element)

Appends an element to the sequence

Parameters:

element – The element to append to this sequence

Type:

element: self.ElementClass or basestring

extend(elements)

Extends the sequence with elements from an iterable

Parameters:

elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence

getSubsequence(start, end)

Return a sequence containing a subset of the elements in this one. Note that the new sequence will be a split-chain sequence and will ignore any chain breaks present in the requested subset of elements.

Parameters:
  • start (int) – The index at which the subsequence should start

  • end (int) – The index at which the subsequence should end (exclusive)

Returns:

The requested subsequence

Return type:

ProteinSequence

removeElements(eles)

Remove elements from the sequence.

Parameters:

eles (list(residue.AbstractSequenceElement)) – A list of elements to remove from the sequence.

Raises:

ValueError – If any of the given elements are not in the sequence.

getStructureResForRes(res)
Parameters:

res (residue.Residue) – Residue to get structure residue for

Returns:

Structure residue or None if no matching residue is found

Return type:

schrodinger.structure._Residue or NoneType

getGaplessLength()
Returns:

Length of this sequence ignoring gaps

Return type:

int

addGapsByIndices(gap_idxs)

Add gaps to the sequence from a list of gap indices. Note that these indices are based on numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see addGapsBeforeIndices.

Parameters:

gap_idxs (list(int)) – A list of gap indices

addGapsToChainStartsAndEnds(gaps: List[GapRegion])

Add the specified numbers of gaps to the starts and ends of each chain.

Parameters:

gaps – The numbers of gaps to add

removeGapsFromChainStartsAndEnds(gaps: List[GapRegion])

Remove the specified numbers of gaps from the starts and ends of each chain.

Parameters:

gaps – The numbers of gaps to remove

validateGapsToRemoveFromChainStartAndEnds(gaps: List[GapRegion])

Make sure that we can remove the specified numbers of gaps from the starts and ends of each chain.

Parameters:

gaps – The numbers of gaps to remove

Raises:

AssertionError – If some of the sequence elements to be removed aren’t actually gaps.

property disulfide_bonds
Returns:

A sorted tuple of the valid disulfide bonds.

Return type:

tuple(residue.CombinedChainDisulfideBond)

isKinaseChain()
property pred_disulfide_bonds
indexToSeqAndIndex(index)

Convert a combined-chain residue index to a split-chain sequence and a residue index within the specified sequence.

Parameters:

index (int) – A valid combined-chain residue index

Returns:

A tuple of - the split-chain sequence - residue index - the starting index of the split-chain sequence

Return type:

tuple(ProteinSequence, int, int)

property chain
property chains
property chain_offsets
hasChain(chain_name)

Does this sequence contain a chain with the specified name?

Parameters:

chain_name (str) – The chain name to check

Return type:

bool

addChain(seq)

Add a new chain to this sequence.

Parameters:

seq (ProteinSequence) – The chain to add

removeChain(seq)

Remove a chain from this sequence. Note that you should not remove the last chain; instead, remove this sequence from the alignment.

Parameters:

seq (ProteinSequence) – The chain to remove

removeChains(seqs)

Remove multiple chains from this sequence. Note that you should not all chain from a combine-chain sequence; instead, remove the sequence from the alignment.

Parameters:

seqs (list[ProteinSequence]) – The chains to remove

insertElementByChain(index, chain, element)

Add the given element to the specified chain of a sequence.

Parameters:
offsetForChain(chain)

Get the combined-chain residue index for the first residue of the specified chain.

Parameters:

chain (ProteinSequence) – The chain

Returns:

The offset

Return type:

int

clearAllCaching()

This method should be implemented in subclasses that cache any data.

ElementClass

alias of Residue

property entry_id
property entry_name
getStructure(*args, **kwargs)
Returns:

The associated structure. Will return None if there is no associated structure.

Return type:

schrodinger.structure.Structure or NoneType

hasStructure(*args, **kwargs)
Returns:

Whether this sequence has an associated structure.

Return type:

bool

property long_name
property name
property origin
property pdb_id
setStructure(*args, **kwargs)

Set the associated structure. Can only be used on sequences with an associated structure.

Parameters:

struc (schrodinger.structure.Structure) – The new structure for this sequence

Raises:

RuntimeError – If there’s no structure associated with this sequence object.

property visibility
exception schrodinger.protein.sequence.MultipleChainsError

Bases: ValueError

An exception raised when the specified indices span multiple chains but the operation can only be carried out on a single chain.

class schrodinger.protein.sequence.StructureSequence(st, atoms)

Bases: _AtomCollection

Class representing a sequence of protein residues.

property residue

Returns residue iterator for all residues in the sequence

schrodinger.protein.sequence.get_structure_sequences(st)

Iterates over all sequences in the given structure.

schrodinger.protein.sequence.find_generalized_pattern(sequence_list, pattern, validate_pattern=False)

Finds a generalized sequence pattern within specified sequences. NOTE: The search is performed in the forward direction only.

Parameters:
  • sequence_list – list of sequence dictionaries to search.

  • pattern (str) –

    Pattern defined using extended PROSITE syntax.

    • standard IUPAC one-letter codes are used for all amino acids

    • each element in a pattern is separated using ‘-’ symbol

    • symbol ‘x’ is used for position where any amino acid is accepted

    • ambiguities are listed using the acceptable amino acids between square brackets, e.g. [ACT] means Ala, Cys or Thr

    • amino acids not accepted for a given position are indicated by listing them between curly brackets, e.g. {GP} means ‘not Gly and not Pro’

    • repetition is indicated using parentheses, e.g. A(3) means Ala-Ala-Ala, x(2,4) means between 2 to 4 any residues

    • the following lowercase characters can be used as additional flags:

      • ’x’ means any amino acid

      • ’a’ means acidic residue: [DE]

      • ’b’ means basic residue: [KR]

      • ’o’ means hydrophobic residue: [ACFILPWVY]

      • ’p’ means aromatic residue: [WYF]

      • ’s’ means solvent exposed residue

      • ’h’ means helical residue

      • ’e’ means extended residue

      • ’f’ means flexible residue

    • Each position can optionally by followed by @<res_num> expression that will match the position with a given residue number.

    • Entire pattern can be followed by :<index> expression that defines a ‘hotspot’ in the pattern. When the hotspot is defined, only a single residue corresponding to (pattern_match_start+index-1) will be returned as a match. The index is 1-based and can be used to place the hotspot outside of the pattern (can also be a negative number).

    Pattern examples:

    • N-{P}-[ST] : Asn-X-Ser or Thr (X != Pro)

    • N[sf]-{P}[sf]-[ST][sf] : as above, but all residues flexible OR solvent exposed

    • Nsf-{P}sf-[ST]sf : as above, but all residues flexible AND solvent exposed

    • Ns{f} : Asn solvent exposed AND not in flexible region

    • N[s{f}] : Asn solvent exposed OR not in flexible region

    • [ab]{K}{s}f : acidic OR basic, with exception of Lys, flexible AND not solvent exposed

    • Ahe : Ala helical AND extended - no match possible

    • A[he] : Ala helical OR extended

    • A{he} : Ala coiled chain conformation (not helical nor extended)

    • [ST] : Ser OR Thr

    • ST : Ser AND Thr - no match possible

  • validate_pattern (boolean) – If True, the function will validate the pattern without performing the search (the sequences parameter will be ignored) and return True if the pattern is valid, or False otherwise. The default is False.

Return type:

list of lists of integer tuples or False if the pattern is invalid

Returns:

False if the specified input pattern was incorrect. Otherwise, it returns a list of lists of matches for each input sequence. Each match is a (start, end) tuple where start and end are matching sequence positions.

Converts a StructureSequence object to dictionary required by find_generalized_pattern function. Because the conversion can be time consuming, it should be done once per sequence.

Optionally a list of atom SASAs for each atom in the CT can be specified. If it’s not specified, it will get calculated by calling analyze.calculate_sasa_by_atom().

Parameters:
  • seq (StructureSequence) – StructureSequence object

  • sasa_by_atom (list) – list of atom SASAs

Return type:

dict

Returns:

Dictionary of sequence information

schrodinger.protein.sequence.find_pattern(seq, pattern)

Find pattern matches in a specified StructureSequence object. Returns a list of matching positions.

Parameters:
  • seq (StructureSequence) – StructureSequence object

  • pattern (string) – Sequence pattern. The syntax is described in find_generalized_pattern.

Return type:

list of lists of integer tuples or None

Returns:

None if the specified input pattern was incorrect. Otherwise, it returns a list of lists of matches for each residue position in the input structure. Each match is a (start, end) tuple where start and end are matching sequence positions. If ‘hotspot’ is specified then start = end.

schrodinger.protein.sequence.assign_residue_numbers(residues, start_res=None, end_res=None)

Assign residue numbers to the given residues based on the residues before and after

Parameters:
  • residues (list[residue.Residue]) – Residues that need numbering. Will be modified in-place.

  • start_res (residue.Residue or NoneType) – Previous residue. Pass None if the residues are N-terminal

  • end_res (residue.Residue or NoneType) – Next residue. Pass None if the residues are C-terminal

schrodinger.protein.sequence.gen_resnums_and_inscodes(start_resnum, start_inscode, end_resnum, end_inscode)

Create a list of all residue numbers/insertion code combinations possible between the given endpoints. If the ending residue number and insertion code are less than or equal to the starting residue number and insertion code, then an empty list will be returned.

Parameters:
  • start_resnum (int) – The starting residue number.

  • start_inscode (str) – The starting insertion code.

  • end_resnum (int) – The ending residue number.

  • end_inscode (str) – The ending insertion code.

Returns:

A list of residue numbers and insertion codes

Return type:

list[tuple[int, str]]

schrodinger.protein.sequence.get_pairwise_sequence_similarity(chain1, chain2, consider_gap=True, method='muscle')

Given two single chain sequences, align them, and return sequence similarity among them.

Parameters:
  • chain1 (structure._Chain) – The first sequence chain.

  • chain2 (structure._Chain) – The second sequence chain.

  • consider_gap (bool) – Whether or not to consider gaps in the alignment, default to true.

  • method (string) – Which alignment method to use (‘muscle’ or ‘clustalw’)

Returns:

Sequence similarity of the alignment of the two.

Return type:

float, between 0.0 and 1.0

schrodinger.protein.sequence.create_alignment_from_chains(chains)

Return ProteinAlignment object comprised of two chains

Parameters:

chains (iterable(structure._Chain)) – Chains to be aligned

schrodinger.protein.sequence.align_alignment(aln, second_aln=None, method='muscle')

Perform alignment from an ProteinAlignment object

Parameters:
  • aln (ProteinAlignment) – Alignment data

  • method (string) – Which method/program to use

Returns:

Aligned sequences

Return type:

ProteinAlignment

schrodinger.protein.sequence.align_from_chains(chains, method='muscle')

Perform alignment on a series of chains

Parameters:
  • chains (iterable(structure._Chain)) – Chains to be aligned

  • method (string) – Which method/program to use (choices ‘muscle’, ‘clustalw’)

Returns:

Aligned sequences

Return type:

ProteinAlignment

schrodinger.protein.sequence.get_aligned_residues(st1, st2, method='muscle')

This generator will yield 2 structure._Residue objects - one from each structure - for each position in aligned sequences.

Parameters:
  • st1 (structure.Structure) – First structure.

  • st2 (structure.Structure) – Second structure

Returns:

Generates tuples of 2 residues that align at each position.

Return type:

generator(structure._Residue or None, structure._Residue or None)

Raises:

ValueError – if structures don’t have equivalent chains.

schrodinger.protein.sequence.get_aligned_structure_residues(sts, method='muscle')

This generator will yield 2 structure._Residue objects - one from each structure for each position in aligned sequences.

Parameters:

sts (list(structure.Structure)) – Structures to align

Returns:

Generates lists of residues that align at each position.

Return type:

generator(list[structure._Residue or None])

schrodinger.protein.sequence.offset_indices(indices)

Offset insertion indices based on numbering before insertion to reflect numbering after insertion.

For example, [1, 1, 2, 3, 5, 8] would be changed to [1, 2, 4, 6, 9, 13]

schrodinger.protein.sequence.get_fasta_sequences(st, one_letter_codes={'2AS': 'D', '3AH': 'H', '5HP': 'E', 'ACL': 'R', 'AGM': 'R', 'AIB': 'A', 'ALA': 'A', 'ALM': 'A', 'ALO': 'T', 'ALY': 'K', 'ARG': 'R', 'ARM': 'R', 'ARN': 'R', 'ASA': 'D', 'ASB': 'D', 'ASH': 'D', 'ASK': 'D', 'ASL': 'D', 'ASN': 'N', 'ASP': 'D', 'ASQ': 'D', 'ASX': 'B', 'AYA': 'A', 'BCS': 'C', 'BHD': 'D', 'BMT': 'T', 'BNN': 'A', 'BUC': 'C', 'BUG': 'L', 'C5C': 'C', 'C6C': 'C', 'CCS': 'C', 'CEA': 'C', 'CGU': 'E', 'CHG': 'A', 'CLE': 'L', 'CME': 'C', 'CSD': 'A', 'CSO': 'C', 'CSP': 'C', 'CSS': 'C', 'CSW': 'C', 'CSX': 'C', 'CXM': 'M', 'CY1': 'C', 'CY3': 'C', 'CYG': 'C', 'CYM': 'C', 'CYQ': 'C', 'CYS': 'C', 'CYX': 'C', 'DAH': 'F', 'DAL': 'A', 'DAR': 'R', 'DAS': 'D', 'DCY': 'C', 'DGL': 'E', 'DGN': 'Q', 'DHA': 'A', 'DHI': 'H', 'DIL': 'I', 'DIV': 'V', 'DLE': 'L', 'DLY': 'K', 'DNP': 'A', 'DPN': 'F', 'DPR': 'P', 'DSN': 'S', 'DSP': 'D', 'DTH': 'T', 'DTR': 'W', 'DTY': 'Y', 'DVA': 'V', 'EFC': 'C', 'FLA': 'A', 'FME': 'M', 'GGL': 'E', 'GL3': 'G', 'GLH': 'E', 'GLN': 'Q', 'GLU': 'E', 'GLX': 'Z', 'GLY': 'G', 'GLZ': 'G', 'GMA': 'E', 'GSC': 'G', 'HAC': 'A', 'HAR': 'R', 'HIC': 'H', 'HID': 'H', 'HIE': 'H', 'HIP': 'H', 'HIS': 'H', 'HMR': 'R', 'HPQ': 'F', 'HSD': 'H', 'HSE': 'H', 'HSP': 'H', 'HTR': 'W', 'HYP': 'P', 'IIL': 'I', 'ILE': 'I', 'IYR': 'Y', 'KCX': 'K', 'LEU': 'L', 'LLP': 'K', 'LLY': 'K', 'LTR': 'W', 'LYM': 'K', 'LYN': 'K', 'LYS': 'K', 'LYZ': 'K', 'MAA': 'A', 'MEN': 'N', 'MET': 'M', 'MHS': 'H', 'MIS': 'S', 'MLE': 'L', 'MMO': 'R', 'MPQ': 'G', 'MSA': 'G', 'MSE': 'M', 'MVA': 'V', 'NEM': 'H', 'NEP': 'H', 'NLE': 'L', 'NLN': 'L', 'NLP': 'L', 'NMC': 'G', 'OAS': 'S', 'OCS': 'C', 'OMT': 'M', 'PAQ': 'Y', 'PCA': 'E', 'PEC': 'C', 'PHE': 'F', 'PHI': 'F', 'PHL': 'F', 'PR3': 'C', 'PRO': 'P', 'PRR': 'A', 'PTR': 'Y', 'SAC': 'S', 'SAR': 'G', 'SCH': 'C', 'SCS': 'C', 'SCY': 'C', 'SEL': 'S', 'SEP': 'S', 'SER': 'S', 'SET': 'S', 'SHC': 'C', 'SHR': 'K', 'SMC': 'C', 'SOC': 'C', 'STY': 'Y', 'SVA': 'S', 'THO': 'T', 'THR': 'T', 'TIH': 'A', 'TPL': 'W', 'TPO': 'T', 'TPQ': 'A', 'TRG': 'K', 'TRO': 'W', 'TRP': 'W', 'TYB': 'Y', 'TYM': 'Y', 'TYO': 'Y', 'TYQ': 'Y', 'TYR': 'Y', 'TYS': 'Y', 'TYY': 'Y', 'VAL': 'V'})

Get the fasta sequences for each chain in the structure, paired with chain name

Parameters:
  • st – a structure object

  • one_letter_codes – a dictionary of residue names to one-letter codes

Returns:

tuples of chain name and fasta sequence

schrodinger.protein.sequence.get_chain_fasta_sequence(chain, one_letter_codes={'2AS': 'D', '3AH': 'H', '5HP': 'E', 'ACL': 'R', 'AGM': 'R', 'AIB': 'A', 'ALA': 'A', 'ALM': 'A', 'ALO': 'T', 'ALY': 'K', 'ARG': 'R', 'ARM': 'R', 'ARN': 'R', 'ASA': 'D', 'ASB': 'D', 'ASH': 'D', 'ASK': 'D', 'ASL': 'D', 'ASN': 'N', 'ASP': 'D', 'ASQ': 'D', 'ASX': 'B', 'AYA': 'A', 'BCS': 'C', 'BHD': 'D', 'BMT': 'T', 'BNN': 'A', 'BUC': 'C', 'BUG': 'L', 'C5C': 'C', 'C6C': 'C', 'CCS': 'C', 'CEA': 'C', 'CGU': 'E', 'CHG': 'A', 'CLE': 'L', 'CME': 'C', 'CSD': 'A', 'CSO': 'C', 'CSP': 'C', 'CSS': 'C', 'CSW': 'C', 'CSX': 'C', 'CXM': 'M', 'CY1': 'C', 'CY3': 'C', 'CYG': 'C', 'CYM': 'C', 'CYQ': 'C', 'CYS': 'C', 'CYX': 'C', 'DAH': 'F', 'DAL': 'A', 'DAR': 'R', 'DAS': 'D', 'DCY': 'C', 'DGL': 'E', 'DGN': 'Q', 'DHA': 'A', 'DHI': 'H', 'DIL': 'I', 'DIV': 'V', 'DLE': 'L', 'DLY': 'K', 'DNP': 'A', 'DPN': 'F', 'DPR': 'P', 'DSN': 'S', 'DSP': 'D', 'DTH': 'T', 'DTR': 'W', 'DTY': 'Y', 'DVA': 'V', 'EFC': 'C', 'FLA': 'A', 'FME': 'M', 'GGL': 'E', 'GL3': 'G', 'GLH': 'E', 'GLN': 'Q', 'GLU': 'E', 'GLX': 'Z', 'GLY': 'G', 'GLZ': 'G', 'GMA': 'E', 'GSC': 'G', 'HAC': 'A', 'HAR': 'R', 'HIC': 'H', 'HID': 'H', 'HIE': 'H', 'HIP': 'H', 'HIS': 'H', 'HMR': 'R', 'HPQ': 'F', 'HSD': 'H', 'HSE': 'H', 'HSP': 'H', 'HTR': 'W', 'HYP': 'P', 'IIL': 'I', 'ILE': 'I', 'IYR': 'Y', 'KCX': 'K', 'LEU': 'L', 'LLP': 'K', 'LLY': 'K', 'LTR': 'W', 'LYM': 'K', 'LYN': 'K', 'LYS': 'K', 'LYZ': 'K', 'MAA': 'A', 'MEN': 'N', 'MET': 'M', 'MHS': 'H', 'MIS': 'S', 'MLE': 'L', 'MMO': 'R', 'MPQ': 'G', 'MSA': 'G', 'MSE': 'M', 'MVA': 'V', 'NEM': 'H', 'NEP': 'H', 'NLE': 'L', 'NLN': 'L', 'NLP': 'L', 'NMC': 'G', 'OAS': 'S', 'OCS': 'C', 'OMT': 'M', 'PAQ': 'Y', 'PCA': 'E', 'PEC': 'C', 'PHE': 'F', 'PHI': 'F', 'PHL': 'F', 'PR3': 'C', 'PRO': 'P', 'PRR': 'A', 'PTR': 'Y', 'SAC': 'S', 'SAR': 'G', 'SCH': 'C', 'SCS': 'C', 'SCY': 'C', 'SEL': 'S', 'SEP': 'S', 'SER': 'S', 'SET': 'S', 'SHC': 'C', 'SHR': 'K', 'SMC': 'C', 'SOC': 'C', 'STY': 'Y', 'SVA': 'S', 'THO': 'T', 'THR': 'T', 'TIH': 'A', 'TPL': 'W', 'TPO': 'T', 'TPQ': 'A', 'TRG': 'K', 'TRO': 'W', 'TRP': 'W', 'TYB': 'Y', 'TYM': 'Y', 'TYO': 'Y', 'TYQ': 'Y', 'TYR': 'Y', 'TYS': 'Y', 'TYY': 'Y', 'VAL': 'V'})

Get the fasta sequence for a single chain from a structure

Parameters:
  • chain (structure._Chain) – a structure chain object

  • one_letter_codes – a dictionary of residue names to one-letter codes

Returns:

a string of the chain’s fasta sequence

schrodinger.protein.sequence.get_single_letter_code(res, one_letter_codes={'2AS': 'D', '3AH': 'H', '5HP': 'E', 'ACL': 'R', 'AGM': 'R', 'AIB': 'A', 'ALA': 'A', 'ALM': 'A', 'ALO': 'T', 'ALY': 'K', 'ARG': 'R', 'ARM': 'R', 'ARN': 'R', 'ASA': 'D', 'ASB': 'D', 'ASH': 'D', 'ASK': 'D', 'ASL': 'D', 'ASN': 'N', 'ASP': 'D', 'ASQ': 'D', 'ASX': 'B', 'AYA': 'A', 'BCS': 'C', 'BHD': 'D', 'BMT': 'T', 'BNN': 'A', 'BUC': 'C', 'BUG': 'L', 'C5C': 'C', 'C6C': 'C', 'CCS': 'C', 'CEA': 'C', 'CGU': 'E', 'CHG': 'A', 'CLE': 'L', 'CME': 'C', 'CSD': 'A', 'CSO': 'C', 'CSP': 'C', 'CSS': 'C', 'CSW': 'C', 'CSX': 'C', 'CXM': 'M', 'CY1': 'C', 'CY3': 'C', 'CYG': 'C', 'CYM': 'C', 'CYQ': 'C', 'CYS': 'C', 'CYX': 'C', 'DAH': 'F', 'DAL': 'A', 'DAR': 'R', 'DAS': 'D', 'DCY': 'C', 'DGL': 'E', 'DGN': 'Q', 'DHA': 'A', 'DHI': 'H', 'DIL': 'I', 'DIV': 'V', 'DLE': 'L', 'DLY': 'K', 'DNP': 'A', 'DPN': 'F', 'DPR': 'P', 'DSN': 'S', 'DSP': 'D', 'DTH': 'T', 'DTR': 'W', 'DTY': 'Y', 'DVA': 'V', 'EFC': 'C', 'FLA': 'A', 'FME': 'M', 'GGL': 'E', 'GL3': 'G', 'GLH': 'E', 'GLN': 'Q', 'GLU': 'E', 'GLX': 'Z', 'GLY': 'G', 'GLZ': 'G', 'GMA': 'E', 'GSC': 'G', 'HAC': 'A', 'HAR': 'R', 'HIC': 'H', 'HID': 'H', 'HIE': 'H', 'HIP': 'H', 'HIS': 'H', 'HMR': 'R', 'HPQ': 'F', 'HSD': 'H', 'HSE': 'H', 'HSP': 'H', 'HTR': 'W', 'HYP': 'P', 'IIL': 'I', 'ILE': 'I', 'IYR': 'Y', 'KCX': 'K', 'LEU': 'L', 'LLP': 'K', 'LLY': 'K', 'LTR': 'W', 'LYM': 'K', 'LYN': 'K', 'LYS': 'K', 'LYZ': 'K', 'MAA': 'A', 'MEN': 'N', 'MET': 'M', 'MHS': 'H', 'MIS': 'S', 'MLE': 'L', 'MMO': 'R', 'MPQ': 'G', 'MSA': 'G', 'MSE': 'M', 'MVA': 'V', 'NEM': 'H', 'NEP': 'H', 'NLE': 'L', 'NLN': 'L', 'NLP': 'L', 'NMC': 'G', 'OAS': 'S', 'OCS': 'C', 'OMT': 'M', 'PAQ': 'Y', 'PCA': 'E', 'PEC': 'C', 'PHE': 'F', 'PHI': 'F', 'PHL': 'F', 'PR3': 'C', 'PRO': 'P', 'PRR': 'A', 'PTR': 'Y', 'SAC': 'S', 'SAR': 'G', 'SCH': 'C', 'SCS': 'C', 'SCY': 'C', 'SEL': 'S', 'SEP': 'S', 'SER': 'S', 'SET': 'S', 'SHC': 'C', 'SHR': 'K', 'SMC': 'C', 'SOC': 'C', 'STY': 'Y', 'SVA': 'S', 'THO': 'T', 'THR': 'T', 'TIH': 'A', 'TPL': 'W', 'TPO': 'T', 'TPQ': 'A', 'TRG': 'K', 'TRO': 'W', 'TRP': 'W', 'TYB': 'Y', 'TYM': 'Y', 'TYO': 'Y', 'TYQ': 'Y', 'TYR': 'Y', 'TYS': 'Y', 'TYY': 'Y', 'VAL': 'V'})