schrodinger.protein.sequence module¶
Implementation of ProteinSequence, Sequence, and StructureSequence class.
StructureSequence allows iteration over all sequences in a given protein CT, and iteration over residues of each (in sequence order).
- schrodinger.protein.sequence.guess_seq_type(res_names)¶
Takes an iterable of residue names and returns the appropriate sequence class
- Parameters
res_names (Iterable(str)) – An iterable of residue names. Note that all residue names must be uppercase.
- Returns
The appropriate class for the input sequence
- Return type
Note that we use a pretty simple heuristic here.
- schrodinger.protein.sequence.make_sequence(elements, *args, **kwargs)¶
Guesses the appropriates Sequence type from the names of residues and returns an instance with those residues.
- Parameters
elements (list(str)) – A list of strings to examine
- Return type
protein.Sequence
- Returns
An instance of the appropriate type
- class schrodinger.protein.sequence.Sequence¶
Bases:
object
- class schrodinger.protein.sequence.AbstractSequence(*args, **kwargs)¶
Bases:
schrodinger.protein.sequence.Sequence
,PyQt6.QtCore.QObject
A base class for single-chain and combined-chain biological sequences.
- Variables
ORIGIN (enum.Enum) – Possible sequence origins
AnnotationClass (annotation.SequenceAnnotations) – Class to use for annotations
ElementClass (Type[residue.Residue]) – Class to use for elements
ALPHABET (dict(str, residue.ElementType)) – A mapping of string representations of elements to element types. Concrete subclasses must define this.
_gap_chars (tuple(str)) – A tuple of permissible gap characters in the element list; the first item will be used for serialization.
_unknown_res_type (residue.ElementType) – The type for an unknown residue
residuesChanged (QtCore.pyqtSignal) – A signal emitted when sequence residues are changed. Emitted with the indices of the first and last changed residues.
lengthAboutToChange (QtCore.pyqtSignal) – A signal emitted when the sequence length is about to change. Emitted with the old and new lengths.
lengthChanged (QtCore.pyqtSignal) – A signal emitted when the sequence length is changed. Emitted with the old and new lengths.
nameChanged (QtCore.pyqtSignal) – A signal emitted when the sequence name is changed.
visibilityChanged (QtCore.pyqtSignal) – A signal emitted when the visibility is changed.
structureChanged (QtCore.pyqtSignal) – A signal emitted when the structure changes.
annotationTitleChanged (QtCore.pyqtSignal) – A signal emitted when an annotation title is changed.
- class ORIGIN(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- Maestro = 1¶
- PyMOL = 2¶
- AnnotationClass = None¶
- ElementClass¶
alias of
schrodinger.protein.residue.Residue
- ALPHABET = {}¶
- residuesRemoved¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- residuesAdded¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- residuesChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- lengthAboutToChange¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- lengthChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- nameChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- visibilityChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- structureChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- annotationTitleChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- pfamChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- descriptorsCleared¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- __init__(*args, **kwargs)¶
- residues()¶
Return an iterable of residues, ignoring gaps.
- Returns
Iterable of residues
- Return type
iter(Residue)
- getNextResidue(res)¶
Return the next residue in the sequence (ignoring gaps) or None if this is the last residue.
- Parameters
res (schrodinger.protein.residue.Residue) – A given residue in the sequence
- Returns
The previous residue in the sequence
- Return type
- getPreviousResidue(res)¶
Return the previous residue in the sequence (ignoring gaps) or None if this is the first residue.
- Parameters
res (schrodinger.protein.residue.Residue) – A given residue in the sequence
- Returns
The previous residue in the sequence
- Return type
- iterNeighbors()¶
Return an iterable of three element tuples consisting of (prev_res, curr_res, next_res), ignoring gaps.
None is used for neighbors of first and last residues in the sequence, and does not indicate gaps here.
- Returns
Iterable of 3-tuples, each element of the each tuple being either a
schrodinger.protein.residue.Residue
or None- Return type
iter(tuple(Residue or NoneType, Residue, Residue or NoneType))
- index(res)¶
Returns the index of the specified residue.
- Parameters
res (residue.Residue) – The residue to find
- Returns
The index of the residue
- Return type
int
- indices(residues)¶
Returns the indices of all specified residues. Note that there is no guarantee that the returned integers will be in the same order as the input residues. (For combined-chain sequences, it’s highly likely that they won’t be.)
- Parameters
res (Iterable(residue.Residue)) – The residues to find indices of
- Returns
The indices of the residues
- Return type
list[int]
- getRun(res)¶
For a given residue or gap, return a set of all adjacent element indices in the sequence that are also residues or gaps.
- Parameters
res (residue.AbstractSequenceElement) – Residue to get the run of
- Reuturn
Set of residue indices in the run
- Return type
set(int)
- insertElements(index, elements)¶
Insert a list of elements or sequence element into this sequence.
If
elements
is a string or iterable of strings, residue numbers will be automatically assigned.- Parameters
index (int) – The index at which to insert elements
elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert
- mutate(start, end, elements)¶
Mutate sequence elements starting at the given index to the provided elements.
- Parameters
start (int) – The index at which to start mutating
end (int) – The index of the last mutated element (exclusive)
elements (iterable(self.ElementClass) or iterable(str)) – The elements to which to mutate the sequence
- append(element)¶
Appends an element to the sequence
- Parameters
element – The element to append to this sequence
- Type
element: self.ElementClass or basestring
- extend(elements)¶
Extends the sequence with elements from an iterable
- Parameters
elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence
- getSubsequence(start, end)¶
Return a sequence containing a subset of the elements in this one
- Parameters
start (int) – The index at which the subsequence should start
end (int) – The index at which the subsequence should end (exclusive)
- Returns
The requested subsequence
- Return type
- removeElements(eles)¶
Remove elements from the sequence.
- Parameters
eles (list(residue.AbstractSequenceElement)) – A list of elements to remove from the sequence.
- Raises
ValueError – If any of the given elements are not in the sequence.
- property gap_char¶
- getGaplessLength()¶
- Returns
Length of this sequence ignoring gaps
- Return type
int
- getGaps()¶
- Return type
list(residue.Gap)
- Returns
The gaps in the sequence.
- addGapsByIndices(gap_idxs)¶
Add gaps to the sequence from a list of gap indices. Note that these indices are based on numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see
addGapsBeforeIndices
.- Parameters
gap_idxs (list(int)) – A list of gap indices
- validateGapIndices(gap_idxs)¶
Make sure that the specified gap indices are valid input for
addGapsByIndices
. If the indices are invalid, aValueError
will be raised. Indices are considered invalid if:they refer to a position that’s more than one residue past the end of the sequence
they are negative
- Parameters
gap_idxs (list[int]) – The gap indices to validate
- addGapsBeforeIndices(indices)¶
Add one gap to the alignment before each of the specified residue positions. Note that these indices are based on numbering before the insertion. To insert gaps using indices based on numbering after the insertion, see
addGapsByIndices
.- Parameters
indices (list(int)) – A list of indices to insert gaps before.
- removeTerminalGaps()¶
Remove gaps from the end of the sequence
- getTerminalGaps()¶
Return terminal gaps.
- Returns
A list of terminal gaps (in ascending index order)
- Return type
list(residue.Gap)
- getGapCount()¶
- Returns
the number of gaps in the sequence
- Return type
int
- removeAllGaps()¶
Remove gaps from the sequence
- property annotations¶
- getAnnotation(index, annotation)¶
Returns the annotation at the specified index or None for a gap.
- Raises
ValueError – if the annotation is not available
- getAnnotationValueForComparison(col, anno, ann_index=0, cdr_scheme=None, vernier=False)¶
Get an annotation value appropriate for determining whether two residues have the same annotation value. e.g. for binding site, only same-distance contacts of the same ligand should compare equal.
- Parameters
anno (annotation.ProteinSequenceAnnotations.ANNOTATION_TYPES) – Protein sequence annotation enum member
ann_index (int) – Annotation index for multi-value annotations
cdr_scheme (annotation.AntibodyCDRScheme) – CDR scheme for antibody annotation. (Will be ignored if
anno
is not SEQ_ANNO_TYPES.antibody_cdr.)vernier (bool) – Whether Vernier Zones are enabled
- Returns
Annotation value for comparison or None if the corresponding residue is a gap or has no value for this annotation
- Return type
object or None
- getNumAnnValues(ann)¶
- clearAllCaching()¶
This method should be implemented in subclasses that cache any data.
- setPfam(new_pfam, pfam_name)¶
- clearPfam()¶
- hasPfam()¶
- hasStructure()¶
- Returns
Whether this sequence has an associated structure.
- Return type
bool
- getStructure()¶
- Returns
The associated structure. Will return None if there is no associated structure.
- Return type
schrodinger.structure.Structure or NoneType
- setStructure(struc)¶
Set the associated structure. Can only be used on sequences with an associated structure.
- Parameters
struc (schrodinger.structure.Structure) – The new structure for this sequence
- Raises
RuntimeError – If there’s no structure associated with this sequence object.
- getIdentity(other, consider_gaps=True, only_consider=None)¶
Return a float scoring the identity between the sequence and another sequence, assuming that they’re already aligned
- Parameters
other (schrodinger.protein.sequence.Sequence) – A sequence to compare against
consider_gaps (bool) – Whether we should count gaps when we’re calculating the average score.
- Returns
The sequence identity score (between 0.0 and 1.0)
- Return type
float
- getSimilarity(other, consider_gaps=True, only_consider=None)¶
Return a float score of the similarity count between the sequence and another sequence, assuming that they’re already aligned.
- Parameters
other (schrodinger.protein.sequence.Sequence) – A sequence to compare against
consider_gaps (bool) – Whether we should count gaps when we’re calculating the average score.
- Returns
The sequence similarity score (between 0.0 and 1.0)
- Return type
float
- getConservation(other, consider_gaps=True, only_consider=None)¶
Return a float scoring the homology conservation between the sequence and another sequence, assuming that they’re already aligned.
The homology criterion is based on “side chain chemistry” descriptor matching.
- Parameters
other (schrodinger.protein.sequence.Sequence) – A sequence to compare against
consider_gaps (bool) – Whether we should count gaps when we’re calculating the average score.
- Returns
The sequence conservation score (between 0.0 and 1.0)
- Return type
float
- getSimilarityScore(other, consider_gaps=True, only_consider=None)¶
Return the total score of similarity between the sequence and a other sequence, assuming that they’re already aligned.
- Parameters
other (schrodinger.protein.sequence.Sequence) – A sequence to compare against
consider_gaps (bool) – Ignored because the similarity with a gap is 0.0
only_consider (set or None) – A set of residues to restrict attention to
- Returns
The total sequence similarity score
- Return type
int
- getStructureResForRes(res)¶
- Parameters
res (residue.Residue) – Residue to get structure residue for
- Returns
Structure residue or None if no matching residue is found
- Return type
schrodinger.structure._Residue or NoneType
- structuredResidueCount()¶
Get the number of residues in this sequence with an associated structured residue. :rtype: int
- hasStructuredResidues()¶
Return whether this sequence has any structured residues. This method is equivalent to
bool(seq.structuredResidueCount())
but doesn’t (typically) require iterating through the entire sequence.- Return type
bool
- getProperty(seq_prop)¶
Get the sequence’s value corresponding to the given SequenceProperty object
- Parameters
seq_prop (schrodinger.protein.properties.SequenceProperty) – The object describing the sequence property
- Returns
The value of the sequence property
- Return type
float or None
- updateDescriptors(descriptors, property_source)¶
Updates the descriptor dicts with new descriptor values
- Parameters
descriptors (dict[str, float]) – A dict mapping descriptor names to their values
property_source (properties.PropertySource) – The source of the descriptors
- clearDescriptors()¶
- property descriptors¶
- hasDescriptors()¶
- class schrodinger.protein.sequence.AbstractSingleChainSequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=None)¶
Bases:
schrodinger.protein.sequence.AbstractSequence
Base class for single-chain biological sequences
Note: Protein-specific functionality should go in ProteinSequence.
- Variables
sequenceCopied (QtCore.pyqtSignal) – A signal emitted when this sequence is copied. Emitted with the sequence being copied and the newly created copy. This signal is used by the structure model to make sure that the newly created copy is kept in sync with the structure.
- sequenceCopied¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- __init__(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=None)¶
Make a sequence object from a list of strings and/or
self.ElementClass
Strings are converted to
self.ElementClass
using a mapping of strings to element types.- Parameters
elements (iterable(self.ElementClass or None) or iterable(str or None)) – An iterable of elements making up the sequence.
None
are interpretted as gaps.name (str) – The name of the sequence (possibly shortened), used for display purposes. For a FASTA sequence, this should be a short identifier such as the Uniprot ID.
origin (Sequence.ORIGIN or None) – A piece of metadata indicating where the sequence came from
entry_id – An entry associated with the sequence, if any
entry_name – An entry name associated with the sequence, if any
pdb_id – An id associated with the sequence, if any
chain (str) – The chain to which the sequence belongs
structure_chain (str) – The chain of the structure this sequence is associated with. This is usually the same as
chain
if the sequence has a structure but isn’t necessarily.long_name (str) – The full name of the sequence. For a FASTA sequence, this should be the full FASTA header.
resnums (Iterable(int)) – Residue numbers to assign to
elements
. If not given, residues will be numbered starting from one. Regardless of what’s given here, no residue numbering will occur ifelements
is an iterable ofElementClass
and any element already has a residue number set. If this iterable is shorter thanelements
, additional numbers will be generated by incrementing the last number present.
- Type
str
- Type
str
- Type
str
- __len__()¶
- __contains__(item)¶
- classmethod makeSeqElement(element)¶
- Parameters
element (str or cls.ElementClass) – A sequence element or string representation thereof
- Returns
sequence element
- Return type
cls.ElementClass
- classmethod isValid(elements)¶
- Parameters
elements (iterable(str) or str) – An iterable of string representations of elements making up the sequence
- Returns
Tuple indicating whether valid and a set of invalid characters, if any
- Return type
tuple(bool, set(str))
- property name¶
- property chain¶
- property fullname¶
- Returns
a formatted name + optional chain name for the sequence
- Return type
str
- property visibility¶
- index(res, ignore_gaps=False)¶
Returns the index of the specified residue
- Parameters
res (residue.Residue) – The residue to find
ignore_gaps (bool) – Whether the index returned should ignore gaps in the sequence or not.
- Raises
A ValueError if the residue is not present
- Return type
int
- Returns
The index of the residue
- indices(residues, ignore_gaps=False)¶
Returns the indices of all specified residues. Note that there is no guarantee that the returned integers will be in the same order as the input residues. (For combined-chain sequences, it’s highly likely that they won’t be.)
- Parameters
res (Iterable(residue.Residue)) – The residues to find indices of
ignore_gaps (bool) – Whether the indices returned should ignore gaps in the sequence or not.
- Returns
The indices of the residues
- Return type
list[int]
- insertElements(index, elements)¶
Insert a list of elements or sequence element into this sequence.
If
elements
is a string or iterable of strings, residue numbers will be automatically assigned.- Parameters
index (int) – The index at which to insert elements
elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert
- mutate(start, end, elements)¶
Mutate sequence elements starting at the given index to the provided elements.
- Parameters
start (int) – The index at which to start mutating
end (int) – The index of the last mutated element (exclusive)
elements (iterable(self.ElementClass) or iterable(str)) – The elements to which to mutate the sequence
- append(element)¶
Appends an element to the sequence
- Parameters
element – The element to append to this sequence
- Type
element: self.ElementClass or basestring
- extend(elements)¶
Extends the sequence with elements from an iterable
- Parameters
elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence
- getSubsequence(start, end)¶
Return a sequence containing a subset of the elements in this one
- Parameters
start (int) – The index at which the subsequence should start
end (int) – The index at which the subsequence should end (exclusive)
- Returns
The requested subsequence
- Return type
- removeElements(eles)¶
Remove elements from the sequence.
- Parameters
eles (list(residue.AbstractSequenceElement)) – A list of elements to remove from the sequence.
- Raises
ValueError – If any of the given elements are not in the sequence.
- getGaplessLength()¶
- Returns
Length of this sequence ignoring gaps
- Return type
int
- addGapsByIndices(gap_idxs)¶
Add gaps to the sequence from a list of gap indices. Note that these indices are based on numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see
addGapsBeforeIndices
.- Parameters
gap_idxs (list(int)) – A list of gap indices
- setSSA(new_ssa)¶
- setSSAPredictions(pred)¶
- setDisorderedRegionsPredictions(pred)¶
- setDomainArrangementPredictions(pred)¶
- setSolventAccessibilityPredictions(pred)¶
- deletePrediction(prediction_type)¶
- deleteAllPredictions()¶
- property origin¶
- Returns
A piece of metadata indicating where the sequence came from
- Rtype origin
Sequence.ORIGIN
or None
- hasStructure()¶
- Returns
Whether this sequence has an associated structure.
- Return type
bool
- getStructure()¶
- Returns
The associated structure. Will return None if there is no associated structure.
- Return type
schrodinger.structure.Structure or NoneType
- setStructure(struc)¶
Set the associated structure. Can only be used on sequences with an associated structure.
- Parameters
struc (schrodinger.structure.Structure) – The new structure for this sequence
- Raises
RuntimeError – If there’s no structure associated with this sequence object.
- onStructureChanged()¶
- setResidueMap(residue_map)¶
Set a new mapping between ResidueKey and structured residues.
Note: the only intended user of this method is
schrodinger.application.msv.seqio.StructureConverter
, where the ResidueKey is computed from thestructure._Residue
used to create theresidue.Residue
. If the sequence has a structure, the map can be generated usinggenerateResidueMap
.- Parameters
residue_map (dict(residue.ResidueKey, residue.Residue)) – Mapping between residue key and Residue
- generateResidueMap()¶
Create residue map based on current structured residues.
Note: this method requires
self.hasStructure()
to be True andself.entry_id
to be set. If this sequence was produced byschrodinger.application.msv.seqio.StructureConverter
, there should already be a residue map and this method does not need to be called.- Raises
RuntimeError – If sequence has no structure or entry id
- getResByKey(res_key)¶
- Parameters
res_key (residue.ResidueKey) – Residue key: (entry_id, chain, resnum, inscode)
- Returns
Residue matching key or None if no matching residue is found
- Return type
residue.Residue or NoneType
- Raises
RuntimeError – If sequence has no structure
- getStructureResForRes(res)¶
- Parameters
res (residue.Residue) – Residue to get structure residue for
- Returns
Structure residue or None if no matching residue is found
- Return type
schrodinger.structure._Residue or NoneType
- class schrodinger.protein.sequence.ProteinSequenceMeta¶
Bases:
PyQt6.sip.wrappertype
Metaclass for split-chain and combined-chain protein sequences
- property ALPHABET¶
- class schrodinger.protein.sequence.AbstractProteinSequenceMixin(*args, **kwargs)¶
Bases:
object
A mixin for code shared between split-chain and combined-chain protein sequences.
- disulfideBondsCacheCleared¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- predictionsChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- secondaryStructureChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- kinaseFeaturesChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- kinaseConservationChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- property ALPHABET¶
- __init__(*args, **kwargs)¶
- property disulfide_bonds¶
- Returns
A sorted tuple of the valid disulfide bonds.
- Return type
tuple(residue.DisulfideBond)
- property secondary_structures¶
A list of _SecondaryStructure namedtuples containing the type of secondary structure and where the secondary structures begin and end.
- Returns
A list of namedtuples containing an SS_TYPE from schrodinger.structure and the residue indexes marking the limits of the secondary structure.
- Return type
list(_SecondaryStructure)
- clearDisulfideBondsCache()¶
- isKinaseChain() bool ¶
- hasDisorderedRegionsPredictions()¶
- hasDisulfideBondPredictions()¶
- hasDomainArrangementPredictions()¶
- hasSolventAccessibility()¶
- hasSSAPredictions()¶
- property pred_secondary_structures¶
- class schrodinger.protein.sequence.ProteinSequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)¶
Bases:
schrodinger.models.json.JsonableClassMixin
,schrodinger.protein.sequence.AbstractProteinSequenceMixin
,schrodinger.protein.sequence.AbstractSingleChainSequence
A single-chain protein sequence.
- Variables
secondaryStructureCacheCleared – A signal emitted when the secondary structure cache has been cleared. Used to keep the
CombinedChainProteinSequence
cache in sync. If listening for changes in the secondary structure values, usesecondaryStructureChanged
instead.
- AnnotationClass¶
alias of
schrodinger.protein.annotation.ProteinSequenceAnnotations
- ElementClass¶
alias of
schrodinger.protein.residue.Residue
- secondaryStructureCacheCleared¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- __init__(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)¶
See
AbstractSingleChainSequence
for additional documentation.- Parameters
disulfide_bonds (Iterable(tuple(int, int))) – A list of pairs of residue indices to link via disulfide bonds.
pred_disulfide_bonds (Iterable(tuple(int, int))) – A list of pairs of residue indices to link via predicted disulfide bonds.
- toJsonImplementation()¶
Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.
- Returns
A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders
- classmethod fromJsonImplementation(json_obj)¶
Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.
- Parameters
json_dict (dict) – A dictionary loaded from a JSON string or file.
- Returns
An instance of the derived class.
- Return type
cls
- classmethod adapter47007(json_dict)¶
- classmethod adapter48002(json_dict)¶
- classmethod adapter52065(json_dict)¶
- classmethod isValid(elements)¶
- Parameters
elements (iterable(str) or str) – An iterable of string representations of elements making up the sequence
- Returns
Tuple indicating whether valid and a set of invalid characters, if any
- Return type
tuple(bool, set(str))
- renumberResidues(new_rescode_map)¶
Renumber residues in this sequence given a dictionary mapping old rescodes to the new desired rescodes.
- isKinaseChain()¶
- property is_kinase_annotated¶
- setKinaseFeatures(feature_map: Dict[schrodinger.protein.residue.Residue, schrodinger.protein.annotation.KinaseFeatureLabel])¶
- Parameters
feature_map – Map of residue to kinase feature label
- property is_kinase_cons_annotated¶
- setKinaseConservation(cons_map, lig_asl)¶
- Parameters
cons_map (dict[residue.Residue, annotation.KinaseConservation]) – Map of residue to conservation
lig_asl (str) – ASL of the associated ligand
- property disulfide_bonds¶
- Returns
A sorted tuple of the valid disulfide bonds.
- Return type
tuple(residue.DisulfideBond)
- property pred_disulfide_bonds¶
- removeStructurelessResidues(start=0, end=None)¶
Remove any structureless residues
- Parameters
start (int) – The index at which to start filtering structureless residues.
end (int) – The index at which to end filtering
- encodeForPatternSearch(with_ss=False, with_flex=False, with_asa=False)¶
Convert to sequence dict expected by
find_generalized_pattern
.- Parameters
with_ss (bool) – Whether to include secondary structure information.
with_flex (bool) – Whether to include flexibility information.
with_asa (bool) – Whether to include accessible surface area information.
- Return type
dict
- Returns
dictionary of sequence data
- clearAllCaching()¶
This method should be implemented in subclasses that cache any data.
- setGPCRAnnotations(gpcr_annotation_map)¶
- class schrodinger.protein.sequence.NucleicSequenceMeta¶
Bases:
schrodinger.protein.sequence.ProteinSequenceMeta
- ALPHABET = None¶
- class schrodinger.protein.sequence.NucleicAcidSequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)¶
Bases:
schrodinger.protein.sequence.ProteinSequence
- AnnotationClass¶
alias of
schrodinger.protein.annotation.NucleicAcidSequenceAnnotations
- ElementClass¶
- ALPHABET = None¶
- COMPLEMENT_FN = None¶
- REVERSE_COMPLEMENT_FN = None¶
- property is_kinase_cons_annotated¶
- property is_kinase_annotated¶
- isKinaseChain()¶
- getTranslation()¶
Get a translated sequence. This method uses BioPython’s translate method to convert a nucleic acid sequence into an amino acid sequence
- Returns
A translated protein sequence. The name and chain from the nucleic acid sequence are copied over
- Return type
- getReverse()¶
Get the reverse of a DNA or RNA sequence. Residues are renumbered. Creates a new Sequence; original object is unmodified.
- Returns
The reversed nucleic acid sequence, of the same type as
self
. The name and chain from the nucleic acid sequence are copied over.- Return type
- getComplement()¶
Get the complement of a DNA or RNA sequence. This method uses BioPython’s complement method for nucleic acid sequences. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.
- Returns
The complementary nucleic acid sequence, of the same type as
self
. The name and chain from the nucleic acid sequence are copied over.- Return type
- getReverseComplement()¶
Get the reverse complement of a DNA or RNA sequence. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.
- Returns
The reverse complement nucleic acid sequence, of the same type as
self
. The name and chain from the nucleic acid sequence are copied over.- Return type
- encodeForPatternSearch(with_ss=False, with_flex=False, with_asa=False)¶
Convert to sequence dict expected by
find_generalized_pattern
.- Parameters
with_ss (bool) – Whether to include secondary structure information.
with_flex (bool) – Whether to include flexibility information.
with_asa (bool) – Whether to include accessible surface area information.
- Return type
dict
- Returns
dictionary of sequence data
- class schrodinger.protein.sequence.DNASequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)¶
Bases:
schrodinger.protein.sequence.NucleicAcidSequence
- ALPHABET = mappingproxy({'DA': DeoxyribonucleotideType('A', 'DA', 'Adenine'), 'DC': DeoxyribonucleotideType('C', 'DC', 'Cytosine'), 'DG': DeoxyribonucleotideType('G', 'DG', 'Guanine'), 'DT': DeoxyribonucleotideType('T', 'DT', 'Thymine'), 'A': DeoxyribonucleotideType('A', 'DA', 'Adenine'), 'C': DeoxyribonucleotideType('C', 'DC', 'Cytosine'), 'G': DeoxyribonucleotideType('G', 'DG', 'Guanine'), 'T': DeoxyribonucleotideType('T', 'DT', 'Thymine')})¶
- static COMPLEMENT_FN(sequence, inplace=False)¶
Return the complement as a DNA sequence.
If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.
>>> my_seq = "CGA" >>> complement(my_seq) 'GCT' >>> my_seq = Seq("CGA") >>> complement(my_seq) Seq('GCT') >>> my_seq = MutableSeq("CGA") >>> complement(my_seq) MutableSeq('GCT') >>> my_seq MutableSeq('CGA')
Any U in the sequence is treated as a T:
>>> complement(Seq("CGAUT")) Seq('GCTAA')
In contrast,
complement_rna
returns an RNA sequence:>>> complement_rna(Seq("CGAUT")) Seq('GCUAA')
Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:
>>> complement("ACGTUacgtuXYZxyz") 'TGCAAtgcaaXRZxrz'
The sequence is modified in-place and returned if inplace is True:
>>> my_seq = MutableSeq("CGA") >>> complement(my_seq, inplace=True) MutableSeq('GCT') >>> my_seq MutableSeq('GCT')
As strings and
Seq
objects are immutable, aTypeError
is raised ifreverse_complement
is called on aSeq
object withinplace=True
.
- static REVERSE_COMPLEMENT_FN(sequence, inplace=False)¶
Return the reverse complement as a DNA sequence.
If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.
>>> my_seq = "CGA" >>> reverse_complement(my_seq) 'TCG' >>> my_seq = Seq("CGA") >>> reverse_complement(my_seq) Seq('TCG') >>> my_seq = MutableSeq("CGA") >>> reverse_complement(my_seq) MutableSeq('TCG') >>> my_seq MutableSeq('CGA')
Any U in the sequence is treated as a T:
>>> reverse_complement(Seq("CGAUT")) Seq('AATCG')
In contrast,
reverse_complement_rna
returns an RNA sequence:>>> reverse_complement_rna(Seq("CGAUT")) Seq('AAUCG')
Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:
>>> reverse_complement("ACGTUacgtuXYZxyz") 'zrxZRXaacgtAACGT'
The sequence is modified in-place and returned if inplace is True:
>>> my_seq = MutableSeq("CGA") >>> reverse_complement(my_seq, inplace=True) MutableSeq('TCG') >>> my_seq MutableSeq('TCG')
As strings and
Seq
objects are immutable, aTypeError
is raised ifreverse_complement
is called on aSeq
object withinplace=True
.
- class schrodinger.protein.sequence.RNASequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)¶
Bases:
schrodinger.protein.sequence.NucleicAcidSequence
- ALPHABET = mappingproxy({'A': RibonucleotideType('A', 'A', 'Adenine'), 'C': RibonucleotideType('C', 'C', 'Cytosine'), 'G': RibonucleotideType('G', 'G', 'Guanine'), 'U': RibonucleotideType('U', 'U', 'Uracil')})¶
- static COMPLEMENT_FN(sequence, inplace=False)¶
Return the complement as an RNA sequence.
If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.
>>> my_seq = "CGA" >>> complement_rna(my_seq) 'GCU' >>> my_seq = Seq("CGA") >>> complement_rna(my_seq) Seq('GCU') >>> my_seq = MutableSeq("CGA") >>> complement_rna(my_seq) MutableSeq('GCU') >>> my_seq MutableSeq('CGA')
Any T in the sequence is treated as a U:
>>> complement_rna(Seq("CGAUT")) Seq('GCUAA')
In contrast,
complement
returns a DNA sequence:>>> complement(Seq("CGAUT")) Seq('GCTAA')
Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:
>>> complement_rna("ACGTUacgtuXYZxyz") 'UGCAAugcaaXRZxrz'
The sequence is modified in-place and returned if inplace is True:
>>> my_seq = MutableSeq("CGA") >>> complement(my_seq, inplace=True) MutableSeq('GCT') >>> my_seq MutableSeq('GCT')
As strings and
Seq
objects are immutable, aTypeError
is raised ifreverse_complement
is called on aSeq
object withinplace=True
.
- static REVERSE_COMPLEMENT_FN(sequence, inplace=False)¶
Return the reverse complement as an RNA sequence.
If given a string, returns a new string object. Given a Seq object, returns a new Seq object. Given a MutableSeq, returns a new MutableSeq object. Given a SeqRecord object, returns a new SeqRecord object.
>>> my_seq = "CGA" >>> reverse_complement_rna(my_seq) 'UCG' >>> my_seq = Seq("CGA") >>> reverse_complement_rna(my_seq) Seq('UCG') >>> my_seq = MutableSeq("CGA") >>> reverse_complement_rna(my_seq) MutableSeq('UCG') >>> my_seq MutableSeq('CGA')
Any T in the sequence is treated as a U:
>>> reverse_complement_rna(Seq("CGAUT")) Seq('AAUCG')
In contrast,
reverse_complement
returns a DNA sequence:>>> reverse_complement(Seq("CGAUT"), inplace=False) Seq('AATCG')
Supports and lower- and upper-case characters, and unambiguous and ambiguous nucleotides. All other characters are not converted:
>>> reverse_complement_rna("ACGTUacgtuXYZxyz") 'zrxZRXaacguAACGU'
The sequence is modified in-place and returned if inplace is True:
>>> my_seq = MutableSeq("CGA") >>> reverse_complement_rna(my_seq, inplace=True) MutableSeq('UCG') >>> my_seq MutableSeq('UCG')
As strings and
Seq
objects are immutable, aTypeError
is raised ifreverse_complement
is called on aSeq
object withinplace=True
.
- class schrodinger.protein.sequence.NASequence(elements='', name='', origin=None, entry_id=None, entry_name='', pdb_id='', chain='', structure_chain=None, long_name='', resnums=(), disulfide_bonds=None, pred_disulfide_bonds=None)¶
Bases:
schrodinger.protein.sequence.NucleicAcidSequence
A nucleic acid sequence. Agnostic to backbone type and capable of representing bases typical to either DNA or RNA.
- ALPHABET = mappingproxy({'DA': DeoxyribonucleotideType('A', 'DA', 'Adenine'), 'DC': DeoxyribonucleotideType('C', 'DC', 'Cytosine'), 'DG': DeoxyribonucleotideType('G', 'DG', 'Guanine'), 'DT': DeoxyribonucleotideType('T', 'DT', 'Thymine'), 'A': RibonucleotideType('A', '6MA', 'Adenine'), 'C': RibonucleotideType('C', 'OMC', 'Cytosine'), 'G': RibonucleotideType('G', 'OMG', 'Guanine'), 'U': RibonucleotideType('U', 'DU', 'Uracil'), 'AMP': RibonucleotideType('A', 'AMP', 'Adenine'), 'ADP': RibonucleotideType('A', 'ADP', 'Adenine'), 'ATP': RibonucleotideType('A', 'ATP', 'Adenine'), '1MA': RibonucleotideType('A', '1MA', 'Adenine'), '6MA': RibonucleotideType('A', '6MA', 'Adenine'), 'CMP': RibonucleotideType('C', 'CMP', 'Cytosine'), 'CDP': RibonucleotideType('C', 'CDP', 'Cytosine'), 'CTP': RibonucleotideType('C', 'CTP', 'Cytosine'), '5MC': RibonucleotideType('C', '5MC', 'Cytosine'), '5HC': RibonucleotideType('C', '5HC', 'Cytosine'), '5FC': RibonucleotideType('C', '5FC', 'Cytosine'), '1CC': RibonucleotideType('C', '1CC', 'Cytosine'), 'OMC': RibonucleotideType('C', 'OMC', 'Cytosine'), 'GMP': RibonucleotideType('G', 'GMP', 'Guanine'), 'GDP': RibonucleotideType('G', 'GDP', 'Guanine'), 'GTP': RibonucleotideType('G', 'GTP', 'Guanine'), '1MG': RibonucleotideType('G', '1MG', 'Guanine'), '2MG': RibonucleotideType('G', '2MG', 'Guanine'), 'M2G': RibonucleotideType('G', 'M2G', 'Guanine'), '7MG': RibonucleotideType('G', '7MG', 'Guanine'), 'OMG': RibonucleotideType('G', 'OMG', 'Guanine'), 'UMP': RibonucleotideType('U', 'UMP', 'Uracil'), 'UDP': RibonucleotideType('U', 'UDP', 'Uracil'), 'UTP': RibonucleotideType('U', 'UTP', 'Uracil'), 'PSU': RibonucleotideType('Ψ', 'PSU', 'Uracil'), 'H2U': RibonucleotideType('U', 'H2U', 'Uracil'), '5MU': RibonucleotideType('U', '5MU', 'Uracil'), 'DU': RibonucleotideType('U', 'DU', 'Uracil'), 'TMP': DeoxyribonucleotideType('T', 'TMP', 'Thymine'), 'TDP': DeoxyribonucleotideType('T', 'TDP', 'Thymine'), 'TTP': DeoxyribonucleotideType('T', 'TTP', 'Thymine'), 'YYG': NucleotideType('N', 'YYG', 'Unknown'), 'I': NucleotideType('I', 'I', 'Unknown'), 'DI': NucleotideType('DI', 'DI', 'Unknown'), 'T': DeoxyribonucleotideType('T', 'TTP', 'Thymine'), 'Ψ': RibonucleotideType('Ψ', 'PSU', 'Uracil'), 'N': NucleotideType('N', 'YYG', 'Unknown')})¶
- getTranslation()¶
Get a translated sequence. This method uses BioPython’s translate method to convert a nucleic acid sequence into an amino acid sequence
- Returns
A translated protein sequence. The name and chain from the nucleic acid sequence are copied over
- Return type
- getComplement()¶
Get the complement of a DNA or RNA sequence. This method uses BioPython’s complement method for nucleic acid sequences. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.
- Returns
The complementary nucleic acid sequence, of the same type as
self
. The name and chain from the nucleic acid sequence are copied over.- Return type
- getReverseComplement()¶
Get the reverse complement of a DNA or RNA sequence. Supports gaps and unknown residues; does not support ambiguous residues (since NucleicAcidSequence doesn’t). Creates a new Sequence; original object is unmodified.
- Returns
The reverse complement nucleic acid sequence, of the same type as
self
. The name and chain from the nucleic acid sequence are copied over.- Return type
- class schrodinger.protein.sequence.CombinedChainSequenceMeta(cls, bases, classdict, *, wraps=None, wrapped_constants=(), wrapped_properties=(), wrapped_getters=(), wrapped_setters=())¶
Bases:
schrodinger.application.msv.utils.DocstringWrapperMetaClass
,schrodinger.protein.sequence.ProteinSequenceMeta
The metaclass for
CombinedChainProteinSequence
. This metaclass wraps the specified class attributes.
- class schrodinger.protein.sequence.GapRegion(from_start: int, from_end: int)¶
Bases:
object
Container for information about gaps to add to or remove from the start and end of a chain
- from_start: int¶
- from_end: int¶
- __init__(from_start: int, from_end: int) None ¶
- class schrodinger.protein.sequence.CombinedChainProteinSequence(seqs)¶
Bases:
schrodinger.protein.sequence.AbstractProteinSequenceMixin
,schrodinger.protein.sequence.AbstractSequence
A sequence that contains multiple chains from the same protein. Instances of this class do not directly contain any residues themselves and instead wrap one or several
ProteinSequence
objects.- Note
CombinedChainProteinSequence.visibility
properly reports entry inclusion state, but it may not correctly report entry visibility (e.g. partially visible vs. fully visible). The MSV structure icons only report inclusion state and the visibility of included entries isn’t reported anywhere in the panel, though, so this limitation doesn’t have any impact on functionality.
- AnnotationClass¶
alias of
schrodinger.protein.annotation.CombinedChainProteinSequenceAnnotations
- __init__(seqs)¶
- Parameters
seqs (list(ProteinSequence)) – A list of the split-chain sequences to wrap.
- __len__()¶
- property fullname¶
- index(res)¶
Returns the index of the specified residue
- Parameters
res (residue.Residue) – The residue to find
ignore_gaps (bool) – Whether the index returned should ignore gaps in the sequence or not.
- Raises
A ValueError if the residue is not present
- Return type
int
- Returns
The index of the residue
- indices(residues)¶
Returns the indices of all specified residues. Note that the returned integers will likely not be in the same order as the input residues.
- Parameters
res (Iterable(residue.CombinedChainResidueWrapper)) – The residues to find indices of
- Returns
The indices of the residues
- Return type
list[int]
- insertElements(index, elements)¶
Insert a list of elements or sequence element into this sequence.
If
elements
is a string or iterable of strings, residue numbers will be automatically assigned.- Parameters
index (int) – The index at which to insert elements
elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert
- mutate(start, end, elements)¶
Mutate sequence elements. See parent class for additional method documentation.
- Raises
MultipleChainsError – If the specified residue range spans multiple chains.
- assertCanMutateResidues(start, end)¶
Make sure that we can mutate the specified residues. If not, raise an exception.
- Parameters
start (int) – The index at which to start mutating
end (int) – The index of the last mutated element (exclusive)
- Raises
MultipleChainsError – If the specified residue range spans multiple chains.
- append(element)¶
Appends an element to the sequence
- Parameters
element – The element to append to this sequence
- Type
element: self.ElementClass or basestring
- extend(elements)¶
Extends the sequence with elements from an iterable
- Parameters
elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence
- getSubsequence(start, end)¶
Return a sequence containing a subset of the elements in this one. Note that the new sequence will be a split-chain sequence and will ignore any chain breaks present in the requested subset of elements.
- Parameters
start (int) – The index at which the subsequence should start
end (int) – The index at which the subsequence should end (exclusive)
- Returns
The requested subsequence
- Return type
- removeElements(eles)¶
Remove elements from the sequence.
- Parameters
eles (list(residue.AbstractSequenceElement)) – A list of elements to remove from the sequence.
- Raises
ValueError – If any of the given elements are not in the sequence.
- getStructureResForRes(res)¶
- Parameters
res (residue.Residue) – Residue to get structure residue for
- Returns
Structure residue or None if no matching residue is found
- Return type
schrodinger.structure._Residue or NoneType
- getGaplessLength()¶
- Returns
Length of this sequence ignoring gaps
- Return type
int
- addGapsByIndices(gap_idxs)¶
Add gaps to the sequence from a list of gap indices. Note that these indices are based on numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see
addGapsBeforeIndices
.- Parameters
gap_idxs (list(int)) – A list of gap indices
- addGapsToChainStartsAndEnds(gaps: List[schrodinger.protein.sequence.GapRegion])¶
Add the specified numbers of gaps to the starts and ends of each chain.
- Parameters
gaps – The numbers of gaps to add
- removeGapsFromChainStartsAndEnds(gaps: List[schrodinger.protein.sequence.GapRegion])¶
Remove the specified numbers of gaps from the starts and ends of each chain.
- Parameters
gaps – The numbers of gaps to remove
- validateGapsToRemoveFromChainStartAndEnds(gaps: List[schrodinger.protein.sequence.GapRegion])¶
Make sure that we can remove the specified numbers of gaps from the starts and ends of each chain.
- Parameters
gaps – The numbers of gaps to remove
- Raises
AssertionError – If some of the sequence elements to be removed aren’t actually gaps.
- property disulfide_bonds¶
- Returns
A sorted tuple of the valid disulfide bonds.
- Return type
- isKinaseChain()¶
- property pred_disulfide_bonds¶
- indexToSeqAndIndex(index)¶
Convert a combined-chain residue index to a split-chain sequence and a residue index within the specified sequence.
- Parameters
index (int) – A valid combined-chain residue index
- Returns
A tuple of - the split-chain sequence - residue index - the starting index of the split-chain sequence
- Return type
tuple(ProteinSequence, int, int)
- property chain¶
- property chains¶
- property chain_offsets¶
- hasChain(chain_name)¶
Does this sequence contain a chain with the specified name?
- Parameters
chain_name (str) – The chain name to check
- Return type
bool
- addChain(seq)¶
Add a new chain to this sequence.
- Parameters
seq (ProteinSequence) – The chain to add
- removeChain(seq)¶
Remove a chain from this sequence. Note that you should not remove the last chain; instead, remove this sequence from the alignment.
- Parameters
seq (ProteinSequence) – The chain to remove
- removeChains(seqs)¶
Remove multiple chains from this sequence. Note that you should not all chain from a combine-chain sequence; instead, remove the sequence from the alignment.
- Parameters
seqs (list[ProteinSequence]) – The chains to remove
- insertElementByChain(index, chain, element)¶
Add the given element to the specified chain of a sequence.
- Parameters
index (int) – The index to insert the element at. Note that this is the index in the sequence, not the chain.
chain (sequence.ProteinSequence) – The chain to insert into
element (str or residue.AbstractSequenceElement) – The element to insert
- offsetForChain(chain)¶
Get the combined-chain residue index for the first residue of the specified chain.
- Parameters
chain (ProteinSequence) – The chain
- Returns
The offset
- Return type
int
- clearAllCaching()¶
This method should be implemented in subclasses that cache any data.
- ElementClass¶
alias of
schrodinger.protein.residue.Residue
- property entry_id¶
- property entry_name¶
- getStructure(*args, **kwargs)¶
- Returns
The associated structure. Will return None if there is no associated structure.
- Return type
schrodinger.structure.Structure or NoneType
- hasStructure(*args, **kwargs)¶
- Returns
Whether this sequence has an associated structure.
- Return type
bool
- property long_name¶
- property name¶
- property origin¶
- property pdb_id¶
- setStructure(*args, **kwargs)¶
Set the associated structure. Can only be used on sequences with an associated structure.
- Parameters
struc (schrodinger.structure.Structure) – The new structure for this sequence
- Raises
RuntimeError – If there’s no structure associated with this sequence object.
- property visibility¶
- exception schrodinger.protein.sequence.MultipleChainsError¶
Bases:
ValueError
An exception raised when the specified indices span multiple chains but the operation can only be carried out on a single chain.
- class schrodinger.protein.sequence.StructureSequence(st, atoms)¶
Bases:
schrodinger.structure._structure._AtomCollection
Class representing a sequence of protein residues.
- property residue¶
Returns residue iterator for all residues in the sequence
- schrodinger.protein.sequence.get_structure_sequences(st)¶
Iterates over all sequences in the given structure.
- schrodinger.protein.sequence.find_generalized_pattern(sequence_list, pattern, validate_pattern=False)¶
Finds a generalized sequence pattern within specified sequences. NOTE: The search is performed in the forward direction only.
- Parameters
sequence_list – list of sequence dictionaries to search.
pattern (str) –
Pattern defined using extended PROSITE syntax.
standard IUPAC one-letter codes are used for all amino acids
each element in a pattern is separated using ‘-’ symbol
symbol ‘x’ is used for position where any amino acid is accepted
ambiguities are listed using the acceptable amino acids between square brackets, e.g. [ACT] means Ala, Cys or Thr
amino acids not accepted for a given position are indicated by listing them between curly brackets, e.g. {GP} means ‘not Gly and not Pro’
repetition is indicated using parentheses, e.g. A(3) means Ala-Ala-Ala, x(2,4) means between 2 to 4 any residues
the following lowercase characters can be used as additional flags:
’x’ means any amino acid
’a’ means acidic residue: [DE]
’b’ means basic residue: [KR]
’o’ means hydrophobic residue: [ACFILPWVY]
’p’ means aromatic residue: [WYF]
’s’ means solvent exposed residue
’h’ means helical residue
’e’ means extended residue
’f’ means flexible residue
Each position can optionally by followed by @<res_num> expression that will match the position with a given residue number.
Entire pattern can be followed by :<index> expression that defines a ‘hotspot’ in the pattern. When the hotspot is defined, only a single residue corresponding to (pattern_match_start+index-1) will be returned as a match. The index is 1-based and can be used to place the hotspot outside of the pattern (can also be a negative number).
Pattern examples:
N-{P}-[ST] : Asn-X-Ser or Thr (X != Pro)
N[sf]-{P}[sf]-[ST][sf] : as above, but all residues flexible OR solvent exposed
Nsf-{P}sf-[ST]sf : as above, but all residues flexible AND solvent exposed
Ns{f} : Asn solvent exposed AND not in flexible region
N[s{f}] : Asn solvent exposed OR not in flexible region
[ab]{K}{s}f : acidic OR basic, with exception of Lys, flexible AND not solvent exposed
Ahe : Ala helical AND extended - no match possible
A[he] : Ala helical OR extended
A{he} : Ala coiled chain conformation (not helical nor extended)
[ST] : Ser OR Thr
ST : Ser AND Thr - no match possible
validate_pattern (boolean) – If True, the function will validate the pattern without performing the search (the sequences parameter will be ignored) and return True if the pattern is valid, or False otherwise. The default is False.
- Return type
list of lists of integer tuples or False if the pattern is invalid
- Returns
False if the specified input pattern was incorrect. Otherwise, it returns a list of lists of matches for each input sequence. Each match is a (start, end) tuple where start and end are matching sequence positions.
- schrodinger.protein.sequence.convert_structure_sequence_for_pattern_search(seq, sasa_by_atom=None)¶
Converts a StructureSequence object to dictionary required by find_generalized_pattern function. Because the conversion can be time consuming, it should be done once per sequence.
Optionally a list of atom SASAs for each atom in the CT can be specified. If it’s not specified, it will get calculated by calling analyze.calculate_sasa_by_atom().
- Parameters
seq (
StructureSequence
) – StructureSequence objectsasa_by_atom (list) – list of atom SASAs
- Return type
dict
- Returns
Dictionary of sequence information
- schrodinger.protein.sequence.find_pattern(seq, pattern)¶
Find pattern matches in a specified StructureSequence object. Returns a list of matching positions.
- Parameters
seq (
StructureSequence
) – StructureSequence objectpattern (string) – Sequence pattern. The syntax is described in find_generalized_pattern.
- Return type
list of lists of integer tuples or None
- Returns
None if the specified input pattern was incorrect. Otherwise, it returns a list of lists of matches for each residue position in the input structure. Each match is a (start, end) tuple where start and end are matching sequence positions. If ‘hotspot’ is specified then start = end.
- schrodinger.protein.sequence.assign_residue_numbers(residues, start_res=None, end_res=None)¶
Assign residue numbers to the given residues based on the residues before and after
- Parameters
residues (list[residue.Residue]) – Residues that need numbering. Will be modified in-place.
start_res (residue.Residue or NoneType) – Previous residue. Pass None if the residues are N-terminal
end_res (residue.Residue or NoneType) – Next residue. Pass None if the residues are C-terminal
- schrodinger.protein.sequence.gen_resnums_and_inscodes(start_resnum, start_inscode, end_resnum, end_inscode)¶
Create a list of all residue numbers/insertion code combinations possible between the given endpoints. If the ending residue number and insertion code are less than or equal to the starting residue number and insertion code, then an empty list will be returned.
- Parameters
start_resnum (int) – The starting residue number.
start_inscode (str) – The starting insertion code.
end_resnum (int) – The ending residue number.
end_inscode (str) – The ending insertion code.
- Returns
A list of residue numbers and insertion codes
- Return type
list[tuple[int, str]]
- schrodinger.protein.sequence.get_pairwise_sequence_similarity(chain1, chain2, consider_gap=True, method='muscle')¶
Given two single chain sequences, align them, and return sequence similarity among them.
- Parameters
chain1 (
structure._Chain
) – The first sequence chain.chain2 (
structure._Chain
) – The second sequence chain.consider_gap (bool) – Whether or not to consider gaps in the alignment, default to true.
method (string) – Which alignment method to use (‘muscle’ or ‘clustalw’)
- Returns
Sequence similarity of the alignment of the two.
- Return type
float, between 0.0 and 1.0
- schrodinger.protein.sequence.create_alignment_from_chains(chains)¶
Return
ProteinAlignment
object comprised of two chains- Parameters
chains (iterable(structure._Chain)) – Chains to be aligned
- schrodinger.protein.sequence.align_alignment(aln, second_aln=None, method='muscle')¶
Perform alignment from an ProteinAlignment object
- Parameters
aln (
ProteinAlignment
) – Alignment datamethod (string) – Which method/program to use
- Returns
Aligned sequences
- Return type
ProteinAlignment
- schrodinger.protein.sequence.align_from_chains(chains, method='muscle')¶
Perform alignment on a series of chains
- Parameters
chains (iterable(structure._Chain)) – Chains to be aligned
method (string) – Which method/program to use (choices ‘muscle’, ‘clustalw’)
- Returns
Aligned sequences
- Return type
ProteinAlignment
- schrodinger.protein.sequence.get_aligned_residues(st1, st2, method='muscle')¶
This generator will yield 2 structure._Residue objects - one from each structure - for each position in aligned sequences.
- Parameters
st1 (
structure.Structure
) – First structure.st2 (
structure.Structure
) – Second structure
- Returns
Generates tuples of 2 residues that align at each position.
- Return type
generator(structure._Residue or None, structure._Residue or None)
- Raises
ValueError – if structures don’t have equivalent chains.
- schrodinger.protein.sequence.get_aligned_structure_residues(sts, method='muscle')¶
This generator will yield 2 structure._Residue objects - one from each structure for each position in aligned sequences.
- Parameters
sts (list(structure.Structure)) – Structures to align
- Returns
Generates lists of residues that align at each position.
- Return type
generator(list[structure._Residue or None])
- schrodinger.protein.sequence.offset_indices(indices)¶
Offset insertion indices based on numbering before insertion to reflect numbering after insertion.
For example, [1, 1, 2, 3, 5, 8] would be changed to [1, 2, 4, 6, 9, 13]