schrodinger.protein.alignment module¶
Classes for working with sequences containing alignment information (gaps) and collections thereof.
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.protein.alignment.ResidueSimilarity(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- Identical = 1¶
- Similar = 2¶
- Dissimilar = 3¶
- NA = 4¶
- exception schrodinger.protein.alignment.AnchoredResidueError(message=None, blocking_anchors=<object object>)¶
Bases:
RuntimeError
Exception to indicate that an action would break anchors.
When possible, the specific anchors that block the action are stored on the exception instance.
- Variables
blocking_anchors (list[residue.Residue] or object) – Anchors that block the action or
ALL_ANCHORS
if all anchors block the action.
- ALL_ANCHORS = <object object>¶
- __init__(message=None, blocking_anchors=<object object>)¶
- Parameters
message (str) – Exception message
blocking_anchors (list[residue.Residue]) – Anchored residues that block the action
- exception schrodinger.protein.alignment.StructuredResidueError¶
Bases:
RuntimeError
- class schrodinger.protein.alignment.AlignmentSignals¶
Bases:
PyQt6.QtCore.QObject
A collection of signals that can be emitted by an alignment
- Variables
domainsChanged (
QtCore.pyqtSignal
) – TODOsequencesAboutToBeInserted (
QtCore.pyqtSignal
) – A signal emitted before sequences are inserted into the alignment. Emitted with: (The index of the first sequence to be inserted, The index of the last sequence to be inserted)sequencesInserted (
QtCore.pyqtSignal
) – A signal emitted after sequences are inserted into the alignment. Emitted with: (The index of the first sequence inserted, The index of the last sequence inserted)sequencesAboutToBeRemoved (
QtCore.pyqtSignal
) – A signal emitted before sequences are removed from the alignment. Emitted with: (The index of the first sequence to be removed, The index of the last sequence to be removed)sequencesRemoved (
QtCore.pyqtSignal
) – A signal emitted after sequences are removed from the alignment. Emitted with: (The index of the first sequence removed, The index of the last sequence removed)sequenceResiduesChanged (
QtCore.pyqtSignal
) – A signal emitted after the contents of a sequence have changed. Note that this signal may also be emitted in response to a sequence changing length, as positions in the alignment may switch from blank to occupied or vice versa.sequencesAboutToBeReordered – Signal emitted before reordering sequences
sequencesReordered – Signal emitted after sequences have been reordered
sequenceNameChanged (
QtCore.pyqtSignal
) – A signal emitted after a sequence has changed names. Emitted with: (The modified sequence)annotationTitleChanged (
QtCore.pyqtSignal
) – A signal emitted after a sequence’s annotation has changed titles. Emitted with: (The sequence whose annotation title has been modified)alignmentNumColumnsAboutToChange (
QtCore.pyqtSignal
) – A signal emitted before the alignment changes length. Emitted with: (The current length of the alignment, The new length of the alignment)alignmentNumColumnsChanged (
QtCore.pyqtSignal
) – A signal emitted after the alignment changes length. Emitted with: (The old length of the alignment, The current length of the alignment)residuesAboutToBeRemoved (
QtCore.pyqtSignal
) – A signal emitted before residues are to be removed. Emitted with a list of the residues to be removed.residuesRemoved (
QtCore.pyqtSignal
) – A signal emitted after residues are removed. This signal is not emitted with any parameters, but the residues that were removed were listed with the corresponding residuesAboutToBeRemoved signal.residuesAdded (
QtCore.pyqtSignal
) – A signal emitted with added residues. Note that this signal will be only be emitted once even if residues are added to multiple sequences. In addition, each individual sequence will emit a lengthChanged signal.sequenceVisibilityChanged (
QtCore.pyqtSignal
) – A signal emitted when visibility of a sequence changes. Emitted with: (the sequence whose visibility is changing, the index of the sequence)sequenceStructureChanged (
QtCore.pyqtSignal
) – A signal emitted when structure of a sequence changes. Emitted with: (the sequence whose visibility is changing, the index of the sequence)alignmentAboutToBeCleared (
QtCore.pyqtSignal
) – A signal emitted just before all sequences are removed from the alignment.alignmentCleared (
QtCore.pyqtSignal
) – A signal emitted just after all sequences have been removed from the alignment.anchoredResiduesChanged – A signal emitted when one or more residues are anchored or unanchored.
alnSetsChanged (QtCore.pyqtSignal) – A signal emitted when the alignment set for one or more sequences changes.
- Type
sequencesAboutToBeReordered:
QtCore.pyqtSignals
- Type
sequencesReordered:
QtCore.pyqtSignals
- domainsChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- invalidatedDomains¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- descriptorsCleared¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequencesAboutToBeInserted¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequencesInserted¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequencesAboutToBeRemoved¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequencesRemoved¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequenceResiduesChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequencesAboutToBeReordered¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequencesReordered¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequenceNameChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- annotationTitleChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- alignmentNumColumnsAboutToChange¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- alignmentNumColumnsChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- residuesAboutToBeRemoved¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- residuesRemoved¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- residuesAdded¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequenceVisibilityChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- sequenceStructureChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- alignmentAboutToBeCleared¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- alignmentCleared¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- anchoredResiduesChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- alnSetChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- secondaryStructureChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- predictionsChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- pfamChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- kinaseFeaturesChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- kinaseConservationChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- property aln¶
Return the alignment that this signals object is reporting for. :rtype: BaseAlignment
- emitSeqResChanged()¶
- emitSeqNameChanged()¶
- emitAnnTitleChanged()¶
- allSignals()¶
Iterate over all signals in this object in alphabetical order. :rtype: Iter(QtCore.pyqtBoundSignal)
- allSignalsAndNames()¶
Iterate over all signals in this object and their names in alphabetical order. :rtype: Iter(tuple(QtCore.pyqtBoundSignal, str))
- class schrodinger.protein.alignment.BaseAlignment(sequences=None)¶
Bases:
PyQt6.QtCore.QObject
Abstract base class for classes which handle alignment of various sequences and corresponding annotations.
This is a pure domain object intended to make it easy to work with aligned collections of sequences.
Some methods are decorated with @msv_utils.const in order to make it easy to write a wrapper for this class that supports undo/redo operations.
- Variables
_ALN_ANNOTATION_CLASS (type) – The class for alignment annotations. This value should be overridden in subclasses.
_SEQ_ANNOTATION_CLASS (type) – The class for sequence annotations. This value should be overridden in subclasses.
- __init__(sequences=None)¶
- Parameters
sequences (list) – An optional iterable of sequences
- __len__()¶
Returns the number of sequences in the alignment
- __contains__(seq)¶
Returns whether the sequence is present in the alignment
- property annotations¶
- property global_annotations¶
Returns the alignment-level annotations available for the alignment
- property seq_annotations¶
Returns the sequence-level annotations available for sequences held in the alignment
- getGlobalAnnotationData(index, annotation)¶
Returns column-level annotation data at an index in the alignment
- Parameters
index (int) – The index in the alignment
annotation (
enum.Enum
) – An enum representing the requested annotation, if any
- property num_columns¶
- getWorkspaceCounts()¶
Summarize the visibility status of the alignment’s sequences
- Returns
Counts of each type of visibility
- Return type
collections.Counter
- index(seq)¶
Returns the index of the specified sequence.
- Parameters
seq (
sequence.Sequence
) – The requested sequence- Return type
int
- Returns
The index of the requested sequence
- reorderSequences(seq_indices)¶
Reorder the sequences in the alignment using the specified list of indices.
In the undoable version of this class, the private function is needed to perform the operation in an undoable operation.
- Parameters
seq_indices – A list with the new indices for sequences
- Type
list of int
- Raises
ValueError – In the event that the list of indices does not match the length of the alignment
- sortByProperty(seq_prop, reverse=False)¶
Sort the sequences by a sequence property. Sequences that do not have the sequence property defined will be grouped at the end of the alignment (regardless of
reverse
)
- sort(*, key, reverse=False)¶
Sort the alignment by the specified criteria.
NOTE: Query sequence is not included in the sort.
- Parameters
key (function) – A function that takes a sequence and returns a value to sort by for each sequence. (required keyword-only argument)
reverse (bool) – Whether to sort in reverse (descending) order.
- addSeq(seq, index=None)¶
- Parameters
seq (
sequence.Sequence
) – The sequence to addstart (int) – The index at which to insert; if None, seq is appended
- addSeqs(sequences, start=None)¶
Add multiple sequences to the alignment
- Parameters
sequences (list of
sequence.Sequence
) – Sequences to addstart (int) – The index at which to insert; if None, seqs are appended
- removeSeq(seq)¶
Remove a sequence from the alignment
- Parameters
seq (
sequence.Sequence
) – The sequence to remove
- removeSeqs(seqs)¶
Remove multiple sequences from the alignment
- clear()¶
Clears the entire alignment of sequences
- setReferenceSeq(seq)¶
Set the specified sequence as the reference sequence.
- Parameters
seq (
sequence
) – Sequence to set as reference sequence
- getReferenceSeq()¶
Returns the sequence that has been set as reference sequence or None if there is no reference sequence.
- Returns
The reference sequence or None
- Return type
Sequence
or None
- isReferenceSeq(seq)¶
Return whether or not a sequence is the reference sequence.
- Parameters
seq (
Sequence
) – Sequence to check- Returns
True if the sequence is the reference sequence, False otherwise.
- Return type
bool
- getResidueIndices(residues, sort=True)¶
Returns the indices (in the alignment) of the specified residues
- Parameters
residues (list[residue.AbstractSequenceElement]) – The list of residues and gaps to get indices for.
sort (bool) – Whether the returned list should be sorted.
- Return type
A list of (sequence index, residue index) tuples
- Returns
list[tuple(int, int)]
- removeElements(elements)¶
Removes the specified elements from the alignment.
- Parameters
elements (iterable(residue.AbstractSequenceElement)) – An iterable of elements.
- Raises
AnchoredResidueError – if the elements cannot be removed
StructuredResidueError – In the event that this method attempts to mutate structured residues
- mutateResidues(seq_i, start, end, elements)¶
Mutate a sequence.
- Parameters
seq_i (int) – Index of seq to mutate
start (int) – Start index of seq region to mutate
end (int) – End index of seq region to mutate
elements (iterable(str) or iterable(ElementClass)) – Elements to mutate to
- Raises
AnchoredResidueError – if the mutation violates anchoring
StructuredResidueError – In the event that this method attempts to mutate structured residues
- replaceResiduesWithGaps(residues)¶
Replaces the specified residues with gaps
- Parameters
residues (list) – A list of residues to replace with gaps
- addElements(seq, res_i, elements)¶
Adds the specified elements (residues and/or gaps) to the alignment.
- Parameters
seq (sequence.Sequence) – A sequence in the alignment
res_i (int) – Index to insert the elements
elements (iterable(str or residue.AbstractSequenceElement)) – elements to insert
- getResiduesWithStructure()¶
Returns a list of all residues with structure
- modifyingStructure()¶
- suspendAnchors()¶
While inside this context, all anchors will be ignored. Upon exit, the anchors will be restored and an exception will be raised if any of the anchors are not aligned to the same reference residues they were aligned to at the start.
- anchorResidues(residues)¶
Anchor the specified residues. If passed reference residues, all residues aligned to the reference residues will be anchored.
Anchored residues are constrained to stay aligned to the reference residue with the same column index at the time of anchoring. If elements are removed from the alignment, gaps are added before anchors to maintain alignment. If any other modifications are made to the alignment that would break an anchor, an exception is raised. However, calling code can temporarily take responsibility for maintaining the anchors within the
suspendAnchors
context.- Parameters
residues (list(residue.Residue)) – Residues to anchor.
- getAnchoredResidues()¶
- Returns
A frozenset of residues that are currently anchored.
- Return type
frozenset(residue.Residue)
- getAnchoredResiduesWithRef()¶
- Returns
A frozenset of residues that are currently anchored with the corresponding reference sequence residues
- Return type
frozenset(residue.Residue)
- clearAnchors()¶
- removeAnchors(residues)¶
Unanchor residues. If passed reference residues, all residues anchored to those reference residues will be unanchored. Any given unanchored residues will be ignored.
- Parameters
residues (iterable(residue.Residue)) – The residues to unanchor.
- getSubalignment(start, end)¶
Return another alignment containing the elements within the specified start and end indices
- Parameters
start (int) – The index at which the subalignment should start
end (int) – The index at which the subalignment should end (exclusive)
- Returns
An alignment corresponding to the start and end points specified
- Return type
BaseAligment
- getDiscontinuousSubalignment(indices)¶
Given a list of indices, return a new alignment of sequences made up of the residues at those specified indices within this alignment.
- Parameters
indices (list of (int, int)) – List of (seq index, residue index) tuples
- Returns
A new subalignment
- Return type
- removeSubalignment(start, end)¶
Remove a block of the subalignment from the start to end points.
- Parameters
start (int) – The start index of the columns to remove
end (int) – The end index of the columns to remove (exclusive)
- property is_rectangular¶
- insertSubalignment(aln, start)¶
Insert an alignment into the current alignment at the specified index
- Parameters
aln (
BaseAlignment
) – The alignment to insertstart (int) – The index at which to insert the alignment
- Raises
ValueError – if either alignment is not rectangular
- replaceSubalignment(aln, start, end)¶
Replace a subsection of the alignment indicated by start and end indices with the specified alignment
- Parameters
aln (
BaseAlignment
) – The alignment to insertstart (int) – The starting index of the subsection to replace.
end (int) – The ending index of the subsection to replace.
- Raises
ValueError – if either alignment is not rectangular
- getGaps()¶
Returns a list of list of gaps.
- Returns
list(list(residue.Gap))
- Return type
list
- getTerminalGaps()¶
Returns the terminal gaps in all the sequences
- Return type
list
- Returns
list(list(residue.Gap))
- removeAllGaps()¶
Removes all the gaps of the sequences in the alignment.
- removeTerminalGaps()¶
Removes the gaps from the ends of every sequence in the alignment
- addGapsByIndices(gap_indices)¶
Adds gaps to the alignment
- Note
the length of the gap_indices list must match the number of sequences in the alignment.
- Parameters
gap_indices (list[list[int]]) – A list of lists of gap indices, one for each sequence in the alignment. Note that these indices are based on residue/gap numbering after the insertion. To insert gaps using indices based on numbering before the insertion, see
addGapsBeforeIndices
.- Raises
ValueError – if
gap_indices
is the wrong lengthAnchoredResidueError – if any gap index is before an anchored col
- addGapsBeforeIndices(gap_indices)¶
Add one gap to the alignment before each of the specified residue positions.
- Note
the length of the gap_indices list must match the number of sequences in the alignment.
- Parameters
gap_indices – A list of lists of indices to insert gaps before, one for each sequence in the alignment. Note that these indices are based on residue/gap numbering before the insertion. To insert gaps using indices based on numbering after the insertion, see
addGapsByIndices
.
- padAlignment()¶
Insert gaps into an alignment so that it forms a rectangular block
- getGapOnlyColumns()¶
For each sequence, return a list of the indices in that sequence for which the entire alignment contains gaps. (Indices will be omitted for a sequence if the sequence is shorter than the index.)
- Returns
List of list of indices
- Return type
list[list[int]]
- minimizeAlignment()¶
Minimizes the alignment, i.e. removes all gaps from the gap-only columns.
- getAlignmentMinimizedWithSpaces()¶
This method returns a new alignment and removes gap only columns however it leaves one gap column between blocks
- Returns
the new, minimized alignment
- Return type
- getColumn(index, omit_gaps=False)¶
Returns single alignment column at index position. Optionally, filters out gaps if omit_gaps is True.
- Parameters
index (int) – The index in the alignment
omit_gaps (bool) – Whether to omit the gaps
- Returns
Single alignment column at index position. Returns None to represent terminal gaps.
- Return type
tuple(residue.Residue or residue.Gap or None)
- columns(omit_gaps=False, *, match_type=False)¶
A generator over all columns.
- Parameters
omit_gaps (bool) – Whether to omit gaps
match_type (bool) – Whether to match reference sequence type
- seqMatchesRefType(seq)¶
- getSeqsMatchingRefType()¶
- columnHasAllSameResidues(index)¶
Return whether or not the column at a specified index has all the same residues (excluding gaps).
Note that if any unknown residues are present, the column will not be considered to be of all the same residue type.
- Parameters
index (int) – Index to check for uniformity
- Returns
True if the column is of uniform identity, False otherwise.
- Return type
bool
- getResidueSimilarity(res)¶
Return the similarity score of a residue to the current reference residue at the residues position in the alignment.
- Parameters
res (
residue.Residue
) – Residue to get the similarity score for- Returns
Similarity score for this residue
- Return type
float or None
- elementsToContiguousColumns(elements, invert=False, additional_breaks=None, last_col=None)¶
Get elements marking contiguous columns containing any of the passed elements
- Parameters
elements (iterable(AbstractSequenceElement)) – Elements to convert to columns
invert (bool) – Whether to invert logic (i.e. return columns not containing the passed elements)
additional_breaks (list[int] or None) – If given, contiguous columns will be broken at the specified indices. I.e., no contiguous set of columns will contain both column i and column i-1.
last_col (int or None) – If given, the last column to consider when constructing contiguous columns. It not given, all columns will be considered.
- Returns
[start, end] elements of contiguous columns. Will be from the ref sequence unless the ref sequence is shorter than
num_columns
- Return type
iterable(tuple(AbstractSequenceElement, AbstractSequenceElement))
- clearAllCaching()¶
- addSeqsToAlnSet(seqs, set_name)¶
Add all given sequences to the specified alignment set (i.e. a named group of sequences that are always kept together in the alignment). Sequences already in the set will be ignored. All other sequences will be moved to the end of the set. (Except for the reference sequence: The specified set will be moved to the top of the alignment if the reference sequence is added.)
- Parameters
seqs (Iterable[sequence.Sequence]) – The sequences to add to the set.
set_name (str) – The name of the set to add the sequences to. If no set of this name exists, one will be created.
- removeSeqsFromAlnSet(seqs)¶
Remove all given sequences from any alignment sets they’re part of. Sequences not in a set will be ignored. All other sequences will be moved to the end of the set that they were in.
- Parameters
seqs (Iterable[sequence.Sequence]) – The sequences to remove from alignment sets.
- renameAlnSet(old_name, new_name)¶
Rename the specified alignment set.
- Parameters
old_name (str) – The old name of the alignment set.
new_name (str) – The new name of the alignment set.
- alnSetForSeq(seq)¶
Return the alignment set that contains the given sequence.
- Parameters
seq (sequence.Sequence) – The sequence to retrieve the alignment set for.
- Returns
The requested set. The calling scope must not modify the returned value. Will return None if
seq
is not part of any set.- Return type
AlignmentSet or None
- hasAlnSets()¶
Does this alignment contain any alignment sets? :rtype: bool
- alnSetNames()¶
Return all alignment set names. :rtype: set(str)
- alnSets()¶
Iterate through all alignment sets.
- Returns
An iterator through all alignment sets. The calling scope must not modify any of the sets.
- Return type
dict_keys
- getAlnSet(set_name)¶
Return the requested set.
- Parameters
set_name (str) – The name of the set to retrieve.
- Returns
The requested set. The calling scope must not modify the returned value.
- Return type
- Raises
ValueError – If no set with the given name was found.
- gatherAlnSets()¶
- getFrequencies(normalize=True)¶
Returns the frequencies of each residue in each column. Residues are sorted by decreasing frequency. Gapped positions are not counted when calculating frequencies.
- Parameters
normalize (bool) – Whether to normalize the values; i.e. divide by the number of non-gaps in the column
- Returns
frequencies of each residue in each alignment column
- Return type
tuple(tuple(residue.Residue, float or int)))
- getResidueSeqProps(value_types=None)¶
Get a list of all sequence properties that any residue has. If ‘value_types’ is defined, get only the specific property types listed.
- Parameters
value_types (List) – list of specific properties types- str, int or float etc as structure.PROP_STRING,structure.PROP_INTEGER etc
- Returns
All the sequence properties
- Return type
- getSeqsDescriptors()¶
Return a list of all the calculated descriptors of the sequences in the alignment.
- Returns
All the sequence descriptors
- Return type
- property all_structures¶
Return an iterator over all sequence structures in the alignment. This does not repeat structures that belong to multiple sequences.
- class schrodinger.protein.alignment.AlignmentSet(name, set_id)¶
Bases:
set
A named group of sequences that are always kept together in the alignment.
- __init__(name, set_id)¶
- Parameters
name (str) – The name of the alignment set.
set_id (int) – A unique integer ID for the alignment set. Used to determine the color of the icon and text.
- class schrodinger.protein.alignment.ProteinAlignment(sequences=None)¶
Bases:
schrodinger.models.json.JsonableClassMixin
,schrodinger.protein.alignment._ProteinAlignment
- toJsonImplementation()¶
Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.
- Returns
A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders
- classmethod fromJsonImplementation(json_obj)¶
Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.
- Parameters
json_dict (dict) – A dictionary loaded from a JSON string or file.
- Returns
An instance of the derived class.
- Return type
cls
- classmethod adapter48002(json_dict)¶
- addDisulfideBond(res1, res2, known=True)¶
Add a disulfide bond if both residues’ sequences are in the alignment
- Parameters
res1 (residue.Residue) – A residue to link with a disulfide bond
res2 (residue.Residue) – Another residue to link with a disulfide bond
known (bool) – Whether the bond is known or predicted
- Raises
ValueError – if either sequence is not in the alignment
- removeDisulfideBond(bond)¶
Disconnect a disulfide bond. The bond may be either known or predicted.
- Parameters
bond (residue.DisulfideBond) – The bond to disconnect
- Raises
ValueError – if either sequence is not in the alignment
- classmethod fromStructure(ct, eid=None)¶
- Parameters
ct (schrodinger.structure.Structure) – The structure to convert
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure, if any, will be used.
- Return type
cls
- Returns
An alignment containing the sequences in the structure
- classmethod fromClustalFile(file_name)¶
Returns alignment read from file in Clustal .aln format preserving order of sequences.
- Parameters
file_name (str) – Source file name.
- Raises
IOError – If output file cannot be read.
- Returns
An alignment
- Note
The alignment can be empty if no sequence was present in the input file.
- toClustalFile(file_name, use_unique_names=True)¶
Writes aln to a Clustal alignment file.
- Raises
IOError – If output file cannot be written.
- Parameters
file_name (str) – Destination file name.
use_unique_names (bool) – If True, write unique name for each sequence.
- classmethod fromFastaFile(file_name)¶
Returns alignment read from file in Clustal .aln format preserving order of sequences.
- Raises
IOError – If the input file cannot be read.
- Parameters
file_name (str) – name of input FASTA file
- Returns
Read alignment. The alignment can be empty if no sequence was present in the input file.
- Return type
- classmethod fromFastaString(lines)¶
Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
lines (list of str) – list of strings representing FASTA file
- Returns
The alignment
- Return type
- classmethod fromFastaStringList(strings)¶
Return an alignment object created from an iterable of sequence strings
- Parameters
strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
- Returns
The alignment
- Return type
- toFastaString(use_unique_names=True, maxl=50)¶
Convert ProteinAlignment object to list of sequence strings
- Parameters
aln (
ProteinAlignment
) – Alignment data
- toFastaStringList()¶
Convert self to list of fasta sequence strings
- Return type
list
- Returns
list of str
- toFastaFile(file_name, use_unique_names=True, maxl=50)¶
Write self to specified FASTA file
- Raises
IOError – If output file cannot be written.
- findPattern(pattern)¶
Finds a specified PROSITE pattern in all sequences.
- Parameters
pattern (str) – PROSITE pattern to search in sequences. See
protein.sequence.find_generalized_pattern
for documentation.- Returns
List of matching residues
- Return type
list of
protein.residue.Residue
- class schrodinger.protein.alignment.NucleicAcidAlignment(sequences=None)¶
- class schrodinger.protein.alignment.CombinedChainProteinAlignment(sequences=None, *, chains_to_combine=None)¶
Bases:
schrodinger.protein.alignment._ProteinAlignment
An alignment containing combined-chain sequences (
sequence.CombinedChainProteinSequence
objects).- __init__(sequences=None, *, chains_to_combine=None)¶
- Parameters
sequences (list[sequence.ProteinSequence] or list[sequence.CombinedChainProteinSequence]) – A list of split-chain or combined-chain sequences to add to the alignment. If not given, an empty alignment will be created.
chains_to_combine (list[list[int]]) – Information about which split-chain sequences in
split_undoable_aln
should be included in which combined-chain sequence. Should be a list of lists of indices. Each index refers to the split-chain sequence at that position ofsplit_undoable_aln
, and split-chain sequences that are listed together will be combined into the same combined-chain sequence. Each split-chain sequence fromsplit_undoable_aln
must be referenced exactly once.
- addSeqs(seqs, start=None)¶
Add multiple sequences to the alignment. Note that either single-chain sequences or combined-chain sequences may be added (but not both at the same time).
- Parameters
sequences (list[sequence.ProteinSequence] or list[sequence.CombinedChainProteinSequence]) – Sequences to add
start (int) – The index at which to insert; if None, seqs are appended. Must be None if adding single-chain sequences.
- removeSeqs(seqs)¶
Remove multiple sequences from the alignment. Note that either single- chain sequences or combined-chain sequences may be added (but not both at the same time).
- Parameters
sequences (Iterable[sequence.Sequence] or Iterable[sequence.CombinedChainProteinSequence]) – Sequences to remove
- combinedSeqForSplitSeq(split_seq)¶
Get the combined-chain sequence that contains the given split-chain sequence.
- Parameters
split_seq (sequence.Sequence) – The split-chain sequence
- Returns
The combined-chain sequence
- Return type
- combinedResForSplitRes(split_res)¶
Get the combined-chain residue for the given split-chain residue.
- Parameters
res (residue.AbstractSequenceElement) – The split-chain residue
- Returns
The combined-chain residue
- Return type
- getInterChainAnchors()¶
Return all residues that are anchored to a different chain of the reference sequence (e.g. a residue in the second chain anchored to a reference residue from the first chain).
- Returns
The anchored residues.
- Return type
set[residue.Residue]
- alignChainStarts()¶
Align chain starting positions (e.g. make sure that the start of the N-th chain occurs in the same column for all sequences). This method will add gaps at the starts and/or ends of chains to preserve anchoring.
- Returns
A tuple of:
A list of chain starting indices. This will not include the starting index of the first chain, which is always 0.
The starting index of the first chain for which there’s no corresponding reference chain (e.g. the starting index for the third chain if there are only two chains in the reference sequence). This will be None if there are no chains without a corresponding reference chain.
- Return type
tuple(list[int], int or None)
- adjustChainStarts(adjust_by)¶
Move each chain break position to the right by the specified number of gaps. Note that chain breaks can only be moved along gaps, not residues.
- Parameters
num_gaps (list[list[int]]) – The number of gaps to move each chain break by, given as
adjust_by[sequence_index][chain_break_index] = adjustment
. Note that no adjustment is given for the start of the first chain or the end of the last chain.- Raises
AssertionError – If some of the sequence elements to be removed aren’t actually gaps.
- schrodinger.protein.alignment.get_contiguous_groups(nums)¶
Group numbers in a given list by contiguity. Each group that is returned will be a list of numbers where every value is an int that only differs from its neighbors by one.
- e.g. [1, 2, 4] -> [[1, 2], [4]]
[1, 2, 4, 5, 10] -> [[1, 2], [4, 5], [10]]
- Parameters
nums (list(int)) – A list of numbers to group
- Returns
A list of groups of numbers, where the numbers in each group are contiguous
- Return type
list(list(int))