schrodinger.protein.annotation module

Annotations for biological sequences

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.protein.annotation.BINDING_SITE(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

CloseContact = 1
FarContact = 2
NoContact = 3
class schrodinger.protein.annotation.AntibodyCDRLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

NotCDR = 1
L1 = 2
L2 = 3
L3 = 4
H1 = 5
H2 = 6
H3 = 7
class schrodinger.protein.annotation.AntibodyCDR(label, start, end)

Bases: tuple

end

Alias for field number 2

label

Alias for field number 0

start

Alias for field number 1

class schrodinger.protein.annotation.Region(label, value, start, end)

Bases: tuple

end

Alias for field number 3

label

Alias for field number 0

start

Alias for field number 2

value

Alias for field number 1

class schrodinger.protein.annotation.AntibodyRegionLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

H = 1
HFR = 2
CH = 3
L = 4
LFR = 5
CL = 6
Hinge = 7
class schrodinger.protein.annotation.TCRRegionLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

A = 1
AFR = 2
B = 3
BFR = 4
class schrodinger.protein.annotation.GPCRSegmentLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

NTerm = 1
CTerm = 2
ICL = 3
ECL = 4
H8 = 5
TM = 6
Other = 7
class schrodinger.protein.annotation.Domains(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

Domain = 1
NoDomain = 2
class schrodinger.protein.annotation.KinaseConservation(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: schrodinger.models.jsonable.JsonableEnum

VeryLow = 'Very Low'
Low = 'Low'
Medium = 'Medium'
High = 'High'
VeryHigh = 'Very High'
class schrodinger.protein.annotation.KinaseFeatureLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: schrodinger.models.jsonable.JsonableEnum

GLYCINE_RICH_LOOP = 'Glycine Rich Loop'
ALPHA_C = 'Alpha-C'
GATE_KEEPER = 'Gate Keeper'
HINGE = 'Hinge'
LINKER = 'Linker'
HRD = 'HRD'
CATALYTIC_LOOP = 'Catalytic Loop'
DFG = 'DFG'
ACTIVATION_LOOP = 'Activation Loop'
NO_FEATURE = 'No Feature'
class schrodinger.protein.annotation.Consensus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

not_conserved = ' '
fully_conserved = '*'
strongly_conserved = ':'
weakly_conserved = '.'
property tooltip
class schrodinger.protein.annotation.TupleWithRange(iterable=(), /)

Bases: tuple

property range

The range of data contianed in this tuple. Will return a tuple of (minimum value or zero whichever is less, maximum value or zero whichever is greater). None values will be ignored. If there are no None values in this tuple, will return (0, 0). :rtype: tuple(int or float, int or float)

class schrodinger.protein.annotation.AbstractSequenceAnnotations(seq)

Bases: PyQt6.QtCore.QObject

A base class for single-chain and combined-chain sequence annotations

Variables

titleChanged (QtCore.pyqtSignal) – A signal emitted after an annotation’s title (row header) changes.

titleChanged

pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

__init__(seq)
Parameters

seq (sequence.Sequence) – The sequence to store annotations for.

sequence

A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.

Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.

class schrodinger.protein.annotation.AbstractProteinSequenceAnnotationsMixin(*args, **kwargs)

Bases: object

domainsChanged

pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

invalidatedDomains

pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

__init__(*args, **kwargs)
Parameters

seq (sequence.Sequence) – The sequence to store annotations for.

property max_b_factor
property min_b_factor
invalidateMaxMinBFactor()
getAntibodyCDR(col, scheme)

Returns the antibody CDR information of the col’th index in the sequence under a given antibody CDR numbering scheme.

Parameters
  • col (int) – index into the sequence

  • scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use

Returns

Antibody CDR label, start, and end positions

Return type

AntibodyCDR, which is a named tuple of (AntibodyCDRLabel, int, int) if col is in a CDR, otherwise (AntibodyCDRLabel.NotCDR, None, None)

getAntibodyCDRs(scheme)

Returns a list of antibody CDR information for the entire sequence.

Parameters

scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use

Returns

A list of Antibody CDR labels, starts, and end positions

Return type

list(AntibodyCDR)

getResID(res) str

Get the structure residue ID with the kabat numbering scheme.

Parameters

res – the residue

Returns

the residue ID

getGPCRSegment(col: int) Optional[schrodinger.protein.annotation.Region]

Return the GPCR segment information of the col’th index in the sequence.

Parameters

col – index into the sequence

Returns

GPCR Segment label, value, start, and end positions or None if col is not in a GPCR segment

getGPCRSegments() List[schrodinger.protein.annotation.Region]

Return a list of GPCR segment information for the entire sequence.

Returns

a list of GPCR Segments labels, values, start and end positions

getAntibodyRegion(col: int, scheme: schrodinger.infra.util.AntibodyCDRScheme) Optional[schrodinger.protein.annotation.Region]

Return the antibody region of the given residue based on the numbering scheme.

The regex will strip trailing numbers and get the label according to the AntibodyRegionLabel enum.

Example values: H1, CL, HINGE, LFR4, CH3

Parameters
  • col – index into the sequence

  • scheme – the antibody CDR numbering scheme to use

Returns

An AntibodyRegion with a label and value or None if there is no region

getAntibodyRegions(scheme: schrodinger.infra.util.AntibodyCDRScheme) List[schrodinger.protein.annotation.Region]

Return the list of all antibody regions based on the numbering scheme.

Parameters

scheme – the antibody CDR numbering scheme to use

Returns

getTCRRegion(col: int) Optional[schrodinger.protein.annotation.Region]

Return the TCR region information of the col’th index in the sequence.

Parameters

col – index into the sequence

Returns

TCR Region label, value, start, and end positions or None if col is not in a TCR Region

getTCRRegions() List[schrodinger.protein.annotation.Region]

Return a list of TCR region information for the entire sequence.

Returns

a list of TCR region labels, values, start and end positions

isAntibodyChain()
Returns

Whether the sequence described is an antibody chain

Return type

bool

isAntibodyHeavyChain()
Returns

Whether the sequence described is an antibody heavy chain

Return type

bool

isAntibodyLightChain()
Returns

Whether the sequence described is an antibody light chain

Return type

bool

property binding_sites
property ligands
property ligand_asls
setLigandDistance(distance)

Updates the ligand distance and invalidates the cache

property domains
getSSBondPartner(index)

Return the residue’s intra-sequence disulfide bond partner, if any.

If the residue is not involved in a disulfide bond, its partner has been deleted, or its partner is in another sequence, it will return None.

Parameters

index (int) – Index of the residue to check

Returns

the other Residue in the disulfide bond or None

Return type

residue.Residue or None

clearAllCaching()
getNumAnnValues(ann)
class schrodinger.protein.annotation.SequenceAnnotations(seq)

Bases: schrodinger.protein.annotation.AbstractSequenceAnnotations

Knows how to annotate a single-chain sequence

Annotations can be set at the level of the sequence as a whole, or be per sequence element annotations. If an attribute is accessed on the SequenceAnnotations object, the attribute is first looked for on the object and if not found is assumed to be a per sequence element annotation. If the elements in the sequence lack the attribute, an AttributeError will be raised.

class schrodinger.protein.annotation.ProteinSequenceAnnotations(seq)

Bases: schrodinger.protein.annotation.AbstractProteinSequenceAnnotationsMixin, schrodinger.protein.annotation.SequenceAnnotations

Knows how to annotate a ProteinSequence

annotationInvalidated

pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

invalidatedLigandContacts

pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

invalidatedMaxMinBFactor

pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL

types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.

class ANNOTATION_TYPES(*args, **kwargs)

Bases: schrodinger.models.json.JsonableClassMixin

alignment_set = 2
antibody_cdr = 21
antibody_regions = 35
b_factor = 15
beta_strand_propensity = 7
binding_sites = 19
custom_annotation = 34
disulfide_bonds = 5
domains = 20
exposure_tendency = 10
classmethod fromJsonImplementation(json_obj)

Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.

Parameters

json_dict (dict) – A dictionary loaded from a JSON string or file.

Returns

An instance of the derived class.

Return type

cls

gpcr_generic_number = 33
gpcr_segment = 32
helix_propensity = 6
helix_termination_tendency = 9
hydrophobicity = 13
isoelectric_point = 14
kinase_conservation = 31
kinase_features = 30
pairwise_constraints = 1
pfam = 23
pred_accessibility = 26
pred_disordered = 27
pred_disulfide_bonds = 24
pred_domain_arr = 28
pred_secondary_structure = 25
proximity_constraints = 29
rescode = 4
resnum = 3
sasa = 22
secondary_structure = 18
side_chain_chem = 12
steric_group = 11
tcr_regions = 36
toJsonImplementation()

Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.

Returns

A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders

turn_propensity = 8
window_hydrophobicity = 16
window_isoelectric_point = 17
RES_PROPENSITY_ANNOTATIONS = {<ANNOTATION_TYPES.helix_termination_tendency: 9>, <ANNOTATION_TYPES.exposure_tendency: 10>, <ANNOTATION_TYPES.side_chain_chem: 12>, <ANNOTATION_TYPES.helix_propensity: 6>, <ANNOTATION_TYPES.turn_propensity: 8>, <ANNOTATION_TYPES.beta_strand_propensity: 7>, <ANNOTATION_TYPES.steric_group: 11>}
PRED_ANNOTATION_TYPES = {<ANNOTATION_TYPES.pred_domain_arr: 28>, <ANNOTATION_TYPES.pred_secondary_structure: 25>, <ANNOTATION_TYPES.pred_disulfide_bonds: 24>, <ANNOTATION_TYPES.pred_disordered: 27>, <ANNOTATION_TYPES.pred_accessibility: 26>}
__init__(seq)
Parameters

seq (sequence.Sequence) – The sequence to store annotations for.

invalidateMaxMinBFactor()
property window_hydrophobicity
property hydrophobicity_window_padding
property binding_site_residues

Binding site residues of the sequence as a map, with key being the ligand name(str) and value is the set of residues(protein.Residue).

property isoelectric_point_window_padding
invalidateWindowHydrophobicity()

Invalidate the cached window hydrophobicity data. Note that this method is also called from the sequence when the window size changes.

property window_isoelectric_point
invalidateWindowIsoelectricPoint()

Invalidate the cached window isoelectric point data. Note that this method is also called from the sequence when the window size changes.

property sasa
getAntibodyCDR(col, scheme)

Returns the antibody CDR information of the col’th index in the sequence under a given antibody CDR numbering scheme.

Parameters
  • col (int) – index into the sequence

  • scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use

Returns

Antibody CDR label, start, and end positions

Return type

AntibodyCDR, which is a named tuple of (AntibodyCDRLabel, int, int) if col is in a CDR, otherwise (AntibodyCDRLabel.NotCDR, None, None)

getAntibodyCDRs(scheme)

Returns a list of antibody CDR information for the entire sequence.

Parameters

scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use

Returns

A list of Antibody CDR labels, starts, and end positions

Return type

list(AntibodyCDR)

getGPCRSegment(col: int) Optional[schrodinger.protein.annotation.Region]

Return the GPCR segment information of the col’th index in the sequence.

Parameters

col – index into the sequence

Returns

GPCR Segment label, value, start, and end positions or None if col is not in a GPCR segment

getGPCRSegments() List[schrodinger.protein.annotation.Region]

Return a list of GPCR segment information for the entire sequence.

Returns

a list of GPCR Segments labels, values, start and end positions

getTCRRegion(col: int) Optional[schrodinger.protein.annotation.Region]

Return the TCR region information of the col’th index in the sequence.

Parameters

col – index into the sequence

Returns

TCR Region label, value, start, and end positions or None if col is not in a TCR Region

getTCRRegions() List[schrodinger.protein.annotation.Region]

Return a list of TCR region information for the entire sequence.

Returns

a list of TCR region labels, values, start and end positions

isAntibodyChain()
Returns

Whether the sequence described is an antibody chain

Return type

bool

getAntibodyRegion(col: int, scheme: schrodinger.infra.util.AntibodyCDRScheme)

Return the antibody region of the given residue based on the numbering scheme.

The regex will strip trailing numbers and get the label according to the AntibodyRegionLabel enum.

Example values: H1, CL, HINGE, LFR4, CH3

Parameters
  • col – index into the sequence

  • scheme – the antibody CDR numbering scheme to use

Returns

An AntibodyRegion with a label and value or None if there is no region

getAntibodyRegions(scheme: schrodinger.infra.util.AntibodyCDRScheme)

Return the list of all antibody regions based on the numbering scheme.

Parameters

scheme – the antibody CDR numbering scheme to use

Returns

isAntibodyHeavyChain()
Returns

Whether the sequence described is an antibody heavy chain

Return type

bool

isAntibodyLightChain()
Returns

Whether the sequence described is an antibody light chain

Return type

bool

getSparseRescodes(modulo)
onStructureChanged()
setLigandDistance(distance)

Updates the ligand distance and invalidates the cache

parseDomains(filename)

Parse XML file from UniProt database to get domain information.

Parameters

filename (str) – the XML file to parse for domain information

Returns

a list of the domains (names) for the sequence in order

Return type

list(str)

resetAnnotation(ann)

Force a reset of an annotation’s cache.

clearAllCaching()
property inscode
property resnum
getCDRResidueList(scheme)
Returns

List of CDR Residues.

Return type

List[str]

class schrodinger.protein.annotation.NucleicAcidSequenceAnnotations(seq)

Bases: schrodinger.protein.annotation.ProteinSequenceAnnotations

isAntibodyChain()
Returns

Whether the sequence described is an antibody chain

Return type

bool

class schrodinger.protein.annotation.ProteinAlignmentAnnotations(aln)

Bases: object

Knows how to annotate an alignment (a collection of aligned sequences)

class ANNOTATION_TYPES(*args, **kwargs)

Bases: schrodinger.models.json.JsonableClassMixin

consensus_freq = 6
consensus_seq = 5
consensus_symbols = 4
classmethod fromJsonImplementation(json_obj)

Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.

Parameters

json_dict (dict) – A dictionary loaded from a JSON string or file.

Returns

An instance of the derived class.

Return type

cls

indices = 1
mean_hydrophobicity = 2
mean_isoelectric_point = 3
toJsonImplementation()

Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.

Returns

A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders

__init__(aln)
Parameters

alnalignment.Alignment

alignment

A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.

Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.

property indices

A numbering of all the column indices in an alignment

property mean_hydrophobicity

returns: A list of floats representing per-column averages of the hydrophobicity of residues in the alignment

property mean_isoelectric_point

returns: A list of floats representing per-column averages of the isoelectric point of residues in the alignment

property consensus_seq

Consensus sequence in the alignment. If there is more than one highest freq. residue in the column, save all of them.

Returns

consensus sequence

Return type

list(list(Residue))

property consensus_freq

Returns the frequency of the consensus residue in each alignment column as a list. Gaps are not used for calculation.

Returns

consensus residue frequencies

Return type

TupleWithRange(float)

property consensus_symbols

Consensus symbols in the alignment based on pre-defined residue sets, same as in ClustalW

Returns

consensus symbols for each alignment position

Type

A list of ConsensusSymbol enums.

Calculates normalized frequencies of individual amino acids per alignment position, and overall estimate of column composition conservation (‘bits’). Bit values are weighted by the number of gaps in the column.

Schneider TD, Stephens RM (1990). “Sequence Logos: A New Way to Display Consensus Sequences”. Nucleic Acids Res 18 (20): 6097–6100. doi:10.1093/nar/18.20.6097

Returns

the list of bits and frequencies (in decreasing order) of the residues in each column of the alignment.

Return type

list(tuple(float, tuple(tuple(str, float))))

clearAllCaching()
class schrodinger.protein.annotation.AbstractRegionFinder(seq)

Bases: object

Abstract class to help with finding annotated regions from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.

VALUE_MAP = None
__init__(seq)
Parameters

seq (schrodinger.protein.sequence.ProteinSequence) – The sequence to find the regions on

seq

A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.

Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.

forceIndexReassignment()

Force a recalculation of the region start and end indices. This is required when gaps are inserted/removed.

This will always do a full recalcuation, but is here to match _AntibodyCDRFinder’s API.

class schrodinger.protein.annotation.AntibodyRegionsFinder(*args, **kwargs)

Bases: schrodinger.protein.annotation.AbstractRegionFinder

Class to help with finding Antibody Regions from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.

VALUE_MAP = {'H1': AntibodyRegionLabel.H, 'H2': AntibodyRegionLabel.H, 'H3': AntibodyRegionLabel.H, 'HFR1': AntibodyRegionLabel.HFR, 'HFR2': AntibodyRegionLabel.HFR, 'HFR3': AntibodyRegionLabel.HFR, 'HFR4': AntibodyRegionLabel.HFR, 'L1': AntibodyRegionLabel.L, 'L2': AntibodyRegionLabel.L, 'L3': AntibodyRegionLabel.L, 'LFR1': AntibodyRegionLabel.LFR, 'LFR2': AntibodyRegionLabel.LFR, 'LFR3': AntibodyRegionLabel.LFR, 'LFR4': AntibodyRegionLabel.LFR}
__init__(*args, **kwargs)
Parameters

seq (schrodinger.protein.sequence.ProteinSequence) – The sequence to find the regions on

getAntibodyRegions(scheme)
class schrodinger.protein.annotation.TCRRegionFinder(seq)

Bases: schrodinger.protein.annotation.AbstractRegionFinder

Class to help with finding TCR Regions from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.

VALUE_MAP = {'A1': TCRRegionLabel.A, 'A2': TCRRegionLabel.A, 'A3': TCRRegionLabel.A, 'AFR1': TCRRegionLabel.AFR, 'AFR2': TCRRegionLabel.AFR, 'AFR3': TCRRegionLabel.AFR, 'AFR4': TCRRegionLabel.AFR, 'B1': TCRRegionLabel.B, 'B2': TCRRegionLabel.B, 'B3': TCRRegionLabel.B, 'BFR1': TCRRegionLabel.BFR, 'BFR2': TCRRegionLabel.BFR, 'BFR3': TCRRegionLabel.BFR, 'BFR4': TCRRegionLabel.BFR}
getTCRRegions()
class schrodinger.protein.annotation.GPCRSegmentFinder(seq)

Bases: schrodinger.protein.annotation.AbstractRegionFinder

Class to help with finding GPCR Segments from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.

VALUE_MAP = {'C-term': GPCRSegmentLabel.CTerm, 'ECL1': GPCRSegmentLabel.ECL, 'ECL2': GPCRSegmentLabel.ECL, 'ECL3': GPCRSegmentLabel.ECL, 'H8': GPCRSegmentLabel.H8, 'ICL1': GPCRSegmentLabel.ICL, 'ICL2': GPCRSegmentLabel.ICL, 'ICL3': GPCRSegmentLabel.ICL, 'N-term': GPCRSegmentLabel.NTerm, 'TM1': GPCRSegmentLabel.TM, 'TM2': GPCRSegmentLabel.TM, 'TM3': GPCRSegmentLabel.TM, 'TM4': GPCRSegmentLabel.TM, 'TM5': GPCRSegmentLabel.TM, 'TM6': GPCRSegmentLabel.TM, 'TM7': GPCRSegmentLabel.TM, 'TM8': GPCRSegmentLabel.TM}
getGPCRs()
class schrodinger.protein.annotation.SeqTypeMixin(seq, *args, **kwargs)

Bases: object

Mixin to customize antibody.SeqType for MSV2. See _delayed_antibody_import for class declaration.

__init__(seq, *args, **kwargs)
isHeavyChain()
isLightChain()
class schrodinger.protein.annotation.CombinedChainSequenceAnnotationMeta(cls, bases, classdict, *, wraps=None, cached_annotations=(), wrapped_properties=())

Bases: schrodinger.application.msv.utils.QtDocstringWrapperMetaClass

The metaclass for CombinedChainSequenceAnnotations. This metaclass automatically wraps getters for all sequence annotations.

class schrodinger.protein.annotation.CombinedChainProteinSequenceAnnotations(seq)

Bases: schrodinger.protein.annotation.AbstractProteinSequenceAnnotationsMixin, schrodinger.protein.annotation.AbstractSequenceAnnotations

Sequence annotations for a sequence.CombinedChainProteinSequence. Annotations will be fetched from the ProteinSequenceAnnotations objects for each split-chain sequence.

sequence

A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.

Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.

__init__(seq)
Parameters

seq (sequence.CombinedChainProteinSequence) – The sequence to store annotations for.

chainAdded(chain)

Respond to a new chain being added to the sequence. The sequence is responsible for calling this method whenever a chain is added.

Parameters

chain (sequence.ProteinSequence) – The newly added chain.

chainRemoved(chain)

Respond to a chain being removed from the sequence. The sequence is responsible for calling this method whenever a chain is removed.

Parameters

chain (sequence.ProteinSequence) – The removed chain.

class ANNOTATION_TYPES(*args, **kwargs)

Bases: schrodinger.models.json.JsonableClassMixin

alignment_set = 2
antibody_cdr = 21
antibody_regions = 35
b_factor = 15
beta_strand_propensity = 7
binding_sites = 19
custom_annotation = 34
disulfide_bonds = 5
domains = 20
exposure_tendency = 10
classmethod fromJsonImplementation(json_obj)

Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.

Parameters

json_dict (dict) – A dictionary loaded from a JSON string or file.

Returns

An instance of the derived class.

Return type

cls

gpcr_generic_number = 33
gpcr_segment = 32
helix_propensity = 6
helix_termination_tendency = 9
hydrophobicity = 13
isoelectric_point = 14
kinase_conservation = 31
kinase_features = 30
pairwise_constraints = 1
pfam = 23
pred_accessibility = 26
pred_disordered = 27
pred_disulfide_bonds = 24
pred_domain_arr = 28
pred_secondary_structure = 25
proximity_constraints = 29
rescode = 4
resnum = 3
sasa = 22
secondary_structure = 18
side_chain_chem = 12
steric_group = 11
tcr_regions = 36
toJsonImplementation()

Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.

Returns

A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders

turn_propensity = 8
window_hydrophobicity = 16
window_isoelectric_point = 17
PRED_ANNOTATION_TYPES = {<ANNOTATION_TYPES.pred_domain_arr: 28>, <ANNOTATION_TYPES.pred_secondary_structure: 25>, <ANNOTATION_TYPES.pred_disulfide_bonds: 24>, <ANNOTATION_TYPES.pred_disordered: 27>, <ANNOTATION_TYPES.pred_accessibility: 26>}
RES_PROPENSITY_ANNOTATIONS = {<ANNOTATION_TYPES.helix_termination_tendency: 9>, <ANNOTATION_TYPES.exposure_tendency: 10>, <ANNOTATION_TYPES.side_chain_chem: 12>, <ANNOTATION_TYPES.helix_propensity: 6>, <ANNOTATION_TYPES.turn_propensity: 8>, <ANNOTATION_TYPES.beta_strand_propensity: 7>, <ANNOTATION_TYPES.steric_group: 11>}
property alignment_set
property antibody_cdr
property antibody_regions
property b_factor
property beta_strand_propensity
property custom_annotation
property disulfide_bonds
property domains
property exposure_tendency
property gpcr_generic_number
property gpcr_segment
property helix_propensity
property helix_termination_tendency
property hydrophobicity
property hydrophobicity_window_padding
property isoelectric_point
property isoelectric_point_window_padding
property kinase_conservation
property kinase_features
property pairwise_constraints
property pfam
property pred_accessibility
property pred_disordered
property pred_disulfide_bonds
property pred_domain_arr
property pred_secondary_structure
property proximity_constraints
property rescode
property resnum
property sasa
property secondary_structure
property side_chain_chem
property steric_group
property tcr_regions
property turn_propensity
property window_hydrophobicity
property window_isoelectric_point
getAntibodyCDR(col, scheme)

Returns the antibody CDR information of the col’th index in the sequence under a given antibody CDR numbering scheme.

Parameters
  • col (int) – index into the sequence

  • scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use

Returns

Antibody CDR label, start, and end positions

Return type

AntibodyCDR, which is a named tuple of (AntibodyCDRLabel, int, int) if col is in a CDR, otherwise (AntibodyCDRLabel.NotCDR, None, None)

getAntibodyCDRs(scheme)

Returns a list of antibody CDR information for the entire sequence.

Parameters

scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use

Returns

A list of Antibody CDR labels, starts, and end positions

Return type

list(AntibodyCDR)

getGPCRSegment(col: int) Optional[schrodinger.protein.annotation.Region]

Return the GPCR segment information of the col’th index in the sequence.

Parameters

col – index into the sequence

Returns

GPCR Segment label, value, start, and end positions or None if col is not in a GPCR segment

getGPCRSegments() List[schrodinger.protein.annotation.Region]

Return a list of GPCR segment information for the entire sequence.

Returns

a list of GPCR Segments labels, values, start and end positions

getTCRRegion(col: int) Optional[schrodinger.protein.annotation.Region]

Return the TCR region information of the col’th index in the sequence.

Parameters

col – index into the sequence

Returns

TCR Region label, value, start, and end positions or None if col is not in a TCR Region

getTCRRegions() List[schrodinger.protein.annotation.Region]

Return a list of TCR region information for the entire sequence.

Returns

a list of TCR region labels, values, start and end positions

getAntibodyRegion(col: int, scheme) Optional[schrodinger.protein.annotation.Region]

Return the antibody region of the given residue based on the numbering scheme.

The regex will strip trailing numbers and get the label according to the AntibodyRegionLabel enum.

Example values: H1, CL, HINGE, LFR4, CH3

Parameters
  • col – index into the sequence

  • scheme – the antibody CDR numbering scheme to use

Returns

An AntibodyRegion with a label and value or None if there is no region

getAntibodyRegions(scheme) List[schrodinger.protein.annotation.Region]

Return the list of all antibody regions based on the numbering scheme.

Parameters

scheme – the antibody CDR numbering scheme to use

Returns

isAntibodyChain()
Returns

Whether the sequence described is an antibody chain

Return type

bool

setLigandDistance(distance)

Updates the ligand distance and invalidates the cache

clearAllCaching()
schrodinger.protein.annotation.make_ligand_name_atom(ct, atom_index)

Make a unique, human-readable name for a ligand identified by atom index.

Parameters
Returns

The name for the ligand

Return type

str

schrodinger.protein.annotation.make_ligand_name(ct, ligand)

Make a unique, human-readable name for a ligand. This name matches the ligand name in the structure hierarchy.

Parameters
Returns

The name for the ligand

Return type

str

schrodinger.protein.annotation.parse_antibody_rescode(newcode)

Extract the resnum and inscode from residue number as per the scheme. If the inscode is a number it will be converted to alphabet. eg: ‘H101.1’ -> ‘101A’. Residues that are outside of the numbering scheme catalog (FV) or can not be assigned properly, will have residue number as ‘-1’. eg: ‘H-1’

Parameters

newcode (str) – Residue code by the Antibody CDR numbering scheme.

Returns

new residue number and insertion code.

Return type

tuple

Raises

KeyError – if newcode doesn’t follow the expected pattern.