schrodinger.protein.annotation module¶
Annotations for biological sequences
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.protein.annotation.BINDING_SITE(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- CloseContact = 1¶
- FarContact = 2¶
- NoContact = 3¶
- class schrodinger.protein.annotation.AntibodyCDRLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- NotCDR = 1¶
- L1 = 2¶
- L2 = 3¶
- L3 = 4¶
- H1 = 5¶
- H2 = 6¶
- H3 = 7¶
- class schrodinger.protein.annotation.AntibodyCDR(label, start, end)¶
Bases:
tuple
- end¶
Alias for field number 2
- label¶
Alias for field number 0
- start¶
Alias for field number 1
- class schrodinger.protein.annotation.Region(label, value, start, end)¶
Bases:
tuple
- end¶
Alias for field number 3
- label¶
Alias for field number 0
- start¶
Alias for field number 2
- value¶
Alias for field number 1
- class schrodinger.protein.annotation.AntibodyRegionLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- H = 1¶
- HFR = 2¶
- CH = 3¶
- L = 4¶
- LFR = 5¶
- CL = 6¶
- Hinge = 7¶
- class schrodinger.protein.annotation.TCRRegionLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- A = 1¶
- AFR = 2¶
- B = 3¶
- BFR = 4¶
- class schrodinger.protein.annotation.GPCRSegmentLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- NTerm = 1¶
- CTerm = 2¶
- ICL = 3¶
- ECL = 4¶
- H8 = 5¶
- TM = 6¶
- Other = 7¶
- class schrodinger.protein.annotation.Domains(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- Domain = 1¶
- NoDomain = 2¶
- class schrodinger.protein.annotation.KinaseConservation(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
schrodinger.models.jsonable.JsonableEnum
- VeryLow = 'Very Low'¶
- Low = 'Low'¶
- Medium = 'Medium'¶
- High = 'High'¶
- VeryHigh = 'Very High'¶
- class schrodinger.protein.annotation.KinaseFeatureLabel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
schrodinger.models.jsonable.JsonableEnum
- GLYCINE_RICH_LOOP = 'Glycine Rich Loop'¶
- ALPHA_C = 'Alpha-C'¶
- GATE_KEEPER = 'Gate Keeper'¶
- HINGE = 'Hinge'¶
- LINKER = 'Linker'¶
- HRD = 'HRD'¶
- CATALYTIC_LOOP = 'Catalytic Loop'¶
- DFG = 'DFG'¶
- ACTIVATION_LOOP = 'Activation Loop'¶
- NO_FEATURE = 'No Feature'¶
- class schrodinger.protein.annotation.Consensus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
enum.Enum
- not_conserved = ' '¶
- fully_conserved = '*'¶
- strongly_conserved = ':'¶
- weakly_conserved = '.'¶
- property tooltip¶
- class schrodinger.protein.annotation.TupleWithRange(iterable=(), /)¶
Bases:
tuple
- property range¶
The range of data contianed in this tuple. Will return a tuple of (minimum value or zero whichever is less, maximum value or zero whichever is greater).
None
values will be ignored. If there are noNone
values in this tuple, will return (0, 0). :rtype: tuple(int or float, int or float)
- class schrodinger.protein.annotation.AbstractSequenceAnnotations(seq)¶
Bases:
PyQt6.QtCore.QObject
A base class for single-chain and combined-chain sequence annotations
- Variables
titleChanged (
QtCore.pyqtSignal
) – A signal emitted after an annotation’s title (row header) changes.
- titleChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- __init__(seq)¶
- Parameters
seq (sequence.Sequence) – The sequence to store annotations for.
- sequence¶
A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.
Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.
- class schrodinger.protein.annotation.AbstractProteinSequenceAnnotationsMixin(*args, **kwargs)¶
Bases:
object
- domainsChanged¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- invalidatedDomains¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- __init__(*args, **kwargs)¶
- Parameters
seq (sequence.Sequence) – The sequence to store annotations for.
- property max_b_factor¶
- property min_b_factor¶
- invalidateMaxMinBFactor()¶
- getAntibodyCDR(col, scheme)¶
Returns the antibody CDR information of the col’th index in the sequence under a given antibody CDR numbering scheme.
- Parameters
col (int) – index into the sequence
scheme (
AntibodyCDRScheme
) – The antibody CDR numbering scheme to use
- Returns
Antibody CDR label, start, and end positions
- Return type
AntibodyCDR
, which is a named tuple of (AntibodyCDRLabel
, int, int) if col is in a CDR, otherwise (AntibodyCDRLabel.NotCDR, None, None)
- getAntibodyCDRs(scheme)¶
Returns a list of antibody CDR information for the entire sequence.
- Parameters
scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use
- Returns
A list of Antibody CDR labels, starts, and end positions
- Return type
list(AntibodyCDR)
- getResID(res) str ¶
Get the structure residue ID with the kabat numbering scheme.
- Parameters
res – the residue
- Returns
the residue ID
- getGPCRSegment(col: int) Optional[schrodinger.protein.annotation.Region] ¶
Return the GPCR segment information of the col’th index in the sequence.
- Parameters
col – index into the sequence
- Returns
GPCR Segment label, value, start, and end positions or None if col is not in a GPCR segment
- getGPCRSegments() List[schrodinger.protein.annotation.Region] ¶
Return a list of GPCR segment information for the entire sequence.
- Returns
a list of GPCR Segments labels, values, start and end positions
- getAntibodyRegion(col: int, scheme: schrodinger.infra.util.AntibodyCDRScheme) Optional[schrodinger.protein.annotation.Region] ¶
Return the antibody region of the given residue based on the numbering scheme.
The regex will strip trailing numbers and get the label according to the AntibodyRegionLabel enum.
Example values: H1, CL, HINGE, LFR4, CH3
- Parameters
col – index into the sequence
scheme – the antibody CDR numbering scheme to use
- Returns
An AntibodyRegion with a label and value or None if there is no region
- getAntibodyRegions(scheme: schrodinger.infra.util.AntibodyCDRScheme) List[schrodinger.protein.annotation.Region] ¶
Return the list of all antibody regions based on the numbering scheme.
- Parameters
scheme – the antibody CDR numbering scheme to use
- Returns
- getTCRRegion(col: int) Optional[schrodinger.protein.annotation.Region] ¶
Return the TCR region information of the col’th index in the sequence.
- Parameters
col – index into the sequence
- Returns
TCR Region label, value, start, and end positions or None if col is not in a TCR Region
- getTCRRegions() List[schrodinger.protein.annotation.Region] ¶
Return a list of TCR region information for the entire sequence.
- Returns
a list of TCR region labels, values, start and end positions
- isAntibodyChain()¶
- Returns
Whether the sequence described is an antibody chain
- Return type
bool
- isAntibodyHeavyChain()¶
- Returns
Whether the sequence described is an antibody heavy chain
- Return type
bool
- isAntibodyLightChain()¶
- Returns
Whether the sequence described is an antibody light chain
- Return type
bool
- property binding_sites¶
- property ligands¶
- property ligand_asls¶
- setLigandDistance(distance)¶
Updates the ligand distance and invalidates the cache
- property domains¶
- getSSBondPartner(index)¶
Return the residue’s intra-sequence disulfide bond partner, if any.
If the residue is not involved in a disulfide bond, its partner has been deleted, or its partner is in another sequence, it will return None.
- Parameters
index (int) – Index of the residue to check
- Returns
the other Residue in the disulfide bond or None
- Return type
residue.Residue or None
- clearAllCaching()¶
- getNumAnnValues(ann)¶
- class schrodinger.protein.annotation.SequenceAnnotations(seq)¶
Bases:
schrodinger.protein.annotation.AbstractSequenceAnnotations
Knows how to annotate a single-chain sequence
Annotations can be set at the level of the sequence as a whole, or be per sequence element annotations. If an attribute is accessed on the SequenceAnnotations object, the attribute is first looked for on the object and if not found is assumed to be a per sequence element annotation. If the elements in the sequence lack the attribute, an AttributeError will be raised.
- class schrodinger.protein.annotation.ProteinSequenceAnnotations(seq)¶
Bases:
schrodinger.protein.annotation.AbstractProteinSequenceAnnotationsMixin
,schrodinger.protein.annotation.SequenceAnnotations
Knows how to annotate a ProteinSequence
- annotationInvalidated¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- invalidatedLigandContacts¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- invalidatedMaxMinBFactor¶
pyqtSignal(*types, name: str = …, revision: int = …, arguments: Sequence = …) -> PYQT_SIGNAL
types is normally a sequence of individual types. Each type is either a type object or a string that is the name of a C++ type. Alternatively each type could itself be a sequence of types each describing a different overloaded signal. name is the optional C++ name of the signal. If it is not specified then the name of the class attribute that is bound to the signal is used. revision is the optional revision of the signal that is exported to QML. If it is not specified then 0 is used. arguments is the optional sequence of the names of the signal’s arguments.
- class ANNOTATION_TYPES(*args, **kwargs)¶
Bases:
schrodinger.models.json.JsonableClassMixin
- alignment_set = 2¶
- antibody_cdr = 21¶
- antibody_regions = 35¶
- b_factor = 15¶
- beta_strand_propensity = 7¶
- binding_sites = 19¶
- custom_annotation = 34¶
- disulfide_bonds = 5¶
- domains = 20¶
- exposure_tendency = 10¶
- classmethod fromJsonImplementation(json_obj)¶
Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.
- Parameters
json_dict (dict) – A dictionary loaded from a JSON string or file.
- Returns
An instance of the derived class.
- Return type
cls
- gpcr_generic_number = 33¶
- gpcr_segment = 32¶
- helix_propensity = 6¶
- helix_termination_tendency = 9¶
- hydrophobicity = 13¶
- isoelectric_point = 14¶
- kinase_conservation = 31¶
- kinase_features = 30¶
- pairwise_constraints = 1¶
- pfam = 23¶
- pred_accessibility = 26¶
- pred_disordered = 27¶
- pred_disulfide_bonds = 24¶
- pred_domain_arr = 28¶
- pred_secondary_structure = 25¶
- proximity_constraints = 29¶
- rescode = 4¶
- resnum = 3¶
- sasa = 22¶
- secondary_structure = 18¶
- side_chain_chem = 12¶
- steric_group = 11¶
- tcr_regions = 36¶
- toJsonImplementation()¶
Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.
- Returns
A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders
- turn_propensity = 8¶
- window_hydrophobicity = 16¶
- window_isoelectric_point = 17¶
- RES_PROPENSITY_ANNOTATIONS = {<ANNOTATION_TYPES.steric_group: 11>, <ANNOTATION_TYPES.side_chain_chem: 12>, <ANNOTATION_TYPES.turn_propensity: 8>, <ANNOTATION_TYPES.helix_propensity: 6>, <ANNOTATION_TYPES.helix_termination_tendency: 9>, <ANNOTATION_TYPES.exposure_tendency: 10>, <ANNOTATION_TYPES.beta_strand_propensity: 7>}¶
- PRED_ANNOTATION_TYPES = {<ANNOTATION_TYPES.pred_disulfide_bonds: 24>, <ANNOTATION_TYPES.pred_accessibility: 26>, <ANNOTATION_TYPES.pred_secondary_structure: 25>, <ANNOTATION_TYPES.pred_disordered: 27>, <ANNOTATION_TYPES.pred_domain_arr: 28>}¶
- __init__(seq)¶
- Parameters
seq (sequence.Sequence) – The sequence to store annotations for.
- invalidateMaxMinBFactor()¶
- property window_hydrophobicity¶
- property hydrophobicity_window_padding¶
- property binding_site_residues¶
Binding site residues of the sequence as a map, with key being the ligand name(str) and value is the set of residues(protein.Residue).
- property isoelectric_point_window_padding¶
- invalidateWindowHydrophobicity()¶
Invalidate the cached window hydrophobicity data. Note that this method is also called from the sequence when the window size changes.
- property window_isoelectric_point¶
- invalidateWindowIsoelectricPoint()¶
Invalidate the cached window isoelectric point data. Note that this method is also called from the sequence when the window size changes.
- property sasa¶
- getAntibodyCDR(col, scheme)¶
Returns the antibody CDR information of the col’th index in the sequence under a given antibody CDR numbering scheme.
- Parameters
col (int) – index into the sequence
scheme (
AntibodyCDRScheme
) – The antibody CDR numbering scheme to use
- Returns
Antibody CDR label, start, and end positions
- Return type
AntibodyCDR
, which is a named tuple of (AntibodyCDRLabel
, int, int) if col is in a CDR, otherwise (AntibodyCDRLabel.NotCDR, None, None)
- getAntibodyCDRs(scheme)¶
Returns a list of antibody CDR information for the entire sequence.
- Parameters
scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use
- Returns
A list of Antibody CDR labels, starts, and end positions
- Return type
list(AntibodyCDR)
- getGPCRSegment(col: int) Optional[schrodinger.protein.annotation.Region] ¶
Return the GPCR segment information of the col’th index in the sequence.
- Parameters
col – index into the sequence
- Returns
GPCR Segment label, value, start, and end positions or None if col is not in a GPCR segment
- getGPCRSegments() List[schrodinger.protein.annotation.Region] ¶
Return a list of GPCR segment information for the entire sequence.
- Returns
a list of GPCR Segments labels, values, start and end positions
- getTCRRegion(col: int) Optional[schrodinger.protein.annotation.Region] ¶
Return the TCR region information of the col’th index in the sequence.
- Parameters
col – index into the sequence
- Returns
TCR Region label, value, start, and end positions or None if col is not in a TCR Region
- getTCRRegions() List[schrodinger.protein.annotation.Region] ¶
Return a list of TCR region information for the entire sequence.
- Returns
a list of TCR region labels, values, start and end positions
- isAntibodyChain()¶
- Returns
Whether the sequence described is an antibody chain
- Return type
bool
- getAntibodyRegion(col: int, scheme: schrodinger.infra.util.AntibodyCDRScheme)¶
Return the antibody region of the given residue based on the numbering scheme.
The regex will strip trailing numbers and get the label according to the AntibodyRegionLabel enum.
Example values: H1, CL, HINGE, LFR4, CH3
- Parameters
col – index into the sequence
scheme – the antibody CDR numbering scheme to use
- Returns
An AntibodyRegion with a label and value or None if there is no region
- getAntibodyRegions(scheme: schrodinger.infra.util.AntibodyCDRScheme)¶
Return the list of all antibody regions based on the numbering scheme.
- Parameters
scheme – the antibody CDR numbering scheme to use
- Returns
- isAntibodyHeavyChain()¶
- Returns
Whether the sequence described is an antibody heavy chain
- Return type
bool
- isAntibodyLightChain()¶
- Returns
Whether the sequence described is an antibody light chain
- Return type
bool
- getSparseRescodes(modulo)¶
- onStructureChanged()¶
- setLigandDistance(distance)¶
Updates the ligand distance and invalidates the cache
- parseDomains(filename)¶
Parse XML file from UniProt database to get domain information.
- Parameters
filename (str) – the XML file to parse for domain information
- Returns
a list of the domains (names) for the sequence in order
- Return type
list(str)
- resetAnnotation(ann)¶
Force a reset of an annotation’s cache.
- clearAllCaching()¶
- property inscode¶
- property resnum¶
- class schrodinger.protein.annotation.NucleicAcidSequenceAnnotations(seq)¶
Bases:
schrodinger.protein.annotation.ProteinSequenceAnnotations
- isAntibodyChain()¶
- Returns
Whether the sequence described is an antibody chain
- Return type
bool
- class schrodinger.protein.annotation.ProteinAlignmentAnnotations(aln)¶
Bases:
object
Knows how to annotate an alignment (a collection of aligned sequences)
- class ANNOTATION_TYPES(*args, **kwargs)¶
Bases:
schrodinger.models.json.JsonableClassMixin
- consensus_freq = 6¶
- consensus_seq = 5¶
- consensus_symbols = 4¶
- classmethod fromJsonImplementation(json_obj)¶
Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.
- Parameters
json_dict (dict) – A dictionary loaded from a JSON string or file.
- Returns
An instance of the derived class.
- Return type
cls
- indices = 1¶
- mean_hydrophobicity = 2¶
- mean_isoelectric_point = 3¶
- sequence_logo = 7¶
- toJsonImplementation()¶
Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.
- Returns
A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders
- __init__(aln)¶
- Parameters
aln –
alignment.Alignment
- alignment¶
A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.
Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.
- property indices¶
A numbering of all the column indices in an alignment
- property mean_hydrophobicity¶
returns: A list of floats representing per-column averages of the hydrophobicity of residues in the alignment
- property mean_isoelectric_point¶
returns: A list of floats representing per-column averages of the isoelectric point of residues in the alignment
- property consensus_seq¶
Consensus sequence in the alignment. If there is more than one highest freq. residue in the column, save all of them.
- Returns
consensus sequence
- Return type
list(list(Residue))
- property consensus_freq¶
Returns the frequency of the consensus residue in each alignment column as a list. Gaps are not used for calculation.
- Returns
consensus residue frequencies
- Return type
TupleWithRange(float)
- property consensus_symbols¶
Consensus symbols in the alignment based on pre-defined residue sets, same as in ClustalW
- Returns
consensus symbols for each alignment position
- Type
A list of ConsensusSymbol enums.
- property sequence_logo¶
Calculates normalized frequencies of individual amino acids per alignment position, and overall estimate of column composition conservation (‘bits’). Bit values are weighted by the number of gaps in the column.
Schneider TD, Stephens RM (1990). “Sequence Logos: A New Way to Display Consensus Sequences”. Nucleic Acids Res 18 (20): 6097–6100. doi:10.1093/nar/18.20.6097
- Returns
the list of bits and frequencies (in decreasing order) of the residues in each column of the alignment.
- Return type
list(tuple(float, tuple(tuple(str, float))))
- clearAllCaching()¶
- class schrodinger.protein.annotation.AbstractRegionFinder(seq)¶
Bases:
object
Abstract class to help with finding annotated regions from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.
- VALUE_MAP = None¶
- __init__(seq)¶
- Parameters
seq (schrodinger.protein.sequence.ProteinSequence) – The sequence to find the regions on
- seq¶
A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.
Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.
- forceIndexReassignment()¶
Force a recalculation of the region start and end indices. This is required when gaps are inserted/removed.
This will always do a full recalcuation, but is here to match _AntibodyCDRFinder’s API.
- class schrodinger.protein.annotation.AntibodyRegionsFinder(*args, **kwargs)¶
Bases:
schrodinger.protein.annotation.AbstractRegionFinder
Class to help with finding Antibody Regions from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.
- VALUE_MAP = {'H1': AntibodyRegionLabel.H, 'H2': AntibodyRegionLabel.H, 'H3': AntibodyRegionLabel.H, 'HFR1': AntibodyRegionLabel.HFR, 'HFR2': AntibodyRegionLabel.HFR, 'HFR3': AntibodyRegionLabel.HFR, 'HFR4': AntibodyRegionLabel.HFR, 'L1': AntibodyRegionLabel.L, 'L2': AntibodyRegionLabel.L, 'L3': AntibodyRegionLabel.L, 'LFR1': AntibodyRegionLabel.LFR, 'LFR2': AntibodyRegionLabel.LFR, 'LFR3': AntibodyRegionLabel.LFR, 'LFR4': AntibodyRegionLabel.LFR}¶
- __init__(*args, **kwargs)¶
- Parameters
seq (schrodinger.protein.sequence.ProteinSequence) – The sequence to find the regions on
- getAntibodyRegions(scheme)¶
- class schrodinger.protein.annotation.TCRRegionFinder(seq)¶
Bases:
schrodinger.protein.annotation.AbstractRegionFinder
Class to help with finding TCR Regions from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.
- VALUE_MAP = {'A1': TCRRegionLabel.A, 'A2': TCRRegionLabel.A, 'A3': TCRRegionLabel.A, 'AFR1': TCRRegionLabel.AFR, 'AFR2': TCRRegionLabel.AFR, 'AFR3': TCRRegionLabel.AFR, 'AFR4': TCRRegionLabel.AFR, 'B1': TCRRegionLabel.B, 'B2': TCRRegionLabel.B, 'B3': TCRRegionLabel.B, 'BFR1': TCRRegionLabel.BFR, 'BFR2': TCRRegionLabel.BFR, 'BFR3': TCRRegionLabel.BFR, 'BFR4': TCRRegionLabel.BFR}¶
- getTCRRegions()¶
- class schrodinger.protein.annotation.GPCRSegmentFinder(seq)¶
Bases:
schrodinger.protein.annotation.AbstractRegionFinder
Class to help with finding GPCR Segments from the sequence. Values should be cached to reduce load on multiple requests from the table helpers and annotation requests.
- VALUE_MAP = {'C-term': GPCRSegmentLabel.CTerm, 'ECL1': GPCRSegmentLabel.ECL, 'ECL2': GPCRSegmentLabel.ECL, 'ECL3': GPCRSegmentLabel.ECL, 'H8': GPCRSegmentLabel.H8, 'ICL1': GPCRSegmentLabel.ICL, 'ICL2': GPCRSegmentLabel.ICL, 'ICL3': GPCRSegmentLabel.ICL, 'N-term': GPCRSegmentLabel.NTerm, 'TM1': GPCRSegmentLabel.TM, 'TM2': GPCRSegmentLabel.TM, 'TM3': GPCRSegmentLabel.TM, 'TM4': GPCRSegmentLabel.TM, 'TM5': GPCRSegmentLabel.TM, 'TM6': GPCRSegmentLabel.TM, 'TM7': GPCRSegmentLabel.TM, 'TM8': GPCRSegmentLabel.TM}¶
- getGPCRs()¶
- class schrodinger.protein.annotation.SeqTypeMixin(seq, *args, **kwargs)¶
Bases:
object
Mixin to customize antibody.SeqType for MSV2. See _delayed_antibody_import for class declaration.
- __init__(seq, *args, **kwargs)¶
- isHeavyChain()¶
- isLightChain()¶
- class schrodinger.protein.annotation.CombinedChainSequenceAnnotationMeta(cls, bases, classdict, *, wraps=None, cached_annotations=(), wrapped_properties=())¶
Bases:
schrodinger.application.msv.utils.QtDocstringWrapperMetaClass
The metaclass for
CombinedChainSequenceAnnotations
. This metaclass automatically wraps getters for all sequence annotations.
- class schrodinger.protein.annotation.CombinedChainProteinSequenceAnnotations(seq)¶
Bases:
schrodinger.protein.annotation.AbstractProteinSequenceAnnotationsMixin
,schrodinger.protein.annotation.AbstractSequenceAnnotations
Sequence annotations for a
sequence.CombinedChainProteinSequence
. Annotations will be fetched from theProteinSequenceAnnotations
objects for each split-chain sequence.- sequence¶
A descriptor for an instance attribute that should be stored as a weakref. Unlike weakref.proxy, this descriptor allows the attribute to be hashed.
Note that the weakref is stored on the instance using the same name as the descriptor (which is stored on the class). Since this descriptor implements __set__, it will always take precedence over the value stored on the instance.
- __init__(seq)¶
- Parameters
seq (sequence.CombinedChainProteinSequence) – The sequence to store annotations for.
- chainAdded(chain)¶
Respond to a new chain being added to the sequence. The sequence is responsible for calling this method whenever a chain is added.
- Parameters
chain (sequence.ProteinSequence) – The newly added chain.
- chainRemoved(chain)¶
Respond to a chain being removed from the sequence. The sequence is responsible for calling this method whenever a chain is removed.
- Parameters
chain (sequence.ProteinSequence) – The removed chain.
- class ANNOTATION_TYPES(*args, **kwargs)¶
Bases:
schrodinger.models.json.JsonableClassMixin
- alignment_set = 2¶
- antibody_cdr = 21¶
- antibody_regions = 35¶
- b_factor = 15¶
- beta_strand_propensity = 7¶
- binding_sites = 19¶
- custom_annotation = 34¶
- disulfide_bonds = 5¶
- domains = 20¶
- exposure_tendency = 10¶
- classmethod fromJsonImplementation(json_obj)¶
Abstract method that must be defined by all derived classes. Takes in a dictionary and constructs an instance of the derived class.
- Parameters
json_dict (dict) – A dictionary loaded from a JSON string or file.
- Returns
An instance of the derived class.
- Return type
cls
- gpcr_generic_number = 33¶
- gpcr_segment = 32¶
- helix_propensity = 6¶
- helix_termination_tendency = 9¶
- hydrophobicity = 13¶
- isoelectric_point = 14¶
- kinase_conservation = 31¶
- kinase_features = 30¶
- pairwise_constraints = 1¶
- pfam = 23¶
- pred_accessibility = 26¶
- pred_disordered = 27¶
- pred_disulfide_bonds = 24¶
- pred_domain_arr = 28¶
- pred_secondary_structure = 25¶
- proximity_constraints = 29¶
- rescode = 4¶
- resnum = 3¶
- sasa = 22¶
- secondary_structure = 18¶
- side_chain_chem = 12¶
- steric_group = 11¶
- tcr_regions = 36¶
- toJsonImplementation()¶
Abstract method that must be defined by all derived classes. Converts an instance of the derived class into a jsonifiable object.
- Returns
A dict made up of JSON native datatypes or Jsonable objects. See the link below for a table of such types. https://docs.python.org/2/library/json.html#encoders-and-decoders
- turn_propensity = 8¶
- window_hydrophobicity = 16¶
- window_isoelectric_point = 17¶
- PRED_ANNOTATION_TYPES = {<ANNOTATION_TYPES.pred_disulfide_bonds: 24>, <ANNOTATION_TYPES.pred_accessibility: 26>, <ANNOTATION_TYPES.pred_secondary_structure: 25>, <ANNOTATION_TYPES.pred_disordered: 27>, <ANNOTATION_TYPES.pred_domain_arr: 28>}¶
- RES_PROPENSITY_ANNOTATIONS = {<ANNOTATION_TYPES.steric_group: 11>, <ANNOTATION_TYPES.side_chain_chem: 12>, <ANNOTATION_TYPES.turn_propensity: 8>, <ANNOTATION_TYPES.helix_propensity: 6>, <ANNOTATION_TYPES.helix_termination_tendency: 9>, <ANNOTATION_TYPES.exposure_tendency: 10>, <ANNOTATION_TYPES.beta_strand_propensity: 7>}¶
- property alignment_set¶
- property antibody_cdr¶
- property antibody_regions¶
- property b_factor¶
- property beta_strand_propensity¶
- property custom_annotation¶
- property disulfide_bonds¶
- property domains¶
- property exposure_tendency¶
- property gpcr_generic_number¶
- property gpcr_segment¶
- property helix_propensity¶
- property helix_termination_tendency¶
- property hydrophobicity¶
- property hydrophobicity_window_padding¶
- property isoelectric_point¶
- property isoelectric_point_window_padding¶
- property kinase_conservation¶
- property kinase_features¶
- property pairwise_constraints¶
- property pfam¶
- property pred_accessibility¶
- property pred_disordered¶
- property pred_disulfide_bonds¶
- property pred_domain_arr¶
- property pred_secondary_structure¶
- property proximity_constraints¶
- property rescode¶
- property resnum¶
- property sasa¶
- property secondary_structure¶
- property side_chain_chem¶
- property steric_group¶
- property tcr_regions¶
- property turn_propensity¶
- property window_hydrophobicity¶
- property window_isoelectric_point¶
- getAntibodyCDR(col, scheme)¶
Returns the antibody CDR information of the col’th index in the sequence under a given antibody CDR numbering scheme.
- Parameters
col (int) – index into the sequence
scheme (
AntibodyCDRScheme
) – The antibody CDR numbering scheme to use
- Returns
Antibody CDR label, start, and end positions
- Return type
AntibodyCDR
, which is a named tuple of (AntibodyCDRLabel
, int, int) if col is in a CDR, otherwise (AntibodyCDRLabel.NotCDR, None, None)
- getAntibodyCDRs(scheme)¶
Returns a list of antibody CDR information for the entire sequence.
- Parameters
scheme (AntibodyCDRScheme) – The antibody CDR numbering scheme to use
- Returns
A list of Antibody CDR labels, starts, and end positions
- Return type
list(AntibodyCDR)
- getGPCRSegment(col: int) Optional[schrodinger.protein.annotation.Region] ¶
Return the GPCR segment information of the col’th index in the sequence.
- Parameters
col – index into the sequence
- Returns
GPCR Segment label, value, start, and end positions or None if col is not in a GPCR segment
- getGPCRSegments() List[schrodinger.protein.annotation.Region] ¶
Return a list of GPCR segment information for the entire sequence.
- Returns
a list of GPCR Segments labels, values, start and end positions
- getTCRRegion(col: int) Optional[schrodinger.protein.annotation.Region] ¶
Return the TCR region information of the col’th index in the sequence.
- Parameters
col – index into the sequence
- Returns
TCR Region label, value, start, and end positions or None if col is not in a TCR Region
- getTCRRegions() List[schrodinger.protein.annotation.Region] ¶
Return a list of TCR region information for the entire sequence.
- Returns
a list of TCR region labels, values, start and end positions
- getAntibodyRegion(col: int, scheme) Optional[schrodinger.protein.annotation.Region] ¶
Return the antibody region of the given residue based on the numbering scheme.
The regex will strip trailing numbers and get the label according to the AntibodyRegionLabel enum.
Example values: H1, CL, HINGE, LFR4, CH3
- Parameters
col – index into the sequence
scheme – the antibody CDR numbering scheme to use
- Returns
An AntibodyRegion with a label and value or None if there is no region
- getAntibodyRegions(scheme) List[schrodinger.protein.annotation.Region] ¶
Return the list of all antibody regions based on the numbering scheme.
- Parameters
scheme – the antibody CDR numbering scheme to use
- Returns
- isAntibodyChain()¶
- Returns
Whether the sequence described is an antibody chain
- Return type
bool
- setLigandDistance(distance)¶
Updates the ligand distance and invalidates the cache
- clearAllCaching()¶
- schrodinger.protein.annotation.make_ligand_name_atom(ct, atom_index)¶
Make a unique, human-readable name for a ligand identified by atom index.
- Parameters
ct (schrodinger.structure.Structure) – Structure the ligand belongs to
atom_index (int) – the atom index of the ligand to make a name for
- Returns
The name for the ligand
- Return type
str
- schrodinger.protein.annotation.make_ligand_name(ct, ligand)¶
Make a unique, human-readable name for a ligand. This name matches the ligand name in the structure hierarchy.
- Parameters
ct (schrodinger.structure.Structure) – Structure the ligand belongs to
ligand (schrodinger.structutils.analyze.Ligand) – the ligand to make a name for
- Returns
The name for the ligand
- Return type
str
- schrodinger.protein.annotation.parse_antibody_rescode(newcode)¶
Extract the resnum and inscode from residue number as per the scheme. If the inscode is a number it will be converted to alphabet. eg: ‘H101.1’ -> ‘101A’. Residues that are outside of the numbering scheme catalog (FV) or can not be assigned properly, will have residue number as ‘-1’. eg: ‘H-1’
- Parameters
newcode (str) – Residue code by the Antibody CDR numbering scheme.
- Returns
new residue number and insertion code.
- Return type
tuple
- Raises
KeyError – if newcode doesn’t follow the expected pattern.