schrodinger.application.bioluminate.classify module¶
- schrodinger.application.bioluminate.classify.renumber_st_chain(chain: Structure, scheme: AntibodyCDRScheme = AntibodyCDRScheme.Kabat, resid_map: dict[str, str] = None) dict[str, str]¶
Renumber the residues in a structure and the resids in a resid_map mapping the original residue ID to the current residue ID if provided.
- Parameters:
chain – The structure chain to renumber
scheme – The numbering scheme to use
resid_map – Mapping original residue ID to current residue ID
- Returns:
The updated resid_map
- schrodinger.application.bioluminate.classify.update_resid_map(chain: Structure, resid_map: dict[str, str] = None) dict[str, str]¶
Update the resid_map with the current residue IDs from the chain. If no resid_map is provided, a new one will be created.
- Parameters:
chain – The structure chain to update
resid_map – A mapping from the original residue ID to the current residue ID; if not provided, a new mapping will be created
- Returns:
The updated resid_map
- schrodinger.application.bioluminate.classify.apply_numbering(chain: Structure, numbered_sequence: NumberedSequence, resid_map: dict[str, str] = None) dict[str, str]¶
Apply the numbering to the chain residues and to the resid_map.
- Parameters:
chain – The structure chain to renumber
numbered_sequence – The sequence numbering to apply
resid_map – A mapping from the original residue ID to the current residue ID; the mapping will be updated and returned
- Returns:
The updated resid_map
- schrodinger.application.bioluminate.classify.get_sequence_numbering(sequence, ab_scheme=AntibodyCDRScheme.Kabat, chain_name=None, classifiers=(<class 'schrodinger.application.bioluminate.anarci_classifier.AnarciClassifier'>, )) NumberedSequence¶
Get the sequence numbering for a sequence.
- schrodinger.application.bioluminate.classify.get_protein_family_data(sequence: str, ab_scheme=AntibodyCDRScheme.Kabat, chain_name: str = None, classifiers=(<class 'schrodinger.application.bioluminate.anarci_classifier.AnarciClassifier'>, )) ProteinClass¶
Get the protein family data for a sequence.
- schrodinger.application.bioluminate.classify.get_region_lengths_from_bounds(region_bounds: dict[str, tuple[int, int]]) dict[str, int]¶
Calculate the lengths of the regions based on their bounds.
- schrodinger.application.bioluminate.classify.validate_sequence_for_format(sequence: str, ab_format: AntibodyFormat, ab_scheme=AntibodyCDRScheme.Kabat) None¶
Check if the sequence contains the necessary regions for the specified antibody format.
- Parameters:
sequence – The full sequence to check
ab_format – The antibody format specifying the required regions
ab_scheme – The antibody numbering scheme to use for determining the regions; defaults to DEFAULT_ANTIBODY_SCHEME
- Raises:
ValueError – If the sequence is not valid for the specified antibody format
- schrodinger.application.bioluminate.classify.get_sequence_for_format(sequence: str, ab_format: AntibodyFormat, ab_scheme=AntibodyCDRScheme.Kabat) str¶
Get the sequence segment corresponding to the specified antibody format.
- Parameters:
sequence – The full sequence to extract the segment from
ab_format – The antibody format specifying the segment to extract
ab_scheme – The antibody numbering scheme to use for determining the regions; defaults to DEFAULT_ANTIBODY_SCHEME
- schrodinger.application.bioluminate.classify.get_regions_for_format(ab_format: AntibodyFormat, anarci_type: str) list[str]¶
Get the region names corresponding to the specified antibody format and ANARCI type.
- Parameters:
ab_format – The antibody format specifying the segment to extract
anarci_type – The ANARCI type of the sequence (e.g., ‘VH’, ‘VL’)
- Returns:
A list of region names corresponding to the specified format and ANARCI type
- schrodinger.application.bioluminate.classify.trim_sequence_to_regions(protein_data: ProteinClass, regions: tuple[str]) str¶
Get the sequence segment corresponding to the specified regions.
Check that: - All specified regions are present in the protein data. - The regions are contiguous in the sequence, meaning there are no gaps between them.
- Parameters:
protein_data – The ProteinClass instance containing the sequence and region boundaries
regions – A tuple of region names to extract from the sequence
- Returns:
The concatenated sequence segments corresponding to the specified regions
- schrodinger.application.bioluminate.classify.get_merged_region_bounds(bounds_list: list[tuple[int, int]]) list[tuple[int, int]]¶
Merges overlapping or adjacent regions from the given list of (start, end) Merges overlapping and adjacent regions from the given list of (start, end) bounds, returning a single region for each set of adjacent or overlapping regions.
- Parameters:
bounds_list – list of (start, end) bounds for regions
- Returns:
list of (start, end) bounds for each merged continuous region