schrodinger.application.bioluminate.classify module

schrodinger.application.bioluminate.classify.renumber_st_chain(chain: Structure, scheme: AntibodyCDRScheme = AntibodyCDRScheme.Kabat, resid_map: dict[str, str] = None) dict[str, str]

Renumber the residues in a structure and the resids in a resid_map mapping the original residue ID to the current residue ID if provided.

Parameters:
  • chain – The structure chain to renumber

  • scheme – The numbering scheme to use

  • resid_map – Mapping original residue ID to current residue ID

Returns:

The updated resid_map

schrodinger.application.bioluminate.classify.update_resid_map(chain: Structure, resid_map: dict[str, str] = None) dict[str, str]

Update the resid_map with the current residue IDs from the chain. If no resid_map is provided, a new one will be created.

Parameters:
  • chain – The structure chain to update

  • resid_map – A mapping from the original residue ID to the current residue ID; if not provided, a new mapping will be created

Returns:

The updated resid_map

schrodinger.application.bioluminate.classify.apply_numbering(chain: Structure, numbered_sequence: NumberedSequence, resid_map: dict[str, str] = None) dict[str, str]

Apply the numbering to the chain residues and to the resid_map.

Parameters:
  • chain – The structure chain to renumber

  • numbered_sequence – The sequence numbering to apply

  • resid_map – A mapping from the original residue ID to the current residue ID; the mapping will be updated and returned

Returns:

The updated resid_map

schrodinger.application.bioluminate.classify.get_sequence_numbering(sequence, ab_scheme=AntibodyCDRScheme.Kabat, chain_name=None, classifiers=(<class 'schrodinger.application.bioluminate.anarci_classifier.AnarciClassifier'>, )) NumberedSequence

Get the sequence numbering for a sequence.

schrodinger.application.bioluminate.classify.get_protein_family_data(sequence: str, ab_scheme=AntibodyCDRScheme.Kabat, chain_name: str = None, classifiers=(<class 'schrodinger.application.bioluminate.anarci_classifier.AnarciClassifier'>, )) ProteinClass

Get the protein family data for a sequence.

schrodinger.application.bioluminate.classify.get_region_lengths_from_bounds(region_bounds: dict[str, tuple[int, int]]) dict[str, int]

Calculate the lengths of the regions based on their bounds.

schrodinger.application.bioluminate.classify.validate_sequence_for_format(sequence: str, ab_format: AntibodyFormat, ab_scheme=AntibodyCDRScheme.Kabat) None

Check if the sequence contains the necessary regions for the specified antibody format.

Parameters:
  • sequence – The full sequence to check

  • ab_format – The antibody format specifying the required regions

  • ab_scheme – The antibody numbering scheme to use for determining the regions; defaults to DEFAULT_ANTIBODY_SCHEME

Raises:

ValueError – If the sequence is not valid for the specified antibody format

schrodinger.application.bioluminate.classify.get_sequence_for_format(sequence: str, ab_format: AntibodyFormat, ab_scheme=AntibodyCDRScheme.Kabat) str

Get the sequence segment corresponding to the specified antibody format.

Parameters:
  • sequence – The full sequence to extract the segment from

  • ab_format – The antibody format specifying the segment to extract

  • ab_scheme – The antibody numbering scheme to use for determining the regions; defaults to DEFAULT_ANTIBODY_SCHEME

schrodinger.application.bioluminate.classify.get_regions_for_format(ab_format: AntibodyFormat, anarci_type: str) list[str]

Get the region names corresponding to the specified antibody format and ANARCI type.

Parameters:
  • ab_format – The antibody format specifying the segment to extract

  • anarci_type – The ANARCI type of the sequence (e.g., ‘VH’, ‘VL’)

Returns:

A list of region names corresponding to the specified format and ANARCI type

schrodinger.application.bioluminate.classify.trim_sequence_to_regions(protein_data: ProteinClass, regions: tuple[str]) str

Get the sequence segment corresponding to the specified regions.

Check that: - All specified regions are present in the protein data. - The regions are contiguous in the sequence, meaning there are no gaps between them.

Parameters:
  • protein_data – The ProteinClass instance containing the sequence and region boundaries

  • regions – A tuple of region names to extract from the sequence

Returns:

The concatenated sequence segments corresponding to the specified regions

schrodinger.application.bioluminate.classify.get_merged_region_bounds(bounds_list: list[tuple[int, int]]) list[tuple[int, int]]

Merges overlapping or adjacent regions from the given list of (start, end) Merges overlapping and adjacent regions from the given list of (start, end) bounds, returning a single region for each set of adjacent or overlapping regions.

Parameters:

bounds_list – list of (start, end) bounds for regions

Returns:

list of (start, end) bounds for each merged continuous region