schrodinger.livedesign.biologics.registration module¶
This file serves as the primary entry point for Live Design biologics. In particular, it provides an API for: (1) Converting user input data into a canonical data structure for storage and database manipulation. (2) Computing a default set of properties and descriptors.
- class schrodinger.livedesign.biologics.registration.BioPolymer(polymer_id: 'str', monomer_string: 'str', reg_data: 'RegistrationData', connections: "Set['BioPolymer']" = <factory>, neighbors: "Set['BioPolymer']" = <factory>)¶
Bases:
object
- polymer_id: str¶
- monomer_string: str¶
- reg_data: RegistrationData¶
- connections: Set[BioPolymer]¶
- neighbors: Set[BioPolymer]¶
- addConnection(neighbor: BioPolymer)¶
Instead of tracking inter-polymer connections directly, maintain a connection list to find non-self neighbors.
- property bio_class¶
- property deduplication_hash¶
- __init__(polymer_id: str, monomer_string: str, reg_data: ~schrodinger.livedesign.registration.RegistrationData, connections: ~typing.Set[~schrodinger.livedesign.biologics.registration.BioPolymer] = <factory>, neighbors: ~typing.Set[~schrodinger.livedesign.biologics.registration.BioPolymer] = <factory>) None ¶
- schrodinger.livedesign.biologics.registration.get_registration_data(data: str, input_format: Format, options: Optional[RegistrationOptions] = None) Iterator[RegistrationData] ¶
Given an input in the form of fasta or HELM text, yields either one RegistrationData per sequence, or, if given a specially-formatted single-entity fasta, processes the sequences hierarchically and yields data for the subunits and their larger construct
Notably, this function uses the Bioluminate antibody detection modules to detect antibody sequences and classify connected protein chains into their respective antibody class, if any, and then combines the antibody chains into well-defined larger constructs.
- Parameters:
data – input text string to be deserialized into RDKit CG mols
input_format – input format of the data
options – registration options
- Returns:
an iterator over the hierarchy of biologic entities in increasing complexity. For example, an antibody-drug conjugate would return: 1. antibody heavy and light chains 2. a small molecule 3. the arms of the antibody 4. the antibody 5. the antibody-drug conjugate
- schrodinger.livedesign.biologics.registration.add_input_property_data(mol, input_property_data)¶
Add input property data to the rdmol object.
- Parameters:
mol – rdmol object
input_property_data – input property data to be added
- schrodinger.livedesign.biologics.registration.get_entity_class(helm_model: HelmModel) EntityClass ¶
Gets the overall classification of the given helm model.
- schrodinger.livedesign.biologics.registration.extract_biopolymer_graph(helm_model)¶
- schrodinger.livedesign.biologics.registration.create_registration_data(helm_model: HelmModel, bio_class: EntityClass, registered_children: List[RegistrationData]) RegistrationData ¶
Package the relevant information from the input helm model and return a RegistrationData object.
- Parameters:
helm_model – input helm model from which all properties are derived
bio_class – the entity class of the inpuyt model
registered_children – the list of child registration data
- Returns:
RegistrationData with the relevant fields populated.
- schrodinger.livedesign.biologics.registration.count_residues_in_model(helm_model: HelmModel) int ¶
Simple function to count residues in a helm model.
- schrodinger.livedesign.biologics.registration.extract_registration_data(polymer: HelmPolymer, helm_model: HelmModel) RegistrationData ¶
Helper function to extract the connectivity metadata and the Registration data for a particular polymer from given helm model.
- Parameters:
polymer – the helm polymer to extract, in RegistrationData format
helm_model – the helm model to extract the RegistrationData from
- Returns:
the RegistrationData instance for the polymer
- schrodinger.livedesign.biologics.registration.build_polymer_graph(connections: Iterable[HelmConnection], polymer_dict: Dict[str, BioPolymer]) None ¶
Simple function to store neighbors as a list in each BioPolymer instead of storing connections, in order to explore polymer neighborhoods efficiently.
- Parameters:
connections – list of HelmConnections defining the BioPolymer graph
polymer_dict – the dictionary of BioPolymers for easy BioPolymer management
- schrodinger.livedesign.biologics.registration.find_antibodies(polymers: Iterable[BioPolymer], helm_model: HelmModel) List[RegistrationData] ¶
Find and classify all contiguous antibody polymer combinations if connectivity is provided, otherwise just assemble them into one large antibody.
- Parameters:
polymers – list of BioPolymers to search for antibodies
helm_model – source HelmModel, used to generate new HelmModels
- Returns:
a list of RegistrationData for each found antibody construct
- schrodinger.livedesign.biologics.registration.is_antibody_subunit(polymer: BioPolymer) bool ¶
Utility function to increase readability.
- schrodinger.livedesign.biologics.registration.find_abs_by_connectivity(polymer_graph: Iterable[BioPolymer], helm_model: HelmModel) List[RegistrationData] ¶
Depth-first search approach to finding all directly connected antibody components in a helm polymer network.
- Parameters:
helm_model – source helm model to query for connectivity and extract HelmPolymers from for output subunits
polymer_graph – the list of all BioPolymers
- Returns:
a list of RegistrationData for each found (and recognized) antibody constructs
- schrodinger.livedesign.biologics.registration.find_ab_chain(biopolymer: BioPolymer, used_polymers: Set[BioPolymer]) List[BioPolymer] ¶
Recursive function to grow an antibody chain from one polymer to reach all reachable antibody fragments.
- Parameters:
used_polymers – the set of “seen” polymers to avoid infinite recursion
- schrodinger.livedesign.biologics.registration.create_combined_ab_data(ab_polymer_chain: List[BioPolymer], helm_model: HelmModel) RegistrationData ¶
Given a set of BioPolymers, extract them from the given helm_model and create a new RegistrationData object consisting only of the set.
- Parameters:
ab_polymer_chain – the collection of antibody Biopolymers to combine
helm_model – the source helm_model containing connections and HelmPolymer objects to extract
- Returns:
the RegistrationData of the final combined object
- schrodinger.livedesign.biologics.registration.classify_ab_assembly(ab_polymer_chain: List[BioPolymer]) EntityClass ¶
Given a bunch of antibodies, find a matching, recognized antibody construct and return it.
- Parameters:
ab_polymer_chain – the chain of connected antibody subunits
- Returns:
the overall antibody class the collective subunits create, if any
- schrodinger.livedesign.biologics.registration.get_arms(ab_polymer_chain: List[BioPolymer], helm_model: HelmModel) Iterable[RegistrationData] ¶
Given a list of connected BioPolymers, find the arms (light/heavy pairs) and return the registration data for each arm.
- Parameters:
ab_polymer_chain – the full set of connected antibody chains
helm_model – source helm model to extract helm strings and connections from.
- Returns:
generator over the available arms in the antibody polymer chain.
- schrodinger.livedesign.biologics.registration.get_arms_by_connectivity(ab_polymer_chain: List[BioPolymer]) Iterable[BioPolymer] ¶
Extract heavy/light arm pairs via HELM model connections. Each arm must be from a F(ab’)2, monospecific antibody, or bispecific antibody. An arm thus consists of a light Fab domain, connected to a heavy full or heavy Fab’ chain. The heavy chain can connect to other heavy chains, but two light chains sharing a heavy chain would not fall into any of the currently supported classes.
- Parameters:
ab_polymer_chain – the full set of connected antibody chains
helm_model – source helm model to extract helm strings and connections from.
- Returns:
generator over the available arms in the antibody polymer chain.
- schrodinger.livedesign.biologics.registration.get_arms_by_annotation(ab_polymer_chain: List[BioPolymer], helm_model: HelmModel) Iterable[BioPolymer] ¶
Extract heavy/light arm pairs via HELM annotation data instead of HELM model connections.
- Parameters:
ab_polymer_chain – the full set of connected antibody chains
helm_model – source helm model to extract helm strings and connections from.
- schrodinger.livedesign.biologics.registration.replace_base_chains(subunit_reg_data: RegistrationData, children: List[RegistrationData], hash_dict: Dict[str, RegistrationData]) None ¶
Helper function to recursively traverse the set of returned RegistrationData and replace base chain data with their combined subunit in the top-level child hash list.
- Parameters:
child_hashes – top-level hash list
hash_dict – all returned child hashes so far
parent – the current subunit to insert into child_hashes
- schrodinger.livedesign.biologics.registration.find_na_entities(base_biopolymers, model: HelmModel)¶
Generate registration data for nucleic acid entities, if any. Note that until nucleic acid entities more complex than single- and double-stranded DNA/RNA are supported, this function will yield a single entity.
- Parameters:
base_biopolymers – list of BioPolymers to search for NA entities
model – source HELM model
- Returns:
registration data for each NA entity
- schrodinger.livedesign.biologics.registration.determine_na_type(base_biopolymers)¶
- schrodinger.livedesign.biologics.registration.get_display_string(canonical_helm_model: HelmModel) str ¶
Externally available serialization API for HELM serialization.
- Parameters:
canonical_helm_model – canonicalized helm model
- Returns:
a HELM string
- schrodinger.livedesign.biologics.registration.clear_annotations(helm_model: HelmModel)¶
param helm_model: a HelmModel
- schrodinger.livedesign.biologics.registration.get_deduplication_hash(helm_model: HelmModel) str ¶
Unified HELM model hasher for biologics registration, which discards all annotations before generating a hash.
- Parameters:
helm_model – a HelmModel to hash
- Returns:
sha1 hashed HELM string.
- schrodinger.livedesign.biologics.registration.combine_helm(biopolymers: List[BioPolymer], helm_model: HelmModel) HelmModel ¶
Function to ensure recombined HELM strings always canonicalize the same way. Sometimes indistinguishable simple polymers (other than who they’re connected to) can get swapped, changing the connection section and the resulting final hash.
- Parameters:
biopolymers – BioPolymer instances whose corresponding polymers will be extracted
- Returns:
a HelmModel consisting only of the specified biopolymers
- schrodinger.livedesign.biologics.registration.light_heavy_heavy_light(light_chains: List[BioPolymer], heavy_chains: List[BioPolymer]) bool ¶
Ensure that the connectivity is light-heavy-heavy-light, which correspondes to a full antibody.
- Parameters:
light_chains – the list of connected light chains
heavy_chains – the list of connected heavy chains
- schrodinger.livedesign.biologics.registration.is_deoxyribose(helm_monomer: HelmMonomer) bool ¶
- schrodinger.livedesign.biologics.registration.is_standard_ribose(helm_monomer: HelmMonomer) bool ¶
- schrodinger.livedesign.biologics.registration.classify_polymer(helm_polymer: HelmPolymer) EntityClass ¶
Given a HelmPolymer (a polymer consisting of only one type), infer its type.
- Parameters:
helm_polymer – helm_polymer to determine class of
- schrodinger.livedesign.biologics.registration.classify_nucleotide_polymer(helm_polymer) EntityClass ¶
- schrodinger.livedesign.biologics.registration.classify_antibody(antibody_type: str, region_names: List[str]) EntityClass ¶
- schrodinger.livedesign.biologics.registration.classify_protein(helm_polymer) EntityClass ¶
- schrodinger.livedesign.biologics.registration.classify_overall_molecule(polymer_chains: List[BioPolymer]) EntityClass ¶
Given a collection of polymers, return the corresponding EntityClass.
- Parameters:
polymer_chains – the set of polymers to classify
- schrodinger.livedesign.biologics.registration.helm_mol_to_binary(helm_model: HelmModel) str ¶
Converts given Helm model to rdmol binary, and adds sequence annotation data for any antibodies detected.
- Parameters:
helm_model – input helm model to convert to rdmol binary
- Returns:
serialized binary with sequence annotations embedded as serialized json in the “antibody_regions” property.