schrodinger.livedesign.biologics.registration module¶
This file serves as the primary entry point for Live Design biologics. In particular, it provides an API for: (1) Converting user input data into a canonical data structure for storage and database manipulation. (2) Computing a default set of properties and descriptors.
- class schrodinger.livedesign.biologics.registration.BioPolymer(polymer_id: 'str', monomer_string: 'str', reg_data: 'RegistrationData', connections: "Set['BioPolymer']" = <factory>, neighbors: "Set['BioPolymer']" = <factory>)¶
Bases:
object
- polymer_id: str¶
- monomer_string: str¶
- connections: Set[schrodinger.livedesign.biologics.registration.BioPolymer]¶
- neighbors: Set[schrodinger.livedesign.biologics.registration.BioPolymer]¶
- addConnection(neighbor: schrodinger.livedesign.biologics.registration.BioPolymer)¶
Instead of tracking inter-polymer connections directly, maintain a connection list to find non-self neighbors.
- property bio_class¶
- property deduplication_hash¶
- __init__(polymer_id: str, monomer_string: str, reg_data: schrodinger.livedesign.registration.RegistrationData, connections: typing.Set[schrodinger.livedesign.biologics.registration.BioPolymer] = <factory>, neighbors: typing.Set[schrodinger.livedesign.biologics.registration.BioPolymer] = <factory>) None ¶
- schrodinger.livedesign.biologics.registration.get_registration_data(data: str, input_format: schrodinger.rdkit_extensions.Format, options: Optional[schrodinger.livedesign.registration.RegistrationOptions] = None) Iterator[schrodinger.livedesign.registration.RegistrationData] ¶
Given an input in the form of fasta or HELM text, yields either one RegistrationData per sequence, or, if given a specially-formatted single-entity fasta, processes the sequences hierarchically and yields data for the subunits and their larger construct
Notably, this function uses the Bioluminate antibody detection modules to detect antibody sequences and classify connected protein chains into their respective antibody class, if any, and then combines the antibody chains into well-defined larger constructs.
- Parameters
data – input text string to be deserialized into RDKit CG mols
input_format – input format of the data
options – registration options
- Returns
an iterator over the hierarchy of biologic entities in increasing complexity. For example, an antibody-drug conjugate would return: 1. antibody heavy and light chains 2. a small molecule 3. the arms of the antibody 4. the antibody 5. the antibody-drug conjugate
- schrodinger.livedesign.biologics.registration.get_entity_class(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.livedesign.entity_type.EntityClass ¶
Gets the overall classification of the given helm model.
- schrodinger.livedesign.biologics.registration.extract_biopolymer_graph(helm_model)¶
- schrodinger.livedesign.biologics.registration.create_registration_data(helm_model: schrodinger.protein.helm._helm_parser.HelmModel, bio_class: schrodinger.livedesign.entity_type.EntityClass, registered_children: List[schrodinger.livedesign.registration.RegistrationData]) schrodinger.livedesign.registration.RegistrationData ¶
Package the relevant information from the input helm model and return a RegistrationData object.
- Parameters
helm_model – input helm model from which all properties are derived
bio_class – the entity class of the inpuyt model
registered_children – the list of child registration data
- Returns
RegistrationData with the relevant fields populated.
- schrodinger.livedesign.biologics.registration.count_residues_in_model(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) int ¶
Simple function to count residues in a helm model.
- schrodinger.livedesign.biologics.registration.extract_registration_data(polymer: schrodinger.protein.helm._helm_parser.HelmPolymer, helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.livedesign.registration.RegistrationData ¶
Helper function to extract the connectivity metadata and the Registration data for a particular polymer from given helm model.
- Parameters
polymer – the helm polymer to extract, in RegistrationData format
helm_model – the helm model to extract the RegistrationData from
- Returns
the RegistrationData instance for the polymer
- schrodinger.livedesign.biologics.registration.build_polymer_graph(connections: Iterable[schrodinger.protein.helm._helm_parser.HelmConnection], polymer_dict: Dict[str, schrodinger.livedesign.biologics.registration.BioPolymer]) None ¶
Simple function to store neighbors as a list in each BioPolymer instead of storing connections, in order to explore polymer neighborhoods efficiently.
- Parameters
connections – list of HelmConnections defining the BioPolymer graph
polymer_dict – the dictionary of BioPolymers for easy BioPolymer management
- schrodinger.livedesign.biologics.registration.find_antibodies(polymers: Iterable[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) List[schrodinger.livedesign.registration.RegistrationData] ¶
Find and classify all contiguous antibody polymer combinations if connectivity is provided, otherwise just assemble them into one large antibody.
- Parameters
polymers – list of BioPolymers to search for antibodies
helm_model – source HelmModel, used to generate new HelmModels
- Returns
a list of RegistrationData for each found antibody construct
- schrodinger.livedesign.biologics.registration.is_antibody_subunit(polymer: schrodinger.livedesign.biologics.registration.BioPolymer) bool ¶
Utility function to increase readability.
- schrodinger.livedesign.biologics.registration.find_abs_by_connectivity(polymer_graph: Iterable[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) List[schrodinger.livedesign.registration.RegistrationData] ¶
Depth-first search approach to finding all directly connected antibody components in a helm polymer network.
- Parameters
helm_model – source helm model to query for connectivity and extract HelmPolymers from for output subunits
polymer_graph – the list of all BioPolymers
- Returns
a list of RegistrationData for each found (and recognized) antibody constructs
- schrodinger.livedesign.biologics.registration.find_ab_chain(biopolymer: schrodinger.livedesign.biologics.registration.BioPolymer, used_polymers: Set[schrodinger.livedesign.biologics.registration.BioPolymer]) List[schrodinger.livedesign.biologics.registration.BioPolymer] ¶
Recursive function to grow an antibody chain from one polymer to reach all reachable antibody fragments.
- Parameters
used_polymers – the set of “seen” polymers to avoid infinite recursion
- schrodinger.livedesign.biologics.registration.create_combined_ab_data(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.livedesign.registration.RegistrationData ¶
Given a set of BioPolymers, extract them from the given helm_model and create a new RegistrationData object consisting only of the set.
- Parameters
ab_polymer_chain – the collection of antibody Biopolymers to combine
helm_model – the source helm_model containing connections and HelmPolymer objects to extract
- Returns
the RegistrationData of the final combined object
- schrodinger.livedesign.biologics.registration.classify_ab_assembly(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer]) schrodinger.livedesign.entity_type.EntityClass ¶
Given a bunch of antibodies, find a matching, recognized antibody construct and return it.
- Parameters
ab_polymer_chain – the chain of connected antibody subunits
- Returns
the overall antibody class the collective subunits create, if any
- schrodinger.livedesign.biologics.registration.get_arms(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) Iterable[schrodinger.livedesign.registration.RegistrationData] ¶
Given a list of connected BioPolymers, find the arms (light/heavy pairs) and return the registration data for each arm.
- Parameters
ab_polymer_chain – the full set of connected antibody chains
helm_model – source helm model to extract helm strings and connections from.
- Returns
generator over the available arms in the antibody polymer chain.
- schrodinger.livedesign.biologics.registration.get_arms_by_connectivity(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer]) Iterable[schrodinger.livedesign.biologics.registration.BioPolymer] ¶
Extract heavy/light arm pairs via HELM model connections. Each arm must be from a F(ab’)2, monospecific antibody, or bispecific antibody. An arm thus consists of a light Fab domain, connected to a heavy full or heavy Fab’ chain. The heavy chain can connect to other heavy chains, but two light chains sharing a heavy chain would not fall into any of the currently supported classes.
- Parameters
ab_polymer_chain – the full set of connected antibody chains
helm_model – source helm model to extract helm strings and connections from.
- Returns
generator over the available arms in the antibody polymer chain.
- schrodinger.livedesign.biologics.registration.get_arms_by_annotation(ab_polymer_chain: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) Iterable[schrodinger.livedesign.biologics.registration.BioPolymer] ¶
Extract heavy/light arm pairs via HELM annotation data instead of HELM model connections.
- Parameters
ab_polymer_chain – the full set of connected antibody chains
helm_model – source helm model to extract helm strings and connections from.
- schrodinger.livedesign.biologics.registration.replace_base_chains(subunit_reg_data: schrodinger.livedesign.registration.RegistrationData, children: List[schrodinger.livedesign.registration.RegistrationData], hash_dict: Dict[str, schrodinger.livedesign.registration.RegistrationData]) None ¶
Helper function to recursively traverse the set of returned RegistrationData and replace base chain data with their combined subunit in the top-level child hash list.
- Parameters
child_hashes – top-level hash list
hash_dict – all returned child hashes so far
parent – the current subunit to insert into child_hashes
- schrodinger.livedesign.biologics.registration.find_na_entities(base_biopolymers, model: schrodinger.protein.helm._helm_parser.HelmModel)¶
Generate registration data for nucleic acid entities, if any. Note that until nucleic acid entities more complex than single- and double-stranded DNA/RNA are supported, this function will yield a single entity.
- Parameters
base_biopolymers – list of BioPolymers to search for NA entities
model – source HELM model
- Returns
registration data for each NA entity
- schrodinger.livedesign.biologics.registration.determine_na_type(base_biopolymers)¶
- schrodinger.livedesign.biologics.registration.get_display_string(canonical_helm_model: schrodinger.protein.helm._helm_parser.HelmModel) str ¶
Externally available serialization API for HELM serialization.
- Parameters
canonical_helm_model – canonicalized helm model
- Returns
a HELM string
- schrodinger.livedesign.biologics.registration.clear_annotations(helm_model: schrodinger.protein.helm._helm_parser.HelmModel)¶
param helm_model: a HelmModel
- schrodinger.livedesign.biologics.registration.get_deduplication_hash(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) str ¶
Unified HELM model hasher for biologics registration, which discards all annotations before generating a hash.
- Parameters
helm_model – a HelmModel to hash
- Returns
sha1 hashed HELM string.
- schrodinger.livedesign.biologics.registration.combine_helm(biopolymers: List[schrodinger.livedesign.biologics.registration.BioPolymer], helm_model: schrodinger.protein.helm._helm_parser.HelmModel) schrodinger.protein.helm._helm_parser.HelmModel ¶
Function to ensure recombined HELM strings always canonicalize the same way. Sometimes indistinguishable simple polymers (other than who they’re connected to) can get swapped, changing the connection section and the resulting final hash.
- Parameters
biopolymers – BioPolymer instances whose corresponding polymers will be extracted
- Returns
a HelmModel consisting only of the specified biopolymers
- schrodinger.livedesign.biologics.registration.light_heavy_heavy_light(light_chains: List[schrodinger.livedesign.biologics.registration.BioPolymer], heavy_chains: List[schrodinger.livedesign.biologics.registration.BioPolymer]) bool ¶
Ensure that the connectivity is light-heavy-heavy-light, which correspondes to a full antibody.
- Parameters
light_chains – the list of connected light chains
heavy_chains – the list of connected heavy chains
- schrodinger.livedesign.biologics.registration.is_deoxyribose(helm_monomer: schrodinger.protein.helm._helm_parser.HelmMonomer) bool ¶
- schrodinger.livedesign.biologics.registration.is_standard_ribose(helm_monomer: schrodinger.protein.helm._helm_parser.HelmMonomer) bool ¶
- schrodinger.livedesign.biologics.registration.classify_polymer(helm_polymer: schrodinger.protein.helm._helm_parser.HelmPolymer) schrodinger.livedesign.entity_type.EntityClass ¶
Given a HelmPolymer (a polymer consisting of only one type), infer its type.
- Parameters
helm_polymer – helm_polymer to determine class of
- schrodinger.livedesign.biologics.registration.classify_nucleotide_polymer(helm_polymer) schrodinger.livedesign.entity_type.EntityClass ¶
- schrodinger.livedesign.biologics.registration.classify_antibody(prot_data) schrodinger.livedesign.entity_type.EntityClass ¶
- schrodinger.livedesign.biologics.registration.classify_protein(helm_polymer) schrodinger.livedesign.entity_type.EntityClass ¶
- schrodinger.livedesign.biologics.registration.classify_overall_molecule(polymer_chains: List[schrodinger.livedesign.biologics.registration.BioPolymer]) schrodinger.livedesign.entity_type.EntityClass ¶
Given a collection of polymers, return the corresponding EntityClass.
- Parameters
polymer_chains – the set of polymers to classify
- schrodinger.livedesign.biologics.registration.helm_mol_to_binary(helm_model: schrodinger.protein.helm._helm_parser.HelmModel) str ¶
Converts given Helm model to rdmol binary, and adds sequence annotation data for any antibodies detected.
- Parameters
helm_model – input helm model to convert to rdmol binary
- Returns
serialized binary with sequence annotations embedded as serialized json in the “antibody_regions” property.