schrodinger.livedesign.biologics.sequence module¶
- class schrodinger.livedesign.biologics.sequence.AlignedSequence(sequence: str, identity: float = None, similarity: float = None)¶
Bases:
object
- sequence: str¶
- identity: float = None¶
- similarity: float = None¶
- fromProteinSequence(ref_seq: Optional[schrodinger.protein.sequence.ProteinSequence] = None)¶
- __init__(sequence: str, identity: Optional[float] = None, similarity: Optional[float] = None) None ¶
- schrodinger.livedesign.biologics.sequence.subsequence_matches(match_mol: rdkit.Chem.rdchem.Mol, query_mol: rdkit.Chem.rdchem.Mol) Iterator[rdkit.Chem.rdchem.Mol] ¶
Return matches on query_polymer in match_mol using RDKit’s topology based substructure searching. This is possible because we mark monomers with unique isotopes to differentiate monomer types.
- Parameters
match_mol – molecule to search over for matches
query_mol – molecule to find matches of
- Returns
substructure matches found in match_mol of query_mol
- schrodinger.livedesign.biologics.sequence.get_sequence_viewer_data(mol: rdkit.Chem.rdchem.Mol, scheme: schrodinger.infra.util.AntibodyCDRScheme = AntibodyCDRScheme.Kabat)¶
- Parameters
mol – rdmol to extract sequence data from
- Returns
a map from polymer id to a dictionary mapping antibody regions to monomer indices in the corresponding simple polymer
- Raises
RuntimeError – if the molecule contains nonlinear peptides
- schrodinger.livedesign.biologics.sequence.get_annotations_for_helm_model(model: schrodinger.protein.helm._helm_parser.HelmModel, scheme: schrodinger.infra.util.AntibodyCDRScheme) Dict[str, Dict[str, Union[Tuple[int, int], List[str]]]] ¶
HelmModels reorder polymer chains to canonicalize input, which means that the same polymer can have two different polymer ids in two models if those two models contain different peptide polymers. This function goes back through a HELM model and computes the mapping between each antibody chains and its constituent region annotation.
- Parameters
model – HelmModel to extract annotations for
- Returns
a map from polymer id to a dictionary mapping antibody regions to monomer indices in the corresponding simple polymer.
- schrodinger.livedesign.biologics.sequence.get_sequence_filter_chain_name(entity_class: schrodinger.livedesign.entity_type.EntityClass) str ¶
Simplify chain names presented in the sequence viewer filter combobox
- Parameters
entity_class – the entity class of the given polymer chain
- Returns
chain name to label the given entity’s sequence viewer data
- schrodinger.livedesign.biologics.sequence.get_polymer_annotations(polymer: schrodinger.protein.helm._helm_parser.HelmPolymer, scheme: schrodinger.infra.util.AntibodyCDRScheme) Tuple[str, Dict[str, Union[Tuple[int, int], List[str]]]] ¶
Returns the chain ID and sequence annotations for a HelmPolymer.
- schrodinger.livedesign.biologics.sequence.get_monomer_data(polymer: HelmPolymer) dict[str, dict[str, Any]] ¶
Returns a list of dictionaries containing monomer information for each monomer in the polymer.
- schrodinger.livedesign.biologics.sequence.get_ab_annotations(fasta_sequence: str, scheme: schrodinger.infra.util.AntibodyCDRScheme) Dict[str, Union[Tuple[int, int], List[str]]] ¶
Cheap cache wrapper around antibody.SeqType to reduce the cost of calling get_annotations for each RegistrationData object.
- schrodinger.livedesign.biologics.sequence.split_by_hierarchy(region_dict: Dict[str, List[int]]) Dict[str, Dict[str, List[int]]] ¶
Splits a region dictionary into a dictionary of antibody domain boundaries (e.g., VH, CH1) and a dictionary of subdomain boundaries (e.g. HFR1, H1).
- schrodinger.livedesign.biologics.sequence.get_arm_indices(model: schrodinger.protein.helm._helm_parser.HelmModel) Dict[str, int] ¶
Returns a mapping from polymer id to arm pairs. If no arm pairing is provided, assignes a unique arm pair to each polymer id.
- schrodinger.livedesign.biologics.sequence.align_sequences(sequences: List[str], ref_seq_index: Optional[int] = None) List[schrodinger.livedesign.biologics.sequence.AlignedSequence] ¶
Returns aligned sequences as a FASTA string.
- Parameters
sequences – sequences to align
ref_seq_index – if not None, all sequences are pairwise aligned using the sequence at ref_seq_index as a reference sequence
- Returns
FASTA string of the aligned sequences
- schrodinger.livedesign.biologics.sequence.align_all_to_reference(aln: schrodinger.protein.alignment.ProteinAlignment, ref_seq_index: int) None ¶
Aligns a given ProteinAlignment pairwise with respect to the specified reference sequence. Due to the way alignments were implemented, (see protein.alignment.BaseAlignment) ref_seq must be a sequence already in the alignment. The input ProteinAlignment is modified and not returned.
- Parameters
aln – the alignment to be aligned
ref_seq – the ProteinSequence instance corresponding to the reference sequence. Must be already in the alignment and discoverable by aln.index(ref_seq).
- schrodinger.livedesign.biologics.sequence.multiple_align(aln: schrodinger.protein.alignment.ProteinAlignment) None ¶
Aligns a given ProteinAlignment via multiple sequence alignment. The input ProteinAlignment is modified and not returned.
- Parameters
aln – the alignment to be aligned