schrodinger.protein.process_residue_db module¶
This script processes a residue database to extract only amino acids compatible with mmshare workflows. This includes:
creating and assigning unique residue names to each residue
validating that the residue is an alpha amino acid
standardizing the structure
generating 3D coordinates
applying natural analog properties
validating that the residue is usable in mutation
Output includes a processed residue database file containing only successfully processed residues, as well as CSV files logging the outcomes of processing for both successful and failed residues, including failure reasons for any failed residues.
- exception schrodinger.protein.process_residue_db.ValidationError¶
Bases:
ValueError
- class schrodinger.protein.process_residue_db.PreprocessingOutcomes(good_res_filename='successful_residues.csv', bad_res_filename='processing_errors.csv')¶
Bases:
objectA class to track the outcomes of a preprocessing pipeline, including any failed stages and their reasons, for all residues processed. This can be used to generate logs and reports on the
- __init__(good_res_filename='successful_residues.csv', bad_res_filename='processing_errors.csv')¶
- record_success(res_st, preprocessor)¶
- record_failure(res_st, preprocessor)¶
- log_results()¶
Log the results of preprocessing, including any failed residues and their failure reasons, as well as summary statistics on the number of successes and failures
- write_res_log_csvs()¶
Write residue processing logs to CSV files for both successful and failed residues
- property titles_by_failure¶
Get a mapping of failure reasons to lists of titles of residues that failed
- property total_successes¶
- property total_failures¶
- property total_processed¶
- class schrodinger.protein.process_residue_db.StageFailure(stage_name: 'str', failure_reason: 'str')¶
Bases:
object- stage_name: str¶
- failure_reason: str¶
- __init__(stage_name: str, failure_reason: str) None¶
- class schrodinger.protein.process_residue_db.ResiduePreprocessor(res_db_idx: int, res_name: str, stages: list[PreprocessorStage], outcome_tracker: PreprocessingOutcomes = None)¶
Bases:
objectA class to represent a residue preprocessing pipeline, which consists of a series of preprocessing steps that are applied to a residue structure in order to prepare it for use in mmshare workflows.
- __init__(res_db_idx: int, res_name: str, stages: list[PreprocessorStage], outcome_tracker: PreprocessingOutcomes = None)¶
- run(res_st: Structure) Structure¶
Run the preprocessing pipeline on a residue structure
- Parameters:
res_st – a residue structure to preprocess
- Raises:
ValidationError – if any of the preprocessing steps raise a ValidationError
- static log_start_processing(res_db_idx, res_name, res_title)¶
Log the start time of processing a residue
- property failure_reasons: list[str]¶
Get a list of failure reasons for any failed stages
- class schrodinger.protein.process_residue_db.PreprocessorStage(**kwargs)¶
Bases:
objectA class to represent a residue preprocessing step, which is a function that takes in a residue structure and performs some processing or validation on it.
- NAME = 'Base preprocessor stage'¶
- __init__(**kwargs)¶
- run(res_st: Structure) Structure¶
Run the preprocessor function on a residue structure
- Parameters:
res_st – a residue structure to preprocess
- Raises:
ValidationError – if the preprocessor function raises a ValidationError
- class schrodinger.protein.process_residue_db.UpdateTitle(**kwargs)¶
Bases:
PreprocessorStageUpdate the title of a residue structure to the value of the SDF ID property if it is not already set
- NAME = 'Title update'¶
- class schrodinger.protein.process_residue_db.ValidateChirality(**kwargs)¶
Bases:
PreprocessorStageValidates that the chirality property of a residue structure is allowed (i.e. not racemic, etc.)
- NAME = 'Chirality validation'¶
- class schrodinger.protein.process_residue_db.ValidateNoBuiltInNameCollision(**kwargs)¶
Bases:
PreprocessorStageValidates that a residue structure does not contain any built-in amino acids
- NAME = 'Built-in amino acid check'¶
- class schrodinger.protein.process_residue_db.ValidateSubclass(allow_non_alpha_aa=False)¶
Bases:
PreprocessorStageValidates that the subclass property of a residue structure is allowed (i.e. alpha amino acid)
- NAME = 'Subclass validation'¶
- __init__(allow_non_alpha_aa=False)¶
- class schrodinger.protein.process_residue_db.StandardizeAAStructure(skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False)¶
Bases:
PreprocessorStage- NAME = 'Standardize amino acid structure'¶
- __init__(skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False)¶
- schrodinger.protein.process_residue_db.parse_args(args=None)¶
- schrodinger.protein.process_residue_db.main(options)¶
- schrodinger.protein.process_residue_db.add_logger_file_handler(filename=None)¶
Get a file handler for the logger
- schrodinger.protein.process_residue_db.process_residue_db(residues: Iterable[Structure], skip_minimization: bool = False, allow_non_alpha_aa: bool = False, apply_natural_analogs: bool = True, use_iupac_chirality: bool = False, unique_names: Optional[Iterable[str]] = None) PreprocessingOutcomes¶
Process residue database to extract only the amino acids that are compatible with mmshare workflows
- Parameters:
residues – iterable of residue structures
skip_minimization – if True, skip minimization step
allow_non_alpha_aa – if True, allow non-alpha amino acids to be processed
use_iupac_chirality – if True, use IUPAC names to determine chirality when the chirality is otherwise undefined. This is a last resort option mainly intended for the enamine database, which contains some residues with undefined stereochemistry but defined IUPAC names.
- Returns:
a PreprocessingOutcomes object
- schrodinger.protein.process_residue_db.get_built_in_aa_names() set[str]¶
Get the names of all built-in standard and non-standard amino acids
- Returns:
a set of built-in amino acid names
- schrodinger.protein.process_residue_db.get_reserved_aa_names() set[str]¶
Get amino acid names that are reserved for built-in amino acids or custom amino acids from the nonstandard amino acid panel residue sketcher
- Returns:
a set of reserved amino acid names
- schrodinger.protein.process_residue_db.get_unique_aa_names(reserved=None) Iterable[str]¶
Generate unique amino acid names
- Parameters:
reserved – a set of reserved amino acid names that should not be generated
- schrodinger.protein.process_residue_db.get_unique_aa_names_default() Iterable[str]¶
Generate unique amino acid names, using the default set of reserved names
- schrodinger.protein.process_residue_db.get_natural_analog_map(residue_sts: list[Structure]) dict[str, tuple[str, float]]¶
Get a map of residue names to their natural analogs
- Parameters:
residue_sts – a list of processed residues
- Returns:
a map of custom amino acid names to their natural analogs
- schrodinger.protein.process_residue_db.write_preprocessed_tmp_db(res_sts: list[Structure], filename: str) None¶
Preprocess standard database to remove capping groups and standardize the backbone for comparison with custom residue database
Residues in peptide.bld are stored as ACE-X-NMA tripeptides. This causes the canvas fingerprinting to produce scores of <1 even for residues with identical side chains, unless the residues are preprocessed accordingly
- Parameters:
res_sts – a list of residues structures
filename – output file path
- schrodinger.protein.process_residue_db.apply_natural_analog_properties(processed_residues: list[Structure]) None¶
Apply natural analog properties to processed residues
- Parameters:
processed_residues – a list of processed residues
- schrodinger.protein.process_residue_db.get_standardized_aa_structure(st, skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False) Structure¶
Standardize an amino acid structure by removing any solvent molecules, applying pdb atom names, and minimizing. Confirm the structure is a valid alpha amino acid and not a built-in amino acid.
- Parameters:
st – a structure containing a single amino acid
skip_minimization – if True, skip minimization step
allow_non_alpha_aa – if True, allow non-alpha amino acids to be processed
- Raises:
ValidationError – if the structure is not a valid alpha amino acid for any reason
- schrodinger.protein.process_residue_db.process_one_aa(mol_st, skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False)¶
Process a structure containing one molecule is a valid amino acid
- Parameters:
mol_st – a structure containing one molecule
skip_minimization – if True, skip minimization step
allow_non_alpha_aa – if True, allow non-alpha amino acids to be processed
- Raises:
ValidationError – if the molecule is not a valid amino acid for any reason
- schrodinger.protein.process_residue_db.generate_3d_structure(mol_st: Structure, use_iupac_chirality=False) Structure¶
- schrodinger.protein.process_residue_db.iupac_to_smiles(iupac)¶
Convert an IUPAC name to a SMILES string using the NCI Cactus server
- schrodinger.protein.process_residue_db.rebuild_res_from_smiles(mol_st: Structure, smiles_str: str) Structure¶
Rebuild a structure from a SMILES string, preserving properties. This is sometimes necessary for sdf structures with undefined stereochemistry.
- Parameters:
mol_st – original structure
smiles_str – SMILES string to rebuild from
- Returns:
new structure rebuilt from SMILES
- schrodinger.protein.process_residue_db.validate_mutability(mol_st: Structure) Structure¶
Attempt to mutate the middle residue of a tripeptide to the residue in the provided structure. The final test to see if a residue is valid is to confirm that we can perform a mutation with it
- Parameters:
mol_st – a structure containing a single molecule
- Returns:
the tripeptide structure with the middle residue mutated, for debugging purposes
- schrodinger.protein.process_residue_db.confirm_residue_unmodified(processed_st: Structure, original_st: Structure) None¶
Confirm that a residue structure has not been modified during processing
- Parameters:
processed_st – processed residue structure
original_st – original residue structure
- Raises:
ValidationError – if the residue structure has been modified
- schrodinger.protein.process_residue_db.check_enamine_subclass(st: Structure) None¶
Raise an error if the structure has the enamine subclass property, but is not listed as an alpha amino acid
- Parameters:
st – a structure containing a single residue
- Raises:
ValidationError – if the subclass is not alpha amino acid