schrodinger.protein.process_residue_db module

This script processes a residue database to extract only amino acids compatible with mmshare workflows. This includes:

  • creating and assigning unique residue names to each residue

  • validating that the residue is an alpha amino acid

  • standardizing the structure

  • generating 3D coordinates

  • applying natural analog properties

  • validating that the residue is usable in mutation

Output includes a processed residue database file containing only successfully processed residues, as well as CSV files logging the outcomes of processing for both successful and failed residues, including failure reasons for any failed residues.

exception schrodinger.protein.process_residue_db.ValidationError

Bases: ValueError

class schrodinger.protein.process_residue_db.PreprocessingOutcomes(good_res_filename='successful_residues.csv', bad_res_filename='processing_errors.csv')

Bases: object

A class to track the outcomes of a preprocessing pipeline, including any failed stages and their reasons, for all residues processed. This can be used to generate logs and reports on the

__init__(good_res_filename='successful_residues.csv', bad_res_filename='processing_errors.csv')
record_success(res_st, preprocessor)
record_failure(res_st, preprocessor)
log_results()

Log the results of preprocessing, including any failed residues and their failure reasons, as well as summary statistics on the number of successes and failures

write_res_log_csvs()

Write residue processing logs to CSV files for both successful and failed residues

property titles_by_failure

Get a mapping of failure reasons to lists of titles of residues that failed

property total_successes
property total_failures
property total_processed
class schrodinger.protein.process_residue_db.StageFailure(stage_name: 'str', failure_reason: 'str')

Bases: object

stage_name: str
failure_reason: str
__init__(stage_name: str, failure_reason: str) None
class schrodinger.protein.process_residue_db.ResiduePreprocessor(res_db_idx: int, res_name: str, stages: list[PreprocessorStage], outcome_tracker: PreprocessingOutcomes = None)

Bases: object

A class to represent a residue preprocessing pipeline, which consists of a series of preprocessing steps that are applied to a residue structure in order to prepare it for use in mmshare workflows.

__init__(res_db_idx: int, res_name: str, stages: list[PreprocessorStage], outcome_tracker: PreprocessingOutcomes = None)
static set_res_name(res_st: Structure, name: str)

Set the residue name for a residue structure

run(res_st: Structure) Structure

Run the preprocessing pipeline on a residue structure

Parameters:

res_st – a residue structure to preprocess

Raises:

ValidationError – if any of the preprocessing steps raise a ValidationError

static log_start_processing(res_db_idx, res_name, res_title)

Log the start time of processing a residue

property failure_reasons: list[str]

Get a list of failure reasons for any failed stages

class schrodinger.protein.process_residue_db.PreprocessorStage(**kwargs)

Bases: object

A class to represent a residue preprocessing step, which is a function that takes in a residue structure and performs some processing or validation on it.

NAME = 'Base preprocessor stage'
__init__(**kwargs)
run(res_st: Structure) Structure

Run the preprocessor function on a residue structure

Parameters:

res_st – a residue structure to preprocess

Raises:

ValidationError – if the preprocessor function raises a ValidationError

class schrodinger.protein.process_residue_db.UpdateTitle(**kwargs)

Bases: PreprocessorStage

Update the title of a residue structure to the value of the SDF ID property if it is not already set

NAME = 'Title update'
class schrodinger.protein.process_residue_db.ValidateChirality(**kwargs)

Bases: PreprocessorStage

Validates that the chirality property of a residue structure is allowed (i.e. not racemic, etc.)

NAME = 'Chirality validation'
class schrodinger.protein.process_residue_db.ValidateNoBuiltInNameCollision(**kwargs)

Bases: PreprocessorStage

Validates that a residue structure does not contain any built-in amino acids

NAME = 'Built-in amino acid check'
class schrodinger.protein.process_residue_db.ValidateSubclass(allow_non_alpha_aa=False)

Bases: PreprocessorStage

Validates that the subclass property of a residue structure is allowed (i.e. alpha amino acid)

NAME = 'Subclass validation'
__init__(allow_non_alpha_aa=False)
class schrodinger.protein.process_residue_db.StandardizeAAStructure(skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False)

Bases: PreprocessorStage

NAME = 'Standardize amino acid structure'
__init__(skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False)
schrodinger.protein.process_residue_db.parse_args(args=None)
schrodinger.protein.process_residue_db.main(options)
schrodinger.protein.process_residue_db.add_logger_file_handler(filename=None)

Get a file handler for the logger

schrodinger.protein.process_residue_db.process_residue_db(residues: Iterable[Structure], skip_minimization: bool = False, allow_non_alpha_aa: bool = False, apply_natural_analogs: bool = True, use_iupac_chirality: bool = False, unique_names: Optional[Iterable[str]] = None) PreprocessingOutcomes

Process residue database to extract only the amino acids that are compatible with mmshare workflows

Parameters:
  • residues – iterable of residue structures

  • skip_minimization – if True, skip minimization step

  • allow_non_alpha_aa – if True, allow non-alpha amino acids to be processed

  • use_iupac_chirality – if True, use IUPAC names to determine chirality when the chirality is otherwise undefined. This is a last resort option mainly intended for the enamine database, which contains some residues with undefined stereochemistry but defined IUPAC names.

Returns:

a PreprocessingOutcomes object

schrodinger.protein.process_residue_db.get_built_in_aa_names() set[str]

Get the names of all built-in standard and non-standard amino acids

Returns:

a set of built-in amino acid names

schrodinger.protein.process_residue_db.get_reserved_aa_names() set[str]

Get amino acid names that are reserved for built-in amino acids or custom amino acids from the nonstandard amino acid panel residue sketcher

Returns:

a set of reserved amino acid names

schrodinger.protein.process_residue_db.get_unique_aa_names(reserved=None) Iterable[str]

Generate unique amino acid names

Parameters:

reserved – a set of reserved amino acid names that should not be generated

schrodinger.protein.process_residue_db.get_unique_aa_names_default() Iterable[str]

Generate unique amino acid names, using the default set of reserved names

schrodinger.protein.process_residue_db.get_natural_analog_map(residue_sts: list[Structure]) dict[str, tuple[str, float]]

Get a map of residue names to their natural analogs

Parameters:

residue_sts – a list of processed residues

Returns:

a map of custom amino acid names to their natural analogs

schrodinger.protein.process_residue_db.write_preprocessed_tmp_db(res_sts: list[Structure], filename: str) None

Preprocess standard database to remove capping groups and standardize the backbone for comparison with custom residue database

Residues in peptide.bld are stored as ACE-X-NMA tripeptides. This causes the canvas fingerprinting to produce scores of <1 even for residues with identical side chains, unless the residues are preprocessed accordingly

Parameters:
  • res_sts – a list of residues structures

  • filename – output file path

schrodinger.protein.process_residue_db.apply_natural_analog_properties(processed_residues: list[Structure]) None

Apply natural analog properties to processed residues

Parameters:

processed_residues – a list of processed residues

schrodinger.protein.process_residue_db.get_standardized_aa_structure(st, skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False) Structure

Standardize an amino acid structure by removing any solvent molecules, applying pdb atom names, and minimizing. Confirm the structure is a valid alpha amino acid and not a built-in amino acid.

Parameters:
  • st – a structure containing a single amino acid

  • skip_minimization – if True, skip minimization step

  • allow_non_alpha_aa – if True, allow non-alpha amino acids to be processed

Raises:

ValidationError – if the structure is not a valid alpha amino acid for any reason

schrodinger.protein.process_residue_db.process_one_aa(mol_st, skip_minimization=False, allow_non_alpha_aa=False, use_iupac_chirality=False)

Process a structure containing one molecule is a valid amino acid

Parameters:
  • mol_st – a structure containing one molecule

  • skip_minimization – if True, skip minimization step

  • allow_non_alpha_aa – if True, allow non-alpha amino acids to be processed

Raises:

ValidationError – if the molecule is not a valid amino acid for any reason

schrodinger.protein.process_residue_db.generate_3d_structure(mol_st: Structure, use_iupac_chirality=False) Structure
schrodinger.protein.process_residue_db.iupac_to_smiles(iupac)

Convert an IUPAC name to a SMILES string using the NCI Cactus server

schrodinger.protein.process_residue_db.rebuild_res_from_smiles(mol_st: Structure, smiles_str: str) Structure

Rebuild a structure from a SMILES string, preserving properties. This is sometimes necessary for sdf structures with undefined stereochemistry.

Parameters:
  • mol_st – original structure

  • smiles_str – SMILES string to rebuild from

Returns:

new structure rebuilt from SMILES

schrodinger.protein.process_residue_db.validate_mutability(mol_st: Structure) Structure

Attempt to mutate the middle residue of a tripeptide to the residue in the provided structure. The final test to see if a residue is valid is to confirm that we can perform a mutation with it

Parameters:

mol_st – a structure containing a single molecule

Returns:

the tripeptide structure with the middle residue mutated, for debugging purposes

schrodinger.protein.process_residue_db.confirm_residue_unmodified(processed_st: Structure, original_st: Structure) None

Confirm that a residue structure has not been modified during processing

Parameters:
  • processed_st – processed residue structure

  • original_st – original residue structure

Raises:

ValidationError – if the residue structure has been modified

schrodinger.protein.process_residue_db.check_enamine_subclass(st: Structure) None

Raise an error if the structure has the enamine subclass property, but is not listed as an alpha amino acid

Parameters:

st – a structure containing a single residue

Raises:

ValidationError – if the subclass is not alpha amino acid

schrodinger.protein.process_residue_db.write_residues_to_file(processed_residues: Iterable[Structure], output_filename: str) None

Write processed residues to a file

Parameters:
  • processed_residues – a list of processed residues

  • output_filename – path to output file