schrodinger.livedesign.preprocessor module

Preprocess a molecule for ingestion into LiveDesign.

Preprocessing adjusts the input molecule to a standard form so that LiveDesign behaves in a consistent way. This adjusts both the depiction (e.g 2D layout) and the data associated with the molecule (properties, stereochemistry).

Customers often think of this step in terms of “deduplication” - if I have two molecules, when should they be considered “the same”? The answer is collected in schrodinger.livedesign.registration.RegistrationOptions, but much of it is implemented here.

Different customer sites have differing business logic, so most of this module is about supporting that diversity of different configurations.

We should encourage customers to stay as close to the default configuration as possible. The default is the most validated, and it generates the most consistent behavior for internal and external tools linked to LiveDesign.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.livedesign.preprocessor.ExplicitHydrogens

Bases: enum.Enum

REMOVE_ALL = 1

Remove all hydrogens. Default; recommended for most users

KEEP_WEDGED = 2

Keep only hydrogens that have wedge/dash bonds

ADD_ALL = 3

Add hydrogens to all atoms

AS_IS = 4

Preserve input hydrogens. Does not add or remove hydrogens

ON_HETERO_AND_KEEP_WEDGED = 5

Add hydrogens to heteroatoms and keep hydrogens that have wedge/dash bonds

class schrodinger.livedesign.preprocessor.GenerateCoordinates

Bases: enum.Enum

NONE = 1

Use input coordinates

FULL = 2

Generate totally new 2D coordinates/layout

FULL_ALIGNED = 3

Generate totally new 2D coordinates/layout, but align with input coordinates. Default; recommended for most users

class schrodinger.livedesign.preprocessor.ChiralFlag0Meaning

Bases: enum.Enum

SDF files have a “chiral flag” that can be 0 or 1. According to Biovia’s specification, chiral flag 0 means that the molecule refers to both itself and its enantiomer.

However, different companies have used the chiral flag to mean a variety of things (in part because the documentation of the file format postdates its actual use…)

UNGROUPED_ARE_ABSOLUTE = 1

Ignore the Chiral Flag, treating all chiral centers as absolute. Default; recommended for most users

UNGROUPED_ARE_RACEMIC = 2

Chiral flag 0 molecules are turned into AND (aka racemic) groups. This is the BIOVIA standard

UNGROUPED_ARE_RELATIVE = 3

Chiral flag 0 molecules are turned into OR (aka unknown) groups

class schrodinger.livedesign.preprocessor.PreprocessorOptions

Bases: NamedTuple

Configuration options for the preprocessor. These options are typically stored in a json dict on a LiveDesign instance. Skipping a setting will use the default value for that setting.

MAX_NUM_ATOMS: Optional[int]

Reject compounds with more than MAX_NUM_ATOMS atoms. Default: None (no limit)

KEEP_ONLY_LARGEST_STRUCTURE: bool

Should additional discontiguous structures be removed? Default: True

STRIP_SALTS: Optional[Tuple[str]]

Which salt molecules to avoid registering. Run after removal of small molecules. Default: None (no salts are removed)

NEUTRALIZE: bool

Should the input be neutralized? Default: True

TRANSFORMATIONS: Optional[Tuple[str]]

Standardize the representation of certain functional groups. Default transformations are:

transformations.png
CHOOSE_CANONICAL_TAUTOMER: bool

Canonicalize the tautomeric representation. Slow, but recommended if the TAUTOMER_INSENSITIVE_LAYERS hash scheme is chosen. Default: False

RESOLVE_AMBIGUOUS_TAUTOMERS: bool

If the input includes aromatic or conjugated bonds, should we guess which tautomer was intended? If false, these compouunds will fail with a kekulization error, allowing the user to correct the input.

Default: False unless CHOOSE_CANONICAL_TAUTOMER=True

CHIRAL_FLAG_0_MEANING: schrodinger.livedesign.preprocessor.ChiralFlag0Meaning

What does a chiral flag of 0 mean? Default: UNGROUPED_ARE_ABSOLUTE

STRIP_AND_GROUPS_ON_SINGLE_ATOM: bool

If an AND group includes only one atom, remove the stereo annotation on that atom. Default: False

PRESERVE_ENHANCED_STEREO_GROUP_IDS: bool

Preserve the group IDs of enhanced stereo groups. Default: False

REMOVE_PROPERTIES: bool

Should properties be removed? Properties don’t affect deduplication, but the properties and s-group data from the first instance of a compound registered will be stored in LD. Default: False

GENERATE_COORDINATES: schrodinger.livedesign.preprocessor.GenerateCoordinates

Should 2D coordinates be generated? Default: FULL_ALIGNED

EXPLICIT_HYDROGENS: schrodinger.livedesign.preprocessor.ExplicitHydrogens

How should hydrogens be treated? Default: REMOVE_ALL

CLEAR_INVALID_WEDGE_BONDS: bool

Should wedged bonds on achiral atoms be removed? Default: True

WEDGE_TWO_BONDS_AROUND_CHIRAL_ATOMS: bool

Should we draw two wedged bonds on chiral atoms with 4 neighbors? Default: False

HEAVY_HYDROGEN_DT: bool

Should the Deuterium and Tritium use symbols D and T? Default: False

static fromConfig(config: dict)

Create a PreprocessorOptions object from a configuration dictionary in the format stored in LiveDesign.

Also rejects or updates deprecated options.

Parameters

config – configuration from which to build options

Raises
  • KeyError – if an unknown key is present

  • ValueError – if an unknown value is present

toConfig() dict

Write a LiveDesign configuration dictionary from an options object

schrodinger.livedesign.preprocessor.initialize_audit_log(verbose: bool)

Initialize global audit for logging purposes

schrodinger.livedesign.preprocessor.remove_invalid_config_options(config: dict) Tuple[str]
Parameters

config – configuration from which to build options, from which all invalid keys and values will be stripped

Returns

tuple of errors encountered

schrodinger.livedesign.preprocessor.audit_changes(func: Callable, mol: rdkit.Chem.rdchem.Mol, *args)

When the global audit_changes_log is initialized, compares mol CXSMILES before and after the given function call, capturing information when the CXSMILES has been changed.

Parameters
  • func – transformation function

  • mol – molecule to apply transformation to

schrodinger.livedesign.preprocessor.is_wildcard(atom)

Is atom a wildcard?

schrodinger.livedesign.preprocessor.is_queryatom_exception(atom)

Normally we raise an exception if query atoms are in the molecule to be preprocessed, but we don’t want to do that if the atom is an attachment point

Parameters

atom – the atom to check

Returns

whether or not the atom is allowed in the preprocessor

schrodinger.livedesign.preprocessor.coords_all_zero(conf)

Returns whether or not all atom positions in a conformer are zero

schrodinger.livedesign.preprocessor.get_limited_sanitized_mol(mol: rdkit.Chem.rdchem.Mol) rdkit.Chem.rdchem.Mol

Sanitize the molecule in a limited way as to avoid changing the molecule or throwing when valence errors are present. Specifically we turn off: SANITIZE_PROPERTIES: which otherwise checks valences SANITIZE_CLEANUP: which can change the shape of molecule SANITIZE_CLEANUPCHIRALITY: which can remove chirality markers SANITIZE_FINDRADICALS: which checks valences of radicals

schrodinger.livedesign.preprocessor.setup_mol(mol)

Setup on a molecule that is always done regardless of configuration.

Parameters

mol – An unsanitized RDKit Mol

Returns

A partially sanitized RDKit mol, ready for the standardizer.

schrodinger.livedesign.preprocessor.check_kekulization(mol, options)
schrodinger.livedesign.preprocessor.assign_zero_coords_chirality(mol)

Molecules with all-zero coordinates need to have the “chirality tags” primed from the atom parity properties. Once these are in place, we remove the conformer, and leave the mol in a state equivalent to one that came from a SMILES input.

schrodinger.livedesign.preprocessor.correct_sgroup_coordinates(mol)

If coordinates are generated, make sure the FIELDDISP property in the Sgroups are using relative coordinates.

schrodinger.livedesign.preprocessor.preprocess_molblock(molblock: str, config: Optional[dict] = None, preserved_data_sgroups: Optional[List[str]] = None) str

Standardizes an MDL molblock

Parameters
  • molblock – input molblock

  • config – dict specifying preprocessor options

  • preserved_data_sgroups – list of sgroup names to preserve

Returns

standardized molblock

schrodinger.livedesign.preprocessor.preprocess(mol: rdkit.Chem.rdchem.Mol, options: Optional[schrodinger.livedesign.preprocessor.PreprocessorOptions] = None, preserved_data_sgroups: Optional[List[str]] = None) rdkit.Chem.rdchem.Mol

Standardizes an RDKit mol

Parameters
  • mol – input mol

  • options – preprocessor options

  • preserved_data_sgroups – list of sgroup names to preserve

Returns

standardized mol

exception schrodinger.livedesign.preprocessor.BlindedCompoundError

Bases: ValueError

schrodinger.livedesign.preprocessor.assert_not_blinded(mol: rdkit.Chem.rdchem.Mol, max_num_atoms: Optional[int] = None)

Checks imcoming mol to confirm it has real atoms; if it doesn’t it may have been intentionally stripped by the caller. LiveDesign marks these structures as having been “blinded”, meaning a customer may have IP/legal restrictions, or there’s a delay in registering the structure despite having assay data available. Currently, LiveDesign handles these structures by keeping a row in the LiveReport, but without an associated SDF or image. This is different from other registration errors, which are simply archived.

Parameters
  • mol – RDKit Mol to consider

  • max_mol_wt – maximum allowed molecular weight

schrodinger.livedesign.preprocessor.assert_not_query(mol: rdkit.Chem.rdchem.Mol)

Checks incoming mol to confirm there are no query features present on atoms or bonds, which would otherwise make it not compatible with registration.

Parameters

mol – RDKit Mol to consider

schrodinger.livedesign.preprocessor.get_atoms_mapping(mol_atoms)
schrodinger.livedesign.preprocessor.get_bonds_mapping(mol, original_bond_mapping, atom_idx_mapping)
schrodinger.livedesign.preprocessor.tag_mol_indexes(mol)

Tag atoms with the initial indexes on the mol and create a bond mapping. Bonds are less stable than atoms, so we create an external mapping to the atoms they bind

schrodinger.livedesign.preprocessor.check_attachment_points_changed(sg, atom_idx_mapping)
schrodinger.livedesign.preprocessor.check_cstate_changed(sg, bond_idx_mapping)
schrodinger.livedesign.preprocessor.update_sgroup_indexes(sg, sg_atoms, sg_parent_atoms, sg_bonds, atom_idx_mapping, bond_idx_mapping)
schrodinger.livedesign.preprocessor.update_mol_groups(mol, stereo_groups, substance_groups, original_bond_mapping)

Update atoms and bonds in stereo and substance groups, dropping any atoms/groups that are no longer valid for the current state of the mol.

schrodinger.livedesign.preprocessor.update_sgroups(mol, substance_groups, atom_idx_mapping, bond_idx_mapping)

Update SGroups to reflect the transformations done on mol, updating with new atom and bond indexes, as well as atoms that might have been added or removed.

schrodinger.livedesign.preprocessor.update_stereo_groups(mol, stereo_groups, atom_idx_mapping)
schrodinger.livedesign.preprocessor.add_explicit_hydrogens(mol, only_on_hetero=False)
schrodinger.livedesign.preprocessor.remove_explicit_hydrogens(mol, sgroups, keep_wedged=False, keep_hetero=False)
schrodinger.livedesign.preprocessor.convert_to_molblock(mol, options=None)

Convert processed mol into a molblock and make necessary updates.

schrodinger.livedesign.preprocessor.convert_heavy_hydrogens(molblock)

NOTE that this operates on a molblock, not a molecule

The RDKit does not currently (v2020.03) support writing D or T to mol blocks, so we need to post-process the text. Fortunately it’s an easy regex in v3000 mol blocks. This does not work with V2000 mol blocks, so we throw a ValueError there. This doesn’t seem like a big deal since V2000 support is primarly being kept around for debugging purposes. If we need to eventually support V2000+HEAVY_HYDROGEN_DT, some not-completely-trivial code will need to be written.

schrodinger.livedesign.preprocessor.neutralize(mol, checkForProblematicHs=False)
schrodinger.livedesign.preprocessor.unicode_to_str(unicode_str)

Takes a unicode object and converts it to a str (utf-8). If the arg is already a str, returns unicode_str (i.e. if run with python 3). Needed to support python 2/3 with unicode_literals.

py2: type<unicode> -> type<str utf-8> py3: type<str utf-8> (no unicode type exists)

Parameters

unicode_str (unicode (py2) or str (py3)) – the unicode that potentially needs converting (i.e. if run with python 2)

Returns

str

schrodinger.livedesign.preprocessor.transform(mol, transformation)

apply the transformation to the molecule repeatedly until it no longer applies.

the maxTransformations argument is just there to prevent us from ending up in an infinite loop due to a bogus transformation

Please note that running transformations may alter the stereochemistry of mol, so a stereo recalculation from coordinates might be required.

schrodinger.livedesign.preprocessor.in_xy_plane(mol)
schrodinger.livedesign.preprocessor.generate_coordinates(mol, align=False)
schrodinger.livedesign.preprocessor.is_polymer(s_group)
schrodinger.livedesign.preprocessor.clean_up_polymer_brackets(mol, revert_to_mol=None, keep_existing_brackets=False)

Add polymer brackets back to mol.

Parameters
  • mol – RDKit mol to add polymer brackets to

  • revert_to_mol – RDKit mol to revert to if polymer brackets cannot be added correctly to provided mol. This will occur when brackets cross more than one bond.

  • keep_existing_brackets – whether to recalculate the positions of brackets that are already present

Returns

RDKit mol with polymer brackets

schrodinger.livedesign.preprocessor.copy_lewis_structure_and_hydrogens(st, mol)

Applies bond orders and charges from st to mol. Updates #implicitH to match

Assumes st includes all hydrogens. Adds implicit and explicit hydrogens to the mol, but does not add any graph hydrogens. May remove graph hydrogens.

schrodinger.livedesign.preprocessor.generate_canonical_tautomer(mol)
schrodinger.livedesign.preprocessor.clear_wedge_bonds_from_achiral_centers(mol)
schrodinger.livedesign.preprocessor.calculate_enhanced_stereo(mol, enh_stereo_default_grouping, initial_chiral_flag)
schrodinger.livedesign.preprocessor.strip_stereo_and(input_mol)

Removes any Stereo AND groups with only one center and flattens the bonds around it

Parameters

input_mol – The original molecule to consider

Returns

post-processed molecule, if the input molecule was modified

schrodinger.livedesign.preprocessor.frag_is_smaller(atoms, largest_atoms, weight, largest_weight, smiles, largest_smiles)

A fragment is considered larger if its atoms/weight are larger, the length of the smiles string is larger, or the smiles string is lexicographically smaller if they are equal length. ie, ‘AAA’ is larger than ‘AAB’.. hence the final smiles > largest_smiles check here to reject

schrodinger.livedesign.preprocessor.connect_variable_attachment_points(mol)

forms zero-order bonds between one of the atoms of a bond with an ATTACH property to the “main” molecule in order to have the molecule+variable attachment point treated as a single fragment

returns a 2-tuple with:
  1. the modified molecule

  2. whether or not the molecule was modified

schrodinger.livedesign.preprocessor.remove_fragments(mol, substance_groups)

Fragments are not removed if the molecule contains any SGroups which are associated with polymers

Use the following criteria to remove unwanted fragments from mol:
  1. keep only the fragment which has the most number of atoms

  2. break ties by keeping only fragments with the greatest molecular weight

  3. break ties with the longest smiles string

  4. break additional ties by keeping the fragment with the earliest alpha sorted SMILES string

If two or more identical fragments remain after 1-4, we will throw a fatal error.

schrodinger.livedesign.preprocessor.remove_properties(mol)
schrodinger.livedesign.preprocessor.strip_salts(mol, salt_list)
schrodinger.livedesign.preprocessor.apply_transformations(mol, transformations)

Apply the given list of transformations, and recalculate stereo if at least one transformation applies.

schrodinger.livedesign.preprocessor.add_chiral_hs(mol)
schrodinger.livedesign.preprocessor.wedge_clean(mol, wedge_2_bonds_if_possible)
schrodinger.livedesign.preprocessor.remove_wiggly_bonds_around_double_bonds(mol)