schrodinger.application.matsci.ml_prediction_utils module

Utilities for managing ML prediction models (model data definitions and paths)

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.ml_prediction_utils.ModelData(group, name, flag, prop, directory, skip_standardization)

Bases: tuple

directory

Alias for field number 4

flag

Alias for field number 2

group

Alias for field number 0

name

Alias for field number 1

prop

Alias for field number 3

skip_standardization

Alias for field number 5

schrodinger.application.matsci.ml_prediction_utils.get_model_dir(subdir_name, info_dir=False)

Get the directory path for a model

Parameters:
  • subdir_name (str) – The subdir name for the model

  • info_dir (bool) – Whether the directory is for information about model

Return type:

str

Returns:

The full path to the model directory, default or custom

schrodinger.application.matsci.ml_prediction_utils.get_oled_models()

Return list of OLED ModelData objects

Return type:

list

Returns:

List of OLED model data

schrodinger.application.matsci.ml_prediction_utils.get_models_by_group(group)

Return list of ModelData objects for a given group

Parameters:

group (str) – Model group (SINGLE_MOL, POLYMER, OLED, etc.)

Return type:

list

Returns:

List of ModelData objects matching the group

schrodinger.application.matsci.ml_prediction_utils.get_model_by_flag(flag)

Return ModelData for a given flag

Parameters:

flag (str) – Model flag (e.g., ‘-density_predict’)

Return type:

ModelData or None

Returns:

ModelData object if found, None otherwise

schrodinger.application.matsci.ml_prediction_utils.fetch_model_from_URL(model_dir, url)

Fetch the model from the given URL

Parameters:
  • model_dir (str) – The directory to save the model to

  • url (str) – The URL to download the model from

Return type:

(bool, str)

Returns:

True, empty string if the model was downloaded successfully, False and error message if not

schrodinger.application.matsci.ml_prediction_utils.read_solvent_file(solvent_file='opto_training_solvents.csv')

Read in the solvent file and store it as dictionary

Parameters:

solvent_file (str) – The name of the solvent file to read

Return type:

dict, dict

Returns:

Dictionary of solvent names to SMILES strings, and SMILES string to names

schrodinger.application.matsci.ml_prediction_utils.get_smiles(struct, mark_head_tail=True, logger=None)

Get SMILES pattern of the structures

Parameters:
  • struct (Structure) – The structure to get SMILES for

  • mark_head_tail (bool) – Whether to replace polymer head and tail atoms if found

Return type:

str or None

Returns:

SMILES pattern, or None if there was an issue

schrodinger.application.matsci.ml_prediction_utils.get_smasher_data(flag='these')

Load the in slow-to-import ligand_ml information if ligand_ml is available

Parameters:

flag (str) – The model flag for the warning message

schrodinger.application.matsci.ml_prediction_utils.get_molecular_weight(struct, is_polymer, range_max)

Get the molecular weight of a structure, removing dummy atoms if necessary

Parameters:
  • struct (Structure) – The structure to get molecular weight for

  • is_polymer (bool) – Whether the structure is a polymer

  • range_max (float) – The maximum molecular weight allowed

Return type:

float

Returns:

The molecular weight of the structure

exception schrodinger.application.matsci.ml_prediction_utils.MLPredictionError

Bases: RuntimeError

class schrodinger.application.matsci.ml_prediction_utils.MLPropPredictor(options, structs=None, input_file=None, force_cpu=False, warn_func=None, logger=None)

Bases: object

Class to handle ML property prediction

SMILES_JOINER = '.[1S].'
SMILES_HEADER = 'SMILES'
FEATURE_0_HEADER = 'additional_feature_0'
MODEL_FILE = 'pkgd_model_qzip.qzip'
DATA_INFO_FILE = 'data_info.json'
PANEL_INFO_FILE = 'info.json'
LOGGER = None
__init__(options, structs=None, input_file=None, force_cpu=False, warn_func=None, logger=None)

Create a ML prediction class instance

type options: argparse.Namespace :param options: The command line options

Parameters:
  • structs (list of schrodinger.structure.Structure) – list of structures for prediction

  • input_file (str) – name of the input file

  • force_cpu (bool) – If true, force the model to run on CPU else LigandML will figure out the best device to run the model on. The default is False.

  • warn_func (callable) – The function to call to log/show warnings

Raises:

MLPredictionError – if the input structure is coarse-grained.

log(msg, **kwargs)

Add a message to the log file

Parameters:

msg (str) – The message to log

Additional keyword arguments are passed to the textlogger.log_msg function

run()

Run the workflow

Return type:

pandas.DataFrame

Returns:

Dataframe containing structures and predicted properties

validateAtomPresent()

Validate the input structure to check if structure is differing from trained chemical space. If not, remove the structure from prediction.

Raises:

MLPredictionError – If the atom in the input structure is not part of trained dataset.

featurizeStruct()

Create the rdkit mol objects for each structure and initialize the output dataframe

Raises:

MLPredictionError – If no structures convert successfully

convertToMol(struct)

Convert the structure to a rdkit mol that includes the solvent and necessary properties

Parameters:

struct (Structure) – The structure to convert

Return type:

(Chem.Mol, str) or (None, None)

Returns:

The rdkit mol and the SMILES string of the structure ( without solvent), or None, None if conversion failed

getFormulationMol(smiles, struct)

Get full smiles for formulation or optoelectronics models. Please note that optoelectronics models are special case of formulation models.

Parameters:
  • smiles (str) – smiles of the input structure

  • struct (Structure) – The structure corresponding to smiles. Structure title will be updated for the formulation.

Return type:

Chem.rdchem.Mol or None

Returns:

The RDKIT mol object for input structure and solvent/thin-film if there is a valid input else None.

getRDKitMol(smiles)

Get RDKIT mol object from SMILES string

Parameters:

smiles (str) – The SMILES string

Return type:

Chem.rdchem.Mol or None

Returns:

The RDKIT mol object or None if smiles to mol object conversion failed

getAdditionFeaturesVectors(feature_flag)

Get the additional features for the model

Parameters:

feature_flag (str) – The flag for the model

Return type:

float

Returns:

The additional features for the model

getModel()

Get the ligand_ml model object

Raises:

MLPredictionError – If the user did not select a model to run

getFullModelPath(dirname)

Get the full path to the model file, using the custom version if found

Parameters:

dirname (str) – The path to built-in model

Return type:

str

Returns:

The actual path to the model directory - using the custom path if one is found or the input path if one is not, and the full actual path to the model file itself.

Raises:

MLPredictionError – If the model file is not found

addInfoToModel(model_directory)

Add the molecular weight range as an attribute on the model object

Parameters:

model_directory (str) – The path to the model directory

addAdditonalFeaturesToModel(model_directory)

Add the additional features as an attribute on the model object

Parameters:

model_directory (str) – The path to the model directory

predict()

Run the predictions

cleanDataFrame()

Remove rows without numerical results

Raises:

MLPredictionError – if the dataframe is empty after removal

cleanModelFiles()

Clean temporary files created by the model

checkMolecularWeights()

Check the molecular weight of the structures and add a warning property if the molecular weight is outside the range