schrodinger.application.matsci.ml_prediction_utils module¶
Utilities for managing ML prediction models (model data definitions and paths)
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.application.matsci.ml_prediction_utils.ModelData(group, name, flag, prop, directory, skip_standardization)¶
Bases:
tuple- directory¶
Alias for field number 4
- flag¶
Alias for field number 2
- group¶
Alias for field number 0
- name¶
Alias for field number 1
- prop¶
Alias for field number 3
- skip_standardization¶
Alias for field number 5
- schrodinger.application.matsci.ml_prediction_utils.get_model_dir(subdir_name, info_dir=False)¶
Get the directory path for a model
- Parameters:
subdir_name (str) – The subdir name for the model
info_dir (bool) – Whether the directory is for information about model
- Return type:
str
- Returns:
The full path to the model directory, default or custom
- schrodinger.application.matsci.ml_prediction_utils.get_oled_models()¶
Return list of OLED ModelData objects
- Return type:
list
- Returns:
List of OLED model data
- schrodinger.application.matsci.ml_prediction_utils.get_models_by_group(group)¶
Return list of ModelData objects for a given group
- Parameters:
group (str) – Model group (SINGLE_MOL, POLYMER, OLED, etc.)
- Return type:
list
- Returns:
List of ModelData objects matching the group
- schrodinger.application.matsci.ml_prediction_utils.get_model_by_flag(flag)¶
Return ModelData for a given flag
- Parameters:
flag (str) – Model flag (e.g., ‘-density_predict’)
- Return type:
ModelData or None
- Returns:
ModelData object if found, None otherwise
- schrodinger.application.matsci.ml_prediction_utils.fetch_model_from_URL(model_dir, url)¶
Fetch the model from the given URL
- Parameters:
model_dir (str) – The directory to save the model to
url (str) – The URL to download the model from
- Return type:
(bool, str)
- Returns:
True, empty string if the model was downloaded successfully, False and error message if not
- schrodinger.application.matsci.ml_prediction_utils.read_solvent_file(solvent_file='opto_training_solvents.csv')¶
Read in the solvent file and store it as dictionary
- Parameters:
solvent_file (str) – The name of the solvent file to read
- Return type:
dict, dict
- Returns:
Dictionary of solvent names to SMILES strings, and SMILES string to names
- schrodinger.application.matsci.ml_prediction_utils.get_smiles(struct, mark_head_tail=True, logger=None)¶
Get SMILES pattern of the structures
- Parameters:
struct (Structure) – The structure to get SMILES for
mark_head_tail (bool) – Whether to replace polymer head and tail atoms if found
- Return type:
str or None
- Returns:
SMILES pattern, or None if there was an issue
- schrodinger.application.matsci.ml_prediction_utils.get_smasher_data(flag='these')¶
Load the in slow-to-import ligand_ml information if ligand_ml is available
- Parameters:
flag (str) – The model flag for the warning message
- schrodinger.application.matsci.ml_prediction_utils.get_molecular_weight(struct, is_polymer, range_max)¶
Get the molecular weight of a structure, removing dummy atoms if necessary
- Parameters:
struct (Structure) – The structure to get molecular weight for
is_polymer (bool) – Whether the structure is a polymer
range_max (float) – The maximum molecular weight allowed
- Return type:
float
- Returns:
The molecular weight of the structure
- exception schrodinger.application.matsci.ml_prediction_utils.MLPredictionError¶
Bases:
RuntimeError
- class schrodinger.application.matsci.ml_prediction_utils.MLPropPredictor(options, structs=None, input_file=None, force_cpu=False, warn_func=None, logger=None)¶
Bases:
objectClass to handle ML property prediction
- SMILES_JOINER = '.[1S].'¶
- SMILES_HEADER = 'SMILES'¶
- FEATURE_0_HEADER = 'additional_feature_0'¶
- MODEL_FILE = 'pkgd_model_qzip.qzip'¶
- DATA_INFO_FILE = 'data_info.json'¶
- PANEL_INFO_FILE = 'info.json'¶
- LOGGER = None¶
- __init__(options, structs=None, input_file=None, force_cpu=False, warn_func=None, logger=None)¶
Create a ML prediction class instance
type options:
argparse.Namespace:param options: The command line options- Parameters:
structs (list of
schrodinger.structure.Structure) – list of structures for predictioninput_file (str) – name of the input file
force_cpu (bool) – If true, force the model to run on CPU else LigandML will figure out the best device to run the model on. The default is False.
warn_func (callable) – The function to call to log/show warnings
- Raises:
MLPredictionError – if the input structure is coarse-grained.
- log(msg, **kwargs)¶
Add a message to the log file
- Parameters:
msg (str) – The message to log
Additional keyword arguments are passed to the textlogger.log_msg function
- run()¶
Run the workflow
- Return type:
pandas.DataFrame
- Returns:
Dataframe containing structures and predicted properties
- validateAtomPresent()¶
Validate the input structure to check if structure is differing from trained chemical space. If not, remove the structure from prediction.
- Raises:
MLPredictionError – If the atom in the input structure is not part of trained dataset.
- featurizeStruct()¶
Create the rdkit mol objects for each structure and initialize the output dataframe
- Raises:
MLPredictionError – If no structures convert successfully
- convertToMol(struct)¶
Convert the structure to a rdkit mol that includes the solvent and necessary properties
- Parameters:
struct (Structure) – The structure to convert
- Return type:
(Chem.Mol, str) or (None, None)
- Returns:
The rdkit mol and the SMILES string of the structure ( without solvent), or None, None if conversion failed
- getFormulationMol(smiles, struct)¶
Get full smiles for formulation or optoelectronics models. Please note that optoelectronics models are special case of formulation models.
- Parameters:
smiles (str) – smiles of the input structure
struct (Structure) – The structure corresponding to smiles. Structure title will be updated for the formulation.
- Return type:
Chem.rdchem.Mol or None
- Returns:
The RDKIT mol object for input structure and solvent/thin-film if there is a valid input else None.
- getRDKitMol(smiles)¶
Get RDKIT mol object from SMILES string
- Parameters:
smiles (str) – The SMILES string
- Return type:
Chem.rdchem.Mol or None
- Returns:
The RDKIT mol object or None if smiles to mol object conversion failed
- getAdditionFeaturesVectors(feature_flag)¶
Get the additional features for the model
- Parameters:
feature_flag (str) – The flag for the model
- Return type:
float
- Returns:
The additional features for the model
- getModel()¶
Get the ligand_ml model object
- Raises:
MLPredictionError – If the user did not select a model to run
- getFullModelPath(dirname)¶
Get the full path to the model file, using the custom version if found
- Parameters:
dirname (str) – The path to built-in model
- Return type:
str
- Returns:
The actual path to the model directory - using the custom path if one is found or the input path if one is not, and the full actual path to the model file itself.
- Raises:
MLPredictionError – If the model file is not found
- addInfoToModel(model_directory)¶
Add the molecular weight range as an attribute on the model object
- Parameters:
model_directory (str) – The path to the model directory
- addAdditonalFeaturesToModel(model_directory)¶
Add the additional features as an attribute on the model object
- Parameters:
model_directory (str) – The path to the model directory
- predict()¶
Run the predictions
- cleanDataFrame()¶
Remove rows without numerical results
- Raises:
MLPredictionError – if the dataframe is empty after removal
- cleanModelFiles()¶
Clean temporary files created by the model
- checkMolecularWeights()¶
Check the molecular weight of the structures and add a warning property if the molecular weight is outside the range