schrodinger.application.matsci.ml_formulations_utils module¶
Classes and functions to help with ML formulation-based workflows.
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.application.matsci.ml_formulations_utils.validate_smiles(smiles)¶
A cached function that wraps the adapter.validate_smarts function to validate the smiles string. NOTE: the ML formulations backend uses a dummy molecule ‘[1O]’ as a placeholder for empty components, we should reject that SMILES even though it is valid according to rdkit
- Parameters:
smiles (str) – The smiles string to validate
- Returns:
True if the smiles is valid, False otherwise
- Return type:
bool
- schrodinger.application.matsci.ml_formulations_utils.add_tracking_index_to_csv(csv_file)¶
Add a tracking index to the CSV file, which is the row number of csv file
- Parameters:
csv_file (str) – The CSV file to add the tracking index to
- Returns:
The CSV file with the tracking index added
- Return type:
str
- schrodinger.application.matsci.ml_formulations_utils.merge_input_data_to_predicted(input_csv_file, predicted_csv_file)¶
Merge the input data to the predicted data based on the tracking index
- Parameters:
input_csv_file (str) – The input CSV file to merge
predicted_csv_file (str) – The predicted CSV file to merge
- schrodinger.application.matsci.ml_formulations_utils.check_model_version(model)¶
Get the release version stored in the release_version.txt file in the model and check if it matches the current release version.
- Parameters:
model (str) – Path to the model file
- Returns:
Whether the release versions match
- Return type:
bool
- class schrodinger.application.matsci.ml_formulations_utils.BaseCSVReader¶
Bases:
object
Base class for reading formulation CSV files
- __init__()¶
Create an instance
- setRequiredProps(required_props)¶
Set the required properties for reading the CSV.
- Parameters:
required_props (list) – The list of required properties that must be present in the csv file. If None, there will be no requirement of properties
- setGrpRequiredProps(required_props)¶
Set the required properties for reading the mixtures CSV.
- Parameters:
required_props (list) – The list of required properties that must be present in the csv file. If None, there will be no requirement of properties
- static validateComponent(component, is_mixture=False)¶
Validate that an input component is valid for the formulation. If the component is for a simple formulation, the component name must be a valid SMILES string. If the component is for a mixture, the component name must NOT be a valid SMILES string.
- Parameters:
component (str) – The component string
is_mixture (bool) – Whether the component is from a mixture (complex formulation)
- Return type:
str
- Returns:
The component string
- Raises:
ValueError if the component is invalid
- validateHeader(header)¶
Validate the header of the csv file
- Parameters:
header (list) – The list of header values in the csv file
- Raises:
ValueError – If any of the required headers are not found
- static validateGrpCSVHeader(header, required_props)¶
Validate the header of groups csv file
- Parameters:
header (list) – The list of header values in the csv file
required_props (list) – The list of required properties
- Raises:
ValueError – If any of the required headers are not found
- getFormulationsFromCSV(csv_reader, skip_props=None)¶
Get the formulations from the CSV reader
- Parameters:
csv_reader (csv.DictReader) – The csv reader object
skip_props (list) – The list of properties to skip
- readCSVData(filename)¶
Read the data from the CSV file
- Parameters:
filename (str) – The filename of the CSV file
- Returns:
The list of formulations
- Return type:
list(FormulationData)
- readCSVIOData(csv_io, skip_props=None)¶
Read the data from the CSV StringIO object
- Parameters:
csv_io (io.StringIO) – The StringIO object of the CSV file
skip_props (list) – The list of properties to skip
- Returns:
The list of formulations
- Return type:
list(FormulationData)
- schrodinger.application.matsci.ml_formulations_utils.read_file_from_model(model, filename, match_basename=True)¶
Get the contents of a file that is inside the model
- Parameters:
model (str) – The path to the model
filename (str) – The name of the file to get from the model
match_basename (bool) – If True, match the basename of the filename, this is useful when searching for a file in roots. If False, match the filename as a part of the path, this is useful when searching for a file in a specific directory. For most of the testing the member name always has forward slashes, so do not use os.sep when full path
- Returns:
The file contents as a StringIO object
- Return type:
StringIO
- schrodinger.application.matsci.ml_formulations_utils.read_json_from_model(*args, **kwargs)¶
Get the Python objects from a json file that is inside the model
- Parameters:
model (str) – The path to the model
filename (str) – The name of the file to get from the model
match_basename (bool) – If True, match the basename of the filename, this is useful when searching for a file in roots. If False, match the filename as a part of the path, this is useful when searching for a file in a specific directory. For most of the testing the member name always has forward slashes, so do not use os.sep when full path
- Return type:
object or None
- Returns:
The contents of the json file converted to a Python object, or None if the file was empty or couldn’t be read
- schrodinger.application.matsci.ml_formulations_utils.read_csv_from_model(*args, **kwargs)¶
A generator for the contents of a csv file that is inside the model
- Parameters:
model (str) – The path to the model
filename (str) – The name of the file to get from the model
match_basename (bool) – If True, match the basename of the filename, this is useful when searching for a file in roots. If False, match the filename as a part of the path, this is useful when searching for a file in a specific directory. For most of the testing the member name always has forward slashes, so do not use os.sep when full path
- Ytype:
dict
- Yields:
Keys are CSV headers, values are the CSV row value for that column. A dict is yielded for each row of the file.
- schrodinger.application.matsci.ml_formulations_utils.find_filename_in_model(model, file_ending)¶
Find the name of a file in the model based on the file ending
- Parameters:
model (str) – The path to the model
file_ending – The ending to the file name of interest
- Return type:
str or None
- Returns:
The name of the first file found with that ending, or None if no such filename was found