schrodinger.application.matsci.ml_formulations_utils module

Classes and functions to help with ML formulation-based workflows.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.application.matsci.ml_formulations_utils.validate_smiles(smiles)

A cached function that wraps the adapter.validate_smarts function to validate the smiles string. NOTE: the ML formulations backend uses a dummy molecule ‘[1O]’ as a placeholder for empty components, we should reject that SMILES even though it is valid according to rdkit

Parameters:

smiles (str) – The smiles string to validate

Returns:

True if the smiles is valid, False otherwise

Return type:

bool

schrodinger.application.matsci.ml_formulations_utils.add_tracking_index_to_csv(csv_file)

Add a tracking index to the CSV file, which is the row number of csv file

Parameters:

csv_file (str) – The CSV file to add the tracking index to

Returns:

The CSV file with the tracking index added

Return type:

str

schrodinger.application.matsci.ml_formulations_utils.merge_input_data_to_predicted(input_csv_file, predicted_csv_file)

Merge the input data to the predicted data based on the tracking index

Parameters:
  • input_csv_file (str) – The input CSV file to merge

  • predicted_csv_file (str) – The predicted CSV file to merge

schrodinger.application.matsci.ml_formulations_utils.check_model_version(model)

Get the release version stored in the release_version.txt file in the model and check if it matches the current release version.

Parameters:

model (str) – Path to the model file

Returns:

Whether the release versions match

Return type:

bool

class schrodinger.application.matsci.ml_formulations_utils.BaseCSVReader

Bases: object

Base class for reading formulation CSV files

__init__()

Create an instance

setRequiredProps(required_props)

Set the required properties for reading the CSV.

Parameters:

required_props (list) – The list of required properties that must be present in the csv file. If None, there will be no requirement of properties

setGrpRequiredProps(required_props)

Set the required properties for reading the mixtures CSV.

Parameters:

required_props (list) – The list of required properties that must be present in the csv file. If None, there will be no requirement of properties

static validateComponent(component, is_mixture=False)

Validate that an input component is valid for the formulation. If the component is for a simple formulation, the component name must be a valid SMILES string. If the component is for a mixture, the component name must NOT be a valid SMILES string.

Parameters:
  • component (str) – The component string

  • is_mixture (bool) – Whether the component is from a mixture (complex formulation)

Return type:

str

Returns:

The component string

Raises:

ValueError if the component is invalid

validateHeader(header)

Validate the header of the csv file

Parameters:

header (list) – The list of header values in the csv file

Raises:

ValueError – If any of the required headers are not found

static validateGrpCSVHeader(header, required_props)

Validate the header of groups csv file

Parameters:
  • header (list) – The list of header values in the csv file

  • required_props (list) – The list of required properties

Raises:

ValueError – If any of the required headers are not found

getFormulationsFromCSV(csv_reader, skip_props=None)

Get the formulations from the CSV reader

Parameters:
  • csv_reader (csv.DictReader) – The csv reader object

  • skip_props (list) – The list of properties to skip

readCSVData(filename)

Read the data from the CSV file

Parameters:

filename (str) – The filename of the CSV file

Returns:

The list of formulations

Return type:

list(FormulationData)

readCSVIOData(csv_io, skip_props=None)

Read the data from the CSV StringIO object

Parameters:
  • csv_io (io.StringIO) – The StringIO object of the CSV file

  • skip_props (list) – The list of properties to skip

Returns:

The list of formulations

Return type:

list(FormulationData)

schrodinger.application.matsci.ml_formulations_utils.read_file_from_model(model, filename, match_basename=True)

Get the contents of a file that is inside the model

Parameters:
  • model (str) – The path to the model

  • filename (str) – The name of the file to get from the model

  • match_basename (bool) – If True, match the basename of the filename, this is useful when searching for a file in roots. If False, match the filename as a part of the path, this is useful when searching for a file in a specific directory. For most of the testing the member name always has forward slashes, so do not use os.sep when full path

Returns:

The file contents as a StringIO object

Return type:

StringIO

schrodinger.application.matsci.ml_formulations_utils.read_json_from_model(*args, **kwargs)

Get the Python objects from a json file that is inside the model

Parameters:
  • model (str) – The path to the model

  • filename (str) – The name of the file to get from the model

  • match_basename (bool) – If True, match the basename of the filename, this is useful when searching for a file in roots. If False, match the filename as a part of the path, this is useful when searching for a file in a specific directory. For most of the testing the member name always has forward slashes, so do not use os.sep when full path

Return type:

object or None

Returns:

The contents of the json file converted to a Python object, or None if the file was empty or couldn’t be read

schrodinger.application.matsci.ml_formulations_utils.read_csv_from_model(*args, **kwargs)

A generator for the contents of a csv file that is inside the model

Parameters:
  • model (str) – The path to the model

  • filename (str) – The name of the file to get from the model

  • match_basename (bool) – If True, match the basename of the filename, this is useful when searching for a file in roots. If False, match the filename as a part of the path, this is useful when searching for a file in a specific directory. For most of the testing the member name always has forward slashes, so do not use os.sep when full path

Ytype:

dict

Yields:

Keys are CSV headers, values are the CSV row value for that column. A dict is yielded for each row of the file.

schrodinger.application.matsci.ml_formulations_utils.find_filename_in_model(model, file_ending)

Find the name of a file in the model based on the file ending

Parameters:
  • model (str) – The path to the model

  • file_ending – The ending to the file name of interest

Return type:

str or None

Returns:

The name of the first file found with that ending, or None if no such filename was found