schrodinger.active_learning.al_utils module¶

schrodinger.active_learning.al_utils.positive_int(s)¶

ArgumentParser function to check whether input can be converted to positive integer.

Parameters: s (str) – input string
Returns: integer value of input string
Return type: int

schrodinger.active_learning.al_utils.split_smi_line(line)¶

Split a line from .smi file to SMILES pattern and title. Return empty list if line is empty.

Parameters: line (str) – line from .smi file
Returns: SMILES pattern, title
Return type: [str, str] or []

schrodinger.active_learning.al_utils.get_smi_header()¶

Create header for .smi input file. We assume the SMILES is in the first column and title in in the second column.

Returns: header list, header index for reordering SMILES and title
Return type: list(str), list(int)

schrodinger.active_learning.al_utils.get_csv_header(filename, smi_index, name_index, delimiter=',', with_header=True)¶

Create header for .csv input file. The reordered index will put SMILES at first column and title at the second column.

Parameters

filename (str) – .csv input file
smi_index (int) – column index of molecule SMILES
name_index (int) – column index of molecule name
delimiter (str) – delimiter of input csv files
with_header (bool) – Whether the file has header in its first line

Returns

header list, header index for reordering SMILES and title

Return type

list(str), list(int)

schrodinger.active_learning.al_utils.my_csv_reader(filename)¶

Yield a csv reader that skips the first line.

Parameters: filename (str) – .csv file name
Returns: csv.reader that skips first line of the file.
Return type: iterator

schrodinger.active_learning.al_utils.read_score(score_file)¶

Read known scores of ligands from args.score_file.

Returns: a dictionary that maps ligand title to ligand score.
Return type: dict

schrodinger.active_learning.al_utils.random_filtering(file_list, output_name, probability, random_seed=None, with_header=True)¶

Randomly select lines from entries in the file_list based on probability. The aim is to have a light-weight and ultrafast function to generate a subset for pilot runs.

Parameters

file_list (list) – paths of input files.
output_name (str) – name of the output file.
probability (float) – probablity of randomly select a line.
random_seed (int or None) – random seed number for shuffling the ligands
with_header (bool) – Whether input file(s) has header in its first line.

schrodinger.active_learning.al_utils.get_smiles_from_al_csv_or_smi_line(line)¶: get the SMILES from a smi file or an active learning csv file with smiles in the first column

schrodinger.active_learning.al_utils.reservoir_sampling(file_list, output_name, excluding_smiles=None, sample_size=100000, random_seed=None, with_header=True)¶

Randomly select sample_size of ligands from entries in the file_list. The aim is to have a light-weight and ultrafast function to generate a subset for pilot runs.

Parameters

file_list (list) – paths of input files.
output_name (str) – name of the output file.
excluding_smiles (container) – list of smiles to skip in the sampling.
sample_size (int) – number of ligands to sample.
random_seed (int or None) – random seed number for shuffling the ligands
with_header (bool) – Whether input file(s) has header in its first line.

schrodinger.active_learning.al_utils.random_split(file_list, num_ligands, prefix='splited', block_size=100000, name_index=0, smi_index=1, random_seed=None, delimiter=',', with_header=True)¶

Combine input files, shuffle lines, split into files with block_size line per file. Reorder the columns such that SMILES and name is in the first and second column respectively.

Parameters

file_list (list) – paths of input files.
num_ligands (int) – total number of ligands in all the input files.
prefix (str) – prefix of split files
block_size (int) – number of ligands in each sub .csv file.
name_index (int) – column index of molecule name
smi_index (int) – column index of molecule SMILES
random_seed (int or None) – random seed number for shuffling the ligands
delimiter (str) – delimiter of input csv files
with_header (bool) – Whether input file(s) has header in its first line.

Returns

list of split files, reordered csv header

Return type

list, list

schrodinger.active_learning.al_utils.convert_tar_gz_to_qzip_model(tar_gz_model, qzip_model, job_args_json)¶

Convert .tar.gz ligand_ml model to .qzip model.

Parameters: tar_gz_model (str) – input .tar.gz ligand_ml model file.
Param: qzip_model: output .qzip model file.

:param job_args_json : the file included the arguments needed for deepautoqsar :type job_args_json : str

schrodinger.active_learning.al_utils.convert_qzip_to_tar_gz_model(qzip_model)¶

Convert .qzip deepautoqsar model to .tar.gz ligand_ml model.

Parameters: qzip_model (str) – .qzip deepautoqsar model filename.
Returns: .tar.gz ligand_ml model filename.
Return type: str

schrodinger.active_learning.al_utils.get_file_ext(filename)¶

Get the extension of the file name. Skip ‘gz’ if it is a gz compressed file.

Parameters: filename (str) – name of the file.
Returns: ‘gz’ excluded extension of the file.
Return type: str

schrodinger.active_learning.al_utils.check_driver_disk_space(active_learning_job)¶

Estimate the driver disk usage of an active learning job with some assumed parameters. Print a warning is the available driver disk space is smaller than the estimate space.

Parameters: active_learning_job (ActiveLearningJob instance.) – current AL driver.

schrodinger.active_learning.al_utils.node_run_timer(func)¶: Decorator for timing the running time of runNode method in ActiveLearningNode

schrodinger.active_learning.al_utils.add_output_file(*output_files, incorporate=False)¶

Add files to jobcontrol output files.

Parameters

output_files (str) – files to be transferred.
incorporate (bool) – marked files for incorporation by maestro.

schrodinger.active_learning.al_utils.add_input_file(jsb, *input_files)¶

Check the existence of input file(s). Add it as jobcontrol input file if it exists, otherwise exit with error.

Parameters

jsb (launchapi.JobSpecificationArgsBuilder) – job specification builder
input_files (str) – input file(s) to be added.

schrodinger.active_learning.al_utils.concatenate_logs(combined_logfile, subjob_logfile_list, logger=None)¶

Combine subjob logfiles into single combined logfile.

Parameters

combined_logfile (str) – combined log file name
subjob_logfile_list (list(str)) – list of subjob logfile names to be combined.
logger (Logger or None) – logger for receiving the info and error message.

schrodinger.active_learning.al_utils.get_host_ncpu()¶

Return the host and number of CPU that should be used to submit subjobs. This function works both running under job control and not.

Return type: tuple[str, int]

schrodinger.active_learning.al_utils.is_hostname_valid(hostname)¶

Check whether hostname is correct in the host file.

Parameters: hostname (str) – the hostname to check against
Returns: Whether the hostname is defined in the host file.
Return type: bool

schrodinger.active_learning.al_utils.validate_input_files(input_files, remote_input_ligands=False, allowed_format=None)¶

Check the existence and format of input files. Return error message if validation failed, otherwise return None.

Parameters

input_files (list(str)) – paths of input files.
remote_input_ligands (bool) – Whether input ligand files are located at remote.
allowed_format (list or None) – allowed input file formats.

Returns

error message if validation failed; None if it passed

Return type

str or None

schrodinger.active_learning.al_utils.validate_input_mae(input_files, max_check=10)¶

Validate structure in input .mae/maegz file(s).

Parameters

input_files (list(str)) – list of path(s) of input .mae/.maegz file(s)
max_check (int) – maximum number of structures to validate.

Returns

error message if validation fails. None if validation passes.

Return type

str or None

schrodinger.active_learning.al_utils.validate_input_smiles(input_files, smi_index, name_index, with_header=True, max_check=10)¶

Validate SMILES in input files.

Parameters

input_files (list(str)) – paths of input files.
smi_index (int) – column index of molecule SMILES
name_index (int) – column index of molecule name
with_header (bool) – Whether the file has header in its first line
max_check (int) – maximum number of SMILES to validate.

Returns

error message if validation failed; None if it passed

Return type

str or None

schrodinger.active_learning.al_utils.store_mae_to_db(db_filename, mae_file_list)¶

Store structure in .mae files to a sqlite3 database.

Parameters

db_filename (str) – path of the sqlite3 database
mae_file_list (list(str)) – list of .mae files that contain the structures to be stored to the database

schrodinger.active_learning.al_utils.write_st_from_db_by_smiles(db_filename, out_mae_file, smi_list, chunk_size=500)¶

Extract the ligands’ structures from the database. Write the structures to the output .mae file.

Parameters

db_filename (str) – path of the sqlite3 database containing ligands’ structure
out_mae_file (str) – path of the output .mae/.maegz file
smi_list (list(str)) – list of input ligands’ SMILES
chunk_size (int) – number of SMILES in each query

schrodinger.active_learning.al_utils.add_file_to_aljob_restart_dict(active_learning_job, optional_restart_file, jobname)¶

Add a file to the optional_restart_files_dict of current active learning job. Only register the file with jobcontrol is active_learning_job is None.

Parameters

active_learning_job (ActiveLearningJob instance or None.) – current AL driver.
optional_restart_file (str) – path of the a file to be added
jobname (str) – key of the list that contains the optional_restart_file

schrodinger.active_learning.al_utils.read_scored_ligands(scored_csv_file_list)¶

Read the ligands that were already scored by ScoreProviderNode.

Parameters: scored_csv_file_list (list(str)) – list of ligand_ml training .csv files.
Returns: set of SMILES of the scored ligands.
Return type: set(str)

schrodinger.active_learning.al_utils.count_ligands(file_list, with_header=True)¶

Count the number of ligands in all the files by counting the total number of lines. We assume each line contains a SMILES string.

Parameters

file_list (list(str)) – list of input file paths.
with_header (bool) – Whether the input files have header.

Returns

Number of ligands in all the input files.

Return type

int

schrodinger.active_learning.al_utils.convert_csv_to_smi(csv_file, smi_file)¶

Convert a .csv ligand file to a .smi file. The SMILES and Title should be in the first and second columns of the .csv file respectively.

Parameters

csv_file (str) – path of the .csv file to be converted
smi_file (str) – path of the output .smi file

schrodinger.active_learning.al_utils.split_lig(lig_filename, output_prefix, batch_size)¶

Split structures in a .mae file to batches.

Parameters

lig_filename (str) – path of the .mae file to be splitted
output_prefix (str) – prefix of splitted output ligands
batch_size (int) – number ligands in each splitted .mae file

Returns

list of batched .mae files

Return type

list(str)

schrodinger.active_learning.al_utils.get_allowed_ncpu(user_specified_ncpu)¶

Return the number of allowed CPUs for a job.

Parameters: user_specified_ncpu (int or None) – user specified maximum number of CPUs
Returns: number of allowed CPUs
Return type: int

schrodinger.active_learning.al_utils.generate_mae_file_with_unique_title(input_mae_file, output_mae_file)¶

Convert the input .mae file to output .mae file that contains unique titles.

Parameters

input_mae_file (str) – path of input .mae file
output_mae_file (str) – path of output .mae file containing ligands with unique title

schrodinger.active_learning.al_utils.read_all_st_from_file(st_file)¶

Return all the structure in a file as a list.

Parameters: st_file (str) – path of the input file
Returns: list of structures in the file
Return type: list(structure.Structure)

schrodinger.active_learning.al_utils.my_file_exists(filename)¶: a version of os.path.isfile. Returns None if input is None instead of raising an error.

schrodinger.active_learning.al_utils.default_args(v: str, script_name: str) → dict¶: Return the default arguments for the given script. If None is passed as script_name, use AL-Glide default arguments

schrodinger.active_learning.al_utils.configure_mq_run(args: argparse.Namespace) → argparse.Namespace¶

schrodinger.active_learning.al_utils.package_TGC_models(tar_model_file, make_tarball=True)¶

Select TorchGraphConv models and save to new folder

Parameters

tar_model_file – directory containing original model
make_tarball – if True also create a tarball outfile.tar.gz

schrodinger.active_learning.al_utils.check_models(active_learning_job, tar_model_file)¶

Check if the model contains a TorchGraphConv model. If not, package only the TorchGraphConv models as a new tarball.

Parameters

active_learning_job (ActiveLearningJob instance.) – current active learning job.
tar_model_file (str) – trained tar.gz model file

Returns

None if the model contains a TorchGraphConv model, Packaged “slim” model tarball if needed, otherwise raise ValueError.