schrodinger.active_learning.al_utils module¶
- schrodinger.active_learning.al_utils.positive_int(s)¶
ArgumentParser function to check whether input can be converted to positive integer.
- Parameters
s (str) – input string
- Returns
integer value of input string
- Return type
int
- schrodinger.active_learning.al_utils.split_smi_line(line)¶
Split a line from .smi file to SMILES pattern and title. Return empty list if line is empty.
- Parameters
line (str) – line from .smi file
- Returns
SMILES pattern, title
- Return type
[str, str] or []
- schrodinger.active_learning.al_utils.get_smi_header()¶
Create header for .smi input file. We assume the SMILES is in the first column and title in in the second column.
- Returns
header list, header index for reordering SMILES and title
- Return type
list(str), list(int)
- schrodinger.active_learning.al_utils.get_csv_header(filename, smi_index, name_index, delimiter=',', with_header=True)¶
Create header for .csv input file. The reordered index will put SMILES at first column and title at the second column.
- Parameters
filename (str) – .csv input file
smi_index (int) – column index of molecule SMILES
name_index (int) – column index of molecule name
delimiter (str) – delimiter of input csv files
with_header (bool) – Whether the file has header in its first line
- Returns
header list, header index for reordering SMILES and title
- Return type
list(str), list(int)
- schrodinger.active_learning.al_utils.my_csv_reader(filename)¶
Yield a csv reader that skips the first line.
- Parameters
filename (str) – .csv file name
- Returns
csv.reader that skips first line of the file.
- Return type
iterator
- schrodinger.active_learning.al_utils.read_score(score_file)¶
Read known scores of ligands from args.score_file.
- Returns
a dictionary that maps ligand title to ligand score.
- Return type
dict
- schrodinger.active_learning.al_utils.random_filtering(file_list, output_name, probability, random_seed=None, with_header=True)¶
Randomly select lines from entries in the file_list based on probability. The aim is to have a light-weight and ultrafast function to generate a subset for pilot runs.
- Parameters
file_list (list) – paths of input files.
output_name (str) – name of the output file.
probability (float) – probablity of randomly select a line.
random_seed (int or None) – random seed number for shuffling the ligands
with_header (bool) – Whether input file(s) has header in its first line.
- schrodinger.active_learning.al_utils.get_smiles_from_al_csv_or_smi_line(line)¶
get the SMILES from a smi file or an active learning csv file with smiles in the first column
- schrodinger.active_learning.al_utils.reservoir_sampling(file_list, output_name, excluding_smiles=None, sample_size=100000, random_seed=None, with_header=True)¶
Randomly select sample_size of ligands from entries in the file_list. The aim is to have a light-weight and ultrafast function to generate a subset for pilot runs.
- Parameters
file_list (list) – paths of input files.
output_name (str) – name of the output file.
excluding_smiles (container) – list of smiles to skip in the sampling.
sample_size (int) – number of ligands to sample.
random_seed (int or None) – random seed number for shuffling the ligands
with_header (bool) – Whether input file(s) has header in its first line.
- schrodinger.active_learning.al_utils.random_split(file_list, num_ligands, prefix='splited', block_size=100000, name_index=0, smi_index=1, random_seed=None, delimiter=',', with_header=True)¶
Combine input files, shuffle lines, split into files with block_size line per file. Reorder the columns such that SMILES and name is in the first and second column respectively.
- Parameters
file_list (list) – paths of input files.
num_ligands (int) – total number of ligands in all the input files.
prefix (str) – prefix of split files
block_size (int) – number of ligands in each sub .csv file.
name_index (int) – column index of molecule name
smi_index (int) – column index of molecule SMILES
random_seed (int or None) – random seed number for shuffling the ligands
delimiter (str) – delimiter of input csv files
with_header (bool) – Whether input file(s) has header in its first line.
- Returns
list of split files, reordered csv header
- Return type
list, list
- schrodinger.active_learning.al_utils.convert_tar_gz_to_qzip_model(tar_gz_model, qzip_model, job_args_json)¶
Convert .tar.gz ligand_ml model to .qzip model.
- Parameters
tar_gz_model (str) – input .tar.gz ligand_ml model file.
- Param
qzip_model: output .qzip model file.
:param job_args_json : the file included the arguments needed for deepautoqsar :type job_args_json : str
- schrodinger.active_learning.al_utils.convert_qzip_to_tar_gz_model(qzip_model)¶
Convert .qzip deepautoqsar model to .tar.gz ligand_ml model.
- Parameters
qzip_model (str) – .qzip deepautoqsar model filename.
- Returns
.tar.gz ligand_ml model filename.
- Return type
str
- schrodinger.active_learning.al_utils.get_file_ext(filename)¶
Get the extension of the file name. Skip ‘gz’ if it is a gz compressed file.
- Parameters
filename (str) – name of the file.
- Returns
‘gz’ excluded extension of the file.
- Return type
str
- schrodinger.active_learning.al_utils.check_driver_disk_space(active_learning_job)¶
Estimate the driver disk usage of an active learning job with some assumed parameters. Print a warning is the available driver disk space is smaller than the estimate space.
- Parameters
active_learning_job (ActiveLearningJob instance.) – current AL driver.
- schrodinger.active_learning.al_utils.node_run_timer(func)¶
Decorator for timing the running time of runNode method in ActiveLearningNode
- schrodinger.active_learning.al_utils.add_output_file(*output_files, incorporate=False)¶
Add files to jobcontrol output files.
- Parameters
output_files (str) – files to be transferred.
incorporate (bool) – marked files for incorporation by maestro.
- schrodinger.active_learning.al_utils.add_input_file(jsb, *input_files)¶
Check the existence of input file(s). Add it as jobcontrol input file if it exists, otherwise exit with error.
- Parameters
jsb (launchapi.JobSpecificationArgsBuilder) – job specification builder
input_files (str) – input file(s) to be added.
- schrodinger.active_learning.al_utils.concatenate_logs(combined_logfile, subjob_logfile_list, logger=None)¶
Combine subjob logfiles into single combined logfile.
- Parameters
combined_logfile (str) – combined log file name
subjob_logfile_list (list(str)) – list of subjob logfile names to be combined.
logger (Logger or None) – logger for receiving the info and error message.
- schrodinger.active_learning.al_utils.get_host_ncpu()¶
Return the host and number of CPU that should be used to submit subjobs. This function works both running under job control and not.
- Return type
tuple[str, int]
- schrodinger.active_learning.al_utils.is_hostname_valid(hostname)¶
Check whether hostname is correct in the host file.
- Parameters
hostname (str) – the hostname to check against
- Returns
Whether the hostname is defined in the host file.
- Return type
bool
- schrodinger.active_learning.al_utils.validate_input_files(input_files, remote_input_ligands=False, allowed_format=None)¶
Check the existence and format of input files. Return error message if validation failed, otherwise return None.
- Parameters
input_files (list(str)) – paths of input files.
remote_input_ligands (bool) – Whether input ligand files are located at remote.
allowed_format (list or None) – allowed input file formats.
- Returns
error message if validation failed; None if it passed
- Return type
str or None
- schrodinger.active_learning.al_utils.validate_input_mae(input_files, max_check=10)¶
Validate structure in input .mae/maegz file(s).
- Parameters
input_files (list(str)) – list of path(s) of input .mae/.maegz file(s)
max_check (int) – maximum number of structures to validate.
- Returns
error message if validation fails. None if validation passes.
- Return type
str or None
- schrodinger.active_learning.al_utils.validate_input_smiles(input_files, smi_index, name_index, with_header=True, max_check=10)¶
Validate SMILES in input files.
- Parameters
input_files (list(str)) – paths of input files.
smi_index (int) – column index of molecule SMILES
name_index (int) – column index of molecule name
with_header (bool) – Whether the file has header in its first line
max_check (int) – maximum number of SMILES to validate.
- Returns
error message if validation failed; None if it passed
- Return type
str or None
- schrodinger.active_learning.al_utils.store_mae_to_db(db_filename, mae_file_list)¶
Store structure in .mae files to a sqlite3 database.
- Parameters
db_filename (str) – path of the sqlite3 database
mae_file_list (list(str)) – list of .mae files that contain the structures to be stored to the database
- schrodinger.active_learning.al_utils.write_st_from_db_by_smiles(db_filename, out_mae_file, smi_list, chunk_size=500)¶
Extract the ligands’ structures from the database. Write the structures to the output .mae file.
- Parameters
db_filename (str) – path of the sqlite3 database containing ligands’ structure
out_mae_file (str) – path of the output .mae/.maegz file
smi_list (list(str)) – list of input ligands’ SMILES
chunk_size (int) – number of SMILES in each query
- schrodinger.active_learning.al_utils.add_file_to_aljob_restart_dict(active_learning_job, optional_restart_file, jobname)¶
Add a file to the optional_restart_files_dict of current active learning job. Only register the file with jobcontrol is active_learning_job is None.
- Parameters
active_learning_job (ActiveLearningJob instance or None.) – current AL driver.
optional_restart_file (str) – path of the a file to be added
jobname (str) – key of the list that contains the optional_restart_file
- schrodinger.active_learning.al_utils.read_scored_ligands(scored_csv_file_list)¶
Read the ligands that were already scored by ScoreProviderNode.
- Parameters
scored_csv_file_list (list(str)) – list of ligand_ml training .csv files.
- Returns
set of SMILES of the scored ligands.
- Return type
set(str)
- schrodinger.active_learning.al_utils.count_ligands(file_list, with_header=True)¶
Count the number of ligands in all the files by counting the total number of lines. We assume each line contains a SMILES string.
- Parameters
file_list (list(str)) – list of input file paths.
with_header (bool) – Whether the input files have header.
- Returns
Number of ligands in all the input files.
- Return type
int
- schrodinger.active_learning.al_utils.convert_csv_to_smi(csv_file, smi_file)¶
Convert a .csv ligand file to a .smi file. The SMILES and Title should be in the first and second columns of the .csv file respectively.
- Parameters
csv_file (str) – path of the .csv file to be converted
smi_file (str) – path of the output .smi file
- schrodinger.active_learning.al_utils.split_lig(lig_filename, output_prefix, batch_size)¶
Split structures in a .mae file to batches.
- Parameters
lig_filename (str) – path of the .mae file to be splitted
output_prefix (str) – prefix of splitted output ligands
batch_size (int) – number ligands in each splitted .mae file
- Returns
list of batched .mae files
- Return type
list(str)
- schrodinger.active_learning.al_utils.get_allowed_ncpu(user_specified_ncpu)¶
Return the number of allowed CPUs for a job.
- Parameters
user_specified_ncpu (int or None) – user specified maximum number of CPUs
- Returns
number of allowed CPUs
- Return type
int
- schrodinger.active_learning.al_utils.generate_mae_file_with_unique_title(input_mae_file, output_mae_file)¶
Convert the input .mae file to output .mae file that contains unique titles.
- Parameters
input_mae_file (str) – path of input .mae file
output_mae_file (str) – path of output .mae file containing ligands with unique title
- schrodinger.active_learning.al_utils.read_all_st_from_file(st_file)¶
Return all the structure in a file as a list.
- Parameters
st_file (str) – path of the input file
- Returns
list of structures in the file
- Return type
list(structure.Structure)
- schrodinger.active_learning.al_utils.my_file_exists(filename)¶
a version of os.path.isfile. Returns None if input is None instead of raising an error.
- schrodinger.active_learning.al_utils.default_args(v: str, script_name: str) dict ¶
Return the default arguments for the given script. If None is passed as script_name, use AL-Glide default arguments
- schrodinger.active_learning.al_utils.configure_mq_run(args: argparse.Namespace) argparse.Namespace ¶
- schrodinger.active_learning.al_utils.package_TGC_models(tar_model_file, make_tarball=True)¶
Select TorchGraphConv models and save to new folder
- Parameters
tar_model_file – directory containing original model
make_tarball – if True also create a tarball outfile.tar.gz
- schrodinger.active_learning.al_utils.check_models(active_learning_job, tar_model_file)¶
Check if the model contains a TorchGraphConv model. If not, package only the TorchGraphConv models as a new tarball.
- Parameters
active_learning_job (ActiveLearningJob instance.) – current active learning job.
tar_model_file (str) – trained tar.gz model file
- Returns
None if the model contains a TorchGraphConv model, Packaged “slim” model tarball if needed, otherwise raise ValueError.