schrodinger.active_learning.al_utils module¶
- schrodinger.active_learning.al_utils.positive_int(s)¶
- ArgumentParser function to check whether input can be converted to positive integer. - Parameters
- s (str) – input string 
- Returns
- integer value of input string 
- Return type
- int 
 
- schrodinger.active_learning.al_utils.split_smi_line(line)¶
- Split a line from .smi file to SMILES pattern and title. Return empty list if line is empty. - Parameters
- line (str) – line from .smi file 
- Returns
- SMILES pattern, title 
- Return type
- [str, str] or [] 
 
- schrodinger.active_learning.al_utils.get_smi_header()¶
- Create header for .smi input file. We assume the SMILES is in the first column and title in in the second column. - Returns
- header list, header index for reordering SMILES and title 
- Return type
- list(str), list(int) 
 
- schrodinger.active_learning.al_utils.get_csv_header(filename, smi_index, name_index, delimiter=',', with_header=True)¶
- Create header for .csv input file. The reordered index will put SMILES at first column and title at the second column. - Parameters
- filename (str) – .csv input file 
- smi_index (int) – column index of molecule SMILES 
- name_index (int) – column index of molecule name 
- delimiter (str) – delimiter of input csv files 
- with_header (bool) – Whether the file has header in its first line 
 
- Returns
- header list, header index for reordering SMILES and title 
- Return type
- list(str), list(int) 
 
- schrodinger.active_learning.al_utils.my_csv_reader(filename)¶
- Yield a csv reader that skips the first line. - Parameters
- filename (str) – .csv file name 
- Returns
- csv.reader that skips first line of the file. 
- Return type
- iterator 
 
- schrodinger.active_learning.al_utils.read_score(score_file)¶
- Read known scores of ligands from args.score_file. - Returns
- a dictionary that maps ligand title to ligand score. 
- Return type
- dict 
 
- schrodinger.active_learning.al_utils.random_filtering(file_list, output_name, probability, random_seed=None, with_header=True)¶
- Randomly select lines from entries in the file_list based on probability. The aim is to have a light-weight and ultrafast function to generate a subset for pilot runs. - Parameters
- file_list (list) – paths of input files. 
- output_name (str) – name of the output file. 
- probability (float) – probablity of randomly select a line. 
- random_seed (int or None) – random seed number for shuffling the ligands 
- with_header (bool) – Whether input file(s) has header in its first line. 
 
 
- schrodinger.active_learning.al_utils.get_smiles_from_al_csv_or_smi_line(line)¶
- get the SMILES from a smi file or an active learning csv file with smiles in the first column 
- schrodinger.active_learning.al_utils.reservoir_sampling(file_list, output_name, excluding_smiles=None, sample_size=100000, random_seed=None, with_header=True)¶
- Randomly select sample_size of ligands from entries in the file_list. The aim is to have a light-weight and ultrafast function to generate a subset for pilot runs. - Parameters
- file_list (list) – paths of input files. 
- output_name (str) – name of the output file. 
- excluding_smiles (container) – list of smiles to skip in the sampling. 
- sample_size (int) – number of ligands to sample. 
- random_seed (int or None) – random seed number for shuffling the ligands 
- with_header (bool) – Whether input file(s) has header in its first line. 
 
 
- schrodinger.active_learning.al_utils.random_split(file_list, num_ligands, prefix='splited', block_size=100000, name_index=0, smi_index=1, random_seed=None, delimiter=',', with_header=True)¶
- Combine input files, shuffle lines, split into files with block_size line per file. Reorder the columns such that SMILES and name is in the first and second column respectively. - Parameters
- file_list (list) – paths of input files. 
- num_ligands (int) – total number of ligands in all the input files. 
- prefix (str) – prefix of split files 
- block_size (int) – number of ligands in each sub .csv file. 
- name_index (int) – column index of molecule name 
- smi_index (int) – column index of molecule SMILES 
- random_seed (int or None) – random seed number for shuffling the ligands 
- delimiter (str) – delimiter of input csv files 
- with_header (bool) – Whether input file(s) has header in its first line. 
 
- Returns
- list of split files, reordered csv header 
- Return type
- list, list 
 
- schrodinger.active_learning.al_utils.convert_tar_gz_to_qzip_model(tar_gz_model, qzip_model, job_args_json)¶
- Convert .tar.gz ligand_ml model to .qzip model. - Parameters
- tar_gz_model (str) – input .tar.gz ligand_ml model file. 
- Param
- qzip_model: output .qzip model file. 
 - :param job_args_json : the file included the arguments needed for deepautoqsar :type job_args_json : str 
- schrodinger.active_learning.al_utils.convert_qzip_to_tar_gz_model(qzip_model)¶
- Convert .qzip deepautoqsar model to .tar.gz ligand_ml model. - Parameters
- qzip_model (str) – .qzip deepautoqsar model filename. 
- Returns
- .tar.gz ligand_ml model filename. 
- Return type
- str 
 
- schrodinger.active_learning.al_utils.get_file_ext(filename)¶
- Get the extension of the file name. Skip ‘gz’ if it is a gz compressed file. - Parameters
- filename (str) – name of the file. 
- Returns
- ‘gz’ excluded extension of the file. 
- Return type
- str 
 
- schrodinger.active_learning.al_utils.check_driver_disk_space(active_learning_job)¶
- Estimate the driver disk usage of an active learning job with some assumed parameters. Print a warning is the available driver disk space is smaller than the estimate space. - Parameters
- active_learning_job (ActiveLearningJob instance.) – current AL driver. 
 
- schrodinger.active_learning.al_utils.node_run_timer(func)¶
- Decorator for timing the running time of runNode method in ActiveLearningNode 
- schrodinger.active_learning.al_utils.add_output_file(*output_files, incorporate=False)¶
- Add files to jobcontrol output files. - Parameters
- output_files (str) – files to be transferred. 
- incorporate (bool) – marked files for incorporation by maestro. 
 
 
- schrodinger.active_learning.al_utils.add_input_file(jsb, *input_files)¶
- Check the existence of input file(s). Add it as jobcontrol input file if it exists, otherwise exit with error. - Parameters
- jsb (launchapi.JobSpecificationArgsBuilder) – job specification builder 
- input_files (str) – input file(s) to be added. 
 
 
- schrodinger.active_learning.al_utils.concatenate_logs(combined_logfile, subjob_logfile_list, logger=None)¶
- Combine subjob logfiles into single combined logfile. - Parameters
- combined_logfile (str) – combined log file name 
- subjob_logfile_list (list(str)) – list of subjob logfile names to be combined. 
- logger (Logger or None) – logger for receiving the info and error message. 
 
 
- schrodinger.active_learning.al_utils.get_host_ncpu()¶
- Return the host and number of CPU that should be used to submit subjobs. This function works both running under job control and not. - Return type
- tuple[str, int] 
 
- schrodinger.active_learning.al_utils.is_hostname_valid(hostname)¶
- Check whether hostname is correct in the host file. - Parameters
- hostname (str) – the hostname to check against 
- Returns
- Whether the hostname is defined in the host file. 
- Return type
- bool 
 
- schrodinger.active_learning.al_utils.validate_input_files(input_files, remote_input_ligands=False, allowed_format=None)¶
- Check the existence and format of input files. Return error message if validation failed, otherwise return None. - Parameters
- input_files (list(str)) – paths of input files. 
- remote_input_ligands (bool) – Whether input ligand files are located at remote. 
- allowed_format (list or None) – allowed input file formats. 
 
- Returns
- error message if validation failed; None if it passed 
- Return type
- str or None 
 
- schrodinger.active_learning.al_utils.validate_input_mae(input_files, max_check=10)¶
- Validate structure in input .mae/maegz file(s). - Parameters
- input_files (list(str)) – list of path(s) of input .mae/.maegz file(s) 
- max_check (int) – maximum number of structures to validate. 
 
- Returns
- error message if validation fails. None if validation passes. 
- Return type
- str or None 
 
- schrodinger.active_learning.al_utils.validate_input_smiles(input_files, smi_index, name_index, with_header=True, max_check=10)¶
- Validate SMILES in input files. - Parameters
- input_files (list(str)) – paths of input files. 
- smi_index (int) – column index of molecule SMILES 
- name_index (int) – column index of molecule name 
- with_header (bool) – Whether the file has header in its first line 
- max_check (int) – maximum number of SMILES to validate. 
 
- Returns
- error message if validation failed; None if it passed 
- Return type
- str or None 
 
- schrodinger.active_learning.al_utils.store_mae_to_db(db_filename, mae_file_list)¶
- Store structure in .mae files to a sqlite3 database. - Parameters
- db_filename (str) – path of the sqlite3 database 
- mae_file_list (list(str)) – list of .mae files that contain the structures to be stored to the database 
 
 
- schrodinger.active_learning.al_utils.write_st_from_db_by_smiles(db_filename, out_mae_file, smi_list, chunk_size=500)¶
- Extract the ligands’ structures from the database. Write the structures to the output .mae file. - Parameters
- db_filename (str) – path of the sqlite3 database containing ligands’ structure 
- out_mae_file (str) – path of the output .mae/.maegz file 
- smi_list (list(str)) – list of input ligands’ SMILES 
- chunk_size (int) – number of SMILES in each query 
 
 
- schrodinger.active_learning.al_utils.add_file_to_aljob_restart_dict(active_learning_job, optional_restart_file, jobname)¶
- Add a file to the optional_restart_files_dict of current active learning job. Only register the file with jobcontrol is active_learning_job is None. - Parameters
- active_learning_job (ActiveLearningJob instance or None.) – current AL driver. 
- optional_restart_file (str) – path of the a file to be added 
- jobname (str) – key of the list that contains the optional_restart_file 
 
 
- schrodinger.active_learning.al_utils.read_scored_ligands(scored_csv_file_list)¶
- Read the ligands that were already scored by ScoreProviderNode. - Parameters
- scored_csv_file_list (list(str)) – list of ligand_ml training .csv files. 
- Returns
- set of SMILES of the scored ligands. 
- Return type
- set(str) 
 
- schrodinger.active_learning.al_utils.count_ligands(file_list, with_header=True)¶
- Count the number of ligands in all the files by counting the total number of lines. We assume each line contains a SMILES string. - Parameters
- file_list (list(str)) – list of input file paths. 
- with_header (bool) – Whether the input files have header. 
 
- Returns
- Number of ligands in all the input files. 
- Return type
- int 
 
- schrodinger.active_learning.al_utils.convert_csv_to_smi(csv_file, smi_file)¶
- Convert a .csv ligand file to a .smi file. The SMILES and Title should be in the first and second columns of the .csv file respectively. - Parameters
- csv_file (str) – path of the .csv file to be converted 
- smi_file (str) – path of the output .smi file 
 
 
- schrodinger.active_learning.al_utils.split_lig(lig_filename, output_prefix, batch_size)¶
- Split structures in a .mae file to batches. - Parameters
- lig_filename (str) – path of the .mae file to be splitted 
- output_prefix (str) – prefix of splitted output ligands 
- batch_size (int) – number ligands in each splitted .mae file 
 
- Returns
- list of batched .mae files 
- Return type
- list(str) 
 
- schrodinger.active_learning.al_utils.get_allowed_ncpu(user_specified_ncpu)¶
- Return the number of allowed CPUs for a job. - Parameters
- user_specified_ncpu (int or None) – user specified maximum number of CPUs 
- Returns
- number of allowed CPUs 
- Return type
- int 
 
- schrodinger.active_learning.al_utils.generate_mae_file_with_unique_title(input_mae_file, output_mae_file)¶
- Convert the input .mae file to output .mae file that contains unique titles. - Parameters
- input_mae_file (str) – path of input .mae file 
- output_mae_file (str) – path of output .mae file containing ligands with unique title 
 
 
- schrodinger.active_learning.al_utils.read_all_st_from_file(st_file)¶
- Return all the structure in a file as a list. - Parameters
- st_file (str) – path of the input file 
- Returns
- list of structures in the file 
- Return type
- list(structure.Structure) 
 
- schrodinger.active_learning.al_utils.my_file_exists(filename)¶
- a version of os.path.isfile. Returns None if input is None instead of raising an error. 
- schrodinger.active_learning.al_utils.default_args(v: str, script_name: str) dict¶
- Return the default arguments for the given script. If None is passed as script_name, use AL-Glide default arguments 
- schrodinger.active_learning.al_utils.configure_mq_run(args: argparse.Namespace) argparse.Namespace¶
- schrodinger.active_learning.al_utils.package_TGC_models(tar_model_file, make_tarball=True)¶
- Select TorchGraphConv models and save to new folder - Parameters
- tar_model_file – directory containing original model 
- make_tarball – if True also create a tarball outfile.tar.gz 
 
 
- schrodinger.active_learning.al_utils.check_models(active_learning_job, tar_model_file)¶
- Check if the model contains a TorchGraphConv model. If not, package only the TorchGraphConv models as a new tarball. - Parameters
- active_learning_job (ActiveLearningJob instance.) – current active learning job. 
- tar_model_file (str) – trained tar.gz model file 
 
- Returns
- None if the model contains a TorchGraphConv model, Packaged “slim” model tarball if needed, otherwise raise ValueError.