schrodinger.active_learning.al_node module¶
- schrodinger.active_learning.al_node.estimate_time_cost(num_ligands, num_iter, train_size, train_time, num_score_license, num_autoqsar_license, available_cpu=None, score_per_ligand_cost=20, autoqsar_per_ligand_cost=0.02, num_rescore_ligand=0, multiplier=1.0, application='')[source]¶
Roughly estimate the time cost a active learning job based on the inputs and number of available licenses.
- Parameters
num_ligands (int) – total number of ligands in the library.
num_iter (int) – number of active learning iterations.
train_size (int) – Ligand_ML training size per iteration.
train_time (float) – Ligand_ML training time per iteration in hours.
num_score_license (int) – total number of the application licenses
num_autoqsar_license (int) – total number of AutoQSAR licenses
available_cpu (int) – number of available CPU
score_per_ligand_cost (float) – estimate time of of single ligand scoring time cost in second.
autoqsar_per_ligand_cost (float) – estimate time of of single ligand Ligand_ML time cost in second.
num_rescore_ligand – Number of ligands to be rescored.
multiplier (float) – estimate expansion number per ligand.
application (str) – name of the application that provides score
- Returns
estimate time cost in hour
- Return type
float
- schrodinger.active_learning.al_node.get_jobdj(host_list=None)[source]¶
Return JobDJ with specified host list
- Parameters
host_list ([(str, int)] or None) – A list of (<hostname>, <maximum_concurrent_subjobs>)
- Returns
JobDJ with specific settings.
- Return type
queue.JobDJ object
- schrodinger.active_learning.al_node.get_top_ligands_from_csv_list(csv_list, output_csv, num_ligands)[source]¶
Get the top ligands from a list of .csv files. Write the selected ligands to output csv file.
- Parameters
csv_list (list(str)) – list of .csv files containing the ligands.
output_csv (str) – name of output .csv file.
num_ligands (int) – number of ligands to select.
- class schrodinger.active_learning.al_node.ActiveLearningNode(iter_num=1, job_name='active_learning', job_dir='.')[source]¶
Bases:
object
- class schrodinger.active_learning.al_node.PrepareSmilesNode(args, iter_num, job_name, job_dir)[source]¶
Bases:
schrodinger.active_learning.al_node.ActiveLearningNode
- __init__(args, iter_num, job_name, job_dir)[source]¶
Initialize node for selecting ligands (SMILES) to be scored by ScoreProviderNode.
- static readScoredLigands(scored_csv_file_list)[source]¶
Read the ligands that were already scored by ScoreProviderNode.
- Parameters
scored_csv_file_list (list(str)) – list of ligand_ml training .csv files.
- Returns
set of titles of the scored ligands.
- Return type
set(str)
- checkOutcome(smi_file)[source]¶
Validate the generated SMILES file.
- Parameters
smi_file (str) – name of SMILES file to be validated.
- runNode(csv_list, active_learning_job, smi_file_name=None, **kwargs)[source]¶
Select ligands to be scored.
- Parameters
csv_list (list(str)) – list of csv files that contain candidate ligands.
active_learning_job (ActiveLearningJob instance.) – current active learning job.
smi_file_name (str) – SMILES file name that contains selected ligands.
- uncertaintySelect(smi_file_name, scored_csv_file_list, sample_size, y_index=None, **kwargs)[source]¶
Select random ligands from initial input csv or ligands with largest uncertainty from sorted ligand_ml .csv output.
- Parameters
smi_file_name (str) – SMILES file name that contains selected ligands.
scored_csv_file_list (list(str)) – list of ligand_ml training .csv file.
sample_size (int) – number of ligands to be sampled.
y_index (int) – column index of values to be sorted.
- randomSelect(smi_file_name, scored_csv_file_list, sample_size, sort=True, **kwargs)[source]¶
Select sample_size random ligands from input csv file(s).
- Parameters
smi_file_name (str) – SMILES file name that contains selected ligands.
scored_csv_file_list (list(str)) – list of ligand_ml training .csv file.
sample_size (int) – number of ligands to be sampled.
sort (bool) – Whether the csv files were sorted or initial inputs.
- diversitySelect(smi_file_name, scored_csv_file_list, sample_size, sort=True, **kwargs)[source]¶
Use combinatorial_diversity to select diverse ligands from input csv or sorted ligand_ml .csv output.
- Parameters
smi_file_name (str) – SMILES file name that contains selected ligands.
scored_csv_file_list (list(str)) – list of ligand_ml training .csv file.
sample_size (int) – number of ligands to be sampled.
sort (bool) – Whether the csv files were sorted or initial inputs..
- addOptionalRestartFiles(active_learning_job)¶
Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.
- Parameters
active_learning_job (ActiveLearningJob instance) – current AL driver
- classmethod getName(iter_num)¶
- class schrodinger.active_learning.al_node.ScoreProviderNode(iter_num, job_name, job_dir)[source]¶
Bases:
schrodinger.active_learning.al_node.ActiveLearningNode
- __init__(iter_num, job_name, job_dir)[source]¶
Initialize node for obtaining the score of each ligand (SMILES).
- checkOutcome(score_csv_file)[source]¶
Validate the .csv score file.
- Parameters
score_csv_file (str) – name of generated .csv score file.
- writeScoreCsv(title_to_score, output_csv)[source]¶
Write score to .csv file that ligand_ml needs for training
- Parameters
title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to score
output_csv – ligand_ml training .csv file.
output_csv – str
- addOptionalRestartFiles(active_learning_job)¶
Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.
- Parameters
active_learning_job (ActiveLearningJob instance) – current AL driver
- classmethod getName(iter_num)¶
- class schrodinger.active_learning.al_node.KnownScoreProviderNode(args, iter_num, job_name, job_dir)[source]¶
Bases:
schrodinger.active_learning.al_node.ScoreProviderNode
Class for obtaining the scores from external .csv file. This class is only used for the purpose of testing the performance active learning workflow.
- __init__(args, iter_num, job_name, job_dir)[source]¶
Initialize node for obtaining the score of each ligand (SMILES).
- runNode(smi_file_name, active_learning_job, score_csv_file=None)[source]¶
Read scores from active_learning_job.known_title_to_score.
- Parameters
smi_file_name (str) – SMILES file that contains the ligands to be scored.
active_learning_job (ActiveLearningJob instance.) – current active learning job.
score_csv_file (str) – ligand_ml training .csv file.
- addOptionalRestartFiles(active_learning_job)¶
Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.
- Parameters
active_learning_job (ActiveLearningJob instance) – current AL driver
- checkOutcome(score_csv_file)¶
Validate the .csv score file.
- Parameters
score_csv_file (str) – name of generated .csv score file.
- classmethod getName(iter_num)¶
- writeScoreCsv(title_to_score, output_csv)¶
Write score to .csv file that ligand_ml needs for training
- Parameters
title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to score
output_csv – ligand_ml training .csv file.
output_csv – str
- class schrodinger.active_learning.al_node.LigandMLTrainNode(args, iter_num, job_name, job_dir)[source]¶
Bases:
schrodinger.active_learning.al_node.ActiveLearningNode
Class for ligand_ml model generation.
- __init__(args, iter_num, job_name, job_dir)[source]¶
Initialize node for active learning workflow.
- Parameters
iter_num (int) – current active learning iteration number.
job_name (str) – active learning job name.
job_dir (str) – directory of where the jobs in the node will run.
- checkOutcome(model_file)[source]¶
Check whether ligand_ml model exist.
- Parameters
model_file (str) – name of ligand_ml .qzip model file
- createTrainingCsvFile(discard_cutoff, ascending=True)[source]¶
Generate .csv file for ligand_ml training
- Parameters
discard_cutoff (float) – score cutoff for excluding the ligands in ML training set.
ascending (bool) – lower value means better ligand if ascending is True
Generate training .csv file for ligand_ml model generation.
- runNode(active_learning_job)[source]¶
Perform ligand_ml training with all the scored ligands.
- Parameters
active_learning_job (ActiveLearningJob instance.) – current active learning job.
- addOptionalRestartFiles(active_learning_job)¶
Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.
- Parameters
active_learning_job (ActiveLearningJob instance) – current AL driver
- classmethod getName(iter_num)¶
- class schrodinger.active_learning.al_node.LigandMLEvalNode(args, iter_num, job_name, job_dir)[source]¶
Bases:
schrodinger.active_learning.al_node.ActiveLearningNode
Class for performing ligand_ml prediction with generated model.
- __init__(args, iter_num, job_name, job_dir)[source]¶
Initialize node for active learning workflow.
- Parameters
iter_num (int) – current active learning iteration number.
job_name (str) – active learning job name.
job_dir (str) – directory of where the jobs in the node will run.
- getBestResults(file_list, outfile, ascending=True)[source]¶
Get the best ligands (with lowest score) predicted by ligand_ml.
- Parameters
file_list (list(str)) – list of ligand_ml .csv output files. Each file is sorted by ligand_ml prediction score.
outfile (str) – .csv file that contains best ligands.
ascending (bool) – lower value means better ligand if ascending is True
- checkOutcome(pred_csv_list, uncertain_csv_list)[source]¶
Check the existence of ligand_ml prediction files.
- Parameters
pred_csv (list(str)) – list of ligand_ml prediction csv file(s)
uncertain_csv (list(str)) – list of ligand_ml prediction with uncertainty csv file(s).
- runNode(model_file, active_learning_job)[source]¶
Use the trained model to evaluate all the ligands.
- Parameters
model_file – ligand_ml .qzip model file.
model_file – str
active_learning_job (ActiveLearningJob instance.) – current active learning job.
- addOptionalRestartFiles(active_learning_job)¶
Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.
- Parameters
active_learning_job (ActiveLearningJob instance) – current AL driver
- classmethod getName(iter_num)¶
- class schrodinger.active_learning.al_node.ActiveLearningNodeSupplier(calculate_score_node, pilot_score_node, rescore_node, score_provider_node=<class 'schrodinger.active_learning.al_node.ScoreProviderNode'>, prepare_smi_node=<class 'schrodinger.active_learning.al_node.PrepareSmilesNode'>, known_score_provider_node=<class 'schrodinger.active_learning.al_node.KnownScoreProviderNode'>, ligand_ml_train_node=<class 'schrodinger.active_learning.al_node.LigandMLTrainNode'>, ligand_ml_eval_node=<class 'schrodinger.active_learning.al_node.LigandMLEvalNode'>)[source]¶
Bases:
object
- __init__(calculate_score_node, pilot_score_node, rescore_node, score_provider_node=<class 'schrodinger.active_learning.al_node.ScoreProviderNode'>, prepare_smi_node=<class 'schrodinger.active_learning.al_node.PrepareSmilesNode'>, known_score_provider_node=<class 'schrodinger.active_learning.al_node.KnownScoreProviderNode'>, ligand_ml_train_node=<class 'schrodinger.active_learning.al_node.LigandMLTrainNode'>, ligand_ml_eval_node=<class 'schrodinger.active_learning.al_node.LigandMLEvalNode'>)[source]¶