schrodinger.active_learning.al_node module

schrodinger.active_learning.al_node.estimate_time_cost(num_ligands, num_iter, train_size, train_time, num_score_license, num_autoqsar_license, available_cpu=None, score_per_ligand_cost=20, autoqsar_per_ligand_cost=0.02, num_rescore_ligand=0, multiplier=1.0, application='')

Roughly estimate the time cost a active learning job based on the inputs and number of available licenses.

Parameters
  • num_ligands (int) – total number of ligands in the library.

  • num_iter (int) – number of active learning iterations.

  • train_size (int) – Ligand_ML training size per iteration.

  • train_time (float) – Ligand_ML training time per iteration in hours.

  • num_score_license (int) – total number of the application licenses

  • num_autoqsar_license (int) – total number of AutoQSAR licenses

  • available_cpu (int) – number of available CPU

  • score_per_ligand_cost (float) – estimate time of of single ligand scoring time cost in second.

  • autoqsar_per_ligand_cost (float) – estimate time of of single ligand Ligand_ML time cost in second.

  • num_rescore_ligand – Number of ligands to be rescored.

  • multiplier (float) – estimate expansion number per ligand.

  • application (str) – name of the application that provides score

Returns

estimate time cost in hour

Return type

float

schrodinger.active_learning.al_node.get_jobdj(host_list=None)

Return JobDJ with specified host list

Parameters

host_list ([(str, int)] or None) – A list of (<hostname>, <maximum_concurrent_subjobs>)

Returns

JobDJ with specific settings.

Return type

queue.JobDJ object

schrodinger.active_learning.al_node.get_top_ligands_from_csv_list(csv_list, output_csv, num_ligands)

Get the top ligands from a list of .csv files. Write the selected ligands to output csv file.

Parameters
  • csv_list (list(str)) – list of .csv files containing the ligands.

  • output_csv (str) – name of output .csv file.

  • num_ligands (int) – number of ligands to select.

class schrodinger.active_learning.al_node.ActiveLearningNode(iter_num=1, job_name='active_learning', job_dir='.')

Bases: object

__init__(iter_num=1, job_name='active_learning', job_dir='.')

Initialize node for active learning workflow.

Parameters
  • iter_num (int) – current active learning iteration number.

  • job_name (str) – active learning job name.

  • job_dir (str) – directory of where the jobs in the node will run.

classmethod getName(iter_num)
addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.active_learning.al_node.PrepareSmilesNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

__init__(args, iter_num, job_name, job_dir)

Initialize node for selecting ligands (SMILES) to be scored by ScoreProviderNode.

checkOutcome(smi_file)

Validate the generated SMILES file.

Parameters

smi_file (str) – name of SMILES file to be validated.

runNode(csv_list, active_learning_job, smi_file_name=None, **kwargs)

Select ligands to be scored.

Parameters
  • csv_list (list(str)) – list of csv files that contain candidate ligands.

  • active_learning_job (ActiveLearningJob instance.) – current active learning job.

  • smi_file_name (str) – SMILES file name that contains selected ligands.

uncertaintySelect(smi_file_name, scored_csv_file_list, sample_size, **kwargs)

Select random ligands from initial input csv or ligands with largest uncertainty from sorted ligand_ml .csv output.

Parameters
  • smi_file_name (str) – SMILES file name that contains selected ligands.

  • scored_csv_file_list (list(str)) – list of ligand_ml training .csv file.

  • sample_size (int) – number of ligands to be sampled.

greedySelect(smi_file_name, scored_csv_file_list, sample_size, ascending=True, **kwargs)

Select top ligands from sorted ligand_ml .csv output. ligands in self.csv_list should be already sorted from best to worst.

Parameters
  • smi_file_name (str) – SMILES file name that contains selected ligands.

  • scored_csv_file_list (list(str)) – list of .csv files containing scored ligands

  • sample_size (int) – number of ligands to be sampled.

  • ascending (bool) – ligands with lower scores are better

randomSelect(smi_file_name, scored_csv_file_list, sample_size, sort=True, **kwargs)

Select sample_size random ligands from input csv file(s).

Parameters
  • smi_file_name (str) – SMILES file name that contains selected ligands.

  • scored_csv_file_list (list(str)) – list of ligand_ml training .csv file.

  • sample_size (int) – number of ligands to be sampled.

  • sort (bool) – Whether the csv files were sorted or initial inputs.

diversitySelect(smi_file_name, scored_csv_file_list, sample_size, sort=True, **kwargs)

Use combinatorial_diversity to select diverse ligands from input csv or sorted ligand_ml .csv output.

Parameters
  • smi_file_name (str) – SMILES file name that contains selected ligands.

  • scored_csv_file_list (list(str)) – list of ligand_ml training .csv file.

  • sample_size (int) – number of ligands to be sampled.

  • sort (bool) – Whether the csv files were sorted or initial inputs..

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.active_learning.al_node.ScoreProviderNode(iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

__init__(iter_num, job_name, job_dir)

Initialize node for obtaining the score of each ligand (SMILES).

checkOutcome(score_csv_file)

Validate the .csv score file.

Parameters

score_csv_file (str) – name of generated .csv score file.

writeScoreCsv(title_to_score, output_csv)

Write score to .csv file that ligand_ml needs for training

Parameters
  • title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to score

  • output_csv – ligand_ml training .csv file.

  • output_csv – str

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.active_learning.al_node.KnownScoreProviderNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ScoreProviderNode

Class for obtaining the scores from external .csv file. This class is only used for the purpose of testing the performance active learning workflow.

__init__(args, iter_num, job_name, job_dir)

Initialize node for obtaining the score of each ligand (SMILES).

runNode(smi_file_name, active_learning_job, score_csv_file=None)

Read scores from active_learning_job.known_title_to_score.

Parameters
  • smi_file_name (str) – SMILES file that contains the ligands to be scored.

  • active_learning_job (ActiveLearningJob instance.) – current active learning job.

  • score_csv_file (str) – ligand_ml training .csv file.

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

checkOutcome(score_csv_file)

Validate the .csv score file.

Parameters

score_csv_file (str) – name of generated .csv score file.

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

writeScoreCsv(title_to_score, output_csv)

Write score to .csv file that ligand_ml needs for training

Parameters
  • title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to score

  • output_csv – ligand_ml training .csv file.

  • output_csv – str

class schrodinger.active_learning.al_node.LigandMLTrainNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

Class for ligand_ml model generation.

__init__(args, iter_num, job_name, job_dir)

Initialize node for active learning workflow.

Parameters
  • iter_num (int) – current active learning iteration number.

  • job_name (str) – active learning job name.

  • job_dir (str) – directory of where the jobs in the node will run.

checkOutcome(model_file)

Check whether ligand_ml model exist.

Parameters

model_file (str) – name of ligand_ml .qzip model file

createTrainingCsvFile(discard_cutoff, ascending=True)

Generate .csv file for ligand_ml training

Parameters
  • discard_cutoff (float) – score cutoff for excluding the ligands in ML training set.

  • ascending (bool) – lower value means better ligand if ascending is True

Generate training .csv file for ligand_ml model generation.

runNode(active_learning_job)

Perform ligand_ml training with all the scored ligands. The model file includes the job_args.json file

Parameters

active_learning_job (ActiveLearningJob instance.) – current active learning job.

runLigandMLMerge(merged_model_name, sub_model_list, jobdj=None)

Merge list of .tar.gz ligand_ml models to single .tar.gz ligand_ml model.

Parameters
  • merged_model_name (str) – path of the final merged .tar.gz model.

  • sub_model_list (list(str)) – list of to be merged ligand_ml models.

  • jobdj (queue.JobDJ object or None) – JobDJ where the merging job runs.

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.active_learning.al_node.LigandMLEvalNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

Class for performing ligand_ml prediction with generated model.

__init__(args, iter_num, job_name, job_dir)

Initialize node for active learning workflow.

Parameters
  • iter_num (int) – current active learning iteration number.

  • job_name (str) – active learning job name.

  • job_dir (str) – directory of where the jobs in the node will run.

getBestResults(file_list, outfile, ascending=True)

Get the best ligands (with the lowest score) predicted by ligand_ml.

Parameters
  • file_list (list(str)) – list of ligand_ml .csv output files. Each file is sorted by ligand_ml prediction score.

  • outfile (str) – .csv file that contains the best ligands.

  • ascending (bool) – lower value means better ligand if ascending is True

checkOutcome(pred_csv_list, uncertain_csv_list)

Check the existence of ligand_ml prediction files.

Parameters
  • pred_csv (list(str)) – list of ligand_ml prediction csv file(s)

  • uncertain_csv (list(str)) – list of ligand_ml prediction with uncertainty csv file(s).

runNode(model_file, active_learning_job)

Use the trained model to evaluate all the ligands.

Parameters
  • model_file – ligand_ml .qzip model file.

  • model_file – str

  • active_learning_job (ActiveLearningJob instance.) – current active learning job.

evalMQ(tar_model_file, output_csv, active_learning_job)

Evaluate ligands with ligand_ml model using ZMQ. Distributes evaluation jobs over a set of workers. Reverts to jobdj if ZMQ fails.

Parameters
  • tar_model_file (str) – trained tar.gz model file

  • output_csv (str) – output prediction file

  • active_learning_job (ActiveLearningJob instance.) – current active learning job.

Returns

True if ZMQ job is successful

Return type

bool

evalDJ(tar_model_file, output_csv, active_learning_job)

Evaluate ligands with ligand_ml model using jobdj.

Parameters
  • tar_model_file (str) – tar.gz trained model file

  • output_csv (str) – output prediction file

  • active_learning_job (ActiveLearningJob instance.) – current active learning job.

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.active_learning.al_node.ActiveLearningNodeSupplier(calculate_score_node, pilot_score_node, rescore_node, score_provider_node=<class 'schrodinger.active_learning.al_node.ScoreProviderNode'>, prepare_smi_node=<class 'schrodinger.active_learning.al_node.PrepareSmilesNode'>, known_score_provider_node=<class 'schrodinger.active_learning.al_node.KnownScoreProviderNode'>, ligand_ml_train_node=<class 'schrodinger.active_learning.al_node.LigandMLTrainNode'>, ligand_ml_eval_node=<class 'schrodinger.active_learning.al_node.LigandMLEvalNode'>)

Bases: object

__init__(calculate_score_node, pilot_score_node, rescore_node, score_provider_node=<class 'schrodinger.active_learning.al_node.ScoreProviderNode'>, prepare_smi_node=<class 'schrodinger.active_learning.al_node.PrepareSmilesNode'>, known_score_provider_node=<class 'schrodinger.active_learning.al_node.KnownScoreProviderNode'>, ligand_ml_train_node=<class 'schrodinger.active_learning.al_node.LigandMLTrainNode'>, ligand_ml_eval_node=<class 'schrodinger.active_learning.al_node.LigandMLEvalNode'>)