schrodinger.application.phase.packages.shape_screen_al_node module

Node Classes for Shape Screen active learning workflow. Each node is a stage of the workflow. The one iteration of the workflow is: Stage 1: PrepareSmilesNode will select the ligands to be scored for training purpose from the whole library Stage 2: ScoreProviderNode and CalculateScoreNode will calculate the scores for the selected ligands. Stage 3: LigandMLTrainNode will train a ligand_ml model with the scored ligands. Stage 4: LigandMlEvalNode will evaluate the whole library using the generated ligand_ml model in stage 3. Go to stage 1 for the next iteration if necessary. Stage 5: Rescore Stage (Optional). Run Shape Screen to rescore the top ligands picked in the LigandMlEvalNode of the last iteration.

schrodinger.application.phase.packages.shape_screen_al_node.estimate_time_cost(num_ligands, num_iter, train_size, train_time, num_shape_license, num_autoqsar_license, available_cpu=None, shape_per_ligand_cost=20, autoqsar_per_ligand_cost=0.02, num_rescore_ligand=0)

Roughly estimate the time cost a active learning Shape screen job based on the inputs and number of available licenses.

Parameters
  • num_ligands (int) – total number of ligands in the library.

  • num_iter (int) – number of active learning iterations.

  • train_size (int) – Ligand_ML training size per iteration.

  • train_time (float) – Ligand_ML training time per iteration in hours.

  • num_shape_license (int) – total number of Shape Screen licenses

  • num_autoqsar_license (int) – total number of AutoQSAR licenses

  • available_cpu (int) – number of available CPU

  • shape_per_ligand_cost (float) – estimate time of of single ligand docking time cost in second.

  • autoqsar_per_ligand_cost (float) – estimate time of of single ligand Ligand_ML time cost in second.

  • num_rescore_ligand – Number of ligands to be rescored by Shape Screen.

Returns

estimate time cost in hour

Return type

float

class schrodinger.application.phase.packages.shape_screen_al_node.ShapeScreenJob(ligands_file, query_mae_file, jobname, batch_size, job_dir='.', max_confs=50)

Bases: object

__init__(ligands_file, query_mae_file, jobname, batch_size, job_dir='.', max_confs=50)

Create a Phase Shape CPU job for screening .maegz file.

Parameters
  • ligands_file (str) – ligands .smi or .csv file

  • query_mae_file (str) – query .mae file

  • jobname (str) – Shape screen job name.

  • batch_size (int) – Number of ligands in each batch.

  • job_dir (str) – Shape screen job directory.

  • max_confs (bool) – Maximum number of conformations

getSmiAndShapeScores()

Get the smiles string and shape score per st in the .maegz.

Returns

a dict that maps ligand name to its shape score.

Return type

{str:float}

assertJobFinished(*expected_files)

Assert we have all the necessary files from Shape screen.

convertSmiToMae(file_name, csv_title_col=2)

Shape Screen requires the MAE files as input, so convert the SMI file to a MAE file.

Parameters

file_name (str) – SMI or CSV file name that contains selected ligands.

Returns

Filename of the generated MAE file

Return type

str

runShapeScreen()

Use Phase Shape CPU to screen ligands in MAE file using pharmacophore features.

Returns

shape result maegz file.

Return type

str, (str or None)

class schrodinger.application.phase.packages.shape_screen_al_node.CalculateScoreNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ScoreProviderNode

Class for obtaining the scores from Shape Screen.

__init__(args, iter_num, job_name, job_dir)
Parameters
  • args (argparse.Namespace) – argument namespace with command line options

  • iter_num (int) – Iteration number

  • job_name (str) – Shape screen job name

  • job_dir (str) – Shape screen job directory

writeScoreCsv(title_to_score, output_csv)

Write score to .csv file that ligand_ml needs for training

Parameters
  • title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to Shape screen score

  • output_csv – ligand_ml training .csv file.

  • output_csv – str

runNode(smi_file_name, active_learning_job, score_csv_file=None)

Split .mae file, use ligprep to generate 3D conformation run Shape screening, collect results for ligand_ml training.

Parameters
  • mae_file_name (str) – MAE file that contains the ligands to be scored.

  • active_learning_job (ActiveLearningShapeJob instance.) – current active learning job.

  • score_csv_file (str) – ligand_ml training .csv file.

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

checkOutcome(score_csv_file)

Validate the .csv score file.

Parameters

score_csv_file (str) – name of generated .csv score file.

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.application.phase.packages.shape_screen_al_node.RescoreNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

Class for using Shape Screen to rescore the best ligands predicted by ligand_ml.

__init__(args, iter_num, job_name, job_dir)

Initialize node for active learning workflow.

Parameters
  • iter_num (int) – current active learning iteration number.

  • job_name (str) – active learning job name.

  • job_dir (str) – directory of where the jobs in the node will run.

runNode(ligands_csv, active_learning_job, **kwargs)

Select ligands to be scored.

Parameters
  • ligands_csv (str) – csv file that contains candidate ligands.

  • active_learning_job (ActiveLearningShapeScreenJob instance.) – current active learning job.

checkOutcome(screen_mae_file)

Check the existence of Shape result files.

Parameters

screen_mae_file (str) – Shape .mae screen result file.

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool

class schrodinger.application.phase.packages.shape_screen_al_node.PilotScoreNode(args, iter_num, job_name, job_dir)

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

Class for selecting and docking the ligands for pilot study.

__init__(args, iter_num, job_name, job_dir)

Initialize node for active learning workflow.

Parameters
  • iter_num (int) – current active learning iteration number.

  • job_name (str) – active learning job name.

  • job_dir (str) – directory of where the jobs in the node will run.

runNode(csv_list, active_learning_job, **kwargs)

Select top pilot size ligands from the randomized input ligands. Dock all the selected ligands.

Parameters
  • csv_list (list(str)) – list of csv files that contain randomized input ligands.

  • active_learning_job (ActiveLearningShapeScreenJob instance.) – current active learning job.

checkOutcome()

Check the existence of ligand .csv file and Shape .csv result file.

addOptionalRestartFiles(active_learning_job)

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters

active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)
needsHistogram()

Whether we can generate a histogram plot of calculated target scores.

Returns

whether the histogram of score can be plotted

Return type

bool