schrodinger.application.phase.packages.shape_screen_al_node module¶

Node Classes for Shape Screen active learning workflow. Each node is a stage of the workflow. The one iteration of the workflow is: Stage 1: PrepareSmilesNode will select the ligands to be scored for training purpose from the whole library Stage 2: ScoreProviderNode and CalculateScoreNode will calculate the scores for the selected ligands. Stage 3: LigandMLTrainNode will train a ligand_ml model with the scored ligands. Stage 4: LigandMlEvalNode will evaluate the whole library using the generated ligand_ml model in stage 3. Go to stage 1 for the next iteration if necessary. Stage 5: Rescore Stage (Optional). Run Shape Screen to rescore the top ligands picked in the LigandMlEvalNode of the last iteration.

schrodinger.application.phase.packages.shape_screen_al_node.estimate_time_cost(num_ligands, num_iter, train_size, train_time, num_shape_license, num_autoqsar_license, available_cpu=None, shape_per_ligand_cost=20, autoqsar_per_ligand_cost=0.02, num_rescore_ligand=0)¶

Roughly estimate the time cost a active learning Shape screen job based on the inputs and number of available licenses.

Parameters

num_ligands (int) – total number of ligands in the library.
num_iter (int) – number of active learning iterations.
train_size (int) – Ligand_ML training size per iteration.
train_time (float) – Ligand_ML training time per iteration in hours.
num_shape_license (int) – total number of Shape Screen licenses
num_autoqsar_license (int) – total number of AutoQSAR licenses
available_cpu (int) – number of available CPU
shape_per_ligand_cost (float) – estimate time of of single ligand docking time cost in second.
autoqsar_per_ligand_cost (float) – estimate time of of single ligand Ligand_ML time cost in second.
num_rescore_ligand – Number of ligands to be rescored by Shape Screen.

Returns

estimate time cost in hour

Return type

float

class schrodinger.application.phase.packages.shape_screen_al_node.ShapeScreenJob(ligands_file, query_mae_file, jobname, batch_size, job_dir='.', max_confs=50)¶

Bases: object

__init__(ligands_file, query_mae_file, jobname, batch_size, job_dir='.', max_confs=50)¶

Create a Phase Shape CPU job for screening .maegz file.

Parameters

ligands_file (str) – ligands .smi or .csv file
query_mae_file (str) – query .mae file
jobname (str) – Shape screen job name.
batch_size (int) – Number of ligands in each batch.
job_dir (str) – Shape screen job directory.
max_confs (bool) – Maximum number of conformations

getSmiAndShapeScores()¶

Get the smiles string and shape score per st in the .maegz.

Returns: a dict that maps ligand name to its shape score.
Return type: {str:float}

assertJobFinished(*expected_files)¶: Assert we have all the necessary files from Shape screen.

convertSmiToMae(file_name, csv_title_col=2)¶

Shape Screen requires the MAE files as input, so convert the SMI file to a MAE file.

Parameters: file_name (str) – SMI or CSV file name that contains selected ligands.
Returns: Filename of the generated MAE file
Return type: str

runShapeScreen()¶

Use Phase Shape CPU to screen ligands in MAE file using pharmacophore features.

Returns: shape result maegz file.
Return type: str, (str or None)

class schrodinger.application.phase.packages.shape_screen_al_node.CalculateScoreNode(args, iter_num, job_name, job_dir)¶

Bases: schrodinger.active_learning.al_node.ScoreProviderNode

Class for obtaining the scores from Shape Screen.

__init__(args, iter_num, job_name, job_dir)¶

Parameters

args (argparse.Namespace) – argument namespace with command line options
iter_num (int) – Iteration number
job_name (str) – Shape screen job name
job_dir (str) – Shape screen job directory

writeScoreCsv(title_to_score, output_csv)¶

Write score to .csv file that ligand_ml needs for training

Parameters

title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to Shape screen score
output_csv – ligand_ml training .csv file.
output_csv – str

runNode(smi_file_name, active_learning_job, score_csv_file=None)¶

Split .mae file, use ligprep to generate 3D conformation run Shape screening, collect results for ligand_ml training.

Parameters

mae_file_name (str) – MAE file that contains the ligands to be scored.
active_learning_job (ActiveLearningShapeJob instance.) – current active learning job.
score_csv_file (str) – ligand_ml training .csv file.

addOptionalRestartFiles(active_learning_job)¶

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters: active_learning_job (ActiveLearningJob instance) – current AL driver

checkOutcome(score_csv_file)¶

Validate the .csv score file.

Parameters: score_csv_file (str) – name of generated .csv score file.

classmethod getName(iter_num)¶

needsHistogram()¶

Whether we can generate a histogram plot of calculated target scores.

Returns: whether the histogram of score can be plotted
Return type: bool

class schrodinger.application.phase.packages.shape_screen_al_node.RescoreNode(args, iter_num, job_name, job_dir)¶

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

Class for using Shape Screen to rescore the best ligands predicted by ligand_ml.

__init__(args, iter_num, job_name, job_dir)¶

Initialize node for active learning workflow.

Parameters

iter_num (int) – current active learning iteration number.
job_name (str) – active learning job name.
job_dir (str) – directory of where the jobs in the node will run.

runNode(ligands_csv, active_learning_job, **kwargs)¶

Select ligands to be scored.

Parameters

ligands_csv (str) – csv file that contains candidate ligands.
active_learning_job (ActiveLearningShapeScreenJob instance.) – current active learning job.

checkOutcome(screen_mae_file)¶

Check the existence of Shape result files.

Parameters: screen_mae_file (str) – Shape .mae screen result file.

addOptionalRestartFiles(active_learning_job)¶

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters: active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)¶

needsHistogram()¶

Whether we can generate a histogram plot of calculated target scores.

Returns: whether the histogram of score can be plotted
Return type: bool

class schrodinger.application.phase.packages.shape_screen_al_node.PilotScoreNode(args, iter_num, job_name, job_dir)¶

Bases: schrodinger.active_learning.al_node.ActiveLearningNode

Class for selecting and docking the ligands for pilot study.

__init__(args, iter_num, job_name, job_dir)¶

Initialize node for active learning workflow.

Parameters

iter_num (int) – current active learning iteration number.
job_name (str) – active learning job name.
job_dir (str) – directory of where the jobs in the node will run.

runNode(csv_list, active_learning_job, **kwargs)¶

Select top pilot size ligands from the randomized input ligands. Dock all the selected ligands.

Parameters

csv_list (list(str)) – list of csv files that contain randomized input ligands.
active_learning_job (ActiveLearningShapeScreenJob instance.) – current active learning job.

checkOutcome()¶: Check the existence of ligand .csv file and Shape .csv result file.

addOptionalRestartFiles(active_learning_job)¶

Add node’s optional restart file(s) to driver’s restart dict. Dump the restart dict to the restart .pkl file.

Parameters: active_learning_job (ActiveLearningJob instance) – current AL driver

classmethod getName(iter_num)¶

needsHistogram()¶

Whether we can generate a histogram plot of calculated target scores.

Returns: whether the histogram of score can be plotted
Return type: bool