schrodinger.application.phase.packages.shape_screen_al_node module¶
Node Classes for Shape Screen active learning workflow. Each node is a stage of the workflow. The one iteration of the workflow is: Stage 1: PrepareSmilesNode will select the ligands to be scored for training purpose from the whole library Stage 2: ScoreProviderNode and CalculateScoreNode will calculate the scores for the selected ligands. Stage 3: LigandMLTrainNode will train a ligand_ml model with the scored ligands. Stage 4: LigandMlEvalNode will evaluate the whole library using the generated ligand_ml model in stage 3. Go to stage 1 for the next iteration if necessary. Stage 5: Rescore Stage (Optional). Run Shape Screen to rescore the top ligands picked in the LigandMlEvalNode of the last iteration.
- schrodinger.application.phase.packages.shape_screen_al_node.estimate_time_cost(num_ligands, num_iter, train_size, train_time, num_shape_license, num_autoqsar_license, available_cpu=None, shape_per_ligand_cost=20, autoqsar_per_ligand_cost=0.02, num_rescore_ligand=0)¶
Roughly estimate the time cost a active learning Shape screen job based on the inputs and number of available licenses.
- Parameters
num_ligands (int) – total number of ligands in the library.
num_iter (int) – number of active learning iterations.
train_size (int) – Ligand_ML training size per iteration.
train_time (float) – Ligand_ML training time per iteration in hours.
num_shape_license (int) – total number of Shape Screen licenses
num_autoqsar_license (int) – total number of AutoQSAR licenses
available_cpu (int) – number of available CPU
shape_per_ligand_cost (float) – estimate time of of single ligand docking time cost in second.
autoqsar_per_ligand_cost (float) – estimate time of of single ligand Ligand_ML time cost in second.
num_rescore_ligand – Number of ligands to be rescored by Shape Screen.
- Returns
estimate time cost in hour
- Return type
float
- class schrodinger.application.phase.packages.shape_screen_al_node.ShapeScreenJob(ligands_file, query_mae_file, jobname, batch_size, job_dir='.', max_confs=50)¶
Bases:
object
- __init__(ligands_file, query_mae_file, jobname, batch_size, job_dir='.', max_confs=50)¶
Create a Phase Shape CPU job for screening .maegz file.
- Parameters
ligands_file (str) – ligands .smi or .csv file
query_mae_file (str) – query .mae file
jobname (str) – Shape screen job name.
batch_size (int) – Number of ligands in each batch.
job_dir (str) – Shape screen job directory.
max_confs (bool) – Maximum number of conformations
- getSmiAndShapeScores()¶
Get the smiles string and shape score per st in the .maegz.
- Returns
a dict that maps ligand name to its shape score.
- Return type
{str:float}
- assertJobFinished(*expected_files)¶
Assert we have all the necessary files from Shape screen.
- convertSmiToMae(file_name, csv_title_col=2)¶
Shape Screen requires the MAE files as input, so convert the SMI file to a MAE file.
- Parameters
file_name (str) – SMI or CSV file name that contains selected ligands.
- Returns
Filename of the generated MAE file
- Return type
str
- runShapeScreen()¶
Use Phase Shape CPU to screen ligands in MAE file using pharmacophore features.
- Returns
shape result maegz file.
- Return type
str, (str or None)
- class schrodinger.application.phase.packages.shape_screen_al_node.CalculateScoreNode(args, iter_num, job_name, job_dir)¶
Bases:
schrodinger.active_learning.al_node.ScoreProviderNode
Class for obtaining the scores from Shape Screen.
- __init__(args, iter_num, job_name, job_dir)¶
- Parameters
args (argparse.Namespace) – argument namespace with command line options
iter_num (int) – Iteration number
job_name (str) – Shape screen job name
job_dir (str) – Shape screen job directory
- writeScoreCsv(title_to_score, output_csv)¶
Write score to .csv file that ligand_ml needs for training
- Parameters
title_to_score (defaultdict(lambda : BAD_SCORE)) – dict that maps ligand title to Shape screen score
output_csv – ligand_ml training .csv file.
output_csv – str
- runNode(smi_file_name, active_learning_job, score_csv_file=None)¶
Split .mae file, use ligprep to generate 3D conformation run Shape screening, collect results for ligand_ml training.
- Parameters
mae_file_name (str) – MAE file that contains the ligands to be scored.
active_learning_job (ActiveLearningShapeJob instance.) – current active learning job.
score_csv_file (str) – ligand_ml training .csv file.
- class schrodinger.application.phase.packages.shape_screen_al_node.RescoreNode(args, iter_num, job_name, job_dir)¶
Bases:
schrodinger.active_learning.al_node.ActiveLearningNode
Class for using Shape Screen to rescore the best ligands predicted by ligand_ml.
- __init__(args, iter_num, job_name, job_dir)¶
Initialize node for active learning workflow.
- Parameters
iter_num (int) – current active learning iteration number.
job_name (str) – active learning job name.
job_dir (str) – directory of where the jobs in the node will run.
- runNode(ligands_csv, active_learning_job, **kwargs)¶
Select ligands to be scored.
- Parameters
ligands_csv (str) – csv file that contains candidate ligands.
active_learning_job (ActiveLearningShapeScreenJob instance.) – current active learning job.
- checkOutcome(screen_mae_file)¶
Check the existence of Shape result files.
- Parameters
screen_mae_file (str) – Shape .mae screen result file.
- class schrodinger.application.phase.packages.shape_screen_al_node.PilotScoreNode(args, iter_num, job_name, job_dir)¶
Bases:
schrodinger.active_learning.al_node.ActiveLearningNode
Class for selecting and docking the ligands for pilot study.
- __init__(args, iter_num, job_name, job_dir)¶
Initialize node for active learning workflow.
- Parameters
iter_num (int) – current active learning iteration number.
job_name (str) – active learning job name.
job_dir (str) – directory of where the jobs in the node will run.
- runNode(csv_list, active_learning_job, **kwargs)¶
Select top pilot size ligands from the randomized input ligands. Dock all the selected ligands.
- Parameters
csv_list (list(str)) – list of csv files that contain randomized input ligands.
active_learning_job (ActiveLearningShapeScreenJob instance.) – current active learning job.
- checkOutcome()¶
Check the existence of ligand .csv file and Shape .csv result file.