schrodinger.active_learning.al_report module

schrodinger.active_learning.al_report.get_ligand_ml_metric(ligand_ml_model_file: str) Tuple

Extract the test set metrics, test set labels and predictions from ligand_ml model file.

Parameters

ligand_ml_model_file – ligand_ml .qzip model file.

Returns

r2, mae, rmse, labels and prediction of the test set

schrodinger.active_learning.al_report.make_train_report(ligand_ml_model_file: str, report_path: str, iter_num: int)

Generate a pdf file that records the test set metrics of the ligand_ml model.

Parameters
  • ligand_ml_model_file – ligand_ml .qzip model file.

  • report_path – path of the pdf report

  • iter_num – current iteration number

schrodinger.active_learning.al_report.make_score_hist_plot(name_to_scores: Dict, x_label: str, plot_path: str)

Generate a bar-type histogram plot which contains multiple data. The names of the data are used as the legends.

Parameters
  • name_to_scores – a dictionary where keys are used as the legends and values are the numbers to be plotted.

  • x_label – label of x-axis

  • plot_path – path of the saved plot

schrodinger.active_learning.al_report.perform_ttest(scores_by_node: Dict, alpha: float = 0.05, equal_var=False) Dict

Perform unpaired t-test on two subsequent iterations of scores. If we accept the null hypothesis, the two iterations have statistically identical averages. If we reject the null hypothesis, the averages of the two iterations are statistically different. We assume sets of scores have unequal variances and uneven samples sizes.

Parameters
  • scores_by_node – map of node name to scores

  • alpha – significance level

  • equal_var – if True, perform a standard independent 2 sample test that assumes equal population variances and sample sizes

Returns

node name to t-test results

schrodinger.active_learning.al_report.calculate_kstest_results(scores_by_node: Dict, target: str) Dict

Calculate one-sided KS test for scores between two iterations

Parameters
  • scores_by_node – map of node name to scores

  • target – optimization target

Returns

node name to KS test results

schrodinger.active_learning.al_report.perform_ks_test(reference_scores: List, scores: List, target: str, alpha=0.05) Tuple

Perform one-sided KS (Kolmogorov-Smirnov) test on two subsequent iterations of scores.

In a one-sided two-sample KS test, we compare the cumulative distribution functions of two sets of scores to determine if the previous iteration’s CDF distribution is stochastically greater or less than the next iteration in a specific direction. For more details see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html

Interpreting the outcome of the KS test: dG FEP or Glide scores: ‘reject’ means that scores from the next iteration are statistically favored to be lower given a p-value less than the significance level alpha.

All other targets: ‘reject’ means that scores from the next iteration are statistically favored to be higher given a p-value less than the significance level alpha.

Parameters
  • reference_scores – list of scores from an iteration to use as reference

  • scores – list of scores from the iteration that follows reference_scores

  • target – optimization target

  • alpha – significance level

Returns

KS statistic, p-value, significance level, outcome of the test

schrodinger.active_learning.al_report.get_image(path: str, width: float = 72.0) reportlab.platypus.flowables.Image

Convert image file to reportlab image object that has the same aspect ratio and specified width.

Parameters
  • path – path of the image file.

  • width (float) – width of the reportlab image.

Returns

reportlab image

schrodinger.active_learning.al_report.get_report_maker(active_learning_job)

Get corresponding report maker for the active learning job. It returns None for evaluate task since we do not have report for it yet.

Parameters

active_learning_job (ActiveLearningJob) – active learning job to be processed.

Returns

corresponding report maker

Return type

ALPilotReportMaker

schrodinger.active_learning.al_report.get_time_cost(nodes: Dict, node_name: str) str

Return the time cost of a node. It returns ‘Unavailable’ if the time cost is not available.

Parameters
  • nodes – dict that maps node name to node object.

  • node_name – name of the active learning node of interest.

Returns

time cost in h/m/s format.

schrodinger.active_learning.al_report.get_score_pred_as_array(title_to_score: Dict, pred_score_file: str, discard_cutoff: float, ascending: bool = True) numpy.ndarray

Return the score, predicted score, prediction uncertainty of the ligands as the N X 3 numpy array.

Parameters
  • title_to_score – dict that maps ligand title to score.

  • pred_score_file – path of the ligand ml prediction .csv file.

  • discard_cutoff – score cutoff for excluding the ligands in ML training set.

  • ascending – lower value means better ligand if ascending is True

Returns

numpy array of (num_of_ligands X (score, pred, uncertain))

schrodinger.active_learning.al_report.calculate_recovery_ratio(label_pred: numpy.ndarray, top_ratio: float) numpy.ndarray

Calculate the recovery ratio of the best ligands based on label in different numbers of the top ligands predicted by ligand_ml. More negative value means better ligand.

Parameters
  • label_pred – array of dim number of ligands X 2 contains the (label, prediction)

  • top_ratio – top ratio of the ligands by label.

Returns

(screen ratio, recovery ratio of top ligands defined by top_ratio) of all the ligands.

schrodinger.active_learning.al_report.plot_correlation(x: List, y: List, fname: str, x_label: str, y_label: str, title: str) None

Generate a simple correlation scatter plot :param x_label: label of x-axis :param y_label: label of y-axis :param fname: path of the saved plot :param x: list of x values :param y: list of y values :param title: title of the plot

schrodinger.active_learning.al_report.plot_regression(y_true: numpy.ndarray, y_pred: numpy.ndarray, fname: str)

Generate regression plot. This function is sightly modified from ligand_ml/plotting.py to change the labels of axis.

Parameters
  • y_true – 1D array test set label.

  • y_pred – 2D array of ligand_ml prediction and uncertainty

  • fname – filename to save the image

schrodinger.active_learning.al_report.plot_recovery(recovery_results: Dict, fname: str)

Generate and save recovery plot image.

Parameters
  • recovery_results – dict that maps top ratio to the recovery ratio numpy array.

  • fname – path of the saved image.

schrodinger.active_learning.al_report.make_regress_recovery_plots(y_true: numpy.ndarray, y_pred_uncertain: numpy.ndarray, top_ratio_samples: List, regress_text: str, recovery_text: str)

Generate regression plot and recovery plot and include both in a table. Also return the recovery results for the sampled top ratios as a dict.

schrodinger.active_learning.al_report.make_recovery_table(recovery_results: Dict, screen_ratio_samples: List) List

Generate a list of list that contains the recovery ratio for certain top ratio and screen ratio.

Parameters
  • recovery_results – dict that maps top ratio to the recovery ratio numpy array.

  • screen_ratio_samples – list of screen ratios

Returns

table as a list of list, table caption, largest enrichment in the table.

schrodinger.active_learning.al_report.get_conclusion_string(best_enrichment: float, job_type: str, high_enrich: int = 10, low_enrich: int = 2) str

Return the conclusion string based on the job type and the higheest enrichment we have in the recovery ratio table.

schrodinger.active_learning.al_report.make_scaffold_report(smiles: List, counts: List, report_file: str, max_occuring_count: int = 5)

Makes a report containing counts and structure of the five most occuring scaffolds in the input lists of smiles and counts. Saves the report in the path specified by the report_file. One may get n most occuring scaffolds by changing the parameter ‘max_occuring_count’.

Parameters
  • smiles – list of smiles.

  • counts – list of counts corresponding to the smiles.

  • report_file – path of the output report file.

  • max_occuring_count – number of top scaffolds to keep in the report.

class schrodinger.active_learning.al_report.ALReportMaker(active_learning_job)

Bases: object

Base class for different types of AL report maker.

__init__(active_learning_job)

Initialize the report maker for an active learning job

initReport(header: str)

Initialize the report and add header information

class schrodinger.active_learning.al_report.ALPilotReportMaker(active_learning_job)

Bases: schrodinger.active_learning.al_report.ALReportMaker

__init__(active_learning_job)

Initialize the report maker for an active learning job

report()

Function for building the report

addRunDetail()

Add job specifications and running time cost information to the report

addRecoveryResults()

Add the regression plot, recovery plot, recovery table and conclusion to the report.

class schrodinger.active_learning.al_report.ALScreenReportMaker(active_learning_job)

Bases: schrodinger.active_learning.al_report.ALReportMaker

__init__(active_learning_job)

Initialize the report maker for an active learning job

report()

Function for building the report

addRunDetail()

Add job specifications and running time cost information to the report

addRecoveryResults()

Add the regression plot, recovery plot, recovery table and conclusion to the report.