schrodinger.active_learning.al_report module¶
- schrodinger.active_learning.al_report.get_ligand_ml_metric(ligand_ml_model_file: str) Tuple ¶
Extract the test set metrics, test set labels and predictions from ligand_ml model file.
- Parameters
ligand_ml_model_file – ligand_ml .qzip model file.
- Returns
r2, mae, rmse, labels and prediction of the test set
- schrodinger.active_learning.al_report.make_train_report(ligand_ml_model_file: str, report_path: str, iter_num: int)¶
Generate a pdf file that records the test set metrics of the ligand_ml model.
- Parameters
ligand_ml_model_file – ligand_ml .qzip model file.
report_path – path of the pdf report
iter_num – current iteration number
- schrodinger.active_learning.al_report.make_score_hist_plot(name_to_scores: Dict, x_label: str, plot_path: str)¶
Generate a bar-type histogram plot which contains multiple data. The names of the data are used as the legends.
- Parameters
name_to_scores – a dictionary where keys are used as the legends and values are the numbers to be plotted.
x_label – label of x-axis
plot_path – path of the saved plot
- schrodinger.active_learning.al_report.perform_ttest(scores_by_node: Dict, alpha: float = 0.05, equal_var=False) Dict ¶
Perform unpaired t-test on two subsequent iterations of scores. If we accept the null hypothesis, the two iterations have statistically identical averages. If we reject the null hypothesis, the averages of the two iterations are statistically different. We assume sets of scores have unequal variances and uneven samples sizes.
- Parameters
scores_by_node – map of node name to scores
alpha – significance level
equal_var – if True, perform a standard independent 2 sample test that assumes equal population variances and sample sizes
- Returns
node name to t-test results
- schrodinger.active_learning.al_report.calculate_kstest_results(scores_by_node: Dict, target: str) Dict ¶
Calculate one-sided KS test for scores between two iterations
- Parameters
scores_by_node – map of node name to scores
target – optimization target
- Returns
node name to KS test results
- schrodinger.active_learning.al_report.perform_ks_test(reference_scores: List, scores: List, target: str, alpha=0.05) Tuple ¶
Perform one-sided KS (Kolmogorov-Smirnov) test on two subsequent iterations of scores.
In a one-sided two-sample KS test, we compare the cumulative distribution functions of two sets of scores to determine if the previous iteration’s CDF distribution is stochastically greater or less than the next iteration in a specific direction. For more details see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html
Interpreting the outcome of the KS test: dG FEP or Glide scores: ‘reject’ means that scores from the next iteration are statistically favored to be lower given a p-value less than the significance level alpha.
All other targets: ‘reject’ means that scores from the next iteration are statistically favored to be higher given a p-value less than the significance level alpha.
- Parameters
reference_scores – list of scores from an iteration to use as reference
scores – list of scores from the iteration that follows reference_scores
target – optimization target
alpha – significance level
- Returns
KS statistic, p-value, significance level, outcome of the test
- schrodinger.active_learning.al_report.get_image(path: str, width: float = 72.0) reportlab.platypus.flowables.Image ¶
Convert image file to reportlab image object that has the same aspect ratio and specified width.
- Parameters
path – path of the image file.
width (float) – width of the reportlab image.
- Returns
reportlab image
- schrodinger.active_learning.al_report.get_report_maker(active_learning_job)¶
Get corresponding report maker for the active learning job. It returns None for evaluate task since we do not have report for it yet.
- Parameters
active_learning_job (ActiveLearningJob) – active learning job to be processed.
- Returns
corresponding report maker
- Return type
- schrodinger.active_learning.al_report.get_time_cost(nodes: Dict, node_name: str) str ¶
Return the time cost of a node. It returns ‘Unavailable’ if the time cost is not available.
- Parameters
nodes – dict that maps node name to node object.
node_name – name of the active learning node of interest.
- Returns
time cost in h/m/s format.
- schrodinger.active_learning.al_report.get_score_pred_as_array(title_to_score: Dict, pred_score_file: str, discard_cutoff: float, ascending: bool = True) numpy.ndarray ¶
Return the score, predicted score, prediction uncertainty of the ligands as the N X 3 numpy array.
- Parameters
title_to_score – dict that maps ligand title to score.
pred_score_file – path of the ligand ml prediction .csv file.
discard_cutoff – score cutoff for excluding the ligands in ML training set.
ascending – lower value means better ligand if ascending is True
- Returns
numpy array of (num_of_ligands X (score, pred, uncertain))
- schrodinger.active_learning.al_report.calculate_recovery_ratio(label_pred: numpy.ndarray, top_ratio: float) numpy.ndarray ¶
Calculate the recovery ratio of the best ligands based on label in different numbers of the top ligands predicted by ligand_ml. More negative value means better ligand.
- Parameters
label_pred – array of dim number of ligands X 2 contains the (label, prediction)
top_ratio – top ratio of the ligands by label.
- Returns
(screen ratio, recovery ratio of top ligands defined by top_ratio) of all the ligands.
- schrodinger.active_learning.al_report.plot_correlation(x: List, y: List, fname: str, x_label: str, y_label: str, title: str) None ¶
Generate a simple correlation scatter plot :param x_label: label of x-axis :param y_label: label of y-axis :param fname: path of the saved plot :param x: list of x values :param y: list of y values :param title: title of the plot
- schrodinger.active_learning.al_report.plot_regression(y_true: numpy.ndarray, y_pred: numpy.ndarray, fname: str)¶
Generate regression plot. This function is sightly modified from ligand_ml/plotting.py to change the labels of axis.
- Parameters
y_true – 1D array test set label.
y_pred – 2D array of ligand_ml prediction and uncertainty
fname – filename to save the image
- schrodinger.active_learning.al_report.plot_recovery(recovery_results: Dict, fname: str)¶
Generate and save recovery plot image.
- Parameters
recovery_results – dict that maps top ratio to the recovery ratio numpy array.
fname – path of the saved image.
- schrodinger.active_learning.al_report.make_regress_recovery_plots(y_true: numpy.ndarray, y_pred_uncertain: numpy.ndarray, top_ratio_samples: List, regress_text: str, recovery_text: str)¶
Generate regression plot and recovery plot and include both in a table. Also return the recovery results for the sampled top ratios as a dict.
- schrodinger.active_learning.al_report.make_recovery_table(recovery_results: Dict, screen_ratio_samples: List) List ¶
Generate a list of list that contains the recovery ratio for certain top ratio and screen ratio.
- Parameters
recovery_results – dict that maps top ratio to the recovery ratio numpy array.
screen_ratio_samples – list of screen ratios
- Returns
table as a list of list, table caption, largest enrichment in the table.
- schrodinger.active_learning.al_report.get_conclusion_string(best_enrichment: float, job_type: str, high_enrich: int = 10, low_enrich: int = 2) str ¶
Return the conclusion string based on the job type and the higheest enrichment we have in the recovery ratio table.
- schrodinger.active_learning.al_report.make_scaffold_report(smiles: List, counts: List, report_file: str, max_occuring_count: int = 5)¶
Makes a report containing counts and structure of the five most occuring scaffolds in the input lists of smiles and counts. Saves the report in the path specified by the report_file. One may get n most occuring scaffolds by changing the parameter ‘max_occuring_count’.
- Parameters
smiles – list of smiles.
counts – list of counts corresponding to the smiles.
report_file – path of the output report file.
max_occuring_count – number of top scaffolds to keep in the report.
- class schrodinger.active_learning.al_report.ALReportMaker(active_learning_job)¶
Bases:
object
Base class for different types of AL report maker.
- __init__(active_learning_job)¶
Initialize the report maker for an active learning job
- initReport(header: str)¶
Initialize the report and add header information
- class schrodinger.active_learning.al_report.ALPilotReportMaker(active_learning_job)¶
Bases:
schrodinger.active_learning.al_report.ALReportMaker
- __init__(active_learning_job)¶
Initialize the report maker for an active learning job
- report()¶
Function for building the report
- addRunDetail()¶
Add job specifications and running time cost information to the report
- addRecoveryResults()¶
Add the regression plot, recovery plot, recovery table and conclusion to the report.
- class schrodinger.active_learning.al_report.ALScreenReportMaker(active_learning_job)¶
Bases:
schrodinger.active_learning.al_report.ALReportMaker
- __init__(active_learning_job)¶
Initialize the report maker for an active learning job
- report()¶
Function for building the report
- addRunDetail()¶
Add job specifications and running time cost information to the report
- addRecoveryResults()¶
Add the regression plot, recovery plot, recovery table and conclusion to the report.