schrodinger.active_learning.al_report module¶
- schrodinger.active_learning.al_report.get_ligand_ml_metric(ligand_ml_model_file: str, with_model_info: bool = False) Tuple¶
- Extract the test set metrics, test set labels and predictions from ligand_ml model file. - Parameters:
- ligand_ml_model_file – ligand_ml .qzip model file. 
- with_model_info – if True, return the model info as well. 
 
- Returns:
- r2, mae, rmse, labels and prediction of the test set, (trained_sub_models, final_sub_models) 
 
- schrodinger.active_learning.al_report.make_train_report(ligand_ml_model_file: str, report_path: str, iter_num: int)¶
- Generate a pdf file that records the test set metrics of the ligand_ml model. - Parameters:
- ligand_ml_model_file – ligand_ml .qzip model file. 
- report_path – path of the pdf report 
- iter_num – current iteration number 
 
 
- schrodinger.active_learning.al_report.make_score_hist_plot(name_to_scores: Dict, x_label: str, plot_path: str)¶
- Generate a bar-type histogram plot which contains multiple data. The names of the data are used as the legends. - Parameters:
- name_to_scores – a dictionary where keys are used as the legends and values are the numbers to be plotted. 
- x_label – label of x-axis 
- plot_path – path of the saved plot 
 
 
- schrodinger.active_learning.al_report.perform_ttest(scores_by_node: Dict, alpha: float = 0.05, equal_var=False) Dict¶
- Perform unpaired t-test on two subsequent iterations of scores. If we accept the null hypothesis, the two iterations have statistically identical averages. If we reject the null hypothesis, the averages of the two iterations are statistically different. We assume sets of scores have unequal variances and uneven samples sizes. - Parameters:
- scores_by_node – map of node name to scores 
- alpha – significance level 
- equal_var – if True, perform a standard independent 2 sample test that assumes equal population variances and sample sizes 
 
- Returns:
- node name to t-test results 
 
- schrodinger.active_learning.al_report.calculate_kstest_results(scores_by_node: Dict, target: str) Dict¶
- Calculate one-sided KS test for scores between two iterations - Parameters:
- scores_by_node – map of node name to scores 
- target – optimization target 
 
- Returns:
- node name to KS test results 
 
- schrodinger.active_learning.al_report.perform_ks_test(reference_scores: List, scores: List, target: str, alpha=0.05) Tuple¶
- Perform one-sided KS (Kolmogorov-Smirnov) test on two subsequent iterations of scores. - In a one-sided two-sample KS test, we compare the cumulative distribution functions of two sets of scores to determine if the previous iteration’s CDF distribution is stochastically greater or less than the next iteration in a specific direction. For more details see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html - Interpreting the outcome of the KS test: dG FEP or Glide scores: ‘reject’ means that scores from the next iteration are statistically favored to be lower given a p-value less than the significance level alpha. - All other targets: ‘reject’ means that scores from the next iteration are statistically favored to be higher given a p-value less than the significance level alpha. - Parameters:
- reference_scores – list of scores from an iteration to use as reference 
- scores – list of scores from the iteration that follows reference_scores 
- target – optimization target 
- alpha – significance level 
 
- Returns:
- KS statistic, p-value, significance level, outcome of the test 
 
- schrodinger.active_learning.al_report.get_image(path: str, width: float = 72.0) Image¶
- Convert image file to reportlab image object that has the same aspect ratio and specified width. - Parameters:
- path – path of the image file. 
- width (float) – width of the reportlab image. 
 
- Returns:
- reportlab image 
 
- schrodinger.active_learning.al_report.get_report_maker(active_learning_job)¶
- Get corresponding report maker for the active learning job. It returns None for evaluate task since we do not have report for it yet. - Parameters:
- active_learning_job (ActiveLearningJob) – active learning job to be processed. 
- Returns:
- corresponding report maker 
- Return type:
 
- schrodinger.active_learning.al_report.get_time_cost(nodes: Dict, node_name: str) str¶
- Return the time cost of a node. It returns ‘Unavailable’ if the time cost is not available. - Parameters:
- nodes – dict that maps node name to node object. 
- node_name – name of the active learning node of interest. 
 
- Returns:
- time cost in h/m/s format. 
 
- schrodinger.active_learning.al_report.get_score_pred_as_array(title_to_score: Dict, pred_score_file: str, discard_cutoff: float, ascending: bool = True) ndarray¶
- Return the score, predicted score, prediction uncertainty of the ligands as the N X 3 numpy array. - Parameters:
- title_to_score – dict that maps ligand title to score. 
- pred_score_file – path of the ligand ml prediction .csv file. 
- discard_cutoff – score cutoff for excluding the ligands in ML training set. 
- ascending – lower value means better ligand if ascending is True 
 
- Returns:
- numpy array of (num_of_ligands X (score, pred, uncertain)) 
 
- schrodinger.active_learning.al_report.calculate_recovery_ratio(label_pred: ndarray, top_ratio: float) ndarray¶
- Calculate the recovery ratio of the best ligands based on label in different numbers of the top ligands predicted by ligand_ml. More negative value means better ligand. - Parameters:
- label_pred – array of dim number of ligands X 2 contains the (label, prediction) 
- top_ratio – top ratio of the ligands by label. 
 
- Returns:
- (screen ratio, recovery ratio of top ligands defined by top_ratio) of all the ligands. 
 
- schrodinger.active_learning.al_report.plot_correlation(x: List, y: List, fname: str, x_label: str, y_label: str, title: str) None¶
- Generate a simple correlation scatter plot :param x_label: label of x-axis :param y_label: label of y-axis :param fname: path of the saved plot :param x: list of x values :param y: list of y values :param title: title of the plot 
- schrodinger.active_learning.al_report.plot_regression(y_true: ndarray, y_pred: ndarray, fname: str)¶
- Generate regression plot. This function is sightly modified from ligand_ml/plotting.py to change the labels of axis. - Parameters:
- y_true – 1D array test set label. 
- y_pred – 2D array of ligand_ml prediction and uncertainty 
- fname – filename to save the image 
 
 
- schrodinger.active_learning.al_report.plot_recovery(recovery_results: Dict, fname: str)¶
- Generate and save recovery plot image. - Parameters:
- recovery_results – dict that maps top ratio to the recovery ratio numpy array. 
- fname – path of the saved image. 
 
 
- schrodinger.active_learning.al_report.make_regress_recovery_plots(y_true: ndarray, y_pred_uncertain: ndarray, top_ratio_samples: List, regress_text: str, recovery_text: str)¶
- Generate regression plot and recovery plot and include both in a table. Also return the recovery results for the sampled top ratios as a dict. 
- schrodinger.active_learning.al_report.make_recovery_table(recovery_results: Dict, screen_ratio_samples: List) List¶
- Generate a list of list that contains the recovery ratio for certain top ratio and screen ratio. - Parameters:
- recovery_results – dict that maps top ratio to the recovery ratio numpy array. 
- screen_ratio_samples – list of screen ratios 
 
- Returns:
- table as a list of list, table caption, largest enrichment in the table. 
 
- schrodinger.active_learning.al_report.get_conclusion_string(best_enrichment: float, job_type: str, high_enrich: int = 10, low_enrich: int = 2) str¶
- Return the conclusion string based on the job type and the higheest enrichment we have in the recovery ratio table. 
- schrodinger.active_learning.al_report.make_scaffold_report(smiles: List, counts: List, report_file: str, max_occuring_count: int = 5, scaffold_type: str = 'generic_bemis_murcko')¶
- Makes a report containing counts and structure of the five most occuring scaffolds in the input lists of smiles and counts. Saves the report in the path specified by the report_file. One may get n most occuring scaffolds by changing the parameter ‘max_occuring_count’. - Parameters:
- smiles – list of smiles. 
- counts – list of counts corresponding to the smiles. 
- report_file – path of the output report file. 
- max_occuring_count – number of top scaffolds to keep in the report. 
 
 
- schrodinger.active_learning.al_report.smiles_to_img(smiles)¶
- Converts a SMILES string to a base64 encoded PNG image. - Parameters:
- (str) (smiles) – The SMILES string of the molecule. 
- Returns:
- Base64 encoded PNG image string of the molecule. 
 
- class schrodinger.active_learning.al_report.ALReportMaker(active_learning_job)¶
- Bases: - object- Base class for different types of AL report maker. - __init__(active_learning_job)¶
- Initialize the report maker for an active learning job 
 - initReport(header: str)¶
- Initialize the report and add header information 
 
- class schrodinger.active_learning.al_report.ALPilotReportMaker(active_learning_job)¶
- Bases: - ALReportMaker- __init__(active_learning_job)¶
- Initialize the report maker for an active learning job 
 - report()¶
- Function for building the report 
 - addRunDetail()¶
- Add job specifications and running time cost information to the report 
 - addRecoveryResults()¶
- Add the regression plot, recovery plot, recovery table and conclusion to the report. 
 
- class schrodinger.active_learning.al_report.ALScreenReportMaker(active_learning_job)¶
- Bases: - ALReportMaker- __init__(active_learning_job)¶
- Initialize the report maker for an active learning job 
 - report()¶
- Function for building the report 
 - addRunDetail()¶
- Add job specifications and running time cost information to the report 
 - addRecoveryResults()¶
- Add the regression plot, recovery plot, recovery table and conclusion to the report. 
 
- class schrodinger.active_learning.al_report.ActiveLearningHtmlReport(al_job)¶
- Bases: - object- __init__(al_job)¶
 - property workflow_name¶
 - property scorer_name¶
 - get_fingerprint(smiles)¶
 - property learning_target¶
 - clean_values(df_score)¶
- Remove the outliers from the dataframe for plotting purposes. - Parameters:
- df_score – DataFrame containing the scores. 
- Returns:
- DataFrame with outliers removed. 
 
 - generateReport(running_node=None)¶
 - updateNodeData(running_node=None)¶
 - getFlowChartData(running_node=None)¶
 - PrepareSmilesNodeText(node, status)¶
 - PrepareSmilesNodeFigs(node)¶
 - CalculateScoreNodeFigs(node)¶
 - CalculateScoreNodeMolDisplay(node)¶
 - generate_score_hist(df_score, title='Distribution Ligand Scores')¶
 - getRandomLigandsPCA(random_smiles)¶
 - generatePCAFig(random_smiles, df_score, title='PCA of Fingerprints of Ligands')¶
 - KnownScoreProviderNodeText(node, status)¶
 - CalculateScoreNodeText(node, status)¶
 - LigandMLTrainNodeResults(node)¶
 - LigandMLTrainNodeFigs(node)¶
 - LigandMLTrainNodeText(node, status)¶
 - LigandMLEvalNodeMolDisplay(node)¶
 - LigandMLEvalNodeFigs(node)¶
 - LigandMLEvalNodeText(node, status)¶
 - RescoreNodeFigs(node)¶
 - RescoreNodeMolDisplay(node)¶
 - RescoreNodeText(node, status)¶
 - static get_property(property_func, smi)¶
 - generate_property_hist(ligands_list, ligands_origin, property_name)¶