schrodinger.analysis.enrichment.enrichment_input module¶
Input file parser for enrichment module.
For most virtual screen result input formats, titles are used to identify the ligands. The input is expected to be correctly ordered. If it is not ordered, please set the optional parameter sort_header in parser functions to the correct score header/property. If the file contains duplicate titles then only the first occurrence of a unique title is ranked.
Input file formats:
<actives_file>
Text file.
Raw text, one title per line.
Structure file.
A file containing structures with a meaningful title.
CSV file.
A comma-separated values file.
List(str).
A list of active string titles.
<results_file>
Structure file, e.g. 'foo_pv.mae'
A file containing ordered structures.
CSV file.
A comma-separated values file containing ranked titles ordered by
virtual screen scoring metric.
List(str) or List(structure).
A list of ranked titles ordered by virtual screen scoring metric.
API examples:
# Ex. 1) Calculate BEDROC
active_titles = extract_active_titles_from_txt(actives_file)
total_actives, total_ligands, active_ranks, adjusted_active_ranks,
total_ranked, title_ranks = extract_ranks_from_mae(
mae_file_name="screen_results.maegz",
active_titles=active_titles,
num_decoy=1000)
bedroc, bedroc_ra = metrics.calcBEDROC(total_actives, total_ligands,
active_ranks, 20.0)
# Ex. 2) Using the reporter class to calculate the default set of metrics.
Note that this is not a good practice.
r = reporter.EnrichmentReporter(
actives_file="my_actives.txt",
results_file="screen_results.maegz",
num_decoy=1000)
r.report()
Copyright Schrodinger, LLC. All rights reserved.
- class schrodinger.analysis.enrichment.enrichment_input.FingerprintComponent(fp_gen, fp_sim, active_fingerprint, min_Tc_total_actives)¶
Bases:
object
Data class that contains critical objects that all fingerprint-related metrics functions (calc_DEF, calc_DEFStar and calc_DEFP) need.
- Variables
fp_gen (CanvasFingerprintGenerator) – Object needed to generate fingerprint for each active title.
fp_sim (CanvasFingerprintSimilarity) – Object needed to compare fingerprint similarity for each active pair.
active_fingerprint (dict) – Title keys for fingerprint. Not available for screen results that don’t include title and structure information.
min_Tc_total_actives (float) – A float representing the lowest Tc, Tanimoto coefficient, of all the active similarity pairs.
- __init__(fp_gen, fp_sim, active_fingerprint, min_Tc_total_actives)¶
- schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_csv(actives_file)¶
Parse actives_file as a csv file, return distinct active titles. Repeated active titles are ignored.
- Parameters
actives_file (str) – A csv file containing all active titles.
- Returns
Distinct active titles from the actives file.
- Return type
set(str)
- schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_mae(actives_file)¶
Parse actives_file as a maestro file, return distinct active titles. Repeated active titles are ignored.
- Parameters
actives_file (str) – A maestro file containing all active titles.
- Returns
Distinct active titles from the actives file.
- Return type
set(str)
- schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_txt(actives_file)¶
Parse actives_file as a raw text file with one title per line, return distinct active titles from the actives file. Repeated active titles are ignored.
- Parameters
actives_file (str) – Raw text file containing one title per line.
- Returns
Distinct active titles from the actives file.
- Return type
set(str)
- schrodinger.analysis.enrichment.enrichment_input.extract_active_titles_from_list(actives)¶
Parse actives from list of string, return distinct active titles from the list. Repeated active titles are ignored.
- Parameters
actives (list(str)) – A list of strings containing all active titles.
- Returns
Distinct active titles from the actives file.
- Return type
set(str)
- schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_list(titles_iter, active_titles, num_decoy=0)¶
Compute and return rank and count related terms from a list of ligand titles pre-sorted by virtual screen scoring metric.
- Parameters
titles_iter (list(str)) – A list of title strings, pre-sorted by virtual screen scoring metric.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
- Returns
A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
- Return type
int, int, list(int), list(int), int, dict(str, int)
- schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_csv(csv_file_name, active_titles, num_decoy=0, id_header='Title')¶
Compute and return rank and count related terms from a csv file.
- Parameters
csv_file_name (str) – File name of the csv file that contains the virtual screening result.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
id_header (str) – Name of compound-identifying header.
- Returns
A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
- Return type
int, int, list(int), list(int), int, dict(str, int)
- schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_structures(structure_iter, active_titles, num_decoy=0, id_property='s_m_title')¶
Compute and return rank and count related terms from a list of structures.
- Parameters
structure_iter (list(structure.Structure)) – A list of structure.Structure.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
id_property (str) – Name of compound-identifying property.
- Returns
A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
- Return type
int, int, list(int), list(int), int, dict(str, int)
- schrodinger.analysis.enrichment.enrichment_input.extract_ranks_from_mae(mae_file_name, active_titles, num_decoy=0, id_property='s_m_title')¶
Compute and return rank and count related terms from a structure file.
- Parameters
mae_file_name (str) – A structure file that contains the virtual screening result.
active_titles (set(str)) – Distinct active titles from the actives file
num_decoy (int) – The total number of decoys. If specified, the total number of ligands will be distinct active titles from actives file + num_decoy. This will enable the calculation of the correction term in calc_AUAC, should the total number of ligands not equal to the total number of ranked titles in results_file.
id_property (str) – Name of compound-identifying property.
- Returns
A tuple containing total number of active titles, total number of ligand titles, active ranks, adjusted active ranks, total number of ranked titles, and a dictionary storing active titles as keys and their ranks as value.
- Return type
int, int, list(int), list(int), int, dict(str, int)
- schrodinger.analysis.enrichment.enrichment_input.get_fingerprint_components(structure_file, active_titles, id_property='s_m_title')¶
Initialize and return a data class object needed for fingerprint-related calculations.
- Parameters
structure_file (str or list(str)) – Structure file or a list of structures.
active_titles (set(str)) – Distinct active titles from the actives file
id_property (str) – Name of compound-identifying property.
- Returns
The initialized enrichment_input.FingerprintComponent object.
- Return type