schrodinger.analysis.enrichment.metrics module¶
Stand-alone functions for calculating metrics. The metrics include terms such as Receiver Operator Characteristic area under the curve (ROC), Enrichment Factors (EF), and Robust Initial Enhancement (RIE).
Copyright Schrodinger, LLC. All rights reserved.
- schrodinger.analysis.enrichment.metrics.get_active_sample_size_star(total_actives, total_ligands, adjusted_active_ranks, fraction_of_actives)¶
The size of the decoy sample set required to recover the specified fraction of actives. If there are fewer ranked actives than the requested fraction of all actives then the number of total_ligands is returned.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
adjusted_active_ranks (list(int)) – Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives.
fraction_of_actives (float) – Decimal notation for the fraction of sampled actives, used to determine the sample set size.
- Returns
The size of the decoy sample set required to recover the specified fraction of actives.
- Return type
int
- schrodinger.analysis.enrichment.metrics.get_active_sample_size(total_actives, total_ligands, active_ranks, fraction_of_actives)¶
The size of the sample set required to recover the specified fraction of actives. If there are fewer ranked actives than the requested fraction of all actives then the number of total_ligands is returned.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
fraction_of_actives (float) – Decimal notation for the fraction of sampled actives, used to determine the sample set size.
- Returns
the size of the sample set required to recover the specified fraction of actives.
- Return type
int
- schrodinger.analysis.enrichment.metrics.get_decoy_sample_size(total_actives, total_ligands, active_ranks, fraction_of_decoys)¶
Returns the size of the sample set required to recover the specified fraction of decoys.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
fraction_of_decoys (float) – Decimal notation for the fraction of sampled decoys, used to determine the sample set size.
- Returns
Size of the sample set required to recover the specified fraction of decoys.
- Return type
int
- schrodinger.analysis.enrichment.metrics.calc_ActivesInN(active_ranks, n_sampled_set)¶
Return the number of the known active ligands found in a given sample size.
- Parameters
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
n_sampled_set (int) – The number of rank results for which to calculate the metric. Every active with a rank less than or equal to this value will be counted as found in the set.
- Returns
the number of the known active ligands found in a given sample size.
- Return type
int
- schrodinger.analysis.enrichment.metrics.calc_ActivesInNStar(adjusted_active_ranks, n_sampled_set)¶
Return the number of the known active ligands found in a given sample size.
- Parameters
adjusted_active_ranks (list(int)) – Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives.
n_sampled_set (int) – The number of rank results for which to calculate the metric. Every active with a rank less than or equal to this value will be counted as found in the set.
- Returns
the number of the known active ligands found in a given sample size.
- Return type
int
- schrodinger.analysis.enrichment.metrics.calc_AveNumberOutrankingDecoys(active_ranks)¶
The rank of each active is adjusted by the number of outranking actives. The number of outranking decoys is then defined as the adjusted rank of that active minus one. The number of outranking decoys is calculated for each docked active and averaged.
- Parameters
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
- Returns
the average number of decoys that outranked the actives.
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_DEF(total_actives, total_ligands, active_ranks, title_ranks, fingerprint_comp, n_sampled_set, min_actives=None)¶
Diverse Enrichment Factor, calculated with respect to the number of total ligands.
DEF is defined as:
1 - (min_similarity_among_actives_in_sampled_set) DEF = EF * -------------------------------------------------- 1 - (min_similarity_among_all_actives)
where ‘n_sampled_set’ is the number of all ranks in which to search for actives.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
title_ranks (dict(str, int)) – Unadjusted integer rank keys for title. Not available for table inputs, or other screen results that don’t list the title.
fingerprint_comp (enrichment_input.FingerprintComponent) – Fingerprint component data class object
n_sampled_set (int) – The number of ranked decoy results for which to calculate the enrichment factor.
min_actives (int) – The number of actives that must be within the n_sampled_set, otherwise the returned EF value is None.
- Returns
Diverse Enrichment Factor (DEF) for the given sample size of the screen results. If fewer than min_actives are found in the set, or the calculation raises a ZeroDivisionError, the returned value is None.
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_DEFStar(total_actives, total_ligands, active_ranks, title_ranks, fingerprint_comp, n_sampled_decoy_set, min_actives=None)¶
Here, Diverse EF* (DEF*) is defined as:
1 - (min_similarity_among_actives_in_sampled_set) DEF = EF_star * -------------------------------------------------- 1 - (min_similarity_among_all_actives)
where ‘n_sampled_decoy_set’ is the number of decoy ranks in which to search for actives.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
title_ranks (dict(str, int)) – Unadjusted integer rank keys for title. Not available for table inputs, or other screen results that don’t list the title.
fingerprint_comp (enrichment_input.FingerprintComponent) – Fingerprint component data class object
n_sampled_decoy_set (int) – The number of ranked decoys for which to calculate the enrichment factor.
min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None.
- Returns
Diverse Enrichment Factor (DEF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands. If fewer than min_actives are found in the set the returned value is None.
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_DEFP(total_actives, total_ligands, active_ranks, title_ranks, fingerprint_comp, n_sampled_decoy_set, min_actives=None)¶
Diverse EF’ (DEF’) is defined as:
1 - (min_similarity_among_actives_in_sampled_set) DEF' = EF' * -------------------------------------------------- 1 - (min_similarity_among_all_actives)
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
title_ranks (dict(str, int)) – Unadjusted integer rank keys for title. Not available for table inputs, or other screen results that don’t list the title.
fingerprint_comp (enrichment_input.FingerprintComponent) – Fingerprint component data class object
n_sampled_decoy_set (int) – The number of ranked decoy results for which to calculate the enrichment factor.
min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF’ value is None.
- Returns
Diverse Enrichment Factor prime (DEF’) for a given sample size. If fewer than min_actives are found in the set the returned value is None.
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_EF(total_actives, total_ligands, active_ranks, n_sampled_set, min_actives=None)¶
Calculates the Enrichment factor (EF) for the given sample size of the screen results. If fewer than min_actives are found in the set, or the calculation raises a ZeroDivisionError, the returned value is None.
EF is defined as:
n_actives_in_sampled_set / n_sampled_set EF = ---------------------------------------- total_actives / total_ligands
where ‘n_sampled_set’ is the number of all ranks in which to search for actives.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
n_sampled_set (int) – The number of ranked results for which to calculate the enrichment factor.
min_actives (int) – The number of actives that must be within the n_sampled_set, otherwise the returned EF value is None.
- Returns
enrichment factor
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_EFStar(total_actives, total_ligands, active_ranks, n_sampled_decoy_set, min_actives=None)¶
Calculate the Enrichment factor* (EF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands. If fewer than min_actives are found in the set the returned value is None.
Here, EF* is defined as:
n_actives_in_sampled_set / n_sampled_decoy_set EF* = ---------------------------------------------- total_actives / total_decoys
where ‘n_sampled_decoy_set’ is the number of decoy ranks in which to search for actives.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
n_sampled_decoy_set (int) – The number of ranked decoys for which to calculate the enrichment factor.
min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None.
- Returns
enrichment factor*
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_EFP(total_actives, total_ligands, active_ranks, n_sampled_decoy_set, min_actives=None)¶
Calculates modified enrichment factor defined using the average of the reciprocals of the EF* enrichment factors for recovering the first aa% of the known actives, Enrichment Factor prime (EF’).
EF’(x) will be larger than EF*(x) if the actives in question come relatively early in the sequence, and smaller if they come relatively late. If fewer than min_actives are found in the set the returned value is None.
EF’ is defined as:
n_actives_sampled_set EF' = -------------------------------------------- cumulative_sum(frac. decoys/frac. actives)
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
n_sampled_decoy_set (int) – The number of ranked decoys for which to calculate the enrichment factor.
min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None.
- Returns
enrichment factor prime
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_FOD(total_actives, total_ligands, active_ranks, fraction_of_actives)¶
Calculates the average fraction of decoys outranking the given fraction, provided as a float, of known active ligands. The returned value is None if a) the calculation raises a ZeroDivisionError, or b) fraction_of_actives generates more actives than are ranked, or c) the fraction_of_actives is greater than 1.0
FOD is defined as:
__ 1 \ number_outranking_decoys_in_sampled_set FOD = ------------- / --------------------------------------- num_actives -- total_decoys
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
fraction_of_actives (float) – Decimal notation of the fraction of sampled actives, used to set the sampled set size.
- Returns
Average fraction of outranking decoys.
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_EFF(total_actives, total_ligands, adjusted_active_ranks, fraction_of_decoys)¶
Calculate efficiency in distinguishing actives from decoys (EFF) on an absolute scale of 1 (perfect; all actives come before any decoys) to -1 (all decoys come before any actives); a value of 0 means that actives and decoys were recovered at equal proportionate rates. The returned value is None if the calculation raises a ZeroDivisionError.
EFF is defined as:
frac. actives in sample EFF = (2* -----------------------------------------------) - 1 frac actives in sample + frac. decoys in sample
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
adjusted_active_ranks (list(int)) – Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives.
fraction_of_decoys (float) – The size of the set is in terms of the number of decoys in the screen. For example, given 1000 decoys and fraction_of_decoys = 0.20, actives that appear within the first 200 ranks are counted.
- Returns
Active recovery efficiency at a particular sample set size
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_BEDROC(total_actives, total_ligands, active_ranks, alpha=20.0)¶
Boltzmann-enhanced Discrimination Receiver Operator Characteristic area under the curve. The value is bounded between 1 and 0, with 1 being ideal screen performance. The default alpha=20 weights the first ~8% of screen results. When alpha*Ra << 1, where Ra is the radio of total actives to total ligands, and alpha is the exponential prefactor, the BEDROC metric takes on a probabilistic meaning. Calculated as described by Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq 36.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
alpha (float) – Exponential prefactor for adjusting early enrichment emphasis. Larger values more heavily weight the early ranks. alpha = 20 weights the first ~8% of the screen, alpha = 10 weights the first ~10% of the screen, alpha = 50 weights the first ~3% of the screen results.
- Returns
a tuple of two floats, the first represents the area under the curve for the Boltzmann-enhanced discrimination of ROC (BEDROC) analysis, the second is the alpha*Ra term.
- Return type
(float, float)
- schrodinger.analysis.enrichment.metrics.calc_RIE(total_actives, total_ligands, active_ranks, alpha=20.0)¶
Robust Initial Enhancement (RIE). Active ranks are weighted with an continuously decreasing exponential term. Large positive RIE values indicate better screen performance. Calculated as described by Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq 18.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
alpha (float) – Exponential prefactor for adjusting early enrichment emphasis. Larger values more heavily weight the early ranks. alpha = 20 weights the first ~8% of the screen, alpha = 10 weights the first ~10% of the screen, alpha = 50 weights the first ~3% of the screen results.
- Returns
a float for the Robust Initial Enhancement (RIE).
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_AUAC(total_actives, total_ligands, total_ranked, active_ranks)¶
Area Under the Accumulation Curve (AUAC). The value is bounded between 1 and 0, with 1 being ideal screen performance. Calculated as described by Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq 8. (execept adjusted to a trapezoidal integration, to decrease errors for small data sets).
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
total_ranked (int) – The total number of ligands ranked by the virtual screen scoring metric.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
- Returns
A float representation of the Area Under the Accumulation Curve.
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_ROC(total_actives, total_ligands, adjusted_active_ranks)¶
Calculates a representation of the Receiver Operator Characteristic area underneath the curve. Typically interpreted as the probability an active will appear before an inactive. A value of 1.0 reflects ideal performance, a value of 0.5 reflects a performance on par with random selection. Calculated as described by: Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq A.8
Clasically ROC area is defined as:
AUAC Ra ROC = ------ - ----- Ri 2Ri
Where AUAC is the area under the accumulation curve, Ri is the ratio of inactives, Ra is the ratio of actives.
A different method is used here in order to account for unranked actives - see PYTHON-3055 & PYTHON-3106
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
- Returns
receiver operator characteristic area underneath the curve
- Return type
float
- schrodinger.analysis.enrichment.metrics.calc_HR(total_actives, active_ranks, n_sampled_set=50)¶
Calculates hit rate (HRn) – percentage of actives found in top n-ranked ligands.
- Parameters
total_actives (int) – The total number of active ligands in the screen, ranked and unranked.
active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].
n_sampled_set (int) – The number of ranked results for which to calculate the hit rate.
- Returns
a tuple of two floats, the first represents the hit rate value, the second is the highest posible hit rate value (<100 when total_actives < n_sampled_set).
- Return type
(float, float)