schrodinger.analysis.enrichment.metrics module

Stand-alone functions for calculating metrics. The metrics include terms such as Receiver Operator Characteristic area under the curve (ROC), Enrichment Factors (EF), and Robust Initial Enhancement (RIE).

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.analysis.enrichment.metrics.get_active_sample_size_star(total_actives, total_ligands, adjusted_active_ranks, fraction_of_actives)

The size of the decoy sample set required to recover the specified fraction of actives. If there are fewer ranked actives than the requested fraction of all actives then the number of total_ligands is returned.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • adjusted_active_ranks (list(int)) – Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives.

  • fraction_of_actives (float) – Decimal notation for the fraction of sampled actives, used to determine the sample set size.

Returns

The size of the decoy sample set required to recover the specified fraction of actives.

Return type

int

schrodinger.analysis.enrichment.metrics.get_active_sample_size(total_actives, total_ligands, active_ranks, fraction_of_actives)

The size of the sample set required to recover the specified fraction of actives. If there are fewer ranked actives than the requested fraction of all actives then the number of total_ligands is returned.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • fraction_of_actives (float) – Decimal notation for the fraction of sampled actives, used to determine the sample set size.

Returns

the size of the sample set required to recover the specified fraction of actives.

Return type

int

schrodinger.analysis.enrichment.metrics.get_decoy_sample_size(total_actives, total_ligands, active_ranks, fraction_of_decoys)

Returns the size of the sample set required to recover the specified fraction of decoys.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • fraction_of_decoys (float) – Decimal notation for the fraction of sampled decoys, used to determine the sample set size.

Returns

Size of the sample set required to recover the specified fraction of decoys.

Return type

int

schrodinger.analysis.enrichment.metrics.calc_ActivesInN(active_ranks, n_sampled_set)

Return the number of the known active ligands found in a given sample size.

Parameters
  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • n_sampled_set (int) – The number of rank results for which to calculate the metric. Every active with a rank less than or equal to this value will be counted as found in the set.

Returns

the number of the known active ligands found in a given sample size.

Return type

int

schrodinger.analysis.enrichment.metrics.calc_ActivesInNStar(adjusted_active_ranks, n_sampled_set)

Return the number of the known active ligands found in a given sample size.

Parameters
  • adjusted_active_ranks (list(int)) – Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives.

  • n_sampled_set (int) – The number of rank results for which to calculate the metric. Every active with a rank less than or equal to this value will be counted as found in the set.

Returns

the number of the known active ligands found in a given sample size.

Return type

int

schrodinger.analysis.enrichment.metrics.calc_AveNumberOutrankingDecoys(active_ranks)

The rank of each active is adjusted by the number of outranking actives. The number of outranking decoys is then defined as the adjusted rank of that active minus one. The number of outranking decoys is calculated for each docked active and averaged.

Parameters

active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

Returns

the average number of decoys that outranked the actives.

Return type

float

schrodinger.analysis.enrichment.metrics.calc_DEF(total_actives, total_ligands, active_ranks, title_ranks, fingerprint_comp, n_sampled_set, min_actives=None)

Diverse Enrichment Factor, calculated with respect to the number of total ligands.

DEF is defined as:

            1 - (min_similarity_among_actives_in_sampled_set)
DEF = EF * --------------------------------------------------
            1 - (min_similarity_among_all_actives)

where ‘n_sampled_set’ is the number of all ranks in which to search for actives.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • title_ranks (dict(str, int)) – Unadjusted integer rank keys for title. Not available for table inputs, or other screen results that don’t list the title.

  • fingerprint_comp (enrichment_input.FingerprintComponent) – Fingerprint component data class object

  • n_sampled_set (int) – The number of ranked decoy results for which to calculate the enrichment factor.

  • min_actives (int) – The number of actives that must be within the n_sampled_set, otherwise the returned EF value is None.

Returns

Diverse Enrichment Factor (DEF) for the given sample size of the screen results. If fewer than min_actives are found in the set, or the calculation raises a ZeroDivisionError, the returned value is None.

Return type

float

schrodinger.analysis.enrichment.metrics.calc_DEFStar(total_actives, total_ligands, active_ranks, title_ranks, fingerprint_comp, n_sampled_decoy_set, min_actives=None)

Here, Diverse EF* (DEF*) is defined as:

                 1 - (min_similarity_among_actives_in_sampled_set)
DEF = EF_star * --------------------------------------------------
                      1 - (min_similarity_among_all_actives)

where ‘n_sampled_decoy_set’ is the number of decoy ranks in which to search for actives.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • title_ranks (dict(str, int)) – Unadjusted integer rank keys for title. Not available for table inputs, or other screen results that don’t list the title.

  • fingerprint_comp (enrichment_input.FingerprintComponent) – Fingerprint component data class object

  • n_sampled_decoy_set (int) – The number of ranked decoys for which to calculate the enrichment factor.

  • min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None.

Returns

Diverse Enrichment Factor (DEF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands. If fewer than min_actives are found in the set the returned value is None.

Return type

float

schrodinger.analysis.enrichment.metrics.calc_DEFP(total_actives, total_ligands, active_ranks, title_ranks, fingerprint_comp, n_sampled_decoy_set, min_actives=None)

Diverse EF’ (DEF’) is defined as:

             1 - (min_similarity_among_actives_in_sampled_set)
DEF' = EF' * --------------------------------------------------
                  1 - (min_similarity_among_all_actives)
Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • title_ranks (dict(str, int)) – Unadjusted integer rank keys for title. Not available for table inputs, or other screen results that don’t list the title.

  • fingerprint_comp (enrichment_input.FingerprintComponent) – Fingerprint component data class object

  • n_sampled_decoy_set (int) – The number of ranked decoy results for which to calculate the enrichment factor.

  • min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF’ value is None.

Returns

Diverse Enrichment Factor prime (DEF’) for a given sample size. If fewer than min_actives are found in the set the returned value is None.

Return type

float

schrodinger.analysis.enrichment.metrics.calc_EF(total_actives, total_ligands, active_ranks, n_sampled_set, min_actives=None)

Calculates the Enrichment factor (EF) for the given sample size of the screen results. If fewer than min_actives are found in the set, or the calculation raises a ZeroDivisionError, the returned value is None.

EF is defined as:

      n_actives_in_sampled_set / n_sampled_set
EF =  ----------------------------------------
           total_actives / total_ligands

where ‘n_sampled_set’ is the number of all ranks in which to search for actives.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • n_sampled_set (int) – The number of ranked results for which to calculate the enrichment factor.

  • min_actives (int) – The number of actives that must be within the n_sampled_set, otherwise the returned EF value is None.

Returns

enrichment factor

Return type

float

schrodinger.analysis.enrichment.metrics.calc_EFStar(total_actives, total_ligands, active_ranks, n_sampled_decoy_set, min_actives=None)

Calculate the Enrichment factor* (EF*) for the given sample size of the screen results, calculated with respect to the total decoys instead of the more traditional total ligands. If fewer than min_actives are found in the set the returned value is None.

Here, EF* is defined as:

       n_actives_in_sampled_set / n_sampled_decoy_set
EF* =  ----------------------------------------------
            total_actives / total_decoys

where ‘n_sampled_decoy_set’ is the number of decoy ranks in which to search for actives.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • n_sampled_decoy_set (int) – The number of ranked decoys for which to calculate the enrichment factor.

  • min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None.

Returns

enrichment factor*

Return type

float

schrodinger.analysis.enrichment.metrics.calc_EFP(total_actives, total_ligands, active_ranks, n_sampled_decoy_set, min_actives=None)

Calculates modified enrichment factor defined using the average of the reciprocals of the EF* enrichment factors for recovering the first aa% of the known actives, Enrichment Factor prime (EF’).

EF’(x) will be larger than EF*(x) if the actives in question come relatively early in the sequence, and smaller if they come relatively late. If fewer than min_actives are found in the set the returned value is None.

EF’ is defined as:

                n_actives_sampled_set
EF' = --------------------------------------------
       cumulative_sum(frac. decoys/frac. actives)
Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • n_sampled_decoy_set (int) – The number of ranked decoys for which to calculate the enrichment factor.

  • min_actives (int) – The number of actives that must be within the n_sampled_decoy_set, otherwise the returned EF value is None.

Returns

enrichment factor prime

Return type

float

schrodinger.analysis.enrichment.metrics.calc_FOD(total_actives, total_ligands, active_ranks, fraction_of_actives)

Calculates the average fraction of decoys outranking the given fraction, provided as a float, of known active ligands. The returned value is None if a) the calculation raises a ZeroDivisionError, or b) fraction_of_actives generates more actives than are ranked, or c) the fraction_of_actives is greater than 1.0

FOD is defined as:

                     __
           1         \    number_outranking_decoys_in_sampled_set
FOD = -------------  /   ---------------------------------------
       num_actives   --         total_decoys
Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • fraction_of_actives (float) – Decimal notation of the fraction of sampled actives, used to set the sampled set size.

Returns

Average fraction of outranking decoys.

Return type

float

schrodinger.analysis.enrichment.metrics.calc_EFF(total_actives, total_ligands, adjusted_active_ranks, fraction_of_decoys)

Calculate efficiency in distinguishing actives from decoys (EFF) on an absolute scale of 1 (perfect; all actives come before any decoys) to -1 (all decoys come before any actives); a value of 0 means that actives and decoys were recovered at equal proportionate rates. The returned value is None if the calculation raises a ZeroDivisionError.

EFF is defined as:

                   frac. actives in sample
EFF = (2* -----------------------------------------------) - 1
          frac actives in sample + frac. decoys in sample
Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • adjusted_active_ranks (list(int)) – Modified active ranks; each rank is improved by the number of preceding actives. For example, a screen result that placed three actives as the first three ranks, [1, 2, 3], has adjusted ranks of [1, 1, 1]. In this way, actives are not penalized by being outranked by other actives.

  • fraction_of_decoys (float) – The size of the set is in terms of the number of decoys in the screen. For example, given 1000 decoys and fraction_of_decoys = 0.20, actives that appear within the first 200 ranks are counted.

Returns

Active recovery efficiency at a particular sample set size

Return type

float

schrodinger.analysis.enrichment.metrics.calc_BEDROC(total_actives, total_ligands, active_ranks, alpha=20.0)

Boltzmann-enhanced Discrimination Receiver Operator Characteristic area under the curve. The value is bounded between 1 and 0, with 1 being ideal screen performance. The default alpha=20 weights the first ~8% of screen results. When alpha*Ra << 1, where Ra is the radio of total actives to total ligands, and alpha is the exponential prefactor, the BEDROC metric takes on a probabilistic meaning. Calculated as described by Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq 36.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • alpha (float) – Exponential prefactor for adjusting early enrichment emphasis. Larger values more heavily weight the early ranks. alpha = 20 weights the first ~8% of the screen, alpha = 10 weights the first ~10% of the screen, alpha = 50 weights the first ~3% of the screen results.

Returns

a tuple of two floats, the first represents the area under the curve for the Boltzmann-enhanced discrimination of ROC (BEDROC) analysis, the second is the alpha*Ra term.

Return type

(float, float)

schrodinger.analysis.enrichment.metrics.calc_RIE(total_actives, total_ligands, active_ranks, alpha=20.0)

Robust Initial Enhancement (RIE). Active ranks are weighted with an continuously decreasing exponential term. Large positive RIE values indicate better screen performance. Calculated as described by Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq 18.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • alpha (float) – Exponential prefactor for adjusting early enrichment emphasis. Larger values more heavily weight the early ranks. alpha = 20 weights the first ~8% of the screen, alpha = 10 weights the first ~10% of the screen, alpha = 50 weights the first ~3% of the screen results.

Returns

a float for the Robust Initial Enhancement (RIE).

Return type

float

schrodinger.analysis.enrichment.metrics.calc_AUAC(total_actives, total_ligands, total_ranked, active_ranks)

Area Under the Accumulation Curve (AUAC). The value is bounded between 1 and 0, with 1 being ideal screen performance. Calculated as described by Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq 8. (execept adjusted to a trapezoidal integration, to decrease errors for small data sets).

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • total_ranked (int) – The total number of ligands ranked by the virtual screen scoring metric.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

Returns

A float representation of the Area Under the Accumulation Curve.

Return type

float

schrodinger.analysis.enrichment.metrics.calc_ROC(total_actives, total_ligands, adjusted_active_ranks)

Calculates a representation of the Receiver Operator Characteristic area underneath the curve. Typically interpreted as the probability an active will appear before an inactive. A value of 1.0 reflects ideal performance, a value of 0.5 reflects a performance on par with random selection. Calculated as described by: Trunchon and Bayly, J. Chem. Inf. Model. 2007, 47, 488-508 Eq A.8

Clasically ROC area is defined as:

       AUAC     Ra
ROC = ------ - -----
        Ri      2Ri

Where AUAC is the area under the accumulation curve, Ri is the ratio of inactives, Ra is the ratio of actives.

A different method is used here in order to account for unranked actives - see PYTHON-3055 & PYTHON-3106

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • total_ligands (int) – The total number of ligands (actives and unknowns/ decoys) used in the screen.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

Returns

receiver operator characteristic area underneath the curve

Return type

float

schrodinger.analysis.enrichment.metrics.calc_HR(total_actives, active_ranks, n_sampled_set=50)

Calculates hit rate (HRn) – percentage of actives found in top n-ranked ligands.

Parameters
  • total_actives (int) – The total number of active ligands in the screen, ranked and unranked.

  • active_ranks (list(int)) – List of unadjusted integer ranks for the actives found in the screen. For example, a screen result that placed three actives as the first three ranks has an active_ranks list of = [1, 2, 3].

  • n_sampled_set (int) – The number of ranked results for which to calculate the hit rate.

Returns

a tuple of two floats, the first represents the hit rate value, the second is the highest posible hit rate value (<100 when total_actives < n_sampled_set).

Return type

(float, float)