schrodinger.application.bioluminate.pose_filtering.hdx_io module

Utilities for reading and validating HDX-MS CSV files from different vendors.

# Code Vocabulary

Column definition: The column names for a specific HDX-MS CSV file format. Different vendors may use different column names for the same data.

# Scientific Vocabulary

HDX-MS: Hydrogen Deuterium eXchange Mass Spectrometry. This measures the exchange of hydrogen atoms with deuterium atoms in proteins.

Protein State: A unique state of the protein, e.g. bound or unbound. In practice, these can take arbitrary names that users designate as bound or unbound.

Time Slice: The amount of time that the proteins in the HDX-MS experiment are exposed to deuterium.

Uptake: The amount of deuterium that has been incorporated into the protein.

class schrodinger.application.bioluminate.pose_filtering.hdx_io.DynamXColumns(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.StrEnum

Expected column names for HDX-MS CSV files produced by the DynamX software. Actual CSV file may contain more columns.

PROTEIN_STATE = 'State'
TIME_SLICE = 'Exposure'
START = 'Start'
END = 'End'
SEQUENCE = 'Sequence'
UPTAKE = 'Uptake'
MAX_UPTAKE = 'MaxUptake'
class schrodinger.application.bioluminate.pose_filtering.hdx_io.HDExaminerColumns(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.StrEnum

Expected column names for HDX-MS files produced by the HDExaminer software. Actual CSV file may contain more columns.

PROTEIN_STATE = 'Protein State'
TIME_SLICE = 'Deut Time (sec)'
START = 'Start'
END = 'End'
SEQUENCE = 'Sequence'
PERCENT_D = '%D'
class schrodinger.application.bioluminate.pose_filtering.hdx_io.StandardColumns(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.StrEnum

Column names that must be present in all HDX-MS datasets in order to perform analysis. Actual data may contain more columns.

START: The start index of the fragment. END: The end index of the fragment. SEQUENCE: The 1-letter residue code sequence of the fragment. PERCENT_D: The percent deuterium uptake of the fragment. CHAIN: The chain name(s) of the fragment in a given pose.

START = 'Start'
END = 'End'
SEQUENCE = 'Sequence'
PERCENT_D = '%D'
CHAIN = 'Chain'
class schrodinger.application.bioluminate.pose_filtering.hdx_io.SubstructureType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: enum.Enum

RECEPTOR = 'Receptor'
LIGAND = 'Ligand'
exception schrodinger.application.bioluminate.pose_filtering.hdx_io.InvalidSubstructureTypeError(substructure_type: schrodinger.application.bioluminate.pose_filtering.hdx_io.SubstructureType)

Bases: Exception

__init__(substructure_type: schrodinger.application.bioluminate.pose_filtering.hdx_io.SubstructureType)
exception schrodinger.application.bioluminate.pose_filtering.hdx_io.InvalidPropertyError(name: str, value: schrodinger.application.bioluminate.pose_filtering.hdx_io.T, valid_properties: set[T])

Bases: Exception

Base class for invalid HDX-MS properties

__init__(name: str, value: schrodinger.application.bioluminate.pose_filtering.hdx_io.T, valid_properties: set[T])
exception schrodinger.application.bioluminate.pose_filtering.hdx_io.InvalidProteinStateError(protein_state: str, valid_states: set[str])

Bases: schrodinger.application.bioluminate.pose_filtering.hdx_io.InvalidPropertyError

__init__(protein_state: str, valid_states: set[str])
exception schrodinger.application.bioluminate.pose_filtering.hdx_io.InvalidTimeSliceError(time_slice: float, valid_slices: set[float])

Bases: schrodinger.application.bioluminate.pose_filtering.hdx_io.InvalidPropertyError

__init__(time_slice: float, valid_slices: set[float])
exception schrodinger.application.bioluminate.pose_filtering.hdx_io.MissingColumnsError(missing_columns: set[str])

Bases: Exception

Raised when a DataFrame is missing required columns.

__init__(missing_columns: set[str])
class schrodinger.application.bioluminate.pose_filtering.hdx_io.HdxMsProperties(protein_states_recep: set[str], protein_states_lig: set[str], time_slices: set[float])

Bases: NamedTuple

Properties of an HDX-MS dataset.

Variables
  • protein_states_recep – All unique protein states for the receptor.

  • protein_states_lig – All unique protein states for the ligand. May be empty if no ligand data was supplied.

  • time_slices – Unique time slices present in both the receptor and ligand datasets.

protein_states_recep: set[str]

Alias for field number 0

protein_states_lig: set[str]

Alias for field number 1

time_slices: set[float]

Alias for field number 2

schrodinger.application.bioluminate.pose_filtering.hdx_io.get_hdx_ms_properties(fname_recep: str, fname_lig: str | None = None) schrodinger.application.bioluminate.pose_filtering.hdx_io.HdxMsProperties

Convenience method for getting the properties of an HDX-MS dataset.

schrodinger.application.bioluminate.pose_filtering.hdx_io.get_hdx_adapter(fname_recep: str, fname_lig: str | None = None) schrodinger.application.bioluminate.pose_filtering.hdx_io._HdxAdapter

Return the appropriate HDX adapter for the given input files.

Parameters
  • fname_recep – The filename of the receptor HDX-MS CSV file.

  • fname_lig – The filename of the ligand HDX-MS CSV file. May be None.

schrodinger.application.bioluminate.pose_filtering.hdx_io.get_missing_columns(df: pandas.core.frame.DataFrame, column_def: enum.StrEnum) set[enum.StrEnum]

Return any column headers that are missing from the supplied column definition.

Parameters

df – A DataFrame containing unprocessed HDX-MS CSV data.