1 Introduction

Peptide amide hydrogen/deuterium exchange mass spectrometry (DXMS) is an increasingly important tool for the study of protein dynamics and structure [14]. Amide hydrogens in proteins exchange with hydrogen atoms from solvent water molecules surrounding the protein with exchange rates that depend on the details of the protein’s structure and thermodynamic stability. In general, amide hydrogens that only exchange as a result of large-scale unfolding of the protein exchange the slowest, while amide hydrogens in loop regions or within thermodynamically less stable regions of the protein tend to exchange faster. Beyond such generalizations, interpretation of exchange data in terms of protein structure remains a great challenge for the field. DXMS is an exceptionally powerful approach for examining the exchange behavior of proteins, and data can be obtained on otherwise problematic proteins that are too large, limited in quantity, or of limited solubility to be purified or crystallized in structurally competent form. Thus hydrogen exchange data is frequently used to support models proposed for protein structure in the absence of high resolution crystallographic or NMR determinations.

The approaches taken for representing and interpreting experimental exchange data in terms of structural models have been largely qualitative. Results of such analyses are often represented as two-dimensional (2-D) “butterfly,” “mirror,” or colored “heat maps,” where the deuteration level of peptides is represented by varying colors and arranged in register with the intact protein’s primary amino acid sequence. If a high-resolution 3-D structure for a protein or a protein homolog is available, the exchange data can be superimposed on such structures, often by differential coloring. Recent improvements in the comprehensiveness and resolution of DXMS experimentation [5, 6] provide further impetus for the development of improved methods for applying experimental data to the testing of protein structure models [712].

We have developed a computational method (DXCOREX) by which amide hydrogen exchange rates for a crystallographically or NMR-solved protein structure, or a hypothesized protein structure, can be predicted and then transformed into a data set that precisely corresponds with the probe peptides and on-exchange times of the available experimental data. Correlation analysis of the experimental data for a protein and DXCOREX calculated datasets for a 3-D structure proposed for the protein allows a quantitative evaluation of the degree to which the experimental data supports the proposed structural model.

In this work we have found that virtual DXMS data sets predicted by DXCOREX analysis of the known 3-D structures of 13 proteins are highly correlated with the results of experimental DXMS analysis of these same proteins, despite the data acquisition having been performed by multiple investigators employing varying exchange data acquisition methodologies. We then demonstrate the use of this method in the quantitative assessment of the accuracy of a recently proposed 3-D structure of a protein for which exchange data was available.

2 Methods

2.1 DXCOREX - Introduction

The COREX algorithm [1315] allows efficient calculation of the in-solution dynamic behavior of proteins whose structure had been previously determined as a static representation, either crystallographically or by NMR experimentation. COREX calculates residue-specific stability values for each amino acid in a protein from knowledge of the structural coordinates of the atoms in the protein’s deduced 3-D structure, and a partition function by which the relative energetic cost of exposing the various atoms in the structured protein to solvent water can be efficiently calculated. At its inception, a provisional peptide amide hydrogen exchange rate calculating capability, based on the residue-specific stability values, was incorporated into COREX, to allow experimental estimation of the accuracy of the algorithm to calculate the residue-specific thermodynamic stability parameters. Algorithm-calculated residue-specific exchange rates were calculated for several structurally solved proteins with available NMR-obtained residue-specific exchange rate data. It was found that the COREX- calculated rates were in general agreement (approximately 75% agreement) with the available NMR data for the studied proteins [13, 16], though it was recognized that the NMR exchange rate data sets available were incomplete, most particularly in the measurements of the more rapidly exchanging amides in proteins. With this validation in place, the COREX algorithm had been subsequently applied to successfully describe protein features, such as cold denaturation [17, 18], allosteric binding effects [19, 20], pH-linked structural transitions [21], and energetic profiling of protein folds [22].

These COREX calculations were based on the readily available coordinates of the heavy atoms typically represented in published structures (that exclude representation of hydrogen) and the competency for exchange of an amide hydrogen was estimated by measuring the solvent exposure of the entire amide nitrogen component of each peptide amide in the structure. This approximation was accurate enough to allow validation of the COREX algorithm against multiple proteins [13, 16]. We have found that the accuracy of exchange rate calculation is improved if amide hydrogens are taken into account in the calculation. In the present investigation, COREX was modified to allow calculation of the exchange-competent solvent exposure of explicitly represented amide hydrogens versus amide nitrogen (termed H-COREX). This modest change resulted in higher accuracy in calculating exchange rates than those obtained with unmodified COREX when evaluated against NMR-derived rate measurements (see Supplemental Material, and Supplemental Material Figures S1 and S2).

2.2 COREX Microstate Ensemble Generation

The native states of proteins are conformationally heterogeneous, owing to both small and large scale conformational fluctuations [2326]. COREX treats a protein as a statistical thermodynamic ensemble of myriad conformational microstates. By using a crystal, NMR, or hypothetically-derived 3-D structure as a template, an ensemble of partially unfolded microstates is generated by treating contiguous groups of residues (folding units) as being either folded and native-like or as being unfolded with the same thermodynamic properties as isolated and unstructured peptides[1315, 27, 28]. By combinatorial unfolding of a set of predefined folding units and by systematically varying the boundaries of the folding units throughout the protein structure simultaneously, a large number of conformational states are generated. In the present analysis, a Monte Carlo sampling method was used to preferentially select low energy states at the expense of high-energy states in order to decrease computational demands. The probability of each state can be represented by

$$ {P_i} = \frac{{{e^{{\frac{{ - \Delta Gi}}{{RT}}}}}}}{{\sum\limits_{{i = 0}}^N {{e^{{\frac{{ - \Delta Gi}}{{RT}}}}}} }} $$
(1)

where the numerator is the statistical weight of the microstate i, the denominator is the partition function for the system, and ΔG i is the relative Gibbs energy for a microstate relative to the native conformation.

2.3 Calculation of Microstate Energy

In COREX, the Gibbs free energy for each microstate, ΔG i , relative to the native conformation is calculated from the contributions of apolar and polar solvation and the conformational entropy:

$$ \Delta {G_i} = \Delta {G_{{apol,i}}} + \Delta {G_{{pol,i}}} - W \times T\Delta {S_{{conf,i}}} $$
(2)

COREX uses accessible surface area based parameterizations that have been previously validated to calculate the relative apolar and polar free energies of each microstate [13, 14, 2732].

$$ \Delta {G_{{apol,i}}}(T) = - 8.44 \times \Delta AS{A_{{apol,i}}} + 0.45 \times \Delta AS{A_{{apol,i}}} \times (T - 333) - T \times (0.45 \times \Delta AS{A_{{apol,i}}} \times \ln (\frac{T}{{385}})) $$
(3)
$$ \Delta {G_{{pol,i}}}(T) = 31.44 \times \Delta AS{A_{{pol,i}}} - 0.26 \times \Delta AS{A_{{pol,i}}} \times (T - 333) - T \times ( - 0.26 \times \Delta AS{A_{{pol,i}}} \times \ln (\frac{T}{{335}})) $$
(4)

The conformational entropy of each state, ΔS conf,i , is evaluated by explicitly considering backbone and side-chain contributions. W is the entropy weighting factor [13, 14, 31]. For each study protein, the parameters used for calculations are indicated in Supplemental Material, Table S1, including window size, number of microstates, entropy weighting factor, pH, and Temperature.

2.4 Calculation of Accessible Surface Area (ASA)

ASA is calculated as the locus of the center of the probe sphere, a water molecule, as it rolls over the Van der Waals surface of the protein [33]. In COREX, apolar and polar accessible surface area (ASA) of the native conformation is first determined and then used as a reference. For the other microstates in the ensemble, the apolar and polar ASA from folded parts of the protein are treated in the same way as for the native conformation, and ASA from unfolded parts of the protein is calculated by treating unfolded peptides as totally unstructured. Thus, the changes of accessible surface area of a microstate, ΔASA apol,i and ΔASA pol,i , caused by unfolding can be determined.

2.5 Calculation of Exchange Rates from Structures Decorated with Explicitly-Placed Hydrogens (H-COREX)

In the present study, amide hydrogens are added with their appropriate coordinates using WHATIF, implemented at http://swift.cmbi.ru.nl/servers/html/index.html [34]. Assuming that only solvent-exposed amide hydrogens can exchange with hydrogens in solvent, the protection factor of any given residue j can be defined as the ratio of the sum of the probabilities of the microstates in which residue j is exposed to solvent, to the sum of the probabilities of the microstates in which residue j is buried. Usually, a residue is exposed to solvent when it is in the unfolded condition, while it is solvent inaccessible when it is in the folded condition. However, there are two exceptions to this: when the amide hydrogen of residue j is solvent accessible in the native state, and when residue j is located in the portions of the protein that remain folded, but is exposed to solvent due to the partial unfolding. If we use P f,ex,j to indicate the sum of probabilities of the microstates of above two exceptions, we can calculate protection factor by using

$$ P{F_j} = \frac{{\sum\limits_{{i = 1}}^{{{N_{{j,folded}}}}} {{P_i} - {P_{{f,ex,j}}}} }}{{\sum\limits_{{i = 1}}^{{{N_{{j,unfolded}}}}} {{P_i} + {P_{{f,ex,j}}}} }} $$
(5)

And then the hydrogen exchange rate can be estimated by

$$ {k_{{ex,j}}} = \frac{{{k_{{{\rm int}, j}}}}}{{P{F_j}}} $$
(6)

where k int,j is the exchange rate of residue j when it is in a fully unfolded state. These values are calculated based on studies of model peptides [23, 35, 36], performed with the use of a spreadsheet available at http://hx2.med.upenn.edu/download.html.

2.6 Calculation of Predicted DXMS Data Sets

H-COREX-calculated hydrogen exchange rates are transformed to predicted DXMS data sets (deuterons incorporated per peptide) that match the resolution (specific peptide probes, specific on-exchange durations) of available experimental data sets. In each published experiment examined in the present study, the methods of data acquisition and analysis vary, as does the extent of peptide probe coverage. Deuterium losses of the probe peptides after exchange-quench are corrected in the publications by use of either an average lost factor or by peptide-specific loss factors measured in the studies, and therefore the corresponding DXCOREX-calculated deuteron incorporation per peptide requires no correction for losses in the present study. The precise manner in which experimental DXMS data was extracted from these publications for comparison with DXCOREX-predicted data sets, and the manner in which corrections for deuterium losses were performed in the publications, are described for each protein in Supplemental Material, Proteins Studied. For several proteins, published DXMS analysis had been performed on complexed forms of the proteins, but with data reported for only one member of the binding pair. For these studies, H-COREX calculations were performed on the 3-D structure of the entire protein-binding partner complex, and then predicted data for the target protein probe peptides compared with the experimental data available for it. As the first two residues of each peptide are rapidly exchanging and do not substantially retain deuterons during LC-MS [23, 35, 36], the quantity of deuterium on any peptide probe fragment f, which covers residues from m to n at time point t can be calculated using

$$ {D_{{f,t}}} = \sum\limits_{{i = m + 2}}^n {(1 - {e^{{ - {k_{{ex,i}}}t}}})} . $$
(7)

k ex,i is the exchange rate estimated by H-COREX for the structured protein.

2.7 Data Processing

COREX and H-COREX were implemented in C language and run on a personal computer using the Linux operating system [13, 15, 27, 28]. Because the computational demands of the algorithm increase exponentially with the size of the protein, a Monte Carlo sampling strategy was used [20]. For a given protein, the COREX/H-COREX window size and number of microstates used were adjusted to allow completion of computations within 6 d, and are presented in Supplemental Material, Table S1. For each protein, the experimentally-measured number of deuterons incorporated into each probe peptide at each on-exchange time was compared with the DXCOREX-predicted deuteron incorporation into the same peptide at the same on-exchange time. Aggregate results for all peptide probes at all on-exchange times were then plotted as DXCOREX-calculated deuteron incorporation per peptide (y axis) versus experimentally-determined deuteron incorporation per peptide (x axis), and the Pearson correlation coefficient for the ensemble calculated employing the appropriate functionality in Microsoft Excel.

2.8 Target Protein Selection

Proteins were selected on the basis of the availability of both published DXMS data and high-resolution X-ray crystallographically-determined structures as described in each of the publications cited below and summarized in Table 1: inhibitor of IκBα; dimerization/docking domain of type II isoform of protein kinase A; α-actinin calponin homology (CH2) domain; catalytic subunit of PKA with an E230Q mutation; a high-affinity human growth hormone variant, either free or bound to the extracellular domain of its receptor; mitogen-activated protein kinase P38; DNA/RNA repair enzyme AlkB; free peroxisome proliferator-activated receptor gamma ligand binding domain (PPARγ LBD); rosiglitazone-bound PPARγ LBD; cerezyme; superoxide dismutase A4V mutant and wild-type superoxide dismutase. DXCOREX analysis was also performed on a recently proposed 3-D homology model for the catalytic domain of group VIA-2 Ca2+-independent phospholipase A2 that is without an experimentally determined high-resolution structure, and results compared with the available DXMS data for this protein. Coordinates for this hypothesized structure were provided by Dr. Yuan-Hao Hsu, Department of Chemistry and Biochemistry, University of California, San Diego, and are available upon request.

Table 1 Study Proteins

3 Results

3.1 Hydrogen-Exchange Behavior Predicted by DXCOREX Analysis of High-Resolution 3-D Protein Structures Is Highly Correlated with Experimental DXMS Data for the Proteins

DXCOREX was used to generate predicted DXMS data sets from the structural coordinates of 13 proteins for which published DXMS data were available. Exchange rates were calculated for each peptide amide in each protein by application of H-COREX to the appropriate 3-D structure, and then predicted data sets constructed by calculating the deuteron incorporation per peptide for all peptide probes, at all on-exchange times interrogated in each of the experimental studies (see Supplemental Material, Proteins Studied, for detailed descriptions of calculations performed with each study protein). For several proteins, exchange analysis had been performed on complexed forms of the proteins, but with data reported for only one member of the binding pair. For these studies, H-COREX calculations were performed on the 3-D structure of the entire protein-binding partner complex, and then predicted deuteron incorporation for the target protein probe peptides compared with the experimental DXMS data available for the target. For each protein, the experimentally-measured number of deuterons incorporated into each probe peptide, at each on-exchange time, was compared with the DXCOREX-predicted deuteron incorporation into the same peptide at the same on-exchange time. Aggregate results for all peptide probes at all on-exchange times were then plotted; DXCOREX-calculated deuteron incorporation per peptide (y axis) versus experimentally-determined deuteron incorporation per peptide (x axis), and the Pearson correlation coefficient calculated. We also prepared figures that allowed visual comparison of DXMS experimental and DXCOREX-calculated peptide exchange behavior, emulating the formats employed in the publications that had originally presented the experimental results.

3.2 IκBα

Inhibitor of IκBα, a protein belonging to the ankyrin repeat family, was studied by MALDI-based DXMS both free and bound to nuclear factor κB [37]. H-COREX calculations were performed on the IκBα-NFκB complex, as the structure of free IκBα has not been reported [37, 38]. Figure 1A is adapted from of the reference publication [37], where the IκBα structure is decorated with DXMS experimental data for nine peptides followed in the study. Figure 1B shows the structure decorated with DXCOREX-predicted data calculated for the same nine peptides, color-coded as in the publication. The DXCOREX and experimental data are seen to be remarkably similar in pattern and magnitude in this 3-D representation. The percent of maximal deuteron incorporation of 17 peptide fragments of IκBα following 2 min of on-exchange were presented in that publication, and DXCOREX-calculated percent of maximal deuteron incorporation for the same peptides compared with the experimental values (Figure 1C). DXCOREX and experimental data for the 17 peptide fragments were highly correlated, with a correlation coefficient of 0.92.

Figure 1
figure 1

IκBα. Experimentally-determined (A) and DXCOREX-calculated (B) percent deuteration incorporation for peptide probes from IκBα following 2 min on-exchange. The deuteration levels are shown by different colors from dark blue (<10% deuteration) to dark red (>90% deuteration). (C) Pearson correlation between the experimental DXMS data set and DXCOREX-calculated data set. (A) Adapted from Figure 5b, pg. 18954 of the reference publication [37], where the IκBα structure is decorated with experimental data for nine peptides. Copyright © 2011 National Academy of Sciences. (B) Shows the 3-D structure of IκBα, decorated with DXCOREX-predicted data for the same nine peptides color-coded as in the publication. A tenth N-terminal peptide (66–80) is not shown, as its residues were missing from the crystallographic structure, and not amenable to DXCOREX determination. N = N-terminal; C = C-terminal

3.3 Dimerization/Docking Domain of Type II Isoform of Protein Kinase A (D/D RIIα PKA)

D/D RIIα PKA was investigated with a liquid chromatography-electrospray ionization mass spectrometry-based experimental approach [39]. As the D/D domain is known to dimerize at high affinity under the conditions employed in this DXMS study, H-COREX calculations were performed on the D/D domain dimer structure [40]. In the published experiments, results from 11 proteolytic fragments, covering all 48 residues in the protein were reported. The exchange behavior for these 11 fragments at 10 on-exchange durations was shown as a colored ribbon representation in that publication. Due to missing residues in the structure, only nine peptides were subjected to DXCOREX calculation. The DXCOREX-predicted percent of maximal deuteron incorporation (Figure 2A) for these peptides at these on-exchange times was remarkably similar to the experimental data (Figure 2B) in the ribbon diagram representation, and the two datasets were highly correlated (Figure 2C), with a correlation coefficient of 0.96.

Figure 2
figure 2

D/D domain of type II isoform of PKA. DXCOREX-calculated (A) and DXMS-experimental (B) percent deuteration incorporation for D/D RIIαPKA peptide probes. The deuteration levels at each time-point (10 s–300,000 s) are shown by colors from blue (<10% deuteration) to red (>90% deuteration). (C) Pearson correlation between the DXMS experimental data set and DXCOREX-predicted data set. (B) Adapted from Figure 3 of reference [39]

3.4 Mitogen-Activated Protein Kinase p38

For the mitogen-activated protein kinase p38, the exchange behavior of 36 peptide probes followed in DXMS study [41] was highly correlated (R2 = 0.86) with the hydrogen exchange behavior calculated for these peptides by DXCOREX analysis of the protein’s crystal structure [42], (see Figure 3A).

Figure 3
figure 3

(A) Mitogen-activated protein kinase p38 (p38). Pearson correlation between the DXMS measured deuteron incorporation of 36 peptide probes and the deuteron incorporation calculated for these peptides by DXCOREX analysis of the protein’s crystal structure (PDB ID: 1P38) [42]. (B) DNA/RNA repair enzyme AlkB (AlkB). Pearson correlation between the DXMS measured deuteron incorporation of 14 peptide probes and the deuteron incorporation calculated for these peptides by DXCOREX analysis of the protein’s crystal structure (PDB ID:2FD8) [43]

3.5 AlkB

For the DNA/RNA repair enzyme AlkB, the exchange behavior of 17 peptide probes followed in DXMS study [43] was highly correlated (R2 = 0.81) with the behavior calculated for these peptides by DXCOREX analysis of the protein’s crystal structure [43], (see Figure 3B).

3.6 High-affinity Human Growth Hormone Variant (hGHv)

DXCOREX analysis was performed on two high-resolution, but internally incomplete 3-D structures for hGHv either free or bound to the extracellular domain of its receptor, for which experimental data had been published [4446]. H-COREX analysis proceeded as if the unassigned internal gaps did not exist in the protein’s primary sequence, and DXMS probe peptides spanning structure-determination gaps were excluded from subsequent correlation analysis. The correlation coefficients of DXMS experimental data versus the corresponding DXCOREX-predicted data sets were 0.92 (free) and 0.76 (bound) (Figure 4A and B). The difficulty that H-COREX had with internal structural gaps may have contributed to the substantial decrement in correlation between DXCOREX-calculated and DXMS experimental data in bound versus free hGHv. For this reason, all other analyses were performed only on proteins with complete internal 3-D structure determinations (see Supplemental Material, Proteins Studied).

Figure 4
figure 4

Free and hGHbp bound hGHv. Pearson correlation analysis between the DXMS experimental deuteron incorporation of the peptide probes( 14 peptide probes in free hGHv; 13 peptide probes in hGHbp bound hGHv) and the deuteron incorporation calculated for these peptides by DXCOREX analysis of the proteins’ crystal structures. (A) Free hGHv and (B) hGHbp bound hGHv

3.7 Catalytic Subunit of PKA E230Q Mutation

In studies of the catalytic subunit of PKA with an E230Q mutation [47], in which the deuteron incorporation of 76 proteolytic fragments at 6 on-exchange durations were reported [48], the correlation coefficient was 0.75 (Supplemental Material, Figure S3).

3.8 α-Actinin CH2 Domain

For an α-actinin CH2 domain study [49] reporting the exchange behavior of 16 proteolytic fragments of this protein [50], the correlation coefficient versus DXCOREX-predicted data was 0.82 (Supplemental Material, Figure S4).

3.9 Bound PPARγ LBD

For the rosiglitazone-PPARγ LBD complex, the exchange behavior of 23 peptide probes followed in a DXMS study [51] were highly correlated (R2 = 0.80) with the behavior calculated for these peptides by DXCOREX analysis of the protein’s crystal [52] (see Supplemental Material, Figure S5).

3.10 Free-PPARγ LBD

For the peroxisome proliferator-activated receptor γ ligand binding domain (PPARγ LBD) the exchange behavior of 23 peptide probes followed a DXMS study [51]was correlated (R2 = 0.53) with the behavior calculated for these peptides by DXCOREX analysis of the protein’s crystal structure [52].

3.11 Cerezyme

For cerezyme, the exchange behavior of 31 peptide probes followed in the DXMS study at multiple on exchange time points (30 s–3000 s), and at two different pH conditions [53], were correlated, (R2 = 0.54, pH 6.1, and R2 = 0.56, pH 8.8) with the results of DXCOREX-analysis of the protein’s crystal structure [54] (see Supplemental Material, Figure S6).

3.12 A4V Mutant of Superoxide Dismutase (A4V SOD), Wild-Type Superoxide Dismutase (WT SOD)

For the A4V SOD and WT SOD, the exchange behavior of eleven peptide probes followed in the proteins after on-exchange for four differing durations in the DXMS study [55] were highly correlated (A4V SOD, R2 = 0.83; WT SOD, R2 = 0.71) with the behavior calculated for these peptides by DXCOREX analysis of the proteins’ crystal structures [56] (see Figure 5A and B, respectively).

Figure 5
figure 5

SOD A4V and wild type SOD. Pearson correlation analysis between the DXMS experimental deuteron incorporation of 11 peptide fragments and the deuteron incorporation calculated for these peptides by DXCOREX analysis of the proteins’ crystal structures. (A) A4V SOD using conventional loss rates, (B) WT SOD using conventional loss rates, (C) A4V SOD using A4V-optimized loss rates, and (D) WT SOD using A4V-optimized loss rates

3.13 Correlation Is Improved Between DXMS Data and DXCOREX Calculations When Optimized Amide-Specific Deuterium Loss Factors Are Employed

The method used in DXCOREX for calculating deuteron losses occurring after exchange-quench is based on an algorithm established several years ago employing NMR measurements of model dipeptides [23, 35, 36]. We reasoned that some of the variance between the experimental and DXCOREX results might be due to inaccuracies arising from the application of these NMR-based rates to the rather different conditions of sample processing, and gradient LC-MS analysis that are peculiar to DXMS analysis. The experimental data for the two SOD proteins noted above, which had been acquired on the same apparatus, employing identical sample processing and analytical conditions, offered an opportunity to assess this possibility. The number of deuterons on any peptide probe fragment f, which covers residues from m to n at time point t can be calculated using

$$ {D_{{f,t\_theo}}} = \sum\limits_{{i = m + 2}}^n {\frac{{(1 - {e^{{ - {k_{{on\_ex,i}}}t}}}){e^{{ - {K_{{back\_ex,i}}}T}}}}}{{{F_{{avg,f}}}}}} $$
(8)

where

$$ {F_{{avg,f}}} = \frac{{\sum\limits_{{i = m + 2}}^n {{e^{{ - {K_{{back\_ex,i}}}T}}}} }}{{n - m + 1}} $$
(9)

k on_ex,i is the on-exchange rate calculated by DXCOREX. K back_ex,i is the amide-specific back-exchange rate.

Nonlinear optimizations were performed using MATLAB to get an optimized set of amide-specific back-exchange rates, K back_ex,i , that minimized the difference between D f ,t_theo and D f, t_exp for A4V SOD. The correlation graph of A4V SOD experimental deuteron incorporation versus DXCOREX calculated incorporation using the A4V SOD-optimized loss rates is shown in Figure 5C, where the correlation had improved to R2 = 0.92.

These A4V SOD-optimized loss rates were then used in the DXCOREX calculations for the other protein, WT SOD, and the result was that the correlation improved to R2 = 0.86 (Figure 5D). These results suggest that preliminary analysis of a reference protein can be used to identify amide-specific loss rates that more accurately reflect the actual losses occurring in a particular DXMS experimental configuration, in effect “calibrating” DXCOREX calculations for other proteins analyzed with the same experimental system and method.

3.14 Quantitative Assessment of the Accuracy of a Structural Model Proposed for the Catalytic Domain of Group VIA-2 Ca2+-Independent Phospholipase A2 (GVIA-2 iPLA2)

As an example of the use of this form of analysis to quantitatively assess protein structural models, we performed H-COREX calculations on a recently proposed 3-D homology model for GVIA-2 iPLA2 that is presently without an experimentally determined high-resolution structure [57]. This model was produced with the program predictprotein (PP), and was primarily based on the 40% primary sequence homology of (GVIA-2 iPLA2) with residues 475–806 of patatin, a potato lipase (PDB ID: 10XW) [57]. Results of DXCOREX analysis were compared with the experimental data reported for GVIA-2 iPLA2 in the publication (Figure 6), including comparative 3-D coloring representations of DXCOREX-calculated (Figure 6A) and experimentally determined (Figure 6B) DXMS data, ribbon diagram representations (DXCOREX-calculated (Figure 6C), and experimentally determined (Figure 6D); and comparison by Pearson correlation analysis of the experimental versus DXCOREX-calculated deuteration levels of the 18 peptide probes analyzed at seven on-exchange intervals in the study (Figure 6E). The correlation was high (R2 = 0.86), comparing favorably with the preceding validation studies of crystallographically-determined proteins.

Figure 6
figure 6

iPLA2. Comparison of DXMS data versus DXCOREX data sets calculated for a 3-D homology model of the catalytic domain of GVIA-2 iPLA2 [57]. In (A) and (B), the 3-D homology model of GVIA-2 iPLA2 shown in Figure 4 of the reference publication [72] has peptide probes colored with (A) DXCOREX-calculated data for 3000 s on-exchange duration, or (B) DXMS experimental data for matched probe peptides at 3000 s on-exchange, as shown in Figure 4 of the reference publication [57]. Ribbon diagram representations of exchange behavior of peptide probes from protein on-exchanged for seven differing on-exchange durations (10 s–10,000 s) are shown for DXCOREX-calculated datasets (C) and DXMS experimental data (D). (D) Adapted from the reference publication [57], displaying residues 482–799 of the catalytic domain of GVIA-2 iPLA2. Pearson correlation analysis of the experimental versus DXCOREX-calculated deuteron incorporation of the 18 peptide probes reported in [57] is shown in (E)

4 Discussion

We have demonstrated that DXCOREX can accurately calculate the hydrogen exchange behavior of proteins in a manner that allows direct comparison of structural models with available DXMS experimental data. To assess the general applicability of the method, we employed published hydrogen exchange data that had been previously produced by ourselves and others employing a variety of experimental approaches, ranging from MALDI to LC-MS. For 13 study proteins with known structures, we found a high correlation between DXCOREX-calculated exchange behavior and exchange behavior measured by hydrogen exchange experiment. Furthermore, we provided an example of how the accuracy of a hypothesized 3-D structure for a protein can be quantitatively assessed by comparison of the DXCOREX-calculated exchange behavior predicted for the structure with the protein’s DXMS experimental data. Taken together, these studies indicate that DXCOREX analysis can significantly increase the precision of structural inferences drawn from DXMS data, and provide further validation of the ability of COREX (and H-COREX) to accurately capture a protein’s thermodynamic stability profile at single residue resolution.

DXCOREX is likely to substantially benefit most applications of hydrogen exchange analysis. The use of hydrogen exchange data in the assessment protein structure has been, for the most part, qualitative and approximate. It is typical in the field for exchange information derived from a protein of unknown structure to be visualized and presented as a differentially colored representation that is decorated upon a (presumably) homologous known structure. The availability of DXCOREX now invites investigators to construct exact models for the unknown protein structure and then quantitatively assess the accuracy of the models with exchange data. In the present study, DXCOREX was used to validate the accuracy of a specific protein structural model proposed on the basis of homology modeling and molecular dynamics. While DXMS is frequently used to assess protein–protein and protein–small molecule interactions, inferences as to the nature of docking surfaces and induced conformational changes are typically qualitative and of low resolution even when the high-resolution structures of the apo-forms are known. DXCOREX allows quantitative assessment of the accuracy of specific high-resolution models for such binding complexes with exchange data, and if the models prove accurate, results in a substantially more precise understanding of the complex’s structure. Thus DXCOREX can be employed to quantitatively assess the accuracy of binding interactions predicted by computational docking algorithms, and facilitate identification of specific binding modes of small molecules to potential therapeutic target proteins.

The availability of DXCOREX-validated structural models may provide a route to entirely new uses for hydrogen exchange experimentation, for example the phasing of X-ray crystallographic data. Molecular replacement-facilitated phasing of X-ray diffraction data for structure determination requires the availability of an approximate structural homolog for the structurally uncharacterized protein under investigation. If a structural homolog cannot be identified (the search for one is usually based on protein primary sequence homology), phasing often then requires extensive additional protein production, crystallization, and data acquisition on heavy-metal derivatives of the unknown protein. This is the situation encountered with what are often the most interesting of structures to be solved, proteins with novel folds. DXCOREX might enable the sufficient refinement structural models proposed for the protein that they can allow diffraction data phasing for the protein to proceed, with the models in effect serving as virtual 3-D virtual homologs. DXCOREX may allow hydrogen exchange data to be used to search the universe of known structures contained within the Protein Data Bank (PDB) for those with matching calculated exchange profiles. In this manner, the recognition of thermodynamic homology between the unknown protein and structurally characterized proteins in the PDB allows the identification of potential phasing-facilitating structures, despite their being unrecognized as such on the basis of primary sequence homology alone. High-throughput platforms will required for the performance of the numerous DXCOREX calculations needed for such applications.

This quantitative method of analysis is in its infancy. While there was high correlation between experimental and DXCOREX calculated exchange behavior for each of the 13 proteins studied, the degree of correlation varied widely with R2 values ranging from 0.54 to 0.96. Efforts to identify the source(s) of this variation, and implement corrective measures are in progress. Truhlar et al. have previously noted that the exchange behavior of non globular proteins (such as IκBα) may be more accurately assessed than that of globular proteins when employing surface-accessibility based metrics [58]. Improvements in experimental methods are now producing DXMS data that approaches single-amide level resolution, and improvements in DXCOREX that allow increasingly accurate residue-specific exchange rate calculation are forthcoming. These combined improvements offer the possibility of quantitative evaluation of structural models by DXCOREX at the single amide level. Reliable DXCOREX analysis requires complete, continuous specification of the coordinates of a protein’s structure that contribute to its thermodynamic stability. Disordered or unstructured regions of proteins typically remain unresolved in otherwise well-determined crystal structures. Their disorder makes it likely that such residues contribute little to the stability of other regions of the protein. The nature of the present embodiment of the COREX algorithm is that it ignores any residues without determined coordinates. When C and/or N-terminus residues are missing assigned coordinates, we computationally truncate the protein’s sequence, eliminating these disordered regions, before performing COREX analysis on the remainder of the determined structure. However, if a protein has no determined coordinates for internal residues, this approximation leads to an inaccurate partition function and subsequently to incorrectly calculate residue exchange rates. Furthermore, the present form of COREX cannot operate on proteins containing post-translational modifications. In this study, with the exception of the high-affinity human growth hormone variant (hGHv), we restricted H-COREX analysis to proteins that have high-resolution 3-D structures without significant internal gaps. Further investigation is needed to assess the limits and reliability of H-COREX analysis, and implement the ability to operate on proteins with internal gaps in determined coordinates and common post-translational modifications.

There is no standard method for performing DXMS data acquisition, with some, but not all, investigators reporting corrections for deuterium loss (back-exchange) during analysis. Losses after institution of quench occur during the denaturation, proteolysis, LC, and electrospray/desolvation steps of LC-MS mediated analysis. MALDI experiments similarly have losses at the various steps of the procedure. While losses can be approximated by computational methods [23, 35, 36], the varying temperatures, salt concentrations, pH (employing pH buffers with differing temperature dependence of chemical activity), and varying LC eluent (acetonitrile) concentrations employed by investigators render these estimates inexact. Some investigators experimentally measure losses on a peptide-specific basis, utilizing an equilibrium-deuterated protein sample, where the deuteration level of each amide at the time of quench can be assumed, for calculation of losses after quench. In these studies, average loss factors are calculated for each peptide under the experimental conditions employed, and these values used to correct for losses on a peptide-specific basis [59, 60]. Other investigators do not specifically quantify losses for each peptide, but assume they are the same for a given peptide in comparative experiments performed under identical experimental conditions. We have found that inaccuracies in the estimation of peptide-specific deuteron losses after quench in experimental data may account for some of the residual disagreement between experimental data and DXCOREX calculations. We have also found that suitably designed preliminary exchange experiments and DXCOREX analysis of proteins with known structure can be used to accurately “calibrate” the peptide-specific deuterium losses for a particular experimental apparatus and method, which can then be applied to other proteins analyzed on the same system. The result is a more accurate correction for deuterium losses and, consequently, a higher correlation between experimental data and DXCOREX calculation.

Improvements in the methods used for comparison of DXCOREX-calculation versus experimental data are also needed. Alternative statistical methods that can assess, from a 3-D perspective, how well experimental and structure-calculated values agree will allow determination of which portions of a hypothesized structure are in agreement with the experimental data and which are not. Alternative approaches to the calculation of peptide amide hydrogen exchange rates from 3-D structures or amino-acid sequence have been described, and may prove to be complementary to DXCOREX in the use of DXMS data to assess the accuracy of proposed structural models [6164].

We envision that comparative DXCOREX analysis can be applied in an iterative manner, where an initially posed structural model for a protein is first assessed, identifying regions of the DXCOREX-analyzed model that variously agree or disagree with the experimental hydrogen exchange data. The structural model can then be refined in regions where the DXMS data indicate disagreement, and refined models reassessed by repeat comparative DXCOREX analysis. With modest improvements in computational resources, it is possible to integrate large-scale structural hypothesis generation and refinement with DXCOREX-calculation, and thereby employ high-resolution exchange data to efficiently guide the refinement of structural hypotheses for challenging proteins, when DXMS data are available for them. Recent advances in the ability to calculate accurate structural models, as with Foldit, may be enhanced by the availability of DXCOREX [65, 66]. Perhaps one of the most fruitful applications of this approach to 3-D structure assessment will be in the validation of hypothesized 3-D structures for integral membrane proteins, both free and complexed with intracellular and/or extracellular ligands. DXMS experimental methods have recently been successfully applied to such challenging proteins [9, 10, 6771].

5 Conclusion

DXCOREX analysis allows quantitative assessment of the degree to which experimental hydrogen exchange data support proposed models for protein structure. While the data employed can be acquired with a wide variety of experimental platforms and methodologies, the stringency of the assessment depends on the quality (comprehensiveness, resolution, and accuracy) of the hydrogen exchange experimental data. High-quality DXMS data, combined with DXCOREX analysis, may allow rigorous, single-amide resolution testing of specific structural models proposed for study proteins, in contrast to the conventionally employed approach of representing data variously as heat maps, ribbon diagrams, or mirror diagrams, that are then used to qualitatively and often subjectively, argue in support low resolution models for protein structure. DXCOREX will be available as a supported web-based application at http://DXCOREX.ucsd.edu.