1 Introduction

Solution-phase hydrogen/deuterium exchange (HDX) monitored by mass spectrometry (MS) has emerged as a powerful tool to study protein interactions and dynamics, based on determination of secondary structure stability and solvent accessibility [1, 2]. Moreover, HDX monitored by MS can be performed for protein complexes for which X-ray crystallography and/or nuclear magnetic resonance may be unavailable (e.g., aggregation, absence of crystals, disordered crystals, insufficient concentration, too high molecular weight, etc.) Typically, an HDX MS experiment starts by incubating the protein in deuterated buffer for each of several periods, after which the exchange is quenched by lowering the pH, and the protein is then digested by an acid protease. The resulting proteolytic peptide fragments are desalted and separated by fast liquid chromatography, and introduced into a mass spectrometer. The increase in average mass of a given peptide fragment as a function of incubation period allows the calculation of exchange rate constants and, therefore, reveals protein stability and/or solvent accessibility as defined by protein–protein interactions or secondary structure hydrogen bonding.

Over the past decade, the need to increase throughput and manage the huge amount of data that results from an HDX experiment has stimulated the development of automated data reduction. The simplest algorithm provides a fast and automated way to calculate the average mass of each proteolytic peptide [3]. The most common tools employ deconvolution of the isotopic distributions to extract deuterium incorporation based on natural isotopic abundance [47]. A different approach based on a peak width method allows the characterization of HDX kinetics [8]. Our own analysis package makes use of the ultrahigh resolving power of Fourier transform ion cyclotron resonance (FT-ICR) MS to calculate deuterium content without the need to deconvolve isotopic distributions [9]. Other software tools such as DXMS [10], Hydra [11], TOF2H [12], The Deuterator [13], and HD Desktop [14] provide a user-friendly interface and go beyond the deuterium uptake calculation to data visualization and comparison.

The next step in automated HDX data reduction is to increase sequence resolution by exploiting the peptide fragment overlap that results from the proteolytic digestion step following HDX. The enzymes used for HDX (pepsin, protease XIII, or XVIII, etc.) exhibit less cleavage specificity than commonly used proteases (trypsin, chymotrypsin, V8 protease, etc.) [15]. Consequently, proteolytic fragments of various lengths may overlap in three ways: (1) the segments share a common N- or C-terminus but overhang at the other (e.g., XXX and XXXY or XXX and YYXXX); (2) one short segment is completely embedded in another and, therefore, the longer segment overhangs at both termini (e.g., XXYYYZ and YYY); and (3) the overhangs extend at both N- and C-termini of each segment (e.g., XXYYY and YYYZ). Fragment overlap increases sequence resolution, potentially localizing deuterium incorporation at the single amino acid level (e.g., XXXY and XXX).

Prior efforts to deal with peptide fragment overlap have been limited. Abzalimov and Kaltashov [7] designed an algorithm based on the maximum entropy method (MEM) but treated only two overlapped segments at a time. In Pascal et al.’s HD Desktop [14], if two overlapping fragments share a N- or C-terminus, the deuterium uptake value for the short fragment is simply subtracted from that of the longer one. The drawback of these two algorithms [7, 14] is that they treat the segments pairwise; they have to be aligned at either end, and since analysis is sequential, the error propagates every time a pair of peptide fragments is analyzed.

An elegant method to assign the rate constants to the overlapped peptide fragments was recently proposed by Althaus et al. [16]. That routine begins from input rate constants determined by some other tool (e.g., maximum entropy method or integer linear programming) and assigns those to given residues by use of a combinatorial approach.

Our approach is similar but combines the rate constant analysis and assignment into one step and is conceptually simpler. The algorithm uses raw, unprocessed deuterium uptake data, and fits all kinetic curves simultaneously to a sum of exponentials, one for each residue in a peptide sequence. To validate the algorithm, we created hypothetical overlapped peptide fragments with simulated rate constants for the amide hydrogens. As shown below, our procedure returns satisfactory results. As a practical example, we have used this tool to evaluate the experimental rate constants for the first 29 N-terminal amino acids of C22A FK506-binding protein (FKBP) digested with either pepsin or protease XIII. We were able to assign unique rate constants whenever the digestion pattern yielded sequence resolution at the single amide level.

2 Materials and Methods

Deuterium oxide D2O (99.9 %), formic acid, pepsin, and protease type XIII from Aspergillus satoi were obtained from Sigma Aldrich (St. Louis, MO, USA). All other chemicals and ultrahigh-purity HPLC solvents (water and acetonitrile) for mass spectrometric analysis were obtained from VWR International (Suwanee, GA, USA).

2.1 Hydrogen/Deuterium Exchange

Preparation of FKBP, the HDX procedure, and the subsequent analysis is described in [17]. Briefly, the protein (100 μL at 4 μM) was digested for 2 min with either pepsin or protease XIII, both prepared in 2 % formic acid at 1.5 mg/mL, pH meter reading 2.3. The digest was injected onto a ProZap C18 column and eluted at 300 μL/min with a short, 1.5 min gradient (2 % to 95 % acetonitrile) to minimize back-exchange [18]. The eluent was post-column split (1:100) and introduced into a hybrid 14.5 T LTQ FT-ICR mass spectrometer by micro-electrospray ionization [19]. HDX was accomplished with a LEAP Technologies (Carrboro, NC, USA) robot controlling incubation in deuterated buffer for periods between 30 s and 8 h, low pH quench, and proteolytic digestion. Triplicate data were collected by XCalibur software (Thermo-Fisher, San Jose, CA, USA) and analyzed by a custom analysis procedure [9].

Back-exchange was estimated with the method of Zhang and Smith [20]. FKBP was fully deuterated under denaturing conditions (4 M urea, 40 °C and for 48 h) before proteolytic digestion and peptide mass measurement.

2.2 Computational Analysis

Here, we describe an algorithm to determine exchange rate constants of amide hydrogens from overlapped peptide fragments. Each amino acid backbone amide hydrogen is assumed to exchange at a single rate. The experimental deuterium uptake (DU) curves for all proteolytic segments were fitted simultaneously (see Section 3) to a sum of the exponential terms representing each single residue, Equation 1:

$$ DU_{j} = {\sum\limits_{i = 1}^n {a_{{ij}} {\left( {1 - e^{{\frac{{ - t}}{{\tau _{i} }}}} } \right)}} } $$
(1)

in which DU j is the number of incorporated deuteriums in fragment j after exchange period t, n is the final residue of the considered stretch of the backbone, and 1/τi is the exchange rate constant for a backbone amide belonging to residue i. Amine hydrogens of the first residue of a proteolytic peptide and the amide hydrogen of the second residue back exchange very quickly and are ignored. It is the third residue of a peptide fragment that is the first amide hydrogen considered in the analysis. If an i th residue is present in the j th fragment, a ij  = 1; otherwise, a ij  = 0. Time course of exchange for overlapped fragments is fitted simultaneously in a “global analysis” fashion [21]. The quality of the fit is assessed by the nonlinear least-square fit criterion (χ2). Conformational heterogeneity of the protein might result in a distribution of the exchange rates rather than a single value, leading to non-exponential behavior. This is easily modeled by a stretched exponential, modification of the exponent part of Equation 1 \( DU_{{j + 1}} = {\sum\limits_{i = 1}^n {a_{{ij}} {\left( {1 - e^{{ - {\left( {\frac{t}{{\tau _{i} }}} \right)}^{\beta } }} } \right)}} } \), but the experimental data are rarely of quality that warrants doubling the number of the optimized variables (the rate and stretching factor β). Moreover, the stretching factor was found before to vary between 0.8 and 1, which does not introduce large deviation from an exponential behavior [22].

For overlapped peptide segments, sequence resolution at individual amino acid level can potentially be achieved whenever the proteolytic digestion generates peptide fragments that align at the N- or C-terminus, but differ in length by a single amino acid in Figure 1b. In the same fashion, an internal residue can also be isolated if two internal fragments overlap by all but one residue. Whenever the digestion pattern does not isolate an individual amino acid as described above, stretches of two or more residues present in some fragments remain “unresolved.” We refer to them as “linked” residues (e.g., residues 6, 7 in Figure 1b. For linked residues, the algorithm still determines different exchange rate constants but cannot assign those rate constants to specific amide hydrogens. Additionally, we can choose to limit the number of different rate constants for linked residues to decrease fitted parameter space and computational time.

Figure 1
figure 1

(a) Simulated deuterium uptake as a function of HDX incubation period (top) for each of seven simulated overlapped peptide fragments of a polyAla nonapeptide (b). Data points correspond to the calculated deuterium uptake time constants from table in panel (d), the smooth curves represent the best fit generated by the algorithm corresponding to values in (d). The contour plots in (c) depict the errors (quality of fit χ2) in the fit for non-linked residues (4 and 5) and linked residues 6 and 7 denoted in (b) by gray area. The axis are in the units of log(1/rate). The simulated input value is denoted by cyan rectangle, the best found value by magenta star. The two areas for the linked residues 6, 7 reflect nonassigned solution for linked residues

The fit of exchange kinetics for all fragments is based on Nelder-Mead Simplex optimization [23], although Marquardt and Newton-Rawson optimizers are also available. The most significant modification of those algorithms is the implementation of user-defined lower and upper limits for parameters to be optimized. For example, only the rate constants that result in change in deuterium uptake during the experimental incubation period range need to be considered. Hydrogens that exchange before the first point can be assigned rate constants > 1/tfirst time point whereas hydrogens that do not exchange are assigned rate constants < 1/tlast time point; only the rate constants between those two limits are assigned a numerical value. In a similar fashion, prolines (with no exchangeable amide hydrogens) are automatically removed from fits according to the user-supplied primary sequence.

In order to ascertain exhaustive sampling of the parameter space, which in turn allows checking for uniqueness of the found solution as well as the identification of local minima, we repeat the optimization process with randomized starting parameters (Monte Carlo approach). This “brute force” approach is surprisingly quite computationally efficient, as tested by us in a number of other applications [24, 25]. The number of Monte Carlo attempts, Simplex iterations, and the values for lower and upper boundaries are all defined by the user. The search is in the log(time) space, which linearizes the parameter space. As expected, analysis time grows with the number of amide hydrogens in the overlap, but can be reduced by limiting the number of distinct rate constants for linked residues, coarse graining of the parameter space, narrowing the upper and lower limits, or lowering the number of iterations or Monte Carlo repeats. For the examples presented here, a few hundred Monte Carlo Simplex optimizations and 5000–10,000 iterations per optimization sufficed for convergence. Those parameters can easily be fine-tuned by the user. The analysis time ranged from a few seconds to 5 min/1000 Monte Carlo trials (with a 2.0 GHz PC), depending on the number of fragments and the number of amide hydrogens in the overlap. After a first round of optimization, the algorithm provides the option of refining the results for amide hydrogens present in fewer overlapped peptide fragments (e.g., residues 3 and 9 in Figure 1 are present in two fragments, whereas all other residues are present in three to four segments). In other words, the initial estimate is weighted/biased toward residues represented multiple times in different fragments. In the refinement phase, the algorithm fixes the rate constants calculated in the first step and searches only for the rate constants for the residues defined by the user. Because the number of variables is greatly reduced and typically one or two residues are refined, only 10–20 Monte Carlo simulations are required for the refinement.

3 Results and Discussion

3.1 Back-Exchange

In general, back-exchange may depend on peptide length: longer peptides are retained on the column for a longer period of time, and thus can undergo more back-exchange than shorter peptides. This effect disappears as peptide length further increases. In fact, for the peptides studied here (>10 aa), there was no length dependence on back-exchange which averaged 12 % ± 3 % (Table 1). This observation is probably the result of our efforts to minimize back-exchange by lowering the temperature to 0 °C and decreasing the peptides retention time to 0.7–1.5 min before mass measurement [18].

Table 1 Extent of Back-exchange

Five among the 20 peptide fragments presented here were detected in both the partially and fully deuterated FKBP, Table 1. For the remaining 15 segments, fully deuterated counterparts were not detected, perhaps because the proteolytic digestion produces different peptides under denaturing conditions. For those segments, the average back-exchange value determined above was used as a correction factor; although this approach ignores potential back-exchange dependence on sequence. The effect is likely to be less than 6 %, estimated from the largest difference between the “average” and “exact” back-exchange values (see Table 1). Thus, correction by average back-exchange versus “exact” correction would result in a difference of 0.6 deuteriums in a 10-amino-acid-long peptide, which is taken into account as an additional error in the analysis algorithm (see below).

3.2 Testing the Algorithm: Proof of Concept

To test the global analysis algorithm, we generated simulated deuterium uptake versus exchange period data based on user-specified time constants for each of the amide hydrogens in the overlapped peptide fragments. Figure 1a shows a semi-log plot of deuterium uptake as a function of HDX incubation period for seven simulated overlapped fragments: amino acid residues 1–4, 1–5, 2–7, 2–9, 3–7, 3–8, and 4–9 Figure 1b. Amide hydrogens 6 and 7 are linked residues, whereas 3, 4, 5, 8, 9 are at “frayed” ends, and therefore, resolved to the individual amino acid level. We assigned to amide hydrogens 3–9 the following rate constants: 10, 1, 0.1, 1, 0.1, 10, and 1 min–1, Figure 1d. Residue 1 and residue 2 (first amide) are not assigned any values and not taken into analysis as the back exchange for these residues is assumed to very fast.Footnote 1 The plotted data points represent the simulated deuterium uptake computed for each incubation period, based on the assigned time constants. Each smooth curve corresponds to the best fit generated by the algorithm (100 Monte Carlo/Simplex) with 1000 iterations per optimization). The computed time constants are in excellent agreement with the input values as listed in Figure 1d. For linked residues (6, 7) we obtain two different rates that are within 5 % of input values; however, as mentioned before, we cannot assign those values to any particular residue. Residue 4 is present in seven fragments and its computed value is in perfect agreement with the input value. Residues 3 and 9 are present in two fragments and, hence, they contribute less to the total exchange summed over all of the overlapped segments compared with residues present in more fragments. There are many ways of removing this bias (e.g., we could have normalized the χ values to the number of fragments in which a given residue is present, but it is easier to implement a second “refinement” stage optimization, in which the residues present in more than 50 % of the fragments are allowed to “float” in a limited range around the values found in the first stage (e.g., τi – 20 % < τi < τi + 20 %); whereas residues present in less than 50 % of the fragments float across full four orders of magnitude range of values. This strategy removes the bias of the well-represented residues 4–7 and fits the less-represented residues 3, 8, and 9. The improvement of the fit is 20-fold after the refinement stage and the resulting values are identical to the input values.

The agreement between the fits and the observed data is only one aspect of fitting. In cases where the problem is ill-defined or data are sparse, the uniqueness of the optimized solutions is equally important. To get an idea about the uniqueness of the solution, we are creating a hypersurface of the χ2 values by varying on a grid of two parameters holding the others at found minima, Figure 1c. The two-dimensional contours facilitate error evaluation. The errors are usually reported as χ2 + 1 contours that correspond to a 67 % confidence limit. For simulated data, the contours are narrow with the minima at the input values (i.e., the error associated with the fitting procedure is very small). For “linked” residues, we have two identical minima, because either of two combinations of the rates for residue i and j is possible. The width of contour plots can increase appreciably for the experimentally observed exchange rates. Figure 2c shows broad contours for solutions for two linked residues 3 and 4, whereas similar linked residues 12 and 13 have a small error.

Figure 2
figure 2

(a) Deuterium uptake as a function of HDX incubation period for overlapped peptide fragments (bottom) obtained by the protease XIII and pepsin digestion of FKBP N-terminal fragment (GVQVETISPGDGRTFPKRGQTAVVHYTGM). The smooth curves represent the best fit generated by the algorithm. (b) Proteolytic fragments and their assignment, black bars on top of the graph denote linked residues. (c) typical χ2 hypersurface contours for three pairs of residues. Rates for residues 12, 13, 14, 15 are determined with higher fidelity than the rates for resides 3 and 4 (broad contours)

3.3 Experimental FKBP HDX Monitored by FT-ICR MS

Having validated the algorithm for simulated data, we proceed to test it with experimental values for FKBP. FKBP was used previously to compare HDX rate constants obtained by FT-ICR MS and by nuclear magnetic resonance [17]. Figure 2 represents deuterium uptake kinetics for the overlapped peptide fragments obtained by proteolytic digestion of FKBP with protease XIII and pepsin. The results of the fits are shown in Table 2. Fragments from both proteolytic digestions have been combined for global analysis because the only explicit assumption of constant exchange rate per residue should be independent of the protease used.

Table 2 Comparison of HDX Rates Derived from MS and NMR

In HDX experiments, the range of determined exchange rate constants is dictated by the shortest and longest exchange incubation periods. For example, if we consider fragment 1 in Figure 2, we note that at least six amide hydrogens have exchanged before the shortest incubation period (30 s) and, therefore, have a “fast” exchange rate constant (k > 120 h-1). Similarly, the highest deuterium uptake value for the same fragment is 11, and because that fragment has 19 amide hydrogens, we deduce that at least 19 – 11 = 8 amide hydrogens have not exchanged by the end of the longest incubation period (8 h) and are, therefore, labeled “slow” (k < 0.13 h-1). We have an option of estimating the rates for these “fast” and “slow” residues in the initial search of first and final incubation points, after which their rates are allowed to vary ± 25 %, but in practice the program is fast enough to float even those residues. The black bars on top of Figure 2b denote five groups of linked residues (3, 4), (6, 7), (9–15), (16, 17), and (27–29) for which we can determine the individual rate constants but cannot assign them to specific residues.

3.4 Comparison of HDX Rate Constants from Mass Spectrometry and NMR

The method is applied to the comparison data of MS and NMR HD exchange published previously by Zhang and collaborators [17]. For linked residues, the results in Table 2 are grouped because the sequence resolution does not allow us to assign rate constants uniquely to specific amide hydrogens. We found that 74 % of the amide hydrogen rates from MS matched the rates determined by NMR. The agreement is not perfect because of the inadequacies of the back exchange correction in those old comparison data.

In conclusion, our tests validate the algorithm’s capability to recover correct rate constant values based on simulated deuterium uptake data. Our software program successfully extracted rate constants from deuterium incorporation data for several overlapped peptide segments following proteolytic digestion of FKBP with either pepsin or protease XIII. A major advantage of our algorithm is that it considers overlaps among all of the long stretches of proteolytic fragments, thereby increasing the sequence resolution. In other words, our procedure is capable of achieving sequence resolution better than that for an individual segment. Consequently, it relaxes the need to obtain the same peptide fragment for two states of the protein, (e.g., wild-type versus mutant, or apo versus ligand-bound) because it allows the comparison of rate constants across smaller, identical protein segments that do not necessarily belong to the same, original proteolytic fragment. Note that the rate constant for any residue is calculated from all the fragments simultaneously, limiting propagation of errors when each fragment is analyzed separately. It must be emphasized that the above conclusions are only valid in the context of error plots, Figure 1c and Figure 2c. These plots combine the limits of fidelity imposed by the experimental errors, uniqueness of fits, and sparsity of data. As in any other experimental data analysis, the answer is only valid if larger than the propagated errors. For example, when comparing two chemical states of the same protein (protein complex), we must make sure that the exchange rate difference is bigger than the contours at the desired confidence level.

Further improvements to the algorithm will include the development of a user-friendly interface that allows function management and cross-comparison between two states of the protein.