1 Introduction

In the rapidly expanding field of proteomics, tandem mass spectrometry (MS/MS) has become an invaluable tool for protein identification. Developing methods to correctly assign the MS/MS spectra is pivotal, so understanding and predicting gas-phase fragmentation of peptide ions is of growing interest [1]. Fragmentation is thought to proceed by both charge-remote and charge-directed mechanisms. Charge-remote fragmentation is invoked to explain the characteristic cleavage near certain residues, such as aspartic acid and histidine, when the number of ionizing protons does not exceed the number of strongly basic residues [25]. The ionizing protons are held tightly by the basic side chains, and the proton necessary for directing the cleavage reaction is supplied by the Brønsted-Lowry acidic side-chain adjacent to the peptide bond.

Charge-directed fragmentation is rationalized using the mobile proton model [6, 7], in which the ionizing proton is transferred from a basic site (such as an arginine side chain or the N-terminus) to the point of fragmentation. This intramolecular proton transfer is endothermic. It occurs only upon activation, as during collision-induced dissociation (CID). The added energy is said to “mobilize” the proton, allowing it to move to less-stable sites along the peptide backbone and promote cleavage. Fragmentation spectra are usually rich in sequence information [8]. In tryptic peptides, which bear a C-terminal lysine or arginine residue, the process can be facilitated by formation of a “salt bridge” between the basic C-terminal side chain and a carboxylic acid moiety, either at the C-terminus or on a side chain [811].

When the number of ionizing protons exceeds the number of strongly basic residues, less energy is needed to mobilize a proton onto the backbone. For example, singly protonated Ala5 can be dissociated more easily than singly protonated ArgAla4 because it lacks the basic Arg side-chain, which sequesters the proton strongly [7]. Wysocki and coworkers noted that although the carbonyl oxygen of a peptide linkage is more basic than the nitrogen site, protonation at oxygen strengthens the peptide bond, while protonation at nitrogen weakens it, thus activating it for cleavage [7]. Thus, backbone cleavage is believed to occur only where a proton resides on an amide nitrogen atom.

Popular software for extracting sequence information from peptide MS/MS spectra, such as SEQUEST [12] and Mascot [13], assumes that peptides fragment with equal probability along the peptide backbone. This is a crude approximation, as demonstrated by Harrison and co-workers for polyalanines [14, 15]. For example, the b +4 ion (see Figure 1 for nomenclature) is observed as the predominant peptide fragment from Ala5H+. Paizs and Suhai investigated this system theoretically, computing both thermochemistry and kinetics for the competing fragmentation reactions [16]. They found that both basicities and activation barriers favor cleavage of the peptide bond near the C-terminus, supporting the experimental results [14, 15]. Proton migration from the N-terminus (N1, the most basic site in the molecule) to the n th peptide moiety leads either to the b n-1 + or the y 6-n + ion. No chemical explanation was presented for the favorable energetics of protonation near the C-terminus.

Figure 1
figure 1

Singly protonated Ala5H+ and fragment-ion nomenclature. The ionizing proton occupies the most basic site (N-terminal amine). Upon mobilization it moves onto the peptide backbone, facilitating fragmentation

Ample computational studies, particularly by Paizs and coworkers, have demonstrated successfully that the computational theory of (pseudo-thermal) peptide fragmentation is the same as for any other thermal reaction in organic chemistry [1]. Because of the larger size of peptides, computations are more difficult and expensive than for small organic compounds, but conventional thermochemical kinetics still applies [17]. Indeed, thermochemistry and kinetics have been used as the basis of a heavily parameterized method for predicting peptide ion fragmentation [18]. Thus, a predictive, ab initio theory for peptide ion fragmentation is available: quantum chemistry provides the parameters needed in detailed kinetics theories, which then yield values for the rate constants that dictate product branching fractions. Unfortunately, this theory is available only in a formal sense. Actual computations are far too slow to be included in a proteomics workflow. Shortcuts are needed.

A similar challenge is found in other areas of chemistry. For example, comprehensive modeling of engineered combustion systems is desired so that computational design might increase the speed, and decrease the cost, of building new systems and accommodating new fuels. Progress is largely through improved shortcuts and approximations to more rigorous methods [19]. The most relevant for the present study are those involving the estimation of rate constants for elementary reactions [20, 21]. Linear free-energy relationships are often effective [22]. In particular, trends in rate constants are often the same as trends in thermochemistry.

In the present study, we make the simplifying assumption that all amide bonds break with equal ease, provided that they are first protonated on the nitrogen atom. This approximation avoids computations of transition structures, which are often difficult and time-consuming. In effect, we assume that the trend in cleavage rates along the backbone is the same as the trend in proton affinities (at nitrogen). This is similar to a hypothesis presented by Savitski et al. that cleavage propensity is proportional to the basicity of backbone oxygen atoms as inferred from a crystallographic database [23]. However, in our computations, the basicity depends upon sequence and conformation, and we attempt to predict fragmentation of individual peptides, not average behavior.

Attention here is limited to several non-tryptic peptides: singly protonated polyalanines ranging in size from Ala3H+ to Ala11H+, and protonated Leu-enkephalin (Tyr-Gly-Gly-Phe-Leu), one of the most studied peptides in mass spectrometry [24]. Relative proton affinities of backbone nitrogen sites are determined, and are discussed in light of previous experimental and theoretical reports [1416]. We also suggest a structural reason for the trends in proton affinities.

2 Computational Details

Certain commercial materials and equipment are identified in this paper in order to specify procedures completely. In no case does such identification imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the material or equipment identified is necessarily the best available for the purpose.

Locating low-energy minima of large and flexible molecules, such as peptides, is a challenging task and a topic of active research. To identify low-energy conformations of singly protonated peptides, we developed a procedure using a molecular mechanics-based Monte Carlo method with subsequent quantum-mechanical structure refinement and energy calculations of selected structures. Substantially similar procedures have been developed and used by others [8, 2527]. Singly protonated structures for Ala n H+ (n = 3–11) and protonated Leu-enkephalin were constructed by using an in-house Perl script. For each peptide, n constitutional isomers were generated by placing the proton on each nitrogen atom in turn. Each isomer was subjected to a conformational search using the Monte-Carlo Multiple Minima (MCMM) method as implemented in the MacroModel program [28, 29]. The OPLS2005 force-field [30] was used for these calculations.

To determine a reasonable number of MCMM steps, an initial round of calculations was carried out on Ala5H+ (protonated at N3; see Figure 1 for labeling) with the number of MCMM steps at several values between 1000 and 500,000. No new low-energy minima were obtained when the number of steps exceeded 20,000. Based on this result, the number of steps was chosen to be 50,000. Conformers were discarded if they were at least 200 kJ/mol above the most stable conformer. All other parameters controlling the calculations were set to their default values. This procedure resulted in 600 to 12,000 conformers for each isomer of each chain length.

Since our model is based upon thermodynamics, it assumes that all degrees of freedom are equilibrated. In particular, the mobile proton samples all possible sites in proportion to their Boltzmann weights [31]. It is well known that the cis-trans isomerization barrier in amides is greatly reduced by N-protonation [32]. More quantitatively, for formamide we find from frozen-core MP2/6-311 + G(d,p) calculations that the barrier is 68 kJ mol–1 when neutral, 172 kJ mol–1 when O-protonated, and 2 kJ mol–1 when N-protonated. Thus, during the conformational searches, amide C–N bonds and carboxylic hydroxyl groups were allowed to adopt both cis- and trans-conformations. The ramifications of this assumption are discussed in the following Section.

OPLS2005 is a well established force field for generating geometries and energies of peptides and proteins. However, it was not parameterized for gas-phase ions, reducing our confidence in its predictions. Therefore, further structural and energetic refinement was done using ab initio methods. For each constitutional isomer, the large set of conformations from the MCMM/OPLS2005 calculations was pruned in stages. Each set was sorted by OPLS2005 energy. All structures below 41.8 kJ/mol, plus 100 structures uniformly distributed between 41.8 and 200 kJ/mol, were subjected to single-point energy evaluations using the Hartree-Fock method and the 3-21G basis set (HF/3-21G). We found the HF/3-21G energies to be well correlated with the OPLS2005 energies (0.98 > linear correlation coefficients > 0.86; see Supporting Information), increasing our confidence in the OPLS2005 energies.

The pruned set was sorted by HF/3-21G energy and pruned further to include only conformers with energies below 50 kJ/mol, discarding conformations that were structurally similar. Two conformations were considered similar if none of their independent dihedral angles (including heavy atoms and polar hydrogens) differed by more than 50°. The geometries of the selected conformers were fully optimized at the HF/3-21G level. To increase confidence in the energies (e.g., by considering electron correlation, including van der Waals interactions), each resulting structure was finally subjected to a single-point RI-LMP2/cc-pVDZ energy calculation (localized-orbital, second-order perturbation theory with the resolution-of-the-identity approximation). Relative proton affinities were computed as the difference in RI-LMP2/cc-pVDZ energies. No thermal or zero-point corrections were made in estimating the relative proton affinities, to avoid the expensive computation of vibrational frequencies. Since we are comparing similar isomers, neglecting such effects is a reasonable approximation [33, 34]. For the same reason, we neglect entropic effects and discuss proton affinities (enthalpies) instead of gas-phase basicities (Gibbs energies). The Gaussian software package [35] was used for the Hartree-Fock calculations and the QChem package [36, 37] for the RI-LMP2 calculations.

As a test of the conformational predictions obtained using OPLS2005, a systematic (but not exhaustive) search was done for Ala3H+ (protonated on N2) on the HF/3-21G potential energy surface. There were 5,832 initial structures generated on a coarse grid in (independent) dihedral angle space (120° or 180° increments, as appropriate). Each initial structure was energy-minimized while constrained to its grid point. A subsequent, full optimization was done without any constraints. Among the resulting structures, two were considered identical if their electronic energies, nuclear repulsion energies, geometric-mean rotational constants, and dipole moments were the same within certain tolerances (10–5 hartree, 0.15 hartree, 0.001 GHz, and 0.03 Debye, respectively). Finally, structures were discarded if any bond had stretched more than 50%, indicating isomerization. There were 1,065 distinct conformers found. The most stable conformer was the same as that found using the force-field-based protocol described above, sustaining our confidence in the OPLS2005-based protocol.

3 Results and Discussion

The purpose of this study is to test the hypothesis that the fragment ion distribution reflects the relative stabilities of the corresponding N-protonated isomers. The hypothesis is illustrated by Figure 2. Collisional or radiative activation raises the effective temperature of the ion ensemble, creating small populations of high-energy isomers, including some that are protonated on backbone N atoms. The N-protonated isomers have weakened amide bonds and are poised for fragmentation. As stated above, the energetic barrier to dissociation, denoted E* in Figure 2, is presumed equal for all such isomers. This is the simplifying approximation that permits us to avoid explicit computation of transition structures and barrier heights. Thus, the rate of dissociation at each peptide bond is predicted from the stability of the corresponding N-protonated isomer. There are probably intermediates involved in proton migration, such as O-protonated isomers, but they are not of concern here; final dissociation is presumed to be the rate-limiting step that dictates the product branching fractions.

Figure 2
figure 2

The fragmentation hypothesis examined in this study derives from the well-known mobile proton model, with additional, simplifying assumptions

For each peptide, Ala n H+ (n = 3–11), the energy was computed for the most stable conformer of each constitutional (N-protonated) isomer using the RI-LMP2/cc-pVDZ//HF/3-21G computational model. Energies are shown in Table 1, relative to the isomer protonated at the terminal amine. There is a clear trend: the most favorable site of amide nitrogen protonation is that neighboring the C-terminus. The one exception is Ala7H+, for which protonation at N5 is more favorable than at N7. There is no obvious reason for n = 7 to be exceptional, and we made extra, unsuccessful efforts to find an N7-protonated isomer of greater stability. We believe this exception does not indicate a real difference, but instead reveals a weakness in our computational protocol.

Table 1 Relative RI-LMP2/cc-pVDZ//HF/3-21G energies (in kJ/mol) of backbone N-protonated isomers of Ala n H+ (n = 3–11) and of Leu-enkephalin; N1 is the terminal amino group. The most favorable peptide site is in boldface

Except for the N-terminus, all the backbone nitrogen sites are the same functional group and would be expected to have equal proton affinities. Thus, the surprisingly large differences in proton affinity must arise from conformational effects. All HF/3-21G-optimized structures were inspected visually. Interestingly, most structures with the C-terminal amide nitrogen protonated adopt an α-helix (a right-handed helix, where the N–H group of one amino acid forms a hydrogen bond with the C = O group of the amino acid positioned four residues nearer the N-terminus, so-called i + 4 → i hydrogen bonding). In contrast, all isomers protonated at the N-terminus (that is, the amino group) favor globular conformations, with the protonated amine solvated by carbonyl oxygens. Thus, a conformational change is required to mobilize the proton from the N-terminus to the C-terminal amide nitrogen. It is known that globule-helix conformational transitions of peptides are rapid [38, 39]. Seemingly at odds with our conclusion, experimental and molecular dynamics (MD) studies have suggested that no helical structures should be expected from protonated Ala15H+ [40]. However, those studies were restricted to the predominant, low-energy isomers. The experimental study was carried out at room temperature (undetectable populations of high-energy isomers) and the MD calculations only considered N-terminal protonation (protons are immobile in a classical MD simulation).

Helix formation is not surprising in these systems. It is known that alanine polypeptides form unusually stable α-helices in the gas phase when they contain a positively charged lysine at the C-terminus [4144]. Most dramatically, Jarrold and co-workers discovered that Ala15LysH+ and Ala20LysH+ maintain helical conformations up to the fragmentation temperature of 725 K [44]. Quantum chemical computations by Dannenberg and coworkers showed that a positive charge placed at the C-terminus stabilizes polyalanine helices [45]. The stability of the α-helix and its interactions with ions has been rationalized in terms of charge-dipole interactions and polarization, sometimes termed collectively the “macrodipole” [4650]. Thus, protonation near the C-terminus and helix formation are mutually stabilizing. However, the influence of peptide conformation on the fragmentation pattern has been discussed in the literature only to a minor extent [51, 52].

It should be emphasized that most of the experimental peptide population has the proton located at the most stable position, the N-terminus, at all internal energies (i.e., at all effective temperatures), even at the threshold of dissociation (barring entropic factors). Other low-energy forms, such as many O-protonated isomers, will also be substantially populated. The higher-energy, N-protonated isomers represent only a small fraction of the ion population. For example, using relative energies from Table 1, at an effective temperature of 800 K the Boltzmann weight for the N5-protonated isomer of Ala5H+ is only 10–5. These rare isomers are unlikely to be detected in experiments such as ion mobility spectrometry. Nevertheless, they are of primary importance in fragmentation because they are the gateway isomers for backbone cleavage, according to the mobile proton model.

To explore the effect of conformation upon proton affinity, we investigated linear, extended conformations of Ala11H+ isomers. These structures were generated by constraining the peptide backbone to remain planar during the geometry optimizations. Surprisingly, the trend in backbone proton affinities is the opposite of that described above. The energy increases as the proton is moved from the amide nitrogen near the N-terminus (N2) towards the C-terminus (N11) (see Figure 3, “linear”). The relative energies of fully optimized structures, from Table 1, are included in Figure 3 (“helix”) for comparison. The amide C(O)–N dipoles are aligned in the extended conformation. This is similar to the helical situation but with the dipoles pointed in the opposite direction. Protonation is favored toward the N-terminus of the molecule. Fixed-geometry calculations were done with the neutral, spectator amide NH groups mutated into CH2 groups (see Scheme 1), thus eliminating the C-N dipoles but retaining the perpendicular C = O dipoles. This change erased the stability trend (Figure 3, “linear polyketone”), confirming the effect of the macrodipole in the linear polypeptide.

Figure 3
figure 3

Relative RI-LMP2/cc-pVDZ//HF/3-21G energies (kJ/mol) of N-protonated isomers of optimized (mostly helical) and extended Ala11H+ and the linear polyketone analogue

Scheme 1
scheme 1

Removing the effects of spectator amide C–N dipoles

The electrostatic models are summarized pictorially in Figure 4 (dipolar arrows point from positive to negative). As a numerical test, we computed the dipole moment of neutral Gly11 in both helical and extended conformations, at the MP2/6-31 + G(d)//HF/3-21G level. For the helix, the moment is 47 D (1 D ≈ 3.336 × 10–30 C m), with the negative end toward the C-terminus, while for the extended conformation the moment is 25 D, with the negative end toward the N-terminus. For comparison, the extended polyketone has a moment of only 1.2 D. Conformational stability depends upon the interaction between amide dipoles and the monopole of the proton, and upon hydrogen bonding. Dipole cooperativity (polarizability) is important in α-helices [5356] and may also play a role in stabilizing other conformations as has been suggested, for example, in β-amyloid peptide aggregation [57].

Figure 4
figure 4

Macrodipole effects in helical peptides, extended peptides, and extended polyketone amides. The alignment of dipoles from individual functional groups determines the overall direction and strength of the electrostatic field

To proceed beyond the artificial polyalanine systems, we investigated protonated Leu-enkephalin, (Tyr-Gly-Gly-Phe-Leu)H+. Our calculations (included in Table 1) indicate that protonation at N5 is favored by 23 to 26 kJ/mol relative to the other amide nitrogens. The helical structures of protonated polyalanines, as discussed above, should not be surprising because alanine is a known alpha-helix former [58]. Leu-enkephalin does not contain alanine, but its N5-protonated form shows a helical structure (Figure 5). (For Leu-enkephalin, a large set of 100 low-energy N5-protonated conformers was considered, instead of only 10; the conformer with the lowest energy was absent from the smaller set of 10.) The isomers protonated at N2, N3, and N4 display cation-π interactions between the protonated amide and the aromatic ring of the Tyr residue [22], but these interactions evidently are not strong enough to compete with protonation at N5.

Figure 5
figure 5

The most stable amide-N-protonated isomer of Leu-enkephalin-H+

As described above, in our computations we assumed that the mobile proton will catalyze trans-cis isomerization of peptide bonds, so we included both possibilities in our Monte Carlo conformational searches. However, we also considered a modified model in which all peptide bonds are restricted to the trans conformation. The results (see Supporting Information) differ only slightly from those of our more liberal model. In particular, the most stable conformations of the most stable isomers are all trans. Thus, our results reveal nothing about the facility of trans-cis isomerization in peptide ions.

4 Comparison with Experiments

Before comparing our model with experimental observations, it is necessary to point out three difficulties in making the comparison. (1) Our computations are approximate. Even if the underlying model is excellent, our numerical predictions may not be. (2) The theoretical model presented here is intended to identify the sites of fragmentation, but is mute about the ultimate location of the ionizing proton. That is, we say nothing about whether cleavage will result in a b ion, a y ion, or some other ion. The fragment with the higher proton affinity will retain the proton [1, 59], analogously to “Stevenson’s Rule” in electron-ionization mass spectrometry [60], but fragment proton affinities are not part of the model presented here. (3) Experimental spectra include not only the primary product ions that result from fragmentation of the parent peptide ion, but also secondary products that result from subsequent dissociation of fragment ions. For example, b ions fragment sequentially to smaller b ions in the sequence \( b_n^{ + } \to b_{{n - 1}}^{ + } \to b_{{n - 2}}^{ + } \to \ldots \) [14]. Thus, spectra depend upon the amount of energy deposited in the parent ion and the timescale of the experiment, with higher energy (and, therefore, certain instrumentation [61]) favoring smaller fragment ions. To identify the initial fragment ions, double resonance [62] is necessary to establish ion parentage. For example, a CID spectrum of Leu-enkephalin suggested that only half the primary product ions were b 4, but the correct fraction is 93%, as determined by double-resonance experiments [63]. A reasonable alternative is to measure spectra over a range of excitation energies, that is, to measure the breakdown curve. The branching fractions in the low-energy limit are likely to correspond to primary fragmentation. Unfortunately, data are seldom available from either of these experimental techniques. The best experiment for comparison with our model is threshold CID, which yields the zero-temperature energy threshold for each primary product ion (and sometimes also for secondary ions) [6466]. However, data analysis is complex for such experiments; tripeptides are the largest peptides studied so far.

Our model, as presented above, predicts that the intensities of primary fragment ions will correspond to the stabilities of the corresponding N-protonated isomers. For equilibrium abundances or kinetic branching fractions, Boltzmann weights are more appropriate than raw energies. These are listed in Table 2, as derived from the energies in Table 1 assuming an effective temperature of 800 K. For comparison, available experimental data are summarized in Table 3. In Table 3 the fractions indicate the prevalence of fragmentation observed at each peptide bond, irrespective of the identity of the resulting ion. For example, for Ala5H+ the value for N2 includes b +2 , y +3 , and a +2 together. In the case of Leu-enkephalin, some ions are internal, resulting from the cleavage of two peptide bonds. For such ions, half the intensity is assigned to each peptide site. Unassigned ions are ignored. CID breakdown data are summarized in Table 3 by a value at low energy, with an inequality symbol to indicate the slope of the curve. For example, a value listed as “<15” means that the cleavage site accounts for 15% of ions observed, but with a positive slope, so that a smaller percentage is expected in the low-energy limit. When the slope is close to zero, no trend is specified in the table. No experimental data are available for Ala9H+, Ala10H+, or Ala11H+.

Table 2 Normalized Boltzmann weights (T eff = 800 K) for competing, amide N-protonated isomers of Ala n H+ (n = 3–11) and of Leu-enkephalin; energies are from Table 1
Table 3 Experimental branching among backbone cleavage sites

Comparing Tables 2 and 3, we consider each peptide in turn. For Ala3H+, the computations predict nearly equal fragmentation at N2 and N3, while the observations are that N3 predominates heavily. For Ala4H+, the computations predict N4 to dominate, while experiments show N3 > N4 > > N2. For Ala5H+, the computations predict N5 to dominate, in agreement with four of the five experimental reports. For Ala6H+, the computations predict N6 to dominate; experiments do show N6 predominant, but not as strongly. For Ala7H+, the computational results (which are suspect, as noted above) predict N5 to dominate, but the experiment shows most fragmentation at N7. For Ala8H+, the computations predict N8 to dominate; the experiment shows N8 ≈ N7. For Leu-enkephalin, the computations predict N5 to predominate, in agreement with the experiments. Overall, the computational predictions agree with experiments well enough to support the underlying model.

For all the peptides studied here, which contain only non-polar or weakly polar residues, the most favorable site for backbone N-protonation is that closest to the C-terminus. This isomer adopts a helical conformation and is stabilized by the corresponding macrodipole. It is tempting to speculate that this is a general phenomenon. Additional work is needed to explore the prevalence of this conformational, tautomeric pattern.

In proteomics, it is tryptic peptides that are most interesting, that is, peptides bearing either arginine or lysine at the C-terminus. However, the present study includes no tryptic peptides. Further work is necessary to determine whether the fragmentation of tryptic peptides is correlated with the energetics of N-protonated isomers, as it is for the non-tryptic peptides studied here.

5 Conclusions

The mobile proton model suggests that backbone fragmentation is preceded immediately by the migration of a proton to an amide nitrogen position. This suggests, in turn, that the thermodynamic stability of competing N-protonated isomers will dictate the sites of backbone cleavage. The evidence for polyalanines and Leu-enkephalin supports this hypothesis. For these systems, a helical macrodipole favors protonation adjacent to the C-terminus and predominance of b n-1 + ions.