Large-Scale Examination of Factors Influencing Phosphopeptide Neutral Loss during Collision Induced Dissociation
- 856 Downloads
Collision-induced dissociation (CID) remains the predominant mass spectrometry-based method for identifying phosphorylation sites in complex mixtures. Unfortunately, the gas-phase reactivity of phosphoester bonds results in MS/MS spectra dominated by phosphoric acid (H3PO4) neutral loss events, suppressing informative peptide backbone cleavages. To understand the major drivers of H3PO4 neutral loss, we performed robust nonparametric statistical analysis of local and distal sequence effects on the magnitude and variability of neutral loss, using a collection of over 35,000 unique phosphopeptide MS/MS spectra. In contrast to peptide amide dissociation pathways, which are strongly influenced by adjacent amino acid side chains, we find that neutral loss of H3PO4 is affected by both proximal and distal sites, most notably basic residues and the peptide N-terminal primary amine. Previous studies have suggested that protonated basic residues catalyze neutral loss through direct interactions with the phosphate. In contrast, we find that nearby basic groups decrease neutral loss regardless of mobility class, an effect only seen by stratifying spectra by charge-mobility. The most inhibitory bases are those immediately N-terminal to the phosphate, presumably because of steric hindrances in catalyzing neutral loss. Further evidence of steric effects is shown by the presence of proline, which can dramatically reduce the presence of neutral loss when between the phosphate and a possible charge donor. In mobile proton spectra, the N-terminus is the strongest predictor of high neutral loss, with proximity to the N-terminus essential for peptides to exhibit the highest levels of neutral loss.
KeywordsPhosphorylation Neutral loss Collision induced dissociation Phosphopeptide
Reversible protein phosphorylation is involved in the regulation of virtually all aspects of cellular function, with more than 10,000 human proteins known to be phosphorylated . Phosphorylation networks, comprised of complex pathways of kinases, phosphatases, and interacting regulatory proteins, enable the cell to adapt rapidly to diverse environmental stimuli by transduction of extracellular signals to the nucleus. Mass spectrometry has become the primary technology for large-scale phosphoproteomics, allowing thousands of phosphorylation sites to be monitored simultaneously and providing a global snapshot of complex signaling network responses to diverse biological stimuli . Unfortunately, the unusual chemistry of the phosphoester bond ensures that only a small percentage of these phosphopeptides can be identified by the predominant ion trap dissociation method, collision-induced dissociation (CID) . Furthermore, the exact amino acid position of the phosphate can be localized in only a minority of identified peptides . Phosphoester bonds in phosphoserine (pSer) and phosphothreonine (pThr) are highly labile relative to other bonds in peptides. As a result, phosphopeptide CID spectra are frequently dominated by a single peak corresponding to the neutral loss of phosphoric acid, H3PO4. Neutral losses from fragment ions further complicate the spectra by adding additional peaks to the spectrum, which confound commonly used search engines for phosphopeptide identification.
CID MS3 sequencing of the dominant H3PO4 precursor neutral loss ion yields a higher proportion of sequence-specific fragments. Several groups have found, however, that performing MS3 may lower identification rates , likely due to reduced sampling rate. Consequently, most identifications come from MS2 scans or multistage activation scans that combine information from both levels of fragmentation . Electron transfer dissociation (ETD) is a complementary fragmentation method that tends to preserve the phosphoester bond and results in greater fragmentation across peptide backbone sites, but works well only for a subset of cationic phosphopeptides with high charge density . Incidentally, CID performs well for localizing these same types of peptides, limiting the complementarity of ETD.
Loss of 98 Da in an MS2 spectrum is a positive indicator of the presence of phosphorylation, barring a few caveats . In practice, however, the degree of neutral loss is highly variable and in some cases not present at all, depending on the peptide sequence . The inability to predict neutral loss levels in CID MS2 from sequence alone reduces the effectiveness of phosphopeptide identification by common search engines by at least 40% . Accurate models to predict the fragmentation of phosphopeptides would enable more sophisticated and discriminative methods for phosphopeptide identification. However, the gas-phase chemistry of phosphopeptides is poorly understood. Peptide fragmentation during CID is generally explained according the mobile proton model , which assumes that fragmentation occurs either by charge directed mechanisms that require a proton at the site of cleavage, or charge remote mechanisms that do not require protonation but may involve participation of neighboring side chains. Thus, the pattern of fragmentation is strongly influenced by whether the peptide contains “mobile” protons that are not sequestered by basic residues and, consequently, free to migrate around the peptide.
The inverse correlation between neutral loss of H3PO4 and charge mobility in phosphopeptide MS2 has been well known for over a decade . This led Tholey et al. to propose a charge remote beta-elimination reaction of phosphate from pSer or pThr to form dehydrobutyric acid or dehydroalanine, respectively . More recent work has shown strong experimental and computational evidence that neutral loss proceeds by a charge directed mechanism involving nucleophilic attack by a backbone carbonyl oxygen to form an oxazoline ring [13, 14, 15, 16], which would require protonation of the phosphate. Intuitively, this would predict higher neutral loss in mobile proton phosphopeptides, contrary to previously observed trends. To address this discrepancy, formation of a stable hydrogen bonded structure between the phosphate and a protonated basic residue, usually arginine, in place of a mobile proton, has been proposed for conferring electropositive character to the phosphate when mobile protons are not present. These groups have also found that despite the uncommonness of sole loss of HPO3, there is a significant competing pathway in which HPO3 is lost from the phosphosite with concurrent or sequential loss of water from elsewhere in the peptide [15, 17, 18].
Further evidence for charge directed, ring-forming mechanisms was found by several groups that independently identified neutral-loss-dependent cleavage of the backbone [19, 20, 21]. This diagnostic cleavage occurs between the α-carbon and amide carbon, one bond N-terminal to the ring that was formed by neutral loss. The resulting ion has been variously described as an “x-type ion” , a “y + 10 ion” , and a neutral loss . The diagnostic ion is seen both in neutral loss from pSer and pThr  and during neutral loss of water from the non-phosphorylated cognates . Harrison also observed an additional ion attributed to a neutral loss mechanism analogous to oxazoline formation, but leading to a larger cyclized product incorporating one or more residues . Although the influence of surrounding amino acid side chains on dissociation events was not systematically explored, these studies point to previously unappreciated mechanisms underlying neutral loss in phosphopeptides that are presently not accounted for by current peptide identification algorithms.
Previous studies of phosphopeptide fragmentation mechanisms are largely based on small sets of phosphopeptide MS2, limiting their applicability for developing a general model of phosphopeptide neutral loss. In this study, we comprehensively examine neighboring residue effects that determine rates of neutral loss in phosphopeptide CID MS2, using 34,057 spectra from public spectral databases supplemented by spectra obtained in our laboratory from biological and synthetic sources. Using a robust nonparametric statistic based on changes in the quantile distribution of neutral loss levels conditioned on local sequence features, we show that immediately adjacent residues and those up to seven residues distal to the phosphosite influence the total amount of observed H3PO4 neutral loss. Distal basic sites, most notably the N-terminus, show strong effects on neutral loss and suggest mechanisms contrary to the mobile proton model, in which immobile protons participate in charge directed mechanisms by forming secondary structures
Phosphopeptide libraries were commercially synthesized by solid phase synthesis on Wang resin (Genscript, Piscataway, NJ, USA for libraries 1–10 or Anaspec, Freemont, CA, USA for libraries 11–13) (Supplementary Figure 1). Degenerate sites in library peptides were generated by adding a mixture of amino acids in certain coupling steps (Genscript) or by splitting the pool of peptides, performing parallel couplings, and recombining (Anaspec). The libraries contain either 60 or 720 expected unique sequences with between one and four phosphorylation sites. The library sequences were designed with the following principles: no peptides within the library should be isobaric, sequence correlation should be minimized, and peptides should bear similar characteristics to those observed in biological datasets. Libraries were received as lyophilized powders. Peptides were solubilized with 5% formic acid, 95% water solution, agitated for approximately 2 min, then diluted to 0.1% formic acid prior to LC/MS/MS analysis.
WM239A Dataset: Samples from Cellular Extract
Phosphopeptide samples were obtained from the human melanoma cell line, WM239A. Cells were lysed in boiling SDT buffer (100 mM Tris pH 7.6, 4% SDS, 100 mM DTT) and lysate was sonicated for 15 s. Buffer exchange, iodoacetamide cysteine alkylation, and trypsin digestion were performed by the filter-aided sample preparation (FASP) method . Peptides were desalted on Oasis HLB columns (Waters, Milford, MA, USA) and dried by vacuum centrifugation. Peptides were fractionated prior to phosphopeptide enrichment, using electrostatic repulsion hydrophilic interaction chromatography (ERLIC) chromatography . Briefly, peptides were solubilized in 70% acetonitrile, 20 mM ammonium formate (pH 2.2), and loaded onto a 4.6 mm × 150 mm PolyWAX LP column (PolyLC) at 1 mL/min using an Agilent 1100 HPLC (Agilent Technologies, Santa Clara, CA, USA). Peptides were eluted as follows, collecting 1 mL fractions: 0–5 min with buffer A (70% acetonitrile (MeCN), 20 mM ammonium formate pH 2.2), 5–15 min with a linear gradient to 100% buffer B (10% MeCN, 20 mM ammonium formate pH 2.2), 15–20 min linear gradient to 100% buffer C (10% MeCN, 1 M ammonium formate, pH 2.2), 15–20 min linear gradient to 100% buffer D (10% MeCN, 1% TFA), 20–24 min wash with buffer D, followed by re-equilibration of the column with buffer A. Fractions were concentrated by vacuum centrifugation to 12.5 μL.
Phosphopeptides were enriched using a batch method with titanium dioxide beads (Titansphere, GL Sciences, Tokyo, Japan) . ERLIC fractions were diluted to a final volume of 400 μL with loading buffer (65% acetonitrile, 2% TFA, 140 mM glutamic acid). Titanium dioxide beads were washed in 65% acetonitrile, 2% TFA, followed by a wash with loading buffer, and added at a peptide-to-bead ratio (w/w) of 1:20 and rotated 15 min at room temperature. Beads were washed once with loading buffer, once with 65% acetonitrile, 0.5% TFA, and twice with 65% acetonitrile 0.1% TFA. Beads were resuspended in 0.1 mL of 65% acetonitrile 0.1% TFA and packed onto the top of a 200 μL C8 Stagetip (ThermoFisher, Waltham, MA, USA). Phosphopeptides were eluted with 100 μL of 20% acetonitrile, 1% NH4OH into a receiving tube with 20 μL of 25% acetonitrile, 1% TFA to neutralize the pH. Remaining phosphopeptides bound to the C8 resin were eluted with two 100 μL volumes of 65% acetonitrile, 1% NH4OH into the same tube. Samples were dried by vacuum centrifugation.
Dried phosphopeptides were solubilized in 0.1% formic acid and directly injected onto a BEH C18 column (25 cm × 75 μm i.d., 1.7 μm bead, 100 Å pore size, Waters, part no. 186003545) on a 2D nanoAcquity system (Waters) in direct injection mode. Peptides were eluted with a linear gradient from 95% buffer A (0.1% formic acid) to 30% buffer B (0.1% formic acid in acetonitrile) in 120 min at a flow rate of 300 nL/min. Mass spectrometry analysis was performed on a LTQ-Orbitrap (ThermoFisher). Survey scans were collected in the Orbitrap at 60,000 resolution (at m/z 300), and MS/MS sequencing was performed by CID in the LTQ in data-dependent mode, using monoisotopic precursor selection and rejecting singly charged and unassigned precursors for sequencing. The 10 most intense ions were targeted. After two observations of a peptide, dynamic exclusion of ±10 ppm mass lasting 180 s was applied. The maximum injection time for MS survey scans was 500 ms with one microscan and AGC = 1 × 106. For LTQ MS/MS scans, maximum injection time for survey scans was 250 ms with one microscan and AGC = 1 × 104. Peptides were fragmented by CID for 30 ms in 1 mTorr of N2 with a normalized collision energy of 35% and activation q = 0.25.
MS/MS spectra were extracted with readw (ver. 4.3.1) and searched with Mascot (ver. 2.2, Matrix Science) against a human IPI 3.27 protein database showing up to two missed tryptic cleavages, with fixed modification of carbamidomethyl-cystein and methionine oxidation, N-terminal pyroglutamic acid (Gln), N-terminal acetylation, and phosphorylation on Ser, Thr, and Tyr as variable modifications. Precursor m/z error was 20 ppm and fragment m/z error was 0.4 Da. A custom software pipeline was used to extract identifications from Mascot DAT files. Phosphopeptide identifications were accepted at 1% FDR at the peptide level determined by separate search of a database with reversed protein sequences. To ensure correct identifications, we further filtered for spectra that had a Mascot delta score of greater than 5, including to other phosphate localizations. This approximately equated to filtering on an Ascore of 20 with 94% of spectra having an Ascore of greater than 22. From these samples, we derived a dataset of 5749 unique precursor ions containing a single pSer or pThr. All accepted ions were doubly or triply protonated.
PhosphoPep Data Set: Spectra from Public Spectral Database
The PhosphoPep project at the Institute of Systems Biology (ISB)  provides a public database currently containing more than 30,000 phosphopeptide spectra obtained from several experiments. The database currently contains data derived from yeast, C. elegans, fly, and human samples. The exact experimental details vary; however, in general, tryptic phosphopeptides were enriched from cellular extract and then analyzed by LC/MS/MS on either an LTQ-Orbitrap or LTQ-FTICR. Spectra are denoised. Replicates matched to the same peptide sequence and charge state are combined to create consensus spectra. Fragment ions are annotated using the SpectraST toolset . In this study, all of the included spectra are assumed to be correct identifications. We obtained a database of 34,057 unique doubly or triply protonated peptide ions containing a single pSer or pThr. Similar results are obtained when using only the most confident 50% of spectra (xCorr score greater than 3.0, data not shown); however, the consequent reduction in the number of spectra reduces the statistical significance of the results, though all discussed trends remain significant. We found that 84% of the included spectra were well localized (Ascore greater or equal to 22), and the major results remained the same after filtering. We determined, however, that since identification and localization correlated with the amount of neutral loss, it was better to use the unfiltered data as misidentifications are expected to be unbiased.
Two important caveats for the use of consensus spectra became apparent in the analysis of the PhosphoPep data. Consensus spectra in this case were obtained through the averaging of multiple observed spectra of the same peptide. First, the spectra varied widely in the amount of noise and the quality of annotation. When analyzing spectra that deviated greatly from observed trends in neutral loss, many proved to be the result of questionable annotation or especially noisy spectra. Removal of the lowest signal-to-noise spectra from the dataset, however, significantly lowered the number of spectra available. Second, the consensus spectra in the PhosphoPep data showed 12% lower neutral loss than our unprocessed spectra collected in our lab, an effect that we speculate is the result of peak voting procedures  that emphasize low-abundance ions and dampen high-abundance ions. This effect appears to be general to consensus spectra and is observed in other libraries that do not contain phosphopeptides (Supplementary Figure 5). Creating data mining metrics that were robust to these artifacts is essential to the effective use of these rich repositories of data.
Peptide spectra were stored in a custom-built ion-centric relational database (Supplementary Figure 2). For analysis, the sequence of identified peptides was encoded in a phosphosite-centric manner with phospho-residue at position 0. The distances of a phosphosite to each of the termini, the identity of the phosphorylated residue, and the identity of each residue from one to 13 residues N-terminal to the phosphosite (positions −1 to −13) and one to thirteen residues C-terminal to the phosphosite (positions 1 to 13) were recorded. The termini were annotated one residue position distal to the terminal residues. Since peptides are of variable length, many peptides do not extend to all 27 residue positions in this encoding. For instance, the peptide XXpSXX has the N-terminus at the −3 position and amino acids at the −2, −1, 1, and 2 positions, whereas all other positions are outside of the sequence. Any position that is not within the peptide sequence is annotated as missing (‘-‘), thus placing all peptide sequences regardless of length on the same scale. In addition to the sequence information, the charge of the precursor ion was recorded.
Neutral Loss Analysis
Peaks were annotated with fragment ion assignments using a priority system similar to that used in Sun et al. . Peaks were initially identified with a stringent tolerance of ±0.125 Thompson (Th) plus 200 parts per million (ppm). This mass window was determined from analysis of a few dozen problematic spectra to sensitively match most expected peaks; however, it often missed peaks that had been merged by the centroiding algorithms, most notably ammonia and water losses from multiply charged ions. Identifications matching within this window were then assigned to the ion with the following priority: neutral loss from the precursor, singly charged b- or y-ions, singly charged a-ions provided the corresponding b-ion was present, singly charged b- or y-ions with a single neutral loss provided the intact b- or y-ion was present, singly charged a-ions with a single neutral loss provided the corresponding a-ion was present, singly-charged b- or y-ions with multiple neutral losses provided the intermediate neutral loss ions are present. This list was then repeated for doubly and triply charged ions. If no matches were found, the mass tolerance was increased to 0.250 Th plus 400 ppm to account for poorly centroided peaks, and the process was repeated. If multiple identifications of the same priority were possible, multiple identifications were assigned. The total neutral loss in a spectrum is the total intensity annotated as having lost phosphoric acid divided by the total identified intensity. Peaks that have ambiguous neutral loss state had their intensities divided between the identifications.
Visualizing Neutral Loss Patterns
A P value for the significance of the observations can be obtained from a two-tailed binomial distribution. This P value corresponds to the likelihood that a result at least as extreme as the observed would be obtained by chance, if the criterion had no effect on neutral loss. Since we are simultaneously testing 22 amino acids at each of 14 positions around the phosphate, the P value was adjusted for 308 multiple hypotheses using the Sidak method .
The spectra were further divided by proton mobility. Mobile proton spectra are defined as having a charge greater than the sum of the number of arginines, lysines, and histidines in the sequence. Immobile proton spectra were defined as having a charge less than or equal to the number of arginines in the sequence. All other spectra were classified as partially mobile proton. Once these subsets of spectra are generated, they are binned by tercile (likely with different cutoffs from before) and tested for odds ratio, as above.
Evaluation of the Types and Amounts of Neutral Loss Observed
When neutral loss from precursor ions and b-ion and y-ion series were treated separately, trends strongly deviated from the overall distribution of neutral loss (Figure 1b–d). Remaining intact peptide was not observed (Figure 1b), as would be expected because of resonant activation of the precursor ion. Interestingly, neutral loss of H3PO4 was more than 3-fold higher for b-ions compared with y-ions. Only 22% of y-ion intensity shows neutral loss, compared with 72% for b-ions, confirming trends suggested by a previous statistical learning study that examined over 3000 spectra . This indicates that ions that have undergone neutral loss form a product that is more susceptible to b-ion formation, or that b-ions are more likely to undergo neutral loss. One potential explanation is the difference in composition of b- and y-ions in tryptic peptides, which is absence of a C-terminal basic residue in b-ions. This difference in basicity would lower mobility of the remaining proton, or result in gas-phase interaction with the resulting H3PO4. However, the dramatic difference in b- and y-ion neutral loss persisted when ions were partitioned by the presence of basic residues. Importantly, this effect is not a result of the bias to the presence of basic residues on the C-terminus as it persists in peptides with non-tryptic C-termini and those constrained to have N-terminal arginine or lysine (Supplementary Figure 3). One implication of the greater observed stability of phosphate on y-ions in CID spectra is that precise localization of phosphorylation sites relies more heavily on y-ion series, since observation of intact phosphoresidues is required for unambiguous localization.
Variation in Neutral Loss Between Peptides
Models that estimate neutral loss based on global averages have been useful for increasing the number of phosphopeptides identifications when incorporated into search algorithms . However, if neutral loss rates vary widely between peptide MS2 in a sequence-dependent manner, such methods are likely to introduce biases against outliers to the main trend. To assess global variability of H3PO4 loss in phosphopeptide CID, we examined the distribution of neutral loss over all spectra in the WM239A dataset (Figure 1e). For peptides containing either pSer or pThr, the observed neutral loss of H3PO4 varies from undetectable to accounting for the entire signal. pSer shows slightly more neutral loss than pThr, as noted previously . Interestingly, this trend is opposite of the analogous neutral loss of water from unmodified serine and threonine , suggesting that the loss of water and loss of H3PO4 have different rate-limiting factors. That pSer, although less basic than pThr, shows higher neutral loss, suggests that proton availability is not rate-limiting in the neutral loss of H3PO4 as would be predicted for charge directed mechanisms involving direct protonation of the phosphate. Although the difference between neutral losses from pSer and pThr is useful for inferring mechanism, the magnitude of the difference is very small compared with the overall variation in neutral loss. Thus, other factors must contribute to the variability of H3PO4 loss.
Previous studies have noted an inverse correlation between proton mobility and the amount of H3PO4 neutral loss of in small sets of phosphopeptide MS2, with lower charge mobility correlating with higher neutral loss . To quantify this effect over a significantly larger data set, we examined the distribution of neutral loss in spectra stratified by proton mobility (Figure 1f). The differences between mobility groups accounted for 24% of the observed variance in neutral loss, a much greater effect than was observed between pSer and pThr. However, the variance within each group is still large, suggesting other factors are important in loss of H3PO4.
The majority of the difference in neutral loss between charge mobility classes is the result of differences in neutral losses from the parent, the ‘M-98’ peak and its derivatives (Figure 1g). All classes of charge mobility demonstrate similar levels of sequence ions displaying neutral loss (Figure 1h). Sequence ions displaying neutral loss are especially problematic for peptide identification, since the presence of multiple correlated ion series poses a significant risk of spurious matching for identification algorithms. Anecdotally, the spectra that suffered the worst depression of identification scores were those that displayed intermediate levels of neutral loss, meaning no one series was dominant.
Effects of Local Sequence on Neutral Loss
We next examined whether local sequence influences neutral loss. Although the WM239A dataset was nearly twice as large as the largest previous study on phosphopeptide neutral loss , we found that statistically meaningful estimates of sequence effects required much larger sets of phosphopeptide MS2. The Institute for Systems Biology maintains the PhosphoPep project as a repository of MS2 spectra obtained from large-scale phosphoproteomics studies. By extracting spectra containing a single pSer or pThr from this data, we developed a dataset containing 34,057 unique phosphopeptide spectra, more than 10 times larger than in previous studies.
Residues immediately adjacent to the phosphosite show the strongest effect on neutral loss of H3PO4. We examined all mobility classes in aggregate, observing that when proximal to pSer or pThr, glycine, basic amino acids, and acidic amino acids appear to increase neutral loss (Figure 2, left panel). Proline reduces neutral loss, but other aliphatic side chains increase neutral loss when immediately adjacent. Threonine, serine, carbamidomethyl-cysteine, glutamine, and asparagine all reduce neutral loss when N-terminal to the phosphosite. The effects of distal residues were surprisingly important. Twelve of the 20 amino acids significantly affect neutral loss even when only considering positions more than five residues away from the phosphosite.
Simpson’s Paradox: Controlling for Charge Mobility, Proximal Basic Residues Suppress H3PO4 Neutral Loss
The number of basic residues in a protonated peptide negatively correlates with charge mobility and, as shown previously in Figure 1, neutral loss increases with decreasing charge mobility. Thus, the positive effect of nearby basic residues on neutral loss could simply reflect a correlation with previously shown effects of charge mobility. To assess this possibility, we repeated the flanking residue analysis with the data stratified by charge mobility (Figure 2, center and right panels, and Supplementary Figure 4). The lack of available data from immobile proton spectra reduces the number of results that attain significance after multiple testing corrections (Supplementary Figure 4). The only significant results for immobile proton cases show that arginine at the −1 position strongly inhibits neutral loss and that proximity to the C-terminus enhances neutral loss. Mobile and partially mobile cases are observed at high enough frequency that many significant effects can be observed.
Surprisingly, in contrast to the positive effect of proximal basic residues on neutral loss in the global analysis, mobile and partially mobile MS2 show decreased neutral loss when basic residues are adjacent to the phosphoresidue. Separating the effects by charge mobility reveals that the basic residue effect observed in the global analysis is reversed within each mobility subgroup, driven by an underlying correlation between the presence of basic residues and the charge mobility of the peptides. This counterintuitive trend is an example of Simpson’s paradox, which arises when a correlation present in the aggregate population disappears or reverses in each and every subgroup upon stratification . We propose that basic residues directly slow neutral rates when proximal to the phosphoresidue. However, in non-mobile proton MS2, the lack of available protons slows the aggregate of other backbone fragmentation pathways to a greater degree, increasing apparent neutral loss of H3PO4 due to reduced fragmentation pathway competition. Thus, the previously published claims that basic residues increase the probability of H3PO4 neutral loss may be based on a statistical artifact of aggregating data across mobility classes.
By controlling for the shifts in overall backbone fragmentation caused by changes in proton mobility, we can more closely examine the direct variation in neutral loss caused by basic residues. The only observed trend with basic residues that was universal across all charge mobility classes is that basic residues at the −1 position strongly inhibit neutral loss. We hypothesize that this effect occurs because the protonated basic residue will form a very stable hydrogen-bonded structure with the phosphate, producing steric hindrance that inhibits formation of an oxazoline and subsequent neutral loss. In mobile proton spectra, basic residues from the −2 to the +5 position inhibit neutral loss, although some of these are not significant after multiple-testing correction. We suspect that proximity of basic residues to the phosphoester facilitates hydrogen bond formation, leading to stable hydrogen-bonded phosphates. These bonded complexes prevent the phosphate from interacting with mobile protons or less stable charge donors, such as the N-terminus, which could better catalyze the neutral loss reaction. As proton mobility decreases, the presence of nearby basic residues is less detrimental to neutral loss, suggesting that absent other reactive acidic sites, the hydrogen bonding of protonated bases to the phosphate is less inhibitory. At the lowest levels of proton availability, the base-catalyzed mechanisms may be the prevalent method of neutral loss.
The N-Terminus is the Primary Driver of Neutral Loss in Mobile Proton Peptides
Enhancement of neutral loss by N-terminal proximity suggests involvement of the N-terminal amine group in the mechanism of H3PO4 loss. However, peptide length, proximity to the C-terminus, or other correlated factors could also explain the observed effects. To assess the importance of the N-terminal amine, we examined 37 mobile proton phosphopeptides from the WM239A dataset with acetylated N-termini. Acetylation of the N-terminus reduces the basicity of the N-terminus. Consistent with the hypothesis of direct action, the acetylated peptides displayed on average 42% less neutral loss than the rest of the data. Interestingly, the distribution of neutral losses from the acetylated b-ions is more similar to that of amino-N-terminated y-ions than amino-N-terminated b-ions (Figure 3b). Thus, the bias of high neutral loss from b-ions is partially explained by the effects of proximity to the N-terminus. The effects of N-terminal acetylation imply that direct action by the N-terminal amine is responsible for the activation of neutral loss.
Interaction of a charged N-terminus and the phosphoester group might bring the neighboring N-terminal amide carbonyl in proximity to the α-carbon of the phosphorylated residue, facilitating nucleophilic attack and formation of large cyclic structures after elimination, similar to those that Harrison  proposed in neutral loss of water from unmodified serine and threonine. To assess this possibility, we looked for the x-type ion, here denoted (x + 2), derived from cleavage at the first alpha carbon–carbonyl carbon bond, reported previously as a marker for this macrocyclic-cleavage . Although less common than the x-type ion immediately N-terminal to the site of phosphorylation , peaks consistent with macrocycle x-type ions were found in MS/MS of several peptides. The spectrum of doubly protonated AQISpSPNLR shows an especially expressive example (Figure 3c).
Proline Provides a Competitive Pathway and Reduces Backbone Flexibility Required for Distal Basic Interactions
Proline’s five-membered ring reduces the flexibility of peptide backbones, creating steric hindrance and preventing the formation of cyclic conformations, or “loops,” which facilitate intramolecular charge solvation. Given the evidence that neutral loss is enhanced by structures that bring the phosphate into proximity with immobilized protons, we hypothesized that proline’s direct reduction of neutral loss was caused by inhibiting loop formation. To examine this possibility, we plotted the magnitude of inhibition as a function of position relative to the phosphosite (Figure 4c). In peptides with mobile protons, proline shows strong inhibition when N-terminal to the phosphosite, consistent with interference with formation of loops required for interaction between the phosphate and the N-terminus. Proline slightly inhibits neutral loss when C-terminal to the phosphosite or when it is more than five residues N-terminal. In peptides with partially mobile protons, the presence of proline reduces neutral loss regardless of whether the proline is C-terminal or N-terminal to the phosphate. This suggests that in addition to the enhancement by the N-terminus that was observed with mobile protons remains, loops that allow the phosphate to interact with the basic residue at the C-terminus encourage neutral loss in partially mobile cases. In immobile cases, proline shows most significant inhibition of neutral loss when C-terminal to the phosphosite. This can be rationalized under the loop-forming hypothesis since the N-terminus is expected to be uncharged in these peptides, leaving tryptic C-terminus as the only site that is generally protonated in these peptides.
Aspartic and Glutamic Acid
Aspartic and glutamic acid both increase the amount of neutral loss from position −6 to +5. It is possible for a carboxylic acid to initiate neutral loss through nucleophilic attack, yielding an acid anhydride, which would rapidly eliminate phosphoric acid under CID conditions. This has been proposed as the mechanism for neutral loss of HPO3 in pTyr [36, 37]. Reactions of this kind would be compatible with previous experiments which indicate that a competing neutral loss reaction pathway exists in which HPO3 is lost from the phosphoresidue concomitantly with the loss of water from elsewhere in the peptide . This reaction would be isobaric and indistinguishable from neutral loss of phosphoric acid. Acid-mediated direct attack would elegantly explain the experimental prevalence of the combined pathway with the rarity of observing the loss of only HPO3. Alternatively, a carboxylate group could directly donate a proton to catalyze the reaction or simply compete with the phosphate for hydrogen bonding with basic sites, thus disrupting conformations that stabilize the phosphate rather than catalyze neutral loss.
To test the plausibility of direct action by acidic residues, we examined the phosphopeptide libraries for cases where a single acidic residue replaced another residue. Contrary to the strong enhancement suggested by the analysis of the PhosphoPep database, substitution incorporating single acidic residues has minimal effect on the observed amount of neutral loss. Indeed, single replacements tend to slightly lower the observed neutral loss rather than increase it. This is in agreement with the observations of Cui et al., who found that mutating acidic residues did not significantly affect neutral loss . It should be noted the presence of one acidic residue greatly increases the chances of finding more acidic residues because of the presence of acidophilic kinase motifs in biological data sets. Thus, the increased neutral loss observed for nearby acidic residues in the global analysis shown in Figure 2 may be an aggregate effect of multiple acidic residues rather than an effect particular to that position. We attribute the rise in the rate of neutral loss to the creation of an extensive hydrogen bond network within the peptide that is able to modulate or stabilize the transfer of protons to the phosphate.
Neutral Loss of Water and Ammonia from Threonine and Other Amino Acids Interferes with the Loss of H3PO4
While aspartic and glutamic acid showed a tendency to increase neutral loss, other amino acids with propensity for loss of water or ammonia from their side chains reduced the observed neutral loss of H3PO4, especially when these amino acids are N-terminal to the phosphate and in peptides with mobile protons (Figure 2). Because this N-terminal bias is similar to the N-terminal enhancement of neutral loss of H3PO4, we hypothesized that a similar mechanism leads to analogous loss of water from modified and unmodified Ser and Thr, along with the loss of water and ammonia from asparagine, glutamine, and carbamidomethyl-cysteine.
To determine what characteristics might be unique to these ions showing very low neutral loss of H3PO4, we examined the sequences of these peptides. When there is a mobile proton, peptides that exhibit high water neutral loss (fraction of signal showing neutral loss of water greater than 0.4) and low phosphoric acid neutral loss (less than 0.2) contain 17% more serines and 85% more threonines than the average peptide in the dataset. Threonines that are N-terminal to the site of the phosphate are especially enriched, being more than twice as prevalent as in other peptides. The number of threonines N-terminal to the phosphate explains 3.6% of the variance in the neutral loss of H3PO4 from mobile proton phosphopeptides. This competition explains the general decrease of neutral loss of phosphoric acid when there is a threonine N-terminal to the phosphate (Figure 2). As an example of this effect, the spectra of AGGPTpTPLSPTR and AGGPApTPLSPTR, which differ only by the substitution of threonine for alanine at position five, are shown in Figure 5b. The addition of the threonine adds a highly active pathway for the neutral loss of water, as evidenced by the appearance of very strong M-18 and M-116 ions. This shows that the mechanism for the reduction of neutral loss of H3PO4 is competition with the loss of water.
In this work, we present a large statistical study of the effects of peptide sequence on the neutral loss of phosphoric acid from phosphopeptides under CID. Although stereotypical images of phosphopepdide spectra involve dominant peaks representing loss of phosphoric acid from the precursor, it is exceedingly rare for more than two-thirds of ion current to be sequestered in that peak. In practice the degree of neutral loss is highly variable, with spectra presenting as many as six competing neutral loss series. Here we show that much of the variability can be explained by the peptide sequence near the phosphate. The data suggests that as has been shown previously, neutral loss is primarily catalyzed by charge pairing between the phosphate and nearby bases. However, small fluctuations in the nature of this pairing can have dramatic effects on the resulting neutral loss. Consequently, nearby bases may counterintuitively reduce neutral loss through the formation of stable charge solvation on the phosphate. Neutral loss is preferential if less stable pairings, most notably with the N-terminus, are possible. These effects may be further modulated by reduced peptide flexibility because of proline and direct competition with the neutral loss of other small molecules.
Previous studies have struggled to reconcile the charge-directed model of H3PO4 neutral loss with the observation that neutral loss pathways decrease when protons become more available [9, 12, 13]. The paradox has been explained by direct catalysis by protonated basic residues via noncovalent interactions between the phosphate and nearby arginines . At first glance, this view appears to be supported by our evidence when examined over all spectra that the presence of basic residues increases the amount of neutral loss. However, in an example of Simpson’s paradox, when we stratify the spectra by charge-mobility, we find evidence for the converse, that nearby bases inhibit neutral loss regardless of mobility class. Furthermore, these effects were not limited to adjacent residues but extended across the entire peptide. Additionally, the N-terminus exerts a strong positive effect on H3PO4 neutral loss when both proximal and distal to the phosphorylation site, making it the primary predictor of neutral loss in mobile proton peptides.
These observations suggest a model in which the phosphate coordinates with protons that are immobilized at basic sites. Because these complexes are constitutively present, the basicity of the phosphate itself becomes unimportant in determining the rate of neutral loss. By this model, the rate is dominated by the basicity of the hydrogen bonded basic group. This gas-phase basicity then determines the partial charge that resides on the phosphate and, consequently, the susceptibility of the phosphorylated side chain to nucleophilic attack. The N-terminus, therefore, becomes the ideal driver of neutral loss because it is the least basic site that is protonated with high occupancy, making it the ideal catalyst. This equilibrium may be further modified by the participation of other side chains, such as by proton donation from the carboxylic acids on Glu and Asp. Because these base-mediated complexes would be stable in the presence of mobile protons, the reaction rate of H3PO4 neutral loss increases more slowly with charge availability than the rate of backbone fragmentation.
The neutral loss of water from Ser and Thr is inversely correlated with the loss of H3PO4 from pSer and pThr, suggesting that the formation of stable complexes between protonated bases and the alcohol side chains is less favorable. Consequently, the formation of the complex becomes rate-limiting, while even small partial charges on the Ser or Thr side chain are enough for the neutral loss of water. Interaction with the N-terminus still appears to be an important determinant for these neutral losses and the competition for charge pairing with the terminus defines the neutral loss pattern. It further appears that other amino acids with side chains capable of neutral loss show effects when near the N-terminus, suggesting that interaction of side chains with immobilized protons may be the general mechanism of neutral loss in peptides.
It is worth noting that neutral loss reactions facilitated by intramolecular interactions with charge donors cannot be usefully described by the mobile proton model. The mobile proton model posits three reaction mechanisms: charge remote mechanisms, charge directed mechanisms involving a mobile proton, and charge directed mechanisms involving a proton immobilized at the site of the reaction. We suggest that neutral loss reactions are catalyzed by a proton that is immobilized distal to the reaction site; however, by the adoption of secondary structures, it interacts with the reaction site. Under this model, the peptide acts more like an irregular solvation shell around the proton than a series of discrete protonation sites. While the conjugation of charge between multiple bases is important in all CID reactions, the evidence here suggests that it may be especially important for neutral losses. This may explain some of the difficulties in the prediction of neutral loss using kinetic models based on the mobile proton model [40, 41, 42].
This study demonstrates the importance of using large datasets to achieve the statistical significance necessary to detect sequence-based factors that influence neutral loss propensity. Large datasets permit testing of multiple hypotheses at the scale necessary to guard against inflation of type I errors without excessively lowering sensitivity. While obtaining large libraries of spectra locally is prohibitively difficult, we exploited the recent availability of large curated MS2 phosphopeptide libraries, which enabled large-scale analysis of local sequence effects on H3PO4 neutral loss. Robust, nonparametric statistics were required to overcome the reduction in control of both data acquisition and data processing inherent in publically available curated data.
Peptide identification algorithms can be improved with more accurate models of MS/MS intensities predicted with empirical kinetic models of peptide fragmentation [28, 43]. Accurate prediction of neutral loss levels of H3PO4 in phosphopeptide CID is problematic, even with sophisticated kinetic models , likely due to distal sequence effects documented here. The additional ions and poorly understood effects on product ion abundances attributable to these neutral loss pathways lowers the accuracy of conventional peptide identification algorithms since the neutral loss provides information that is at best uninformative and usually confounding. This proves especially true with algorithms that attempt to use the intensity patterns in fragment ions to infer sequence information. The effects of neighboring residues could be incorporated easily into existing statistical or kinetic models to improve the prediction of the amount neutral loss. Furthermore, knowledge of site-specific variations in expected neutral loss could be incorporated into phosphorylation site localization methods to increase confidence levels. By these methods, the variability of neutral loss becomes informative in phosphopeptide identification rather than a detriment.
This work was supported by NIH grant R01 CA155453 (W.M.O.). The authors acknowledge the assistance of Veronica Bierbaum in preparing and editing this manuscript.
- 3.Old, W.M., Shabb, J.B., Houel, S., Wang, H., Couts, K.L., Yen, C.Y., Litman, E.S., Croy, C.H., Meyer-Arendt, K., Miranda, J.G., Brown, R.A., Witze, E.S., Schweppe, R.E., Resing, K.A., Ahn, N.G.: Functional proteomics identifies targets of phosphorylation by B-Raf signaling in melanoma. Mol. Cell. 34, 115–131 (2009)CrossRefGoogle Scholar
- 29.Sidak, Z.: Rectangular confidence regions for means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967)Google Scholar