1 Introduction

The modification of proteins on serine and threonine residues with β-N-acetylglucosamine (O-GlcNAc) is an emerging and dynamic post-translational modification (PTM) ubiquitously found on metazoan proteins. It was first discovered by Torres and Hart in 1984 [1], and is found on a wide range of cytoplasmic and nuclear proteins [2]. Further, it is known to be associated with several human diseases [3, 4], including neurodegenerative pathologies [3], type II diabetes [3], as well as cancer [4]. Recent technological progress in O-GlcNAc analytics has, by and large, focussed on biochemical enrichment approaches. Notably, this may be achieved by lectin affinity chromatography [5, 6] or using a chemoenzymatic method in which a β-1,4-galactosyltransferase is used to attach a biotinylated galactose to the endogenous O-GlcNAc moiety [7, 8].

Following some form of enrichment, the discovery of O-GlcNAc modified peptides and proteins is greatly aided by tandem mass spectrometry [6, 9], and reports on the discovery of this modification on individual proteins is increasing at a rapid rate. However, despite recent advances in instrumentation, the mass spectrometric analysis of O-GlcNAc peptides is still difficult and mainly hampered by the substoichiometric occupancy of O-GlcNAc sites [1012] and by the chemical lability of the O-glycosidic bond in the gas phase [1316]. Under typical collision-induced dissociation (CID) conditions, O-GlcNAc modified peptides readily lose the GlcNAc moiety, and spectra are typically dominated by intense neutral loss species as well as the GlcNAc oxonium ion (m/z 204.0866) and further fragments thereof [14]. The GlcNAc oxonium ion is isobaric to that of other GlcNAc epimers (e.g., GalNAc) and, therefore, commonly referred to as HexNAc oxonium ions. The intense HexNAc oxonium ion has been known for a long time as a diagnostically useful reporter ion [10, 13, 17] and used, e.g., in precursor ion scanning experiments on triple quadrupole [10] and quadrupole-time-of-flight mass spectrometers [18]. Unfortunately, the reporter ion may only be occasionally observed in ion trap CID spectra because of the poor recovery of fragment ions in the low m/z range [19]. This can be overcome by pulsed Q dissociation (PQD) in the ion trap [20] or so-called higher energy collisional dissociation (HCD) in a conventional multipole collision cell [21] on a LTQ Orbitrap XL mass spectrometer. Still, the dominant break of the O-glycosidic bond strongly reduces the occurrence of sequence-informative peptide fragment ions, which in turn impedes peptide identification and O-GlcNAc site localization. Alternative activation methods that enable sequencing the underlying peptide are neutral loss-triggered MS3 (NL-MS3) [5], multistage activation (MSA) [22], electron-capture [23], and electron-transfer dissociation [24] (ECD and ETD, respectively). The latter two preserve labile post-translational modifications, thereby facilitating both the identification of O-GlcNAc modified peptides and localization of the PTM site [58, 25]. One published report used a combination of fragmentation methods in which a CID step is followed by ETD fragmentation of the same precursor if the neutral loss of the HexNAc moiety is present in the CID spectrum (NL-ETD) [9]. Similarly, it would be possible to combine CID and HCD (NL-HCD), but this has not yet been published for O-GlcNAc peptides.

In light of the wide range of fragmentation techniques available on a single mass spectrometric platform (i.e., the LTQ Orbitrap XL ETD), it is timely to revisit which fragmentation technique or combination thereof offers particular advantages for the identification and site localization of O-GlcNAc modified peptides. In fact, no systematic study on O-GlcNAc peptides has yet been published using fragmentation techniques available on a hybrid ion trap-Orbitrap instrument. To this end, we evaluated nine different tandem MS acquisition schemes for their ability to identify O-GlcNAc peptides and to localize their PTM sites using a library containing 72 synthetic glycopeptides. As a result of this comparison, we developed a two-stage approach for the analysis of O-GlcNAc peptides, facilitating the detection of such peptides by PQD at low collision energy, and the identification and site localization by ETD. Based on a set of O-GlcNAc-specific fragment ions, we further developed a scoring scheme that is able to discriminate O-GlcNAc peptide spectra from unmodified ones with 95% sensitivity and >99% specificity. The two-stage approach allows detection and identification of O-GlcNAc peptides at the low fmol level in a complex proteomic background and is 10-fold more sensitive than a typical data-dependent ETD experiment.

2 Material and Methods

2.1 O-GlcNAc Standard Peptides and Proteins

Peptides were synthesized on a MultiPep peptide synthesizer (Intavis, Cologne, Germany) using standard Nα-Fmoc solid-phase peptide chemistry. A cytosolic protein extract from exponentially growing E. coli was spiked with different amounts (1:10, 1:100, 1:500, 1:1000 wt/wt) of bovine α-crystalline, followed by trypsin digestion and C18 purification prior to LC-MS/MS analysis. (For details, see supplemental methods.)

2.2 Nano-Liquid Chromatography-Tandem Mass Spectrometry

Mass spectrometry was performed on an LTQ Orbitrap XL ETD mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) connected to a nanoLC Ultra 1D+ liquid chromatography system (Eksigent, Dublin, CA) using in-house packed precolumn (20 mm × 75 μm ReproSil-Pur C18; Dr. Maisch, Germany) and nanocolumn (200 mm × 50 μm ReproSil-Pur C18; Dr. Maisch). The mass spectrometer was equipped with a nano-electrospray ion source (Proxeon Biosystems, DK) and the electrospray voltage was applied via a liquid junction. (For details, see supplemental methods.) All measurements were performed in positive ion mode. Intact peptide mass spectra were acquired at a resolution of 30,000 (at m/z 400) and an automatic gain control (AGC) target value of 106, followed by fragmentation of the most intense ions, a dynamic exclusion of fragmented precursor ions for 20 s, exclusion of singly charged ions and ions without assigned charge state for fragmentation (unless otherwise stated), and internal on-the-fly recalibration using the “lock mass” option. Full scans were acquired in profile mode, whereas all tandem mass spectra were acquired in centroid mode. A complete description of all tandem MS experiments employed in this study can be found in Table S1 in the supplemental methods.

2.3 Peaklist Generation and Database Search

Peak processing and peak picking of MS data was performed using Mascot Distiller ver. 2.2.1 (Matrix Science, London, UK). Briefly, (1) un-centroiding of tandem MS spectra and (2) precursor charge state re-calculation were enabled, (3) tandem MS spectra of singly charged precursors were discarded, (4) the minimum number of peaks per tandem MS spectrum was set to three, and (5) isotope fitting was disabled for the mass range below m/z 205. A brief description of the data processing for NL-MS3, NL-HCD, and NL-ETD experiments is available in the supplemental methods. Resulting peaklists were searched using the Mascot search engine ver. 2.2.04 (Matrix Science) against the complete NCBInr database (02/16/2007, 4,626,804 entries) with sequences of synthetic peptides appended. Search parameters included a precursor tolerance of 10 ppm and a fragment tolerance of 0.5 Da for linear ion trap spectra. HCD spectra were searched with a fragment tolerance of 0.05 Da. Enzyme specificity was set to trypsin, and up to two missed cleavage sites were allowed. Further parameters accounted for the misassignment of the monoisotopic peak (up to the second isotopic peak), for variable modifications by O-GlcNAc (203.0794 Da at serine or threonine), methylation (14.0157 Da at the C-terminus), and in case of the α-crystalline spiked E. coli samples by carbamido-methylation (57.0215 Da at cysteine). Except for ETD experiments, the O-GlcNAc modification definition is crucial for the successful Mascot database search of O-GlcNAc peptide spectra. The neutral losses of 203.0794 and 221.0899 Da were defined as both fragment and precursor neutral loss. Moreover, the HexNAc oxonium ion and its fragments (m/z 126.0550, 138.0550, 144.0655, 168.0655, 186.0761, 204.0866) were ignored for Mascot scoring. The database search results were imported into Scaffold ver. 2.6.02 (Proteome Software, Portland, OR).

2.4 Scoring MS Spectra for the Selective Extraction of Candidate O-GlcNAc Precursors

The raw mass spectrometric data of PQD and HCD experiments were processed using the Mascot Distiller software and parameters exactly as described above. For the extraction of potential O-GlcNAc precursors from the resulting mgf file, an in-house written Perl script was utilized. Briefly, the Perl script parses the mgf file and inserts the rank and normalized intensity of every peak in a spectrum. Based on the precursor m/z value and the precursor charge state, the script further calculates the expected m/z values for the neutral loss of the HexNAc moiety (∆m 203.0794 Da) and the loss of the HexNAc oxonium ion (∆m 204.0866 Da, and charge z-1). For the computation of the OScore according to (1), the normalized intensities of the reporter ions (m/z 126.0550, 138.0550, 144.0655, 168.0655, 186.0761, 204.0866) and sugar loss ions are first divided by their intensity rank and then summed up if they are within a user-specified m/z tolerance (e.g., 10 ppm for HCD, 0.3 Da for PQD). The Perl script exports the list of candidate O-GlcNAc precursors along with the OScore, including the accurate precursor mass, charge state, as well as retention time, which was further used to assemble an inclusion list for targeted experiments. Precursors with an OScore better than 2.0 were included in the inclusion list. Probability computations were performed separately using Microsoft Office Excel 2007 (Microsoft Corporation, Redmond, WA).

3 Results

3.1 Systematic Investigation of Tandem MS Methods

The two-stage LC-MS/MS strategy for the analysis of O-GlcNAc peptides developed in this work is shown in Figure 1. It consists of (1) a discovery LC-MS/MS run for the detection of potential O-GlcNAc peptides using low collision energy PQD, (2) the selective extraction of O-GlcNAc candidates from tandem MS spectra based on a novel spectrum scoring scheme, and (3) a targeted ETD experiment for O-GlcNAc peptide identification and site localization. The strategy was inspired by results from a systematic evaluation of nine different tandem MS methods available on an LTQ-Orbitrap XL ETD instrument. For this investigation, we synthesized a library of glycopeptides with precisely known O-GlcNAc sites using a simple randomization approach (Figure S1). The O-GlcNAc peptide library covers a mass range from 1115 to 1286 Da, and represents a heterogeneous set of doubly-, triply-, and quadruply charged peptides ranging from very hydrophilic to hydrophobic and from highly basic to acidic peptides (Figure S2). According to the extracted ion chromatograms of all 72 O-GlcNAc peptides, the dynamic intensity range of the O-GlcNAc library spans almost three orders of magnitude (Figure S3). Of the 72 possible permutations, 65 glycopeptides could be identified using PQD and ETD tandem MS methods followed by Mascot database search and manual inspection of spectra (see supplemental spectra).

Figure 1
figure 1

Analytical strategy for detecting and identifying O-GlcNAc-modified peptides

Example tandem mass spectra of the peptide LSGgTYFKAK are depicted in Figure 2 and illustrate the merits of each technique. Owing to the chemical lability of the O-glycosidic bond in the gas phase, CID spectra (Figure 2a) are dominated by three signals; the HexNAc oxonium ion at m/z 203.98, the neutral loss of the HexNAc moiety from the precursor (m/z 493.29), and the charge reduced precursor ion that lost the HexNAc oxonium ion. Although the relative intensities of these three signals may vary substantially between different peptides, they usually represent the most intense fragment ions and typically constitute more than 70% of the entire signal in the tandem mass spectrum. Consequently, little, if any, of the available signal corresponds to sequence-specific peptide fragment ions, which in turn render peptide identification from these spectra very difficult. Furthermore, these peptide fragments do not generally retain the O-GlcNAc moiety, thus eliminating direct evidence for the O-GlcNAc modification and site localization from CID spectra. Further fragmentation of the HexNAc neutral loss by NL-MS3 (Figure 2b) or MSA (Figure 2c) significantly increases the yield of peptide fragment ions. Because the neutral loss of the HexNAc group leaves a plain serine or threonine residue at the previously modified O-GlcNAc site, it is impossible to deduce the modification site from NL-MS3 or MSA fragments should more than one possible site exist within the sequence.

Figure 2
figure 2

Example tandem MS spectra of the O-GlcNAc peptide LSGgTYFKAK. Fragments of the HexNAc oxonium ion are marked with an asterisk

PQD and HCD provide access to the full fragment mass range on an LTQ Orbitrap instrument, and hence enable detection of the HexNAc oxonium ion (Figure 2d and e, respectively). Overall, PQD and HCD spectra of O-GlcNAc peptides are quite comparable to CID spectra and hence suffer from similar shortcomings with respect to peptide identification and O-GlcNAc site localization. However, HCD and, occasionally, PQD fragmentation give rise to further intense peaks below m/z 204, which are fragments of the HexNAc oxonium ion [26] (Figure S4, Table S2). In contrast to the aforementioned activation types, ETD preserves the O-GlcNAc-modification on every peptide fragment ion, thus allowing direct O-GlcNAc site localization (Figure 2f). However, ETD spectra often exhibit intense non-dissociated electron-transfer products. This can be overcome by supplemental activation of the charge-reduced species [27] resulting in richer spectra than ETD alone. Since the additional radiofrequency pulse does not adversely affect the O-GlcNAc modification, but significantly increases the intensity of peptide fragment ions supporting peptide identification and site localization, supplemental activation was used for all further experiments involving ETD (Figure S5).

Searching triplicate LC-MS/MS data from all nine activation types by Mascot identified 48 O-GlcNAc peptides from the library with Mascot ion scores greater than 25 (Table 1, Tables S4, S5 and S6). However, the success of identification between individual approaches varied significantly. As shown in Table 1, our results indicate superior performance of PQD with 39 O-GlcNAc peptide identifications followed by ETD with 33 identifications, whereas the least successful approaches were those involving Orbitrap detection of fragment ions (HCD and ETD [FT]), which only identified 19 and 10 O-GlcNAc peptides, respectively. In total, 1371 tandem mass spectra were matched to the O-GlcNAc peptide library (Table S6). Of these, 304 spectra point to peptides with multiple serine or threonine residues, which carry the risk of false O-GlcNAc-site assignments. Striking differences of the O-GlcNAc site localization accuracy exist between techniques involving collisional fragmentation and ETD. Both ETD and ETD (FT) achieve the highest accuracy in O-GlcNAc site assignments (90%–100% correct site localization), while the non-ETD approaches lead to a fairly random assignment of O-GlcNAc sites to serine or threonine residues using Mascot (20%–50% correct site localization).

Table 1 Comparison of Nine Different Acquisition Modes for O-GlcNAc Peptide Identification and Site Localization

Recently, the Mascot Delta Score (MD-score) has been introduced as a simple method for confident phosphorylation site assignment [28]. Likewise, the MD-score can be applied to O-GlcNAc spectra to increase confidence in O-GlcNAc site assignments (i.e., high MD-score) and to identify O-GlcNAc site assignments for which evidence from tandem mass spectra is lacking (i.e., low MD-score). While the average MD-score for ETD is 21.0, it is only 0.8 for non-ETD approaches. In addition, the MD-score for 44% of all non-ETD spectra is 0, indicating that no decision at all about site localization could be made in these cases. It became clear from this systematic investigation that none of the compared tandem MS approaches was particularly successful in O-GlcNAc peptide identification as well as site localization. We concluded that decoupling O-GlcNAc peptide detection from identification and site localization might improve the analysis of O-GlcNAc peptides because it would allow combining the best features of each acquisition mode.

3.2 Detecting O-GlcNAc Peptides with an Optimized Discovery Experiment

Even though CID-type experiments on a LTQ Orbitrap instrument were inappropriate for site localization, the fragment ions involving the sugar moiety can be highly diagnostic. In particular, PQD and HCD provide access to the full fragment mass range enabling the detection of the HexNAc oxonium ion and its fragments. However, the selectivity of the HexNAc oxonium ion may be compromised by numerous possible interfering peptide fragment ions of very similar mass (Table S3). We synthesized the peptide QCPSYFQAK with or without O-GlcNAc on the serine residue. In addition to the oxonium ion (m/z 204.0866), this peptide can give rise to an a2(QC) fragment ion (m/z 204.0801). This allowed us to investigate how resolution, mass accuracy, and collision energy affect the specificity of detection of the HexNAc oxonium ion in the presence of potential interfering ions. Although sulfhydryl groups are typically blocked by alkylation in proteomics experiments, we have chosen the a2(QC) fragment for investigation because it is (along with the isobaric a3 fragment of the amino acid combination AGC) the only regular peptide fragment within 50 ppm of the mass of the oxonium ion.

Owing to the low resolution of ion trap tandem MS spectra, the a2(QC) and HexNAc oxonium ions cannot be distinguished by PQD (mass difference 32 ppm). In contrast, this is possible by HCD provided that resolution and/or mass accuracy are sufficiently high. It turns out that a 15,000 resolution (FWHM) is insufficient, but a 30,000 resolution separates both ions to near baseline (Figure S6). Because HCD spectra acquired at the lowest possible resolution of an Orbitrap (7500 FWHM) still feature mass accuracy of <10 ppm, unambiguous identification of the oxonium ion is possible by mass accuracy alone provided that one of the two fragment ions has a significantly higher intensity than the other.

Yet another alternative for the selective detection of O-GlcNAc peptides is to control the generation of the HexNAc fragment ions by tuning the collision energy. As depicted in Figure 3a, the HexNAc oxonium ion is already generated at low normalized collision energy (NCE) with maxima at 23% NCE (PQD) and 18% NCE (HCD). Both values are considerably lower than the typical NCE values (35%–40%) used for CID-type experiments. Concomitantly, peptide backbone fragmentation is much reduced at low collision energy, generating spectra that are almost completely devoid of peptide fragments (Figure 3b).

Figure 3
figure 3

(a) Collision energy characteristic of the HexNAc oxonium ion (solid line) and the interfering a2(QC) fragment ion (dashed line). For PQD, unmodified QCPSYFQAK and modified AEgSFGANAEK were used. HCD was optimized using QCPgSYFQAK. (b) Low collision energy PQD spectrum of the peptide LDGgTYFAAK. Note that almost all signals in this spectrum point to an O-GlcNAc modification of the precursor

HCD detection is inherently less sensitive than PQD, since precursor ions for HCD are isolated in the LTQ, accumulated in the C-trap before they are injected into the collision octopole, and the fragments are then transferred to the Orbitrap for detection. This process is inevitably accompanied by ion losses, which does not apply for PQD. Second, while the electron multipliers of the LTQ are capable of detecting a single ion, the Orbitrap detector requires a minimum of ~20 charges to detect a signal [29, 30]. This necessitates comparatively high AGC target value settings and, consequently, longer accumulation times. HCD is also significantly slower than PQD (sequential versus parallel MS and MS/MS). As expected, a side-by-side comparison of PQD and HCD (in triplicates) using the O-GlcNAc peptide library showed considerable differences in scan speed. While the discovery PQD experiment generated tandem mass spectra at 2.2 Hz, the speed of HCD data acquisition was only 1.1 Hz. Concomitantly, PQD and HCD generated 320 and 183 O-GlcNAc spectra, respectively, within 45 min LC-MS/MS time (Figure S7). All things considered, low energy PQD turned out to be the superior method. It provided sufficient selectivity and was more efficient than HCD for the detection of O-GlcNAc peptides despite its lower mass accuracy and resolution.

3.3 Scoring Tandem MS Spectra for the Presence of O-GlcNAc of Modified Precursors

With an efficient tandem MS method for the generation of diagnostic fragment ions at hand, we went on to develop a simple scoring scheme that differentiates O-GlcNAc from non-O-GlcNAc tandem spectra. We term this scoring scheme OScore because it utilizes and accounts for all spectral features pointing to the O-GlcNAc modification. The OScore S is calculated according to Eq. 1,

$$ S = - {\log_{{10}}}\sum {\frac{{{I_{{norm}}}}}{n}} $$
(1)

where I norm is the normalized intensity (i.e., divided by the sum of all intensities) of up to eight O-GlcNAc-specific spectral features (see experimental procedures and Figure 3b for details) and n is the intensity rank within the tandem mass spectrum. For calculation of the OScore, the fragment intensity is first normalized by the sum of all spectrum features to render the score independent of precursor intensity and, hence, robust against spectra from high abundant precursors. Second, the normalized intensity is further divided by the rank, in order to favour spectra, in which the O-GlcNAc diagnostic fragments are among the most intense peaks. This step concomitantly penalizes spectra that exhibit intense unspecific signals. The logarithmic transformation is used for convenience to rescale the score. The OScore is computed using a Perl script, which parses the peaklist contained in a mascot generic file (mgf) and calculates the rank and normalized intensities of all peaks in a spectrum before calculating the OScore. It requires at least one of the O-GlcNAc features to be present in the peaklist within a user-specified mass tolerance. The OScore script creates a tab-delimited output file containing (among other information) precursor m/z, precursor charge state, as well as retention time, which can be used to build inclusion lists for follow-up targeted experiments.

In order to assess the discriminating power of the scoring scheme, OScores were computed for a test set of low collision energy PQD spectra (approximately 750 O-GlcNAc spectra from the O-GlcNAc peptide library and 11,300 non-O-GlcNAc spectra from a tryptic digest of cytosolic E. coli proteins; Table S7). According to Figure 4a, the bimodal OScore distribution nicely discriminates O-GlcNAc peptides (low OScores) from unmodified peptides (high OScores). We also compared the OScore with other features of O-GlcNAc spectra, which could similarly be used as classifier to group O-GlcNAc and non-O-GlcNAc tandem MS spectra, e.g., the approach employed by Vosseller et al. [5], which utilizes the combination of the HexNAc oxonium ion and the neutral loss, or the HexNAc oxonium ion intensity, its normalized intensity, or the sum of normalized intensities of the oxonium ion and the HexNAc neutral loss. As revealed by a receiver operator characteristic (ROC) analysis, the OScore outperforms alternative classifiers and discriminates O-GlcNAc peptide spectra from spectra of unmodified peptides with 95% sensitivity at 99% specificity (Figure 4b). Furthermore, the area under the ROC curve (AUC) of the OScore is 0.997, indicating very high cumulative accuracy of the classifier.

Figure 4
figure 4

OScore-enabled detection of O-GlcNAc spectra. (a) OScore distribution of O-GlcNAc spectra (solid line), non O-GlcNAc spectra (dashed line) and positive predictive value (PPV, dotted line). (b) ROC plots of several O-GlcNAc spectrum classifiers

The bimodal distribution of OScores allowed the straightforward calculation of the probability that O-GlcNAc spectrum assignments with a given OScore are correct. Using Bayes’ Law and denoting correct and incorrect assignments as “+” and “–”, respectively, the positive predictive value (PPV) p(+−S) for an OScore S can be calculated according to Eq. 2,

$$ p\left( { + |S} \right) = \frac{{p\left( {S| + } \right)p\left( + \right)}}{{p\left( {S| + } \right)p\left( + \right) + p\left( {S| - } \right)p\left( - \right)}} $$
(2)

with p(S|+) and p(S|–) being the probabilities of OScores among O-GlcNAc and non-O-GlcNAc peptides, respectively, and p(+) and p(−) being prior probabilities representing the overall proportion of O-GlcNAc and non-O-GlcNAc spectra in the data set. The calculation of a PPV for a given OScore from (2) requires accurate models for the OScore score distributions. The symmetrical distribution of O-GlcNAc spectra was approximated using a Gaussian distribution and the asymmetrically distributed non-O-GlcNAc spectra were modeled on an offset-corrected γ distribution. Both distributions were fitted to the data using the method of least squares. Thus, with calculated mean μ and standard deviation σ, the probability for a correct O-GlcNAc spectrum assignment with an OScore S can be calculated according to Eq. 3,

$$ p\left( {S| + } \right) = \frac{1}{{\sqrt {{2\pi \sigma }} }}{e^{{\frac{{{{ - \left( {s - \mu } \right)}^2}}}{{2{\sigma^2}}}}}} $$
(3)

while the probability for an incorrect assignment can be calculated according to Eq. 4,

$$ p\left( {S| - } \right) = \frac{1}{{{\beta^{\alpha }}\Gamma \left( \alpha \right)}}\left( {{{\left( {{S_m} - S} \right)}^{{\alpha - 1}}} \cdot {e^{{\frac{{S - {S_m}}}{\beta }}}} - S_m^{{\alpha - 1}} \cdot {e^{{\frac{{{ - S_m}}}{\beta }}}}} \right) $$
(4)

with S m being the highest observed OScore and computed parameters α and β. Substitution of p(S|+) and p(S|–) in (2) by the modeled Gaussian and γ distribution along with computed prior probabilities p(+) and p(−) allowed calculation of PPVs (Figure 4a). It should be noted that low OScore values correspond to high probabilities and vice versa. As depicted in Figure S8, the computed PPVs are an accurate estimation of the observed probabilities (i.e., the fraction of correct O-GlcNAc spectrum assignments).

3.4 Identification of O-GlcNAc Peptides in a Complex Proteome

To demonstrate the practical utility of the two-stage LC-MS/MS approach, we compared it side-by-side to a conventional data-dependent ETD experiment using a highly complex tryptic digest of cytosolic E. coli proteins spiked with decreasing amounts of O-GlcNAc modified bovine α-crystalline (1:10, 1:100, 1:500, 1:1,000 wt/wt). Bovine α-crystalline is O-GlcNAc-modified at two sites (serine 162 of chain A, threonine 170 of chain B, see supplemental spectra). Serine 162 of chain A is modified at a stoichiometry of 10% [14] and Thr 170 is barely detectable. Considering that chain A and B are present in 1:1 stoichiometry, the molar content of O-GlcNAc of bovine α-crystalline is in the range of 5%. Hence, the actual spiking ratios in our experiment are in the order of 1:200 to 1:20,000 (wt/wt). The results are summarized in Table 2 and Figure 5. The data-dependent ETD experiment identified the O-GlcNAc peptide AIPVgSREEKPSSAPSS (615.6461 m/z, 3+; 922.9654 m/z, 2+) at a spiking ratio of 1:200. The two-stage approach detects and identifies the O-GlcNAc peptide still in the 1:2000 sample, suggesting an approximately 10-fold increased sensitivity over the conventional data-dependent approach. Both, the limit of detection and the limit of identification are reached at the spiking ratio of 1:2000, corresponding to 70 fmol O-GlcNAc peptide on column. Notably, the increase in sensitivity comes along with a significant increase in Mascot ion score (55 versus 24) at the same spiking ratio, thus providing higher confidence for the O-GlcNAc peptide identification.

Table 2 Detection Limits for the Identification of the O-GlcNAc Modified Peptide AIPVgSREEKPSSAPSS in a Complex Proteomic Background
Figure 5
figure 5

Limit of O-GlcNAc peptide identification in a complex proteome. Displayed are (a) the total ion current (TIC) of the 1:2,000 (wt/wt) sample, (b) PQD spectrum of the spiked O-GlcNAc peptide (solid arrows indicate O-GlcNAc ions; dashed arrows point to unexplained signals), and (c) the targeted ETD spectrum leading to the identification of AIPVgSREEKPSSAPSS

4 Discussion

4.1 Systematic Evaluation of Tandem MS Techniques

In light of the poor CID fragmentation of O-GlcNAc peptides, we systematically revisited the numerous fragmentation approaches available on an LTQ Orbitrap XL ETD instrument for their merits in O-GlcNAc peptide identification and site-localization.

Surprisingly, the highest number of O-GlcNAc peptides was identified by PQD (Table 1). ETD, NL-ETD, and NL-HCD also led to a reasonable number of O-GlcNAc peptide identifications. For the latter two, this is reasonable as both spectra are triggered following the detection of a diagnostic neutral loss in the preceding CID spectrum. However, along with other CID-type fragmentation techniques, PQD spectra could not be utilized to localize O-GlcNAc sites reliably. Here, ETD fragmentation offers the distinct advantage that it preserves the O-GlcNAc modification and thus enables the direct inference of the accurate site of modification (Table 1). But ETD has its limitations too: increasing mass and decreasing charge density of the precursor diminish the fragmentation efficiency of ETD [39] and may render the O-GlcNAc site determination with ETD impossible for large peptides. Among the CID-like fragmentation approaches, HCD was the most accurate fragmentation technique with 50% correctly identified O-GlcNAc sites by Mascot. This can be reasoned by the high mass accuracy and high dynamic range of HCD spectra [21], which also allow deducing the O-GlcNAc localization from very low intensity signals.

4.2 Scoring Tandem Mass Spectra for Presence of the O-GlcNAc Modification

The OScore is a conceptually new and straightforward approach to evaluating the presence of an O-GlcNAc modification of potentially modified peptides based on tandem mass spectra. The OScore does not require the detection of sequence-informative peptide fragments but, instead, relies exclusively on the presence and intensity of up to eight different fragments originating from the breakage of the O-glycosidic bond (Figure 3b). Unlike other PTM scores [3135], the OScore does not contribute any information about the localization of O-GlcNAc modification or the underlying peptide sequence. Instead, it assesses tandem MS spectra of complex peptide mixtures for the presence of the modification (Figure 4a and b). Using Bayesian statistics, the OScore can be further transformed into a positive predictive value for a given OScore (Figure 4a and S7). While these probability computations require a sufficient number of O-GlcNAc spectra to model score distributions of O-GlcNAc and non-O-GlcNAc spectra, the OScore itself is calculated on a single spectrum basis and, as such, indicates the presence of an O-GlcNAc modification by a low OScore irrespective of whether a single or hundreds of O-GlcNAc peptides are present in a sample. By design, the OScore is independent of the precursor signal intensity and, as such, independent of the amount of peptide on column. However, the quality of the OScore will decrease with decreasing signal-to-noise ratio of the precursor ion because the contribution of the chemical noise inevitably increases until the resulting tandem MS spectrum no longer primarily reflects the isolated O-GlcNAc peptide (Figure S9).

Nevertheless, the score is quite robust as exemplified in Figure 5. Apart from diagnostic fragments for O-GlcNAc, this PQD spectrum contains three intense signals (554.53, 729.44, and 831.16 m/z), which cannot be explained by typical fragment ions for the peptide AIPVgSREEKPSSAPSS, but very likely result from co-isolation and co-fragmentation of another peptide. With an OScore of 1.8, the precursor ion generating this mixed tandem mass spectrum is a reasonable candidate for inclusion in a targeted ETD experiment, which confirmed the sequence and modification.

Peptides modified by N- or O-linked glycans will probably also result in fairly low OScores (i.e., high O-GlcNAc probability), as they may lose HexNAc groups from their non-reducing ends. On the other hand, fragmentation of complex glycans will also result in numerous signals that do not indicate the O-GlcNAc modification and, hence, increase the OScore and decrease the probability of a false-positive O-GlcNAc spectrum assignment. The recently reported intracellular single N-linked HexNAc modification presumably resulting from breakdown of glycoproteins [6, 8] will probably not interfere, since the N-glycosidic linkage is more stable than the O-glycosidic bond under CID conditions [36], and the targeted ETD experiment would resolve this particular issue. Future experiments will address if and to which extent this modification may influence the discovery of O-GlcNAc peptides.

In addition to its utility in identifying candidate O-GlcNAc species in complex mixtures, the OScore may also support O-GlcNAc peptide identification by database searching. In particular in large-scale studies, the poor CID fragmentation of O-GlcNAc peptides, along with low search engine scores, is likely to result in both, an unnecessarily high proportion of overlooked (i. e. false-negative) as well as false-positive O-GlcNAc peptide-spectrum matches (PSMs). The OScore provides complementary information to that used by search engines, which may be used to ‘rescue’ genuine O-GlcNAc identifications despite having low search engine scores and to discriminate correct from incorrect O-GlcNAc PSMs. Both ways around, data quality would increase significantly. Another application of the OScore may be the retrospective analysis of existing data sets. This, however, would only work for data sets that were created by tandem MS, including the full mass range to ensure the detection of the HexNAc oxonium ion and its fragments.

4.3 Sensitivity of Detection and Identification

Compared with a conventional data-dependent ETD experiment, the two-stage approach resulted in a 10-fold increased sensitivity and significantly improved Mascot ion scores for the analyzed O-GlcNAc peptide of bovine α-crystalline spiked into a tryptic digest of E. coli proteins (Table 2, Figure 5). These improvements were achieved by decoupling the detection of potential O-GlcNAc precursors (by PQD) from their actual identification and site localization (by ETD), as well as by scoring tandem mass spectra for the presence of the O-GlcNAc moiety. This 10-fold gain in sensitivity comes, however, at the expense of requiring twice the amount of sample and measurement time. The limit of detection and identification determined here for a single peptide spiked into a highly complex background (low fmol range) probably represents a very conservative estimate. For less complex mixtures such as O-GlcNAc enriched proteomes or single O-GlcNAc proteins, one might expect limits of detection and identification in the mid amol range.

The PQD discovery experiment and the classical data-dependent ETD experiment acquired a similar number of tandem MS spectra at each spiking level (Table 2). They, therefore, had the same chance of detecting and identifying an O-GlcNAc modified peptide. However, at the spiking level of 1:2000 (wt/wt), only the PQD discovery experiment, in conjunction with the OScore, allowed the identification of the precursor 615.6461 m/z (3+) as a potentially modified peptide. In contrast, the conventional data dependent ETD experiment did not lead to a successful O-GlcNAc peptide identification at spiking ratios of less than 1:200 (wt/wt). Although the respective precursor ion could still be detected in the full scan spectra at higher spiking ratios, it was no longer among the species selected for fragmentation by PQD in a discovery experiment or a conventional data-dependent ETD experiment (Table 2).

While the PQD discovery experiment as well as the OScore was developed to maximize selectivity and sensitivity, the second-stage experiment focused on the targeted peptide identification and O-GlcNAc site localization. ETD was selected for this purpose because of its outstanding accuracy in O-GlcNAc site identification and its sound peptide identification performance (Table 1). Employing ETD in a targeted fashion enabled the acquisition of multiple ETD spectra across the chromatographic peak, which was accomplished by disabling the monoisotopic precursor selection as well as the rejection of unassigned charge states in the MS acquisition software. Owing to an enhanced signal-to-noise ratio and reliable ion statistics in tandem MS spectra, the acquisition of multiple ETD spectra per chromatographic peak resulted in an increased chance to identify the O-GlcNAc peptide as well as a higher confidence that the O-GlcNAc peptide-spectrum match is correct (Table 2).

4.4 Translation of the Two-Stage Approach to other MS Instruments

The two-stage approach has been developed using an LTQ Orbitrap XL ETD mass spectrometer. However, the approach can be easily translated to any other type of mass spectrometer, which is capable of detecting low-mass ions in tandem mass spectra and is ETD-enabled. When doing so, three aspects have to be considered. First, for the discovery experiment, it is important to adjust the fragmentation amplitude to generate O-GlcNAc spectra showing O-GlcNAc diagnostic fragments while suppressing (interfering) peptide fragment ions. Second, the peak list generated as input for the OScore script has to be converted into the Mascot generic format, e.g., using one of the free available peaklist conversion tools. Third, the resulting OScore distribution of O-GlcNAc and non-O-GlcNAc spectra will likely be different from instrument to instrument. Figure S10 shows the OScore distribution of O-GlcNAc and non-O-GlcNAc spectra acquired on an amaZon ETD ion trap mass spectrometer. As expected, the OScore distributions of PQD (Figure 4a) and the corresponding PAN experiment on the amaZon instrument (Figure S10) are alike, but span a different scale. Consequently, the OScore threshold for O-GlcNAc candidates has to be adjusted. Our approach lends itself to real-time decision-making akin to what has been proposed for the analysis of phosphopeptides [37], and further improvements should arise from increasing scan speed and sensitivity of ion trap-Orbitrap [38] and quadrupole TOF mass spectrometers [39, 40].

5 Conclusion

We believe that the developed analytical strategy has great potential for the broad-scale discovery of O-GlcNAc-containing proteins, particularly if combined with O-GlcNAc-specific enrichment tools. We further expect the OScore to become a valuable tool to improving the quality of O-GlcNAc peptide spectrum assignments. We, finally, anticipate that the increase in the number of documented O-GlcNAc proteins discovered in this or similar ways will shed further light on the functional significance of this emerging intracellular protein modification.