1 Introduction

Peptide cation dissociation generally induces backbone fragmentation to generate either b- and y-type or c- and z -type fragment ions (for collision- [1, 2] or electron-based [36] methods, respectively). Losses of amino acid side chains, however, are also observed [13, 712]. These processes have been well characterized for collisional activation and, more recently, for electron-capture dissociation [812]. ECD work has characterized not only general amino acid side chain loss, but also side chain loss specific to aspartic/isoaspartic [1315], glutamic/γ-glutamic acid [16], O-fucosylation [17], SO2 [18], isoleucine/leucine [19], and alkylated cysteine residues [11]. Combined, these studies reveal about 20 common ECD-specific losses [812, 20]. Beyond extending our fundamental knowledge of how dissociation occurs, such research holds promise to help in the correlation of spectrum to sequence [21].

Over the last few years electron-transfer dissociation (ETD)—the ion/ion analog of electron-capture dissociation (ECD)—has become increasingly widespread for a variety of proteomic applications [6, 2230]. In general, ETD and ECD spectra share many similarities; however, differences have been documented [31]. Direct comparison of ECD and ETD is confounded by extreme differences in the pressure at which dissociation occurs [13]. Unlike ECD, which occurs under high vacuum conditions (~10−9 Torr) after capture of a free electron by a multiply protonated peptide cation, ETD follows from transfer of an electron from an anion to the peptide cation at much higher pressures (~10−3 Torr). To date, ETD has been typically analyzed on low-resolution trapping instruments, while ECD is acquired primarily on high-resolution Fourier transform instruments. Despite these differences, those who have compared ETD and ECD have seen value in doing so [13]. To our knowledge, there has been no reported systematic study of ETD-induced side chain losses. With the recent coupling of ETD to the high resolution and high mass accuracy Orbitrap mass analyzer [26, 32], such an analysis is possible. Using an extensive dataset, we catalog 20 amino acid side chain losses following ETD. Five of these have not been reported in either ETD experiments or in ECD spectra contained within the SwedECD database. Further, we examine the frequency and specificity of each loss. With this information, we investigate possible diagnostic value for targeted and large-scale protein sequence analysis.

2 Experimental

2.1 LC-MS/MS Sample Preparation, Data Collection, and Database Searching

Human embryonic stem cells were lysed and digested with LysC (Wako Chemicals, Osaka, Japan). Generated peptides were desalted, of which 1.5 μg were loaded onto a capillary column, separated over a 90 min gradient, and analyzed on a Thermo Scientific (San Jose, CA, USA) ETD-enabled LTQ Orbitrap XL. Both MS1 and MS2 scans were acquired at a resolving power of 60,000. Fluoranthene reagent cations were introduced into both MS1 and MS2 scans, generating an internal calibrant peak having an m/z value of 202.07770 [33]. Raw data files were cleaned by DTA Generator [34, 35], and subsequently searched against a concatenated target-decoy human database (IPI v3.63) using the Open Mass Spectrometry Search Algorithm (OMSSA ver. 2.1.4) [36], with cysteine carbamidomethylation as a fixed modification, methionine oxidation as a variable modification, and three missed cleavages allowed. The peptide–spectrum matches (PSMs) were then filtered to a false discovery rate of 1%. Further details of this software can be found in Wenger et al. [35].

2.2 Synthetic Peptide Samples

Synthetic Peptides were purchased from Thermo Fisher Scientific. Peptides with c-terminal lysine were heavy labeled (13C6, 15N2) while internal cysteine residues were carbamidomethylated. Synthetic peptides were dissolved in 50% methanol, 2% acetic acid. The peptide solutions were directly infused via pico tip (New Objective, Woburn, MA, USA). The ETD spectra were collected on a Thermo Scientific ETD-enabled LTQ Orbitrap Velos. MS2 and MS3 scans were acquired at a resolving power of either 15,000 or 30,000.

2.3 Neutral Loss Extraction

Information about precursor neutral loss peaks was extracted from the raw data with software developed in-house (“PepGate”) using Visual C# with the Microsoft .NET Framework 3.5 and Thermo XRawfile COM library. The extracted peak information contains the mass of the loss, the precursor’s m/z, charge state, and signal-to-noise ratio. The neutral loss masses were then converted to elemental composition. Next, the theoretical masses of these elemental compositions were matched back to the neutral losses with a mass tolerance of ±5 ppm. A FileMaker Pro Advanced 9 database was then generated, containing the PSMs, neutral losses, and summary of the frequency of the amino acids.

2.4 MySQL Peptide Database Construction

We created a peptide database from human IPI ver. 3.63 by digesting proteins in silico with LysC enzymatic specificity, allowing up to three missed cleavages. The resulting peptide database was imported into MySQL (ver. 5.1.19-community). The final MySQL database included one “peptides” table, which consisted of three fields: “proID” (unique protein identifier), “sequence” (peptide sequence), and “mass” (peptide mass). The “mass” field was indexed using a binary tree structure.

2.5 Perl Programs for MySQL Database Query

From the FileMaker database file, a tab delimited text file was exported with each line containing information from each scan, such as the scan number, the precursor peptide mass tolerance (±3 ppm), and the amino acid content derived from neutral loss peaks. We wrote programs in Perl (ActivePerl ver. 5.10.0.1003) to build two SQL query statements for each scan, one with and one without the amino acid content information. The program then connected to and queried the MySQL database using the DBI Perl package through a DBD MySQL driver installed from Active State Perl Package Manager (ver. 4.02). The query results were written to a tab-delimited text file, which was then imported to FileMaker Pro Advanced 9 for further analysis. R (ver. 2.8.1) was used to plot the histograms of the SQL query results.

3 Results and Discussion

To obtain a large dataset of high resolution and high mass accuracy ETD tandem mass spectra, we obtained a complex mixture of proteins from lysed human cells. The mixture was digested with LysC and the resulting peptides were loaded onto a nanoflow reverse-phased capillary LC column and gradient eluted into an ETD-enabled LTQ Orbitrap XL mass spectrometer. Fluoranthene anions were used to induce ETD; fluoranthene cations, a byproduct of the chemical ionization process, were used as an internal standard to provide high mass accuracy [33]. Following the 2 h separation, database searching confirmed a total of 2699 PSMs (1% FDR).

We wrote custom software, PepGate, to identify en masse any generated neutral losses from these data. Neutral loss masses were then compiled and mapped to the corresponding chemical formulas. From these formulas, theoretical masses were generated and mapped back to all PSMs with a mass tolerance of ±5 ppm. Table 1 presents a comprehensive report containing all identified amino acid side chain losses along with a summary of sensitivity and specificity of each. Fifteen out of the 20 amino acid side chain losses presented in Table 1 also occur in ECD tandem mass spectra found within the SwedECD database [3, 812, 20, 21, 37]. The SwedECD database comprises a collection of ECD tandem mass spectra collected under typical ECD reaction conditions. Under alternate ECD conditions, we found evidence for three more of the ETD observed neutral losses that were missing from SwedECD [10, 15, 19, 38]. Falth et al. used the SwedECD database to calculate the sensitivity and specificity of ECD neutral losses. We note an overall excellent agreement with the calculations of Falth and co-workers for sensitivity and specificity of losses overlapping between ECD and ETD [12]. This high degree of overlap confirms that ECD and ETD are similar. However, distinct differences exist; for example, the loss from lysine side chain that has been reported in ECD was not observed in our ETD data [3, 812, 20, 21, 37]. Below we detail the five side chain neutral losses observed in ETD spectra that are not observed in the SwedECD database: CH2O2, C2H5NO, C2H7N2O, C4H7N2, and C4H8N2.

Table 1 Summary table of the mass, chemical formula, amino acid composition, sensitivity, and specificity of each detected neutral loss. A column indicating whether or not the loss has been observed in the SwedECD database is also included [12]

3.1 Loss of CH2O2 (46.00548)

The loss of CH2O2 (46.00548) has been widely accepted as originating from the side chain of either aspartic acid or glutamic acid in collision-activated dissociation [39]. The loss of a 46 Da side chain observed in ECD studies was attributed to the loss of a side chain carboxyl group plus one hydrogen (HCOO) in an early study [9], but was not associated with any specific amino acids at the time. Recent work of Li and co-workers suggests that partial side chain loss via α-hydrogen abstraction could explain this result [31]. To rule out the possibility that this loss arises from low-level, unintended collisional activation during the ion preparation and storage process for ETD, we analyzed a synthetic peptide containing an aspartic acid residue, YYNDVILHK. Triply protonated cations of the peptide were subjected to an ETD reaction with and without the fluoranthene reagent anions present. The neutral loss of CH2O2 was observed in the ETD MS/MS spectrum, but was not present when reagent ions were prevented from entering the trap (data not shown). These data suggest that the loss of CH2O2 arises from an ETD reaction.

3.2 Loss of C2H5NO (59.03711)

Zubarev et al. proposed that the loss of C2H5NO (59.03711) originates from glutamine or asparagine side chains [9]; however, they did not observe this from their large-scale analysis of the SwedECD database [12]. The loss of 58 Da has been reported in ECD spectra of Gln-containing peptides, which was explained by a C2H4NO loss from Gln [15]. Here we observe that the neutral loss detected in ETD is associated with the loss of an additional 1 Da (either H-atom or H+). We have found that 32% of peptides containing a Gln or Asn residue exhibit this loss, and 80% of spectra containing this loss were matched to peptides having a Gln or Asn residue.

3.3 Loss of C2H7N2O (75.05584 Da)

Presented in Figure 1 is the ETD spectrum of a triply protonated peptide having the sequence RQVLIRPCSK. By comparing the neutral loss regions of the singly and doubly charged ETnoD products, we notice that there is a neutral loss of 75.05542 from singly charged ETnoD product ([M + 3H]+••), but not from the doubly charged ETnoD product ([M + 3H]++•). At a 10 ppm mass tolerance a mass of 75.05542 Da unambiguously matches to the chemical formula of C2H7N2O. We postulate this loss follows from two sequential electron transfer events (Figure S-1,<MMC ID = 1> which can be found in the electronic version of this article). The first transfer to the primary amine at the N-terminus cleaves the N–Cα bond and results in loss of NH3, a well-established process in ECD [40]. This doubly charged product ion can then engage in the capture of a second electron, which in the presence of a carbamidomethylated cysteine side chain, can induce loss of 58.02929 Da (C2H4NO). The loss of 58.02929 Da is also observed in ECD spectra of peptides that were carbamidomethylated [17, 38, 41]. This two-step process also explains the presence of the loss of C2H7N2O only in the singly charged ETnoD product region (Figure 1b). To confirm this sequence, the [M + 3H − NH3]++• ETnoD product was isolated after an initial ETD reaction was performed on the [M + 3H]3+ precursor of a synthetic peptide containing two carbamidomethylated cysteine residues, LFVHNVVCHACK. The [M + 3H − NH3]++• ETnoD product was then subjected to another ETD reaction which produced the [M + 3H − C2H7N2O]+•• product (data not shown). This is further evidence that the loss of C2H4NO only occurs after the loss of NH3 and subsequent capture of an electron. While each event in this two-step process has been reported separately in ECD spectra, to our knowledge this is the first explanation of the mechanism of a 75.05584 Da neutral loss in ETD. Evidence for loss of 75.05584 Da was present in 67% of all Cys-containing peptides (n = 196).

Figure 1
figure 1

ETD spectrum of the triply protonated peptide RQVLIRPCSK with the assignment of product ion peaks and neutral losses in Da; (a) m/z range from lock mass to the doubly charged precursor ion; (b) m/z range from the doubly charged precursor ion to the singly charged precursor ion

3.4 Loss of C4H7N2 and C4H8N2 (83.05969 and 84.06875 Da)

In our dataset, we observed the loss of C4H6N2 (82.05310 Da) from His residues with similar sensitivity and specificity (11% and 100%, respectively), to that previously reported in ECD (38% and 99%, respectively) [8, 9, 12]. We found, however, the additional loss of 1 Da also occurred. The sum of the histidine side chain loss and the additional 1 Da (either H-atom or H+) losses maps to either C4H7N2 or C4H8N2. As shown in Table 1, these two types of losses are specific to histidine-containing peptides. Figure 2 demonstrates an example of such losses from the peptide YELHLK. In Figure 2b, an abundant peak shows a neutral loss of 83.05969 from the singly charged ETnoD product. The His82 loss is rather weak; the loss of 83.05969 appears only after two ion/ion reactions (Figure 2b), and not after the initial ion/ion reaction (Figure 2a). We speculate that the loss of His side chain in ETD occurs as a result of sequential ion/ion reactions. One transfer event initiates a H-atom loss (or proton abstraction) and another event induces cleavage of the side chain for a combined loss of 83 Da (C4H7N2) (Figure 2b) rather than 82 Da.

Figure 2
figure 2

ETD spectrum of triply protonated peptide YELHLK with the assignment of product ion peaks and neutral losses in Da; (a) m/z range from lock mass to the doubly charged precursor ion; (b) m/z range from the doubly charged precursor ion to the charge-reduced precursor ion

To offer a more direct comparison with ECD we synthesized the peptide ALANGFARSHALL and collected high-resolution ETD spectra of this peptide. Cooper et al. reported a loss of 82 Da from [M + 2H]2+ following ECD [8]. Interestingly, we did not observe the loss of 82 Da from either the [M + 2H]2+ or [M + 3H]3+ precursor after a single ion/ion reaction. A loss of 83 Da, however, was clearly present following two ion/ion reactions of the triply protonated precursor (Figure 3). To further confirm our supposition we collected ETD spectra of nine other His-containing peptides. Table 2 lists the His losses from various precursors of the total ten peptides. Nine out of 10 (90%) triply charged precursors showed a loss of 83 Da from [M + 3H]+••. Contrarily, only two of the 10 (20%) doubly charged precursors showed a loss of 82 Da from [M + 2H]+•. These observations confirm that the loss of 83 Da can be interpreted as the sum of initial proton or hydrogen atom loss and a subsequent His side chain loss.

Figure 3
figure 3

ETD spectra of ALANGFARSHALL with assignment of: (a) losses from [M + 2H]+• in the spectrum of [M + 2H]++, (b) losses from [M + 3H]++• in the spectrum of [M + 3H]+++, and (c) losses from [M + 3H]+•• in the spectrum of [M + 3H]+++. Each spectrum is the average of 500 scans. Note that although this peptide does not contain Asp/Glu, the loss of CO (27.995 Da) is clearly present. Loss of CO from a peptide not containing Asp or Glu has been observed by Pitteri et al. [52]

Table 2 Summary table of losses around 82 Da from synthetic histidine-containing peptides. “Loss” refers to the measured mass difference between the charge-reduced precursor and the most intense isotopic peak

3.5 Loss of N2H6, CH4N2, CH5N3, and C4H11N3 (34.05310, 44.03745, 59.04835, and 101.09530 Da)

Among the neutral losses observed in all ETD spectra examined in this experiment, the arginine side chain is one of the most prevalent and abundant. We observed the four major types of neutral losses from arginine residues previously reported in ECD spectra [812, 20, 21], namely Arg34 (N2H6, 34.05310), Arg44 (CH4N2, 44.03745), Arg59 (CH5N3, 59.04835), and Arg101 (C4H11N3, 101.09530). Figure 4 summarizes the detected frequencies of these four neutral loss types. The 16 rows in the table exhaust the possibilities of all combinations of these four arginine neutral loss types. This table shows that Arg34 and Arg59 losses are the two most frequent from the side chain of arginine following ETD. In addition, the combination of these two peaks shows the highest detected frequency among all 15 possible combinations of these four neutral losses, considering any one of the four neutral losses is present in a given spectrum. The presence of Arg44 and Arg101 losses alone is rather rare (with detection frequencies of 19 and 0, respectively).

Figure 4
figure 4

The detected frequency of arginine side chain neutral losses. Each row sum represents the number of PSMs with the corresponding combination of arginine side chain neutral losses. Each column sum represents the frequency of each specific neutral loss

3.6 Loss of NH3 (17.02655 Da)

The loss of 17.02655 appears in the vast majority of the ETD spectra. Within ±10 ppm mass tolerance, the numeric value of 17.02655 exclusively matches to NH3. It is debatable whether this loss of NH3 is from the N-terminal amine or from a residue side chain [8, 9]. In a number of cases, where the peptides do not terminate with a basic residue, a strong neutral loss of NH3 is still detected, suggesting that the NH3 is not from the C-terminal lysine. Moreover, ETD of the N-terminal acetylated peptide (ac)SDKPDMAEIEKFDK (Figure S-2)<MMC 1> does not produce a detectable NH3 loss from the charge-reduced precursor, [M + 3H]+••. Instead, an observed loss of 17.006 corresponds to the cleavage of a hydroxyl (–OH) group from the serine residue. This evidence suggests that the loss of NH3 is from N-terminal amine; therefore, the presence of 17.02655 has no diagnostic value that would indicate the amino acid content of the peptide. However, the absence of NH3 loss may indicate the existence of N-terminal modification.

3.7 Agreement of Amino Acid Content from Neutral Losses with PSMs

Note that while the specificity of many losses is fairly high (Table 1), the highest sensitivity of the amino acid-specific losses is only 67%. Therefore, the diagnostic value of these losses lies in confirmation, rather than exclusion, of the presence of certain amino acids in peptide precursors. There have been previous attempts to take advantage of neutral loss phenomena that occur in ECD experiments to improve database searching through post-acquisition data filtering [42]. This report, however, concluded that use of such losses was not advantageous for database searching. In our dataset, 1285 PSMs contained amino acid-specific neutral losses. Among them, 1189 (92.5%) agree with the database search result. The 7.5% disagreement could have a number of causes: modified peptides, false positive peptide identifications, false positive neutral loss peak detection, etc. Nonetheless, this high percentage of agreement suggests the potential use of these diagnostic losses to increase confidence in peptide identification from database searching.

We reasoned that the specificity of neutral losses in ETD spectra may be of use to control false discovery rate (FDR) following database searching. Incorporation of the loss information into our FDR-based filtering procedures, however, only produced marginal results, partially because of the absence or low abundance of losses in many spectra (data not shown). Further work in programming the ETD loss information into search algorithms, thereby improving scoring functions, may prove more fruitful.

3.8 Peptide Identification Using Accurate Mass and Neutral Losses

It has been demonstrated that peptides can be directly identified by accurate precursor mass measurement [4351]. To further explore the use of these amino acid-specific neutral losses listed in Table 1, we asked whether combining high mass accuracy information with amino acid contents as indicated by these specific neutral losses could be used to uniquely identify peptide sequences. Towards this goal, we first performed an in silico LysC-digestion of human protein database with up to three missed cleavages. We then transformed this peptide list into a MySQL database. For example, Figure 5 shows the candidate peptide list of a SQL query using only mass accuracy criteria (±3 ppm) of a precursor having mass of 1622.71992 Da. From the MS/MS data we concluded the peptide must contain aspartic acid (D), cysteine (C), and either glutamine (Q) or asparagine (N). Applying these additional amino acid content criteria resulted in the unique identification of the peptide SVSGTDVQEECREK. This example demonstrates the potential use of high mass accuracy of peptide precursors and the diagnostic value of neutral loss to either draw conclusion about the peptide sequence or help filter out the false positive identifications.

Figure 5
figure 5

Amino acid content information derived from neutral losses in ETD spectrum resolves the ambiguity of peptide identification from accurate mass-based protein database searching [4351]. There are eight candidate sequences (left panel) from human LysC peptide database that pass the accurate mass filtering (1622.71992 ± 3 ppm). From the ETD tandem mass spectrum (scan number 1286), we conclude that the precursor peptide contains aspartic acid, cysteine, and also must contain either glutamine or asparagine. Applying this information filters out seven peptides and leaves only one candidate – SVSGTDVQEECREK (right panel), which is the same peptide identified by database searching

To gain a more complete understanding of the value of neutral loss peaks for identifying peptides, we performed a large-scale SQL query in which we used 1474 MS/MS spectra to generate the SQL query from each. A Perl script was written to connect to and query the human LysC peptide database. Approximately 7% of these SQL queries lead to unique peptide identification. Figure 6 displays the frequency histograms of number of distinct peptides from SQL queries against the human LysC peptide database. Comparing with the SQL queries using only mass accuracy with the same level of tolerance (±3 ppm), we observe a significant shift towards a much narrower list of candidate sequences as a result of the extra information amino acid contents derived from the neutral losses provides. By adding side chain loss information, the median number of sequence candidates from accurate mass query is reduced from 21 to 8.

Figure 6
figure 6

Histograms showing the frequency of sequence candidates from SQL query against the human LysC in silico digested peptide database using information either only from accurate mass (red) or from accurate mass and amino acid composition automatically derived from ETD neutral losses (blue)

4 Conclusion

The neutral losses that occur following ETD were characterized by use of an Orbitrap mass analyzer with high resolving power in a large-scale manner. Most of the observed neutral losses match with those previously reported within the SwedECD database. With the high mass accuracy achieved with the Orbitrap mass analyzer, we identified a unique chemical formula for many neutral losses in ETD spectra and linked them to the side chains of specific amino acids. We also explored the efficiency of combining the accurate mass and amino acid content derived from neutral losses in ETD spectra to query human protein database. We showed that unique peptides could be identified by using accurate mass and side chain loss information; the specificity of accurate mass queries were improved significantly by adding side chain loss information (the median of sequence candidates reduced from 21 to 8). This may provide a useful method for either filtering out false positive identifications or identifying candidate sequences directly from information about their mass and amino acid contents.