Sequence Scrambling in Shotgun Proteomics is Negligible
- First Online:
- Cite this article as:
- Goloborodko, A.A., Gorshkov, M.V., Good, D.M. et al. J. Am. Soc. Mass Spectrom. (2011) 22: 1121. doi:10.1007/s13361-011-0130-z
Analysis of 15,897 low-energy (CAD) and 10,878 higher-energy (HCD) collisional dissociation mass spectra of doubly protonated tryptic peptides taken with high resolution revealed that the rate of sequence scrambling due to b-ion cyclization is negligible (<1%) and can be safely ignored as a possible source of erroneous sequence assignment in shotgun proteomics. On the other hand, there is significant presence of normal (non-scrambled) internal fragments in HCD, which should be taken into account by MS/MS search engines.
Key wordsPeptide fragmentationCADHCDProteomicsDe novo sequencingDatabase search
Shotgun, or bottom-up, approach is a dominant way to study proteomes with mass spectrometry (MS) . In this approach, proteins are digested, typically by trypsin, and the resultant mixture of tryptic peptides is separated by liquid chromatography coupled on-line with tandem mass spectrometry (MS/MS). Peptide sequences are identified by matching the MS/MS datasets against theoretical spectra of in silico digested proteins from a sequence database . The success of such identification depends upon the quality of MS/MS data, in particular the presence of abundant sequence-specific ions and the absence of interferences. One such interference can arise due to internal fragments that result from cleavage in the precursor ion of more than one peptide bond . Moreover, N-terminal peptide fragments (b ions) have been reported to undergo cyclization followed by secondary backbone cleavage that can open the cycle at a different position than the cyclized peptide bond . Such an opening produces internal backbone fragments of b-type with scrambled amino acid sequences compared with the original peptide. For example, peptide ABCDE can give an ABCD b-ion that could cyclize and, upon ring opening between B and C, produces a fragment CDAB. The latter fragment can dissociate further, yielding CDA, CD, DAB, and AB, of which CDA and DAB have unexpected masses, while CD is a “normal” internal fragment. Polfer et al.  have found cyclization to be dominant process for larger polyglycine b ions containing more than eight residues. Such a process, if frequent, could jeopardize the accuracy of peptide sequence identification in shotgun proteomics.
There is an uncertainty on the effect of b-ion scrambling on sequence assignment in the course of large-scale proteomic experiment [5–7]. The early attempts to evaluate its significance have been limited to small peptide datasets. The purpose of this work is to study the appearance frequency of scrambled fragments through statistical evaluation of this effect in extensive, high-resolution MS/MS datasets. One of these datasets is SwedCAD, a library of 15,897 annotated high-resolution low-energy CAD MS/MS spectra of doubly charged peptides . The second database, SwedHCD, is introduced in this study. It contains 10,878 annotated high-resolution higher-energy (HCD) MS/MS spectra of unique doubly protonated peptides from Arabidopsis thaliana, rat, and human samples.
Proteomics samples were prepared according to commonly used protocol. Cells were lysed for 10 min at 40 °C in 0.1% solution of Rapigest (Waters). The extracted proteins were reduced for 10 min at 95 °C in 5 mM DTT (Sigma) and alkylated with 10 mM iodoacetamide (Sigma) for 30 min at room temperature. Subsequently, proteins were digested overnight using sequencing-grade trypsin in a 1:50 trypsin:protein ratio. The tryptic peptides were filtered using a 10 kDa cut-off filter (Pall Life Science) and dried in a speed vac.
Peptide mixtures were loaded into either HP1100 (Agilent) or EASY-nLC (Proxeon) equipped with a home-packed, sprayer-integrated 15 cm nano-column (Proxeon). Mass spectrometric analysis with low-energy CAD was carried out using a hybrid LTQ-Orbitrap [9, 10], and Orbitrap Velos (both - ThermoFisher Scientific) for higher-energy dissociation. Both MS and MS/MS ion detection was performed in an FT analyzer at >10,000 resolving power.
Resultant .raw files were converted into Mascot Generic Format (.mgf) files using DTA Generator [11, 12]. Data were searched against a concatenated forward and reversed databases using MASCOT search engine (Matrix Science). All resulting sequences were trimmed to a <1% false discovery rate  before inclusion into the SwedCAD or SwedHCD database. Redundant sequences were removed with the highest scoring peptides retained.
For each fragmentation spectrum in SwedCAD and SwedHCD databases, theoretical m/z values of fragments were generated and matched with 10 ppm tolerance against the experimentally obtained m/z values that were preliminarily de-charged and de-isotoped. Each experimental m/z value could be attributed only to a single fragment type in the following order: conventional backbone fragments (a, b, c, x, y, and z with possible loss of H2O and NH3 from a and b); internal fragments from the original sequence; fragments from the scrambled sequences. To monitor the number of spurious matches, y-ion sequences were scrambled. Since cyclization of y-ions has not been reported and it is highly unlikely from the mechanistic point of view, while y-ions have the same amino acid composition as b-ions, scrambled y-ions provide excellent decoy dataset for scrambled b-ions. As an additional control, theoretical m/z values of scrambled b-ions were shifted by +3 Th.
Since matches to scrambled y-ions and mass shifted scrambled b-ions are of spurious nature, and the number of such matches is similar to that of matches to scrambled b-ions, the formation rate of the latter species is within the statistical uncertainty caused by false positive discovery. To estimate the number of definitely wrong assignments among the detected hits, we assumed that the spurious assignments are distributed uniformly over the mass window. By subtracting from the total hit number the areas of the pedestals in the mass error histograms in Figures 1 and 2, we estimated the upper limit for true positive hits. For CAD MS/MS spectra, internal fragments gave maximum 1938 hits (1.3% of the normal b and y hits), scrambled fragments of b-ions, 1381 hits (<1 %), y-ions, 1458 hits, and 3 Th shifted scrambled fragments of b-ions, 1001 hits. For HCD MS/MS spectra, maximum 11,204 internal fragments were detected, compared with 1037 hits for scrambled b-ion, 1008 for scrambled y-ions, and 338 for 3 Th shifted b-ion fragments. Assuming Poisson statistics, standard deviation of a hit number is equal to the square root of that number. Thus, the number of matches for the scrambled b- and y-ions are the same within less than two standard deviations in both CAD and HCD, rendering them statistically indistinguishable. If we consider the second control, +3 Th shifted scrambled b-ions, the enhanced occurrence of scrambled b-ions becomes statistically significant, but still much less than 1% of the total number of matched fragments.
Note that the majority of previous studies demonstrating scrambling were performed on singly-charged, non-tryptic, and rather small peptides [4, 5, 7, 14]. We believe that the different outcomes in these studies compared with the current one were determined by the differences in the peptide nature and especially the charge state and the sequence length.
The rate of b-ion cyclization in real-life shotgun proteomics is negligible (<1%), and can be safely ignored as a possible source of error in sequence assignment. On the other hand, there is significant presence of normal internal fragments in HCD, which should be taken into account by MS/MS search engines.
The authors acknowledge support for this work by the Swedish Research Council (grant 2007–4410 to R.A.Z.). D.M.G. received a post-doctoral fellowship from the Wenner-Gren Foundation. M.V.G. thanks Russian Foundation for Basic Research, grant no. 11-04-00515-a