The Ups and Downs of Repeated Cleavage and Internal Fragment Production in Top-Down Proteomics
- 865 Downloads
Analysis of whole proteins by mass spectrometry, or top-down proteomics, has several advantages over methods relying on proteolysis. For example, proteoforms can be unambiguously identified and examined. However, from a gas-phase ion-chemistry perspective, proteins are enormous molecules that present novel challenges relative to peptide analysis. Herein, the statistics of cleaving the peptide backbone multiple times are examined to evaluate the inherent propensity for generating internal versus terminal ions. The raw statistics reveal an inherent bias favoring production of terminal ions, which holds true regardless of protein size. Importantly, even if the full suite of internal ions is generated by statistical dissociation, terminal ions are predicted to account for at least 50% of the total ion current, regardless of protein size, if there are three backbone dissociations or fewer. Top-down analysis should therefore be a viable approach for examining proteins of significant size. Comparison of the purely statistical analysis with actual top-down data derived from ultraviolet photodissociation (UVPD) and higher-energy collisional dissociation (HCD) reveals that terminal ions account for much of the total ion current in both experiments. Terminal ion production is more favored in UVPD relative to HCD, which is likely due to differences in the mechanisms controlling fragmentation. Importantly, internal ions are not found to dominate from either the theoretical or experimental point of view.
KeywordsUVPD HCD Statistical analysis Internal ion
Top-down analysis of whole proteins is a rapidly expanding area in mass spectrometry attributable to several unique capabilities [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. Proteins synthesized from a single gene can exist in several unique molecular forms because of events such as genetic mutation, alternative splicing, and post-translational modifications . Top-down mass spectrometry makes it possible to distinguish these proteoforms and their biological roles . Additionally, information about three-dimensional structure can be obtained through fragmentation of whole proteins and protein complexes [14, 15]. The suite of widely available methods for top-down fragmentation includes electron capture dissociation, electron transfer dissociation (ECD, ETD) [16, 17], and higher-energy collisional dissociation (HCD) , with ultraviolet photodissociation (UVPD) also rising in popularity . Relative to peptides, whole proteins are much larger molecules with vastly more atoms and vibrational degrees of freedom. To fragment such large species requires either (1) significantly higher total energy input or (2) weakening of specific bonds . HCD operates under the first principle whereas ECD/ETD utilize primarily the second, and UVPD likely functions via a combination of both. Regardless of which fragmentation paradigm is relevant for a given experiment, the potential for fragmenting the same protein multiple times must be considered. In other words, multiple photons or electrons can initiate independent fragmentation events, or for HCD the energy can be sufficiently high that product ions subsequently undergo secondary fragmentation.
Cleaving the peptide backbone in multiple places creates internal fragment ions, which have not received significant attention in the literature and are typically ignored during analysis in most experiments. Furthermore, there is a general misunderstanding within the mass spectrometry community of the statistics of internal ions in top-down analysis. Because the number of possible internal ions grows significantly with protein size, it is often assumed that internal ions will consume most of the ion intensity when large proteins are fragmented. Only a few reports have examined internal ion generation within the context of peptide dissociation [21, 22]. From a top-down perspective, Kelleher and co-workers previously reported that inclusion of internal fragment ions can increase sequence coverage for smaller proteins .
Herein we report the statistical analysis of internal fragment ions to reveal the inherent propensities for generating internal versus terminal fragment ions as a function of both protein size and the number of dissociation events. It is confirmed that as molecular size increases, the total number of unique internal ions that can be generated increases dramatically. However, the statistical probability for making any given internal fragment ion also decreases simultaneously at the same rate while the propensity for making terminal ions remains constant. Terminal ions are therefore predicted to be generated frequently, regardless of protein size. Although multiple dissociation events (i.e., more than two) can increase the proportion of internal ions generated from each precursor ion, terminal ions are still statistically predicted to dominate spectra, even with increasing protein size. Indeed, the fraction of precursor ion charge that ends up on terminal ions does not depend on protein size and can be easily estimated if the number of dissociation events is known. The statistical results were compared with actual HCD and UVPD data for proteins of varying sizes, revealing that a significant fraction of the ion intensity is attributed to terminal ions regardless of protein size. In general, UVPD results in a greater fraction of ion intensity being apportioned to terminal ions and more closely resembles statistical fragmentation.
A program was written in VB.net for statistical analysis of internal and terminal ions. With a given input sequence, the program cleaves the protein amide bond from one to four times at random locations to produce b and y ions and internal fragments. The masses of each fragment produced by a single run are recorded and the frequency of each product is then summed after multiple runs have been carried out. For each analysis, the number of runs was sufficient to sample all possible product ions repeatedly (typically hundreds of times). All cleavage points are treated equally, i.e., there is no provision for preferred cleavage at certain sites. It is assumed that disulfide bonds have all been reduced. All product ions are recorded, including single amino acids and short sequences without basic residues. Many of these products would be unlikely to be detected in real experiments, as is discussed in greater detail below.
Human whole cell lysate was prepared as previously described . Briefly, primary IMR90 human fibroblasts were lysed using a buffer composed of 4% SDS (w/v), 10 mM Tris-HCl (pH 7.8), 1 mM dithiothreitol, 10 mM sodium butyrate, in the presence of a protease inhibitors cocktail (Thermo Scientific), and the protein content was cleaned by acetone precipitation and subsequently fractionated using a GELFREE 8100 Fractionation System (Expedeon, Harston, Cambridgeshire, UK) equipped with a 10% and 8% GELFREE 8100 cartridge. Selected eluted fractions in the mass range from 0 to 30 kDa were cleaned from SDS and other contaminants via MeOH/H2O/CHCl3 extraction. Dried protein pellets were resuspended in 25 μL of Solution A (composed of 5% acetonitrile and 0.2% formic acid in water). Protein reversed-phase chromatographic separation was performed using a Dionex Ultimate 3000 chromatographic system (Thermo Scientific, Sunnyvale, CA, USA), with the nanobore column packed with ~30 cm of PLRP-S stationary phase (75 μm i.d., 5 μm particle size, Agilent, Santa Clara, CA, USA) on-line coupled to a nanoelectrospray ionization source. The analytical gradient was set as follows: Solution B (4.8% water in acetonitrile with 0.2% formic acid) was raised from 5% to 15% in 2 min, then from 15% to 50% in 50 min (followed by a wash at 95% B and re-equilibration at 5% B). The column outlet was linked to a 15 μm i.d. electrospray emitter (New Objective, Woburn, MA, USA), packed with ~0.5 mm of PLRP-S resin to avoid outgassing, through a high voltage union to which a ~2 kV potential was applied.
All mass spectrometry measurements were carried out on an Orbitrap Fusion Lumos mass spectrometer (Thermo Scientific, San Jose, CA, USA) operating in “protein mode,” under reduced (2 mTorr) N2 pressure in the HCD cell combined with “extended trapping” of ions in the HCD cell. The transfer capillary temperature was set at 320 °C, the source rf was set at 30% amplitude, and a 15 V offset was applied to favor protein ion desolvation and adduct removal. Acquisition parameters were set as follows: broadband MS scans used a resolving power (r.p.) of 120,000 (at 200 m/z) and were acquired within a 500–2000 m/z window averaging 4 microscans, with an AGC target 1e6 and a maximum injection time of 200 ms. Tandem MS (MS2) was performed in a data-dependent fashion by quadrupole-isolating the two most abundant species in the MS1 spectrum using a 3 m/z-wide isolation window. Ion activation/fragmentation used either HCD or UVPD @213 nm. For HCD, a normalized collision energy of 23% was used. UVPD was performed in the high pressure chamber of the linear ion trap using the fifth harmonic (corresponding to 213 nm photons, ~50 μJ/pulse) of a Nd:YAG solid-state laser (CryLas, Berlin, Germany). The number of laser pulses was varied depending on the average protein size of the analyzed GELFrEE fraction (typically 20–40 pulses were used). All MS2 scans were recorded with a r.p. of 60,000 (at 200 m/z) over a 400–2000 m/z window and four microscans, using an AGC target of 1e6 charges and a maximum injection time of 800 ms. Recorded .RAW files were analyzed using TDPortal (https://portal.nrtdp.northwestern.edu/) for protein identification, using a previously described workflow for human database search . Selected proteoforms, initially identified via TDPortal, were manually annotated, with spectral deconvolution performed using Xtract (Thermo Scientific).
Results and Discussion
Double cleavage yields two terminal fragments and one internal fragment per precursor ion, but this difference alone is insufficient to account for the observed disparity in abundance. A more compelling explanation is that any given internal fragment is generated (on average) less frequently due to dilution because the total number of internal fragments greatly exceeds the total number of terminal ions. For terminal fragments of ubiquitin, there are only 150 possibilities that can be populated, but there are 2401 unique mass internal fragments. The likelihood for generating a particular internal fragment is therefore significantly lower, although there is no inherent length bias for internal fragments in double cleavage, as illustrated by the flatness of the distribution. However, the density of internal fragments within a given mass window decreases with increasing mass (see Supporting Information for a histogram). There are also privileged internal fragments that occur with significantly higher frequency than most. Those at low mass correspond to single amino acids, with the highest frequency peak matching leucine/isoleucine, the most abundant amino acids in ubiquitin. The secondary line of internal fragments just above the primary distribution (indicated by the green arrow) correspond to frame-shifting fragments, i.e., fragments that are flanked by the same amino acid on either side, creating two mass equivalent sequences. For example, an internal sequence flanked by glycine GXXXXG, where X could be any amino acid would create mass equivalent GXXXX and XXXXG sequences, increasing the probability for this mass relative to non-frame-shifting sequences. Double and higher order frame-shifting sequences can also be observed.
Cleaving the backbone in three locations yields the results shown Figure 1c. The trends are largely similar to those observed for double cleavage with the exception of greater bias against long fragments, either terminal or internal. Terminal fragments are still generated with greater frequency than internal fragments for most of the mass range, although the difference is less than the margin observed in Figure 1b, and the point at which long terminal fragments are observed with equal probability to internal fragments occurs at lower mass. Triple cleavage creates two internal fragments per precursor ion, which accounts for the narrowing between distributions. Quadruple cleavage is shown in Figure 1d. When the backbone is cleaved repeatedly, bias against longer terminal fragments becomes more extreme, with the longest fragments being generated orders of magnitude less often than shorter fragments. The gap between internal and terminal fragments also decreases relative to triple cleavage as does the crossover point where long terminal ions are equivalent to shorter internal fragments. It is expected that these same trends would continue for additional backbone fragmentation, with the gap between terminal and internal fragments decreasing incrementally with each additional cleavage.
Fraction charge terminal ions
Fraction charge internal ions
Average intensity internal ion
Number internal ions
These complex statistical results can also be easily predicted by simply considering the experiment from the molecular point of view. A single cleavage of the peptide backbone yields two terminal ions, and consequently 100% of the ion current goes to terminal ions. Cleavage of the backbone twice yields two terminal ions and one internal ion. When averaged out over all potential cleavage points, this will lead to 1/3 of the charge being apportioned to each fragment, leaving 2/3 of the charge on terminal ions. The number of potential internal ions is actually irrelevant in this single molecule picture. By extension, the charge will be split 50/50 after triple cleavage, which will yield two internal and two terminal ions. Importantly, these considerations imply that top-down analysis should not be overly hindered by internal ion production as protein size increases, as long as the number of multiple backbone fragmentations is not excessive. Fortunately, given that large molecules are typically difficult to fragment, excessive sequential fragmentation is not likely to be an issue.
Comparison with Experiments
Statistical fragmentation of proteins significantly favors production of terminal ions over internal ions even if the protein backbone is fragmented repeatedly. This effect becomes more pronounced for larger proteins, with predicted abundances of terminal ions exceeding internals by orders of magnitude. Although the number of unique mass internal ions increases greatly with protein size, the statistical probability for observing any given internal ion decreases proportionately due to dilution. It is therefore incorrect to assume that the staggering number of internal ions that can be generated for large proteins will make detection of terminal ions unlikely. On the contrary, examination of the ion current in actual experiments reveals that terminal ions represent a significant fraction of the total, even for large proteins. The results further suggest that HCD is more prone for favored dissociation pathways while UVPD more closely reflects statistical dissociation.
The authors thank Neil Kelleher for helpful discussions and Connor Julian for assistance with the coding. Funding from the National Science Foundation (CHE-1401737) and NIH (R01 GM107099) supported this research, which was carried out in collaboration with the National Resource for Translational and Developmental Proteomics under grant P41 GM108569 from the National Institute of General Medical Sciences, National Institutes of Health.
- 5.Cleland, T.P., Dehart, C.J., Fellers, R.T., Vannispen, A.J., Greer, J.B., LeDuc, R.D., Parker, W.R., Thomas, P.M., Kelleher, N.L., Brodbelt, J.S.: High-throughput analysis of intact human proteins using UVPD and HCD on an Orbitrap mass spectrometer. J. Proteome Res. 16, 2072–2079 (2017)CrossRefGoogle Scholar
- 12.Smith, L.M., Kelleher, N.L., Linial, M., Goodlett, D., Langridge-Smith, P., Ah Goo, Y., Safford, G., Bonilla, L., Kruppa, G., Zubarev, R., Rontree, J., Chamot-Rooke, J., Garavelli, J., Heck, A., Loo, J., Penque, D., Hornshaw, M., Hendrickson, C., Pasa-Tolic, L., Borchers, C., Chan, D., Young, N., Agar, J., Masselon, C., Gross, M., McLafferty, F., Tsybin, Y., Ge, Y., Sanders, I., Langridge, J., Whitelegge, J., Marshall, A.: Proteoform: a single term describing protein complexity. Nat. Methods. 10, 186–187 (2013)CrossRefGoogle Scholar
- 13.Tran, J.C., Zamdborg, L., Ahlf, D.R., Lee, J.E., Catherman, A.D., Durbin, K.R., Tipton, J.D., Vellaichamy, A., Kellie, J.F., Li, M., Wu, C., Sweet, S.M.M., Early, B.P., Siuti, N., LeDuc, R.D., Compton, P.D., Thomas, P.M., Kelleher, N.L.: Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature. 480, 254–258 (2011)CrossRefGoogle Scholar
- 19.Shaw, J.B., Li, W., Holden, D.D., Zhang, Y., Griep-Raming, J., Fellers, R.T., Early, B.P., Thomas, P.M., Kelleher, N.L., Brodbelt, J.S.: Complete protein characterization using top-down mass spectrometry and ultraviolet photodissociation. J. Am. Chem. Soc. 135, 12646–12651 (2013)CrossRefGoogle Scholar
- 20.Marzluff, E.M., Beauchamp, J.L.: Collisional activation studies of large molecules. in large ions: their vaporization, detection, and structural analysis. Baer T, Ed. Wiley, New York (1996)Google Scholar
- 25.Sham, D.P., Early, B.P., Fellers, R.T., Greer, J.B., Thomas, P.T., Fornelli, L., LeDuc, R.D., Shwab, D.J., Kelleher, N.L.: Accurate estimation of false discovery rates for protein and proteoform identification in top down proteomics. Submitted Google Scholar