Introduction

Top-down analysis of whole proteins is a rapidly expanding area in mass spectrometry attributable to several unique capabilities [1,2,3,4,5,6,7,8,9,10,11]. Proteins synthesized from a single gene can exist in several unique molecular forms because of events such as genetic mutation, alternative splicing, and post-translational modifications [12]. Top-down mass spectrometry makes it possible to distinguish these proteoforms and their biological roles [13]. Additionally, information about three-dimensional structure can be obtained through fragmentation of whole proteins and protein complexes [14, 15]. The suite of widely available methods for top-down fragmentation includes electron capture dissociation, electron transfer dissociation (ECD, ETD) [16, 17], and higher-energy collisional dissociation (HCD) [18], with ultraviolet photodissociation (UVPD) also rising in popularity [19]. Relative to peptides, whole proteins are much larger molecules with vastly more atoms and vibrational degrees of freedom. To fragment such large species requires either (1) significantly higher total energy input or (2) weakening of specific bonds [20]. HCD operates under the first principle whereas ECD/ETD utilize primarily the second, and UVPD likely functions via a combination of both. Regardless of which fragmentation paradigm is relevant for a given experiment, the potential for fragmenting the same protein multiple times must be considered. In other words, multiple photons or electrons can initiate independent fragmentation events, or for HCD the energy can be sufficiently high that product ions subsequently undergo secondary fragmentation.

Cleaving the peptide backbone in multiple places creates internal fragment ions, which have not received significant attention in the literature and are typically ignored during analysis in most experiments. Furthermore, there is a general misunderstanding within the mass spectrometry community of the statistics of internal ions in top-down analysis. Because the number of possible internal ions grows significantly with protein size, it is often assumed that internal ions will consume most of the ion intensity when large proteins are fragmented. Only a few reports have examined internal ion generation within the context of peptide dissociation [21, 22]. From a top-down perspective, Kelleher and co-workers previously reported that inclusion of internal fragment ions can increase sequence coverage for smaller proteins [23].

Herein we report the statistical analysis of internal fragment ions to reveal the inherent propensities for generating internal versus terminal fragment ions as a function of both protein size and the number of dissociation events. It is confirmed that as molecular size increases, the total number of unique internal ions that can be generated increases dramatically. However, the statistical probability for making any given internal fragment ion also decreases simultaneously at the same rate while the propensity for making terminal ions remains constant. Terminal ions are therefore predicted to be generated frequently, regardless of protein size. Although multiple dissociation events (i.e., more than two) can increase the proportion of internal ions generated from each precursor ion, terminal ions are still statistically predicted to dominate spectra, even with increasing protein size. Indeed, the fraction of precursor ion charge that ends up on terminal ions does not depend on protein size and can be easily estimated if the number of dissociation events is known. The statistical results were compared with actual HCD and UVPD data for proteins of varying sizes, revealing that a significant fraction of the ion intensity is attributed to terminal ions regardless of protein size. In general, UVPD results in a greater fraction of ion intensity being apportioned to terminal ions and more closely resembles statistical fragmentation.

Experimental

Simulations

A program was written in VB.net for statistical analysis of internal and terminal ions. With a given input sequence, the program cleaves the protein amide bond from one to four times at random locations to produce b and y ions and internal fragments. The masses of each fragment produced by a single run are recorded and the frequency of each product is then summed after multiple runs have been carried out. For each analysis, the number of runs was sufficient to sample all possible product ions repeatedly (typically hundreds of times). All cleavage points are treated equally, i.e., there is no provision for preferred cleavage at certain sites. It is assumed that disulfide bonds have all been reduced. All product ions are recorded, including single amino acids and short sequences without basic residues. Many of these products would be unlikely to be detected in real experiments, as is discussed in greater detail below.

Mass Spectrometry

Human whole cell lysate was prepared as previously described [24]. Briefly, primary IMR90 human fibroblasts were lysed using a buffer composed of 4% SDS (w/v), 10 mM Tris-HCl (pH 7.8), 1 mM dithiothreitol, 10 mM sodium butyrate, in the presence of a protease inhibitors cocktail (Thermo Scientific), and the protein content was cleaned by acetone precipitation and subsequently fractionated using a GELFREE 8100 Fractionation System (Expedeon, Harston, Cambridgeshire, UK) equipped with a 10% and 8% GELFREE 8100 cartridge. Selected eluted fractions in the mass range from 0 to 30 kDa were cleaned from SDS and other contaminants via MeOH/H2O/CHCl3 extraction. Dried protein pellets were resuspended in 25 μL of Solution A (composed of 5% acetonitrile and 0.2% formic acid in water). Protein reversed-phase chromatographic separation was performed using a Dionex Ultimate 3000 chromatographic system (Thermo Scientific, Sunnyvale, CA, USA), with the nanobore column packed with ~30 cm of PLRP-S stationary phase (75 μm i.d., 5 μm particle size, Agilent, Santa Clara, CA, USA) on-line coupled to a nanoelectrospray ionization source. The analytical gradient was set as follows: Solution B (4.8% water in acetonitrile with 0.2% formic acid) was raised from 5% to 15% in 2 min, then from 15% to 50% in 50 min (followed by a wash at 95% B and re-equilibration at 5% B). The column outlet was linked to a 15 μm i.d. electrospray emitter (New Objective, Woburn, MA, USA), packed with ~0.5 mm of PLRP-S resin to avoid outgassing, through a high voltage union to which a ~2 kV potential was applied.

All mass spectrometry measurements were carried out on an Orbitrap Fusion Lumos mass spectrometer (Thermo Scientific, San Jose, CA, USA) operating in “protein mode,” under reduced (2 mTorr) N2 pressure in the HCD cell combined with “extended trapping” of ions in the HCD cell. The transfer capillary temperature was set at 320 °C, the source rf was set at 30% amplitude, and a 15 V offset was applied to favor protein ion desolvation and adduct removal. Acquisition parameters were set as follows: broadband MS scans used a resolving power (r.p.) of 120,000 (at 200 m/z) and were acquired within a 500–2000 m/z window averaging 4 microscans, with an AGC target 1e6 and a maximum injection time of 200 ms. Tandem MS (MS2) was performed in a data-dependent fashion by quadrupole-isolating the two most abundant species in the MS1 spectrum using a 3 m/z-wide isolation window. Ion activation/fragmentation used either HCD or UVPD @213 nm. For HCD, a normalized collision energy of 23% was used. UVPD was performed in the high pressure chamber of the linear ion trap using the fifth harmonic (corresponding to 213 nm photons, ~50 μJ/pulse) of a Nd:YAG solid-state laser (CryLas, Berlin, Germany). The number of laser pulses was varied depending on the average protein size of the analyzed GELFrEE fraction (typically 20–40 pulses were used). All MS2 scans were recorded with a r.p. of 60,000 (at 200 m/z) over a 400–2000 m/z window and four microscans, using an AGC target of 1e6 charges and a maximum injection time of 800 ms. Recorded .RAW files were analyzed using TDPortal (https://portal.nrtdp.northwestern.edu/) for protein identification, using a previously described workflow for human database search [25]. Selected proteoforms, initially identified via TDPortal, were manually annotated, with spectral deconvolution performed using Xtract (Thermo Scientific).

Results and Discussion

Statistical Dissociation

To understand the statistical propensities for internal and terminal ion production in top-down analysis, stochastic fragmentation of various proteins was carried out in silico as detailed in the Experimental section. Results for ubiquitin are shown in Figure 1, where the number of times each fragment was observed after repeated trials is plotted as a function of mass. Fragmentation was simulated in a completely random fashion, with no preference for any cleavage site. The results for breaking the backbone a single time are shown in Figure 1a, which illustrates a completely flat abundance profile. Single cleavage of the backbone yields complementary N-terminal and C-terminal fragments, and close inspection of the results in Figure 1a reveals symmetry that results from this complementarity (see zoomed box in Figure 1a). As expected, all fragments are generated with essentially equal probability. Figure 1b shows results for the same approach where the backbone was cleaved twice, revealing distinct populations with significantly different abundances. The terminal fragments (orange) occur with much greater frequency than internal fragments (blue) for most of the mass range. The abundance of terminal fragments drops as mass increases because the peptide backbone is cleaved in two locations. In order to observe a large mass terminal ion, both fragmentation points must be located near the opposite terminus, which is statistically unlikely. Owing to this requirement, the longest terminal fragments are generated with frequency similar to the most abundant internal fragments.

Figure 1
figure 1

Statistical fragmentation of ubiquitin for (a) single, (b) double, (c) triple, and (d) quadruple cleavages of the peptide backbone. Terminal fragments are displayed in orange, internal fragments, blue. Green arrow indicates single-frame-shifting internal fragments.

Double cleavage yields two terminal fragments and one internal fragment per precursor ion, but this difference alone is insufficient to account for the observed disparity in abundance. A more compelling explanation is that any given internal fragment is generated (on average) less frequently due to dilution because the total number of internal fragments greatly exceeds the total number of terminal ions. For terminal fragments of ubiquitin, there are only 150 possibilities that can be populated, but there are 2401 unique mass internal fragments. The likelihood for generating a particular internal fragment is therefore significantly lower, although there is no inherent length bias for internal fragments in double cleavage, as illustrated by the flatness of the distribution. However, the density of internal fragments within a given mass window decreases with increasing mass (see Supporting Information for a histogram). There are also privileged internal fragments that occur with significantly higher frequency than most. Those at low mass correspond to single amino acids, with the highest frequency peak matching leucine/isoleucine, the most abundant amino acids in ubiquitin. The secondary line of internal fragments just above the primary distribution (indicated by the green arrow) correspond to frame-shifting fragments, i.e., fragments that are flanked by the same amino acid on either side, creating two mass equivalent sequences. For example, an internal sequence flanked by glycine GXXXXG, where X could be any amino acid would create mass equivalent GXXXX and XXXXG sequences, increasing the probability for this mass relative to non-frame-shifting sequences. Double and higher order frame-shifting sequences can also be observed.

Cleaving the backbone in three locations yields the results shown Figure 1c. The trends are largely similar to those observed for double cleavage with the exception of greater bias against long fragments, either terminal or internal. Terminal fragments are still generated with greater frequency than internal fragments for most of the mass range, although the difference is less than the margin observed in Figure 1b, and the point at which long terminal fragments are observed with equal probability to internal fragments occurs at lower mass. Triple cleavage creates two internal fragments per precursor ion, which accounts for the narrowing between distributions. Quadruple cleavage is shown in Figure 1d. When the backbone is cleaved repeatedly, bias against longer terminal fragments becomes more extreme, with the longest fragments being generated orders of magnitude less often than shorter fragments. The gap between internal and terminal fragments also decreases relative to triple cleavage as does the crossover point where long terminal ions are equivalent to shorter internal fragments. It is expected that these same trends would continue for additional backbone fragmentation, with the gap between terminal and internal fragments decreasing incrementally with each additional cleavage.

The results in Figure 1 allow examination of the statistics as a function of dissociation events, but true stochastic fragmentation would also lead to a random number of backbone cleavages. In other words, a fraction of molecules would be cleaved once, others twice, etc. Illustrative results for this type of process are shown for molecules of various sizes in Figure 2, where equivalent amounts of single, double, and triple cleavages were summed together. Results for bradykinin are shown in Figure 2a. For peptides of this length, the distinction between internal and terminal fragments is small. Short peptides, such as those routinely produced in proteolytic digests for proteomics experiments, have similar numbers of total internal and terminal fragments (16 terminal, 22 internal for bradykinin), mitigating the statistical dilution effect. In contrast, results for helicase illustrate clear differences between the propensity for creating internal and terminal fragments (see Figure 2b). Helicase is a ~32 kDa protein with 550 terminal and 34,356 internal unique mass fragments. The distinction between terminal and internal fragments becomes even larger for human serum albumin (HSA), as shown in Figure 2c. In fact, the ratio of the average abundance of terminal fragments to internal fragments of similar mass follows a linear trend as shown in Figure 2d. With larger molecular size, the probability for generating any particular internal fragment becomes less favorable because of the dilution effect.

Figure 2
figure 2

Linear sum of single, double, and triple statistical cleavages of (a) bradykinin, (b) helicase, and (c) human serum albumin. (d) Illustration of trend for ratio of internal/terminal ions as a function of protein size. Terminal fragments are displayed in orange, internal fragments, blue

In order to relate the information in Figures 1 and 2 to mass spectra, the charge must also be taken into consideration. If a fixed amount of charge is assigned to the total population of fragments according to mass, i.e., the amount of charge ending up on a fragment is equal to its proportion of mass, the amount of charge attributed to terminal and internal ions can be approximated. The results are summarized in Table 1 for double and triple cleavage of the peptide backbone for proteins of various sizes. Interestingly, the statistics predict that terminal ions account for the same fraction of total ion current regardless of protein size for a given number of backbone fragmentations. For double cleavage, terminal ions account for ~2/3 of the available charge. For triple cleavage, terminal ions retain ~1/2 of the charge. These results may seem counterintuitive given that the number of unique internal ions surges with increasing protein size. However, inspection of the results reveals that although the number of internal ions increases dramatically for larger proteins, the propensity for generating each internal ion decreases at essentially the same rate. These offsetting factors cause roughly the same amount of ion current to end up on internal ions, regardless of protein size.

Table 1 Charge Distributionᅟ

These complex statistical results can also be easily predicted by simply considering the experiment from the molecular point of view. A single cleavage of the peptide backbone yields two terminal ions, and consequently 100% of the ion current goes to terminal ions. Cleavage of the backbone twice yields two terminal ions and one internal ion. When averaged out over all potential cleavage points, this will lead to 1/3 of the charge being apportioned to each fragment, leaving 2/3 of the charge on terminal ions. The number of potential internal ions is actually irrelevant in this single molecule picture. By extension, the charge will be split 50/50 after triple cleavage, which will yield two internal and two terminal ions. Importantly, these considerations imply that top-down analysis should not be overly hindered by internal ion production as protein size increases, as long as the number of multiple backbone fragmentations is not excessive. Fortunately, given that large molecules are typically difficult to fragment, excessive sequential fragmentation is not likely to be an issue.

Comparison with Experiments

To gain additional insight, we analyzed representative results extracted from a high-throughput top-down experiment utilizing HCD and UVPD @213 nm. Detailed results are shown in Figure 3 for the +9 charge state of SH3 domain-binding glutamic acid-rich-like protein 3 (SH3BP), a ~10 kDa protein, and the +28 charge state of heat shock protein beta-1 (HSPB1), a ~23 kDa protein. Assignable fragments from the HCD spectrum for SH3BP are shown in Figure 3a using the same presentation scheme established in Figures 1 and 2. For experimental data, terminal and internal fragments do not separate into clearly distinct groupings as was observed in the statistical model. The primary reasons for this difference are: (1) not all fragments are detectable, and (2) dissociation is favored at some positions. More meaningful comparison between statistics and experiment can be extracted from the percentage of total ion current attributed to terminal and internal fragments, as shown in the bar graphs in Figure 3. For SH3BP, the majority of the ion current is attributed to terminal ions for both HCD and UVPD, although terminal ions are more dominant in UVPD. For the larger HSPB1 protein, internal ions account for more than half of the ion intensity in HCD and roughly half in UVPD. Based on the results in Table 1, to achieve the balance of terminal/internal ions observed in Figure 3c by statistical dissociation, the backbone would have to be fragmented five times on average (yielding two terminal and four internal ions, giving a 1:2 ratio). This degree of dissociation seems unlikely, suggesting that another factor influences the balance of ions detected. Indeed, inspection of the results reveals that many of the intense internal ions (e.g., residues 151-203, 154-203, 179-198, 2-42, 2-43, 3-42, 3-30) are generated by fragmentation near the termini. It is unlikely that the handful of residues comprising the terminal counterparts for these internal ions would be detected. HCD experiments are prone to favored fragmentation pathways [20, 26, 27]. If dissociation is favored near a terminus, the balance of charge attributed to terminal ions will be negatively impacted. In contrast, analysis of the UVPD data reveals that only two backbone fragmentations are needed to account for completely statistical generation of the ion intensity taken up by internal ions in Figure 3d. Double cleavage of the backbone is more experimentally plausible, but it is also possible that some sequential dissociation may contribute to the overall intensity of internal ions in UVPD because recent reports have suggested direct dissociation of the backbone in UVPD is rare [28,29,30]. Inspection of the internal ions generated by UVPD does not reveal abundant dissociation near the termini. Overall, the results hint that backbone dissociation in UVPD is more statistical in nature relative to HCD.

Figure 3
figure 3

Fragment intensity versus mass for terminal (red triangles) and internal (blue squares) ions from SH3BP-1 (a) HCD, (b) UVPD and HSPB1, (c) HCD, and (d) UVPD. The percent of the total ion count for each type of fragment is shown in the bar graphs

The percentage of total ion current distributions for twelve additional proteins for HCD and UPVD are shown sorted by precursor mass in Figure 4. In general, UVPD leads to more ion current in terminal ions and less variability (min ~50%, max ~95%). In contrast, HCD yields more sporadic results, which may be connected to the locations of facile fragmentation points within any given protein sequence. However, for both UVPD and HCD, a significant portion of the total ion count is detected in terminal ions, and no obvious connection is noted between molecular weight and the distribution of total ion current.

Figure 4
figure 4

Total ion current distributions for 12 additional proteins sorted by mass

Conclusions

Statistical fragmentation of proteins significantly favors production of terminal ions over internal ions even if the protein backbone is fragmented repeatedly. This effect becomes more pronounced for larger proteins, with predicted abundances of terminal ions exceeding internals by orders of magnitude. Although the number of unique mass internal ions increases greatly with protein size, the statistical probability for observing any given internal ion decreases proportionately due to dilution. It is therefore incorrect to assume that the staggering number of internal ions that can be generated for large proteins will make detection of terminal ions unlikely. On the contrary, examination of the ion current in actual experiments reveals that terminal ions represent a significant fraction of the total, even for large proteins. The results further suggest that HCD is more prone for favored dissociation pathways while UVPD more closely reflects statistical dissociation.