Role of peptide processing predictions in T cell epitope identification: contribution of different prediction programs
Proteolysis is the general term to describe the process of protein degradation into peptides. Proteasomes are the main actors in cellular proteolysis, and their activity can be measured in in vitro digestion experiments. However, in vivo proteolysis can be different than what is measured in these experiments if other proteases participate or if proteasomal activity is different in vivo. The in vivo proteolysis can be measured only indirectly, by the analysis of peptides presented on MHC-I molecules. MHC-I presented peptides are protected from further degradation, thus enabling an indirect view on the underlying in vivo proteolysis. The ligands presented on different MHC-I molecules enable different views on this process; in combination, they might give a complete picture. Based on in vitro proteasome-only digestions and MHC-I ligand data, different proteolysis predictors have been developed. With new in vitro digestion and MHC-I ligand data sets, we benchmarked how well these predictors capture in vitro proteasome-only activity and in vivo whole-cell proteolysis, respectively. Even though the in vitro proteasome digestion patterns were best captured by methods trained on such data (ProteaSMM and NetChop 20S), the in vivo whole-cell proteolysis was best predicted by a method trained on MHC-I ligand data (NetChop Cterm). Follow-up analysis showed that the likely source of this difference is the activity from proteases other than the proteasome, such as TPPII. This non-proteasomal in vivo activity is captured by NetChop Cterm and should be taken into account in MHC-I ligand predictions.
KeywordsProteasomal cleavage Proteolysis MHC-I presentation Peptide processing
The proteasome degrades intracellular proteins, marked for degradation by the ubiquitination pathway (Hershko and Ciechanover 1992). Protein degradation, i.e., proteolysis, is important to remove miss-folded proteins, to regulate cellular processes such as the cell-cycle or for the production of MHC-I ligands (Goldberg 2003; Seifert et al. 2010; van Leuken et al. 2008; Clijsters et al. 2013; Kloetzel 2001). Peptide fragments that result from proteolysis are rapidly degraded by cytosolic aminopeptidases (Reits et al. 2003). However, few peptides escape this degradation and are transported to the endoplasmatic reticulum (ER) by the transporter associated with antigen processing (TAP), where they can form peptide-MHC-I complexes (pMHCs) (Neefjes et al. 2011). pMHCs are presented on the cell surface to enable immune surveillance by T cells.
Most cells express the constitutive proteasome, which is a barrel-shaped multi-subunit protein complex, composed of two α- and two β-rings, where each ring contains seven subunits. In the β ring of the constitutive proteasome, three proteins are present that have proteolytic capacity: β1, β2, and β5 (Kloetzel and Ossendorp 2004). Under the influence of interferon- γ (IFN γ), these subunits can be substituted by β1i, β2i, and β5i, respectively, to form the so-called immunoproteasome (Aki et al. 1994). Whereas the constitutive proteasome has a preference to cleave hydrophobic, acidic, and basic amino acids, the immunoproteasome is more efficiently cleaving after hydrophobic and basic amino acids (Gaczynska et al. 1993; Toes et al. 2001; Kesmir et al. 2003). Other proteasome types can be formed by a combination of constitutive and immunoproteasomal subunits (Guillaume et al. 2010), or with the β5t subunit that is only expressed in cortical thymic epithelial cells (Murata et al. 2007). These different proteasome types largely overlap in their cleavage preferences (Guillaume et al. 2010; Murata et al. 2007; Florea et al. 2010), though the efficiency can differ at different cleavage sites which has an influence on the repertoire of MHC-I presented peptides (Kincaid et al. 2012).
Two main approaches have been taken to study proteolytic activity: in vitro digestion experiments and in vivo MHC-I-ligand elutions. In an in vitro digestion experiment, a protein is incubated with proteasomes. The peptide fragments that are formed during the digestion can be detected by mass spectrometry, and cleavage sites can be inferred from the fragments (Emmerich et al. 2000; Tenzer et al. 2004; Toes et al. 2001). So far, the cleavage sites of only three proteins, i.e., β-casein, enolase, and prion protein, have been determined in such in vitro assays (Emmerich et al. 2000; Tenzer et al. 2004; Toes et al. 2001). Alternatively, in vivo proteolytic activity can be measured by the analysis of digestion fragments that form pMHCs; these fragments can be eluted from a cell and identified by mass spectrometry. The C-terminus of the MHC-I presented peptide is generated by proteolytic activity and reflects an in vivo cleavage site in the protein from which the MHC-I ligand was derived (Kloetzel 2001). However, as many cleavage sites will result in fragments that do not become MHC-I ligands, only a small subset of all cleavage sites can be detected via this approach. In addition, other peptidases such as ACE, TPPII, and Nardilysin (Geier et al. 1999; Shen et al. 2008; Kessler et al. 2011) can influence the C-terminus of MHC-I ligands. Therefore, the MHC-I-ligand data is more likely to reflect the proteolytic activity of all cellular proteases, rather than the activity of just the proteasomes or one proteasome-type.
To study proteolyis and to aid MHC-I ligand predictions, different proteolyis predictors have been developed (Holzhutter et al. 1999; Holzhutter and Kloetzel 2000; Kesmir et al. 2002; Nielsen et al. 2005; Tenzer et al. 2005; Ginodi et al. 2008; Kuttler et al. 2000; Nussbaum et al. 2001. Most predictors, e.g., FragPredict (Holzhutter et al. 1999; Holzhutter and Kloetzel 2000), ProteaSMM (Tenzer et al. 2005), PAProC (Kuttler et al. 2000; Nussbaum et al. 2001), and PepCleave (Ginodi et al. 2008), have been trained on the in vitro proteasome digestion data from β-casein and enolase (Emmerich et al. 2000; Toes et al. 2001). NetChop 20S (Kesmir et al. 2002; Nielsen et al. 2005) and the so-called enhanced versions of ProteaSMM are trained on the in vitro proteasome digestion data from β-casein, enolase, and the prion-protein (Emmerich et al. 2000; Toes et al. 2001; Tenzer et al. 2004). Unlike the other predictors, NetChop Cterm is trained on in vivo MHC-I ligand data (Kesmir et al. 2002; Nielsen et al. 2005). Besides the different data sets that were used for training the methods, different computational techniques were used to construct the predictors. For instance, ProteaSMM models the cleavage pattern with a stabilized matrix method (SMM) using six amino acids C-terminal and four amino acids N-terminal of a potential cleavage site, and NetChop is based on a neural network that uses nine amino acids C-terminal and eight amino acids N-terminal of a potential cleavage site.
In 2005, Tenzer et al. (2005) bench-marked FragPredict, PAProC, NetChop-2.0, and ProteaSMM on several data sets, and showed that ProteaSMM best predicted in vitro proteasome digestion cleavage patterns, whereas NetChop-2.0 Cterm best predicted the cleavage patterns based on MHC-I ligands. Tenzer et al. (2005) argued that the increased performance of NetChop-2.0 Cterm on the MHC-I ligand data was due to a recognition of TAP-transportable peptides. After this study, NetChop was updated to version 3.0 (Nielsen et al. 2005) and a new method, PepCleave, was developed (Ginodi et al. 2008). Unfortunately, PepCleave cannot be compared to the other predictors as it predicts fragments and not cleavages (Ginodi et al. 2008). Therefore, we have chosen to compare ProteaSMM and the newest version of NetChop on new in vitro proteasome digestion data sets, and a new benchmark set of MHC-I ligands. Next to benchmarking, our analysis shines light on the nature of the difference between in vitro proteasome-only and in vivo whole-cell proteolytic activities, suggesting an important role for proteases other than the proteasome.
Predicting in vitro cleavage patterns
To compare proteasome predictors, we generated a new independent data set. This data set was based on in vitro digestions of 17–30 amino acids long HIV–1 peptides; the products of these digestions were analyzed using mass spectrometry to determine cleavage and non-cleavage sites (see “Methods” section). Digestions were performed with either constitutive or immunoproteasomes. Of 368 possible cleavage sites, 150 (41 %) were used by the constitutive proteasomes, and 148 by the immunoproteasomes, 103 sites (of the 148 cleavage sites) were cleaved by both proteasome-types (Supplementary Table S1). Thus, even though the different proteasomes can target the different sites with varying efficiencies, the set of cleavage sites that is identified in this assay largely overlaps.
Predictor performances on in vitro proteasomal cleavage pattern predictions
Constitutive cleavage prediction (AUC)
Immunoproteasomal cleavage prediction (AUC)
ProteaSMM immuno enhanced
ProteaSMM constitutive enhanced
In summary, the methods that have been trained on in vitro proteasome digestion data (proteaSMMs and NetChop-3.0 20S) outperformed the method that has been trained on in vivo MHC-I ligand data (NetChop-3.0 Cterm), which agrees with previous observations (Tenzer et al. 2005; Saxova et al. 2003) and the expectation that methods trained on in vitro data can best predict proteasome-only cleavage patterns.
Predicting in vivo cleavage patterns
In an AUC-analysis, one can test the predictive performance of a single set of scores. However, we wanted to test the performance of a combination of two scores, i.e., proteasome cleavage and TAP transport scores, as an alternative to the additive model proposed by as Tenzer et al. did. Therefore, we developed a new method to measure the performance of these two scores simultaneously. In this method, for every TAP binding threshold, the performance of the cleavage predictor was measured on cleavage and non-cleavage sites exceeding the threshold. Next, an integration over all the performance scores was combined in a score called volume under the plane (VUP; see “Methods” section). For both non-cleavage definitions, NetChop-3.0 Cterm outperformed the other proteasome predictors based on VUP-scores (ROC-comparison test: p < 0.001; Fig. 3), again indicating that its higher performance is not due to a biased recognition of TAP ligands. Taken together, NetChop Cterm seems to predict in vivo proteolysis better than the other predictors that are trained on proteasome-only in vitro proteolysis data. This suggests that the proteolytic activity in vivo that underlies MHC-I ligand production is markedly different from in vitro proteasome-only proteolysis.
Comparing in vitro and in vivo proteolysis activity
In this study, we analyzed how well different methods can predict the cleavage patterns in proteolysis. In vitro cleavage patterns were shown to be best captured by methods trained on in vitro proteasome digestion data, i.e., ProteaSMM and NetChop-3.0 20S (Table 1). Similarly, in vivo proteolysis was best predicted by the method that is trained on MHC-I ligand data, NetChop-3.0 Cterm (Fig. 3). Furthermore, we showed that the better prediction of in vivo proteolysis was not due to an embedded recognition of TAP transportable peptides (Figs. 3 and 4).
There can be two explanations for the difference between in vitro and in vivo proteolysis: First, the proteolytic activity of proteasomes in vitro might be different from their in vivo activity. This difference might result from the interactions with other molecules such as PA28 or the 19S cap regulatory particle (de Graaf et al. 2011; Emmerich et al. 2000). Second, other proteases such as TPPII, ACE, or Nardilysin might make a substantial contribution to the in vivo proteolysis (Geier et al. 1999; Shen et al. 2008, 2011; Kessler et al. 2011). The best described example of in vivo proteolytic activity that is not observed in vitro is the cleavage after Lysine residues. This activity is required to generate ligands for HLA–A*03 and HLA–A*11 that bind peptides with a Lysine at the C-terminus (Seifert et al. 2003; Kloetzel 2004; Kloetzel and Ossendorp 2004). A well-described example of such peptides is the HIV Nef-derived epitope at positions 73 to 82 with a Lysine at its C-terminus, and it was shown that the generation of this peptide depends on TPPII activity (Seifert et al. 2003). However, it is not yet known how dominant this endopeptidase activity is within the TPPII enzyme complex (Geier et al. 1999), and therefore it is not yet clear whether TPPII is responsible for all the activities creating the peptides with a Lysine at its C-terminal. More recently, a more detailed analysis of the substrate specificity of TPPII has been published , which suggests that the endopeptidase activity of TPPII is very much dependent on the length of the substrate and thus is not likely to be a very general enzymatic activity of TPPII. We show that only NetChop-3.0 Cterm captures this hallmark of in vivo proteolysis (Fig. 5). As this activity has not been contributed to the proteasome, we conclude that NetChop Cterm has learned to incorporate non-proteasomal proteolytic activity.
A biased recognition of TAP transportable peptide is not explaining the increased performance of NetChop Cterm on the prediction of in vivo cleavage sites derived from MHC-I ligand data (Figs. 4 and 5). Similarly, one could think that a bias to recognize MHC-I presented ligands should be controlled. NetChop Cterm, it was trained on in vivo cleavage sites derived from a set of pMHCs with a homogenous distribution of MHC-I molecules with various binding preferences (Nielsen et al. 2005) to minimize such a bias that would be due to the recognition of MHC-I binding peptides. In addition, the in vivo cleavage/non-cleavage site data sets in this study are derived from peptides that were not used to train NetChop Cterm and that were eluted from many different MHC-I molecules.
The evaluation of different proteasome cleavage predictors depends on the construction of a set of non-cleavage sites, as the performance on these and on the true cleavage sites needs to be compared. Unfortunately, a substantial set of true non-cleavage sites is not available, and therefore we have to rely on assumptions when compiling a set of non-cleavage sites. To prevent a bias as a result of such assumptions, we have followed two different sets of assumptions when constructing the non-cleavage sites. First, non-cleavage sites were made by shuffling the sequence around a cleavage site to destroy any motif that is used by the proteasome while keeping the same distribution of amino acids. Second, we considered other positions in the source protein as non-cleavage sites. Although identical conclusions were drawn from the analyses with the different sets of non-cleavage sites, identification of true in vivo non-cleavage sites is required to permanently settle this issue or to describe sequence motifs that truly inhibit proteasomal cleavage.
The development of proteasome predictors serves two goals. First, to understand the specificity and biochemical processes that underly proteolysis. Second, to predict and understand how this process influences the MHC-I ligandome. With respect to the first goal, we show that profound differences between proteasome activity in vitro and cellular proteolysis in vivo exist, suggesting a non-negligible role of non-proteasomal proteases. Evidently, the specificity of these additional proteases should be taken into account for optimal MHC-I ligand predictions. Therefore, we conclude that NetChop Cterm or future proteolysis predictors trained on in vivo data should be used in MHC-I ligandome predictions.
Proteasomal in vitro cleavage patterns were derived from a digestion of HIV-1 peptides with constitutive or immuno-proteasomes, as explained in (Peters et al. 2002). Sixteen peptides from the HIV-1 proteins GAG and TAT, with a length of 17 to 30 amino acids, were degraded. After 0, 1, 2, 4, 8, and 24 h of degradation, peptide fragments were analyzed using mass spectrometry (as in Peters et al. (2002)). To avoid analyzing secondary cleavage products, peptide fragments found after 4 h of degradation were used to infer cleavage sites. Of 368 possible cleavage sites, 150 were efficiently cleaved by the immunoproteasome after 4 h and 148 were efficiently cleaved by the constitutive proteasome; 103 sites (69 %) were shown to be cleaved by both proteasome subtypes (Supplementary Table S1). The ProteaSMM proteasome cleavage predictors require six amino acids N-terminal and four amino acids C-terminal of a possible cleavage site. Therefore, cleavage predictions cannot be made at the beginning and end of a peptide sequence. As a result of this limitation, only 240 (of the 368) sites could be used to compare the different proteasome predictions. Of these 240 sites, 99 were efficiently cleaved by the immunoproteasome and 99 were efficiently cleaved by the constitutive proteasome; 68 sites were put in both sets.
In vivo cleavage sites were inferred from MHC-I ligand data . Ligands that were identified in MHC-I elution studies were downloaded from the SYFPHEITI database (Rammensee et al. 1999) and the IEDB database (Vita et al. 2010). Source proteins of the MHC-I ligands were downloaded from the NCBI via links that were provided by the SYFPHEITI and IEDB databases. The C-terminal residue of an MHC-I ligand was regarded as position P1’ of a cleavage site (Fig. 2). In total, 3076 MHC-I ligands with their source protein were derived from the SYFPHEITI database and 457 MHC-I ligands with their source protein were derived from the IEDB database. Identical peptides, or peptides that were either a C- or N-terminal extension of each other, were regarded as redundant. In addition, the ligands and their corresponding source proteins that were published before 2005, or which were redundant/identical to an MHC-I ligand published before 2005, were excluded because they could have been used for training of NetChop-3.0 Cterm. This filtering resulted in 832 MHC-ligands and their source proteins, of which every MHC-I ligand corresponds to a peptide fragment that is generated by in vivo proteolytic activity (Fig. 1).
Detecting in vivo non-cleavage sites based on the absence of a peptide in the MHC-I ligand databases is not possible, as many other reasons might underlie the absence of an MHC-I ligand, e.g., further degradation of the fragment or low affinity to MHC-I molecules. Therefore, non-cleavage sites were generated in two ways: (1) by shuffling of an area of 19 amino acids around the cleavage site (the longest flanking region used by a proteasome predictor method plus one extra amino acid on each side, as indicated in Fig. 2). After shuffling, the middle position, previously corresponding to the cleavage site, was assigned as a non-cleavage site. For every cleavage site, 100 non-cleavage sites were generated, i.e., in total 83.200 non-cleavages sites (Fig. 1). The advantage of this method is that the amino acid frequencies of cleavage and non-cleavage sites remain identical. (2) All sites in the source proteins of the MHC-I ligands, that were not assigned as a cleavage site were assumed to be non-cleavage sites (N = 507.538, Fig. 1).
Prediction performance measures
Proteasome cleavage and TAP transport predictions were performed as suggested by the developers of the different prediction methods (Peters et al. 2003; Tenzer et al. 2005; Nielsen et al. 2005). The different proteasome predictors were assessed for their performance in discriminating cleavage from non-cleavage sites. First, the performance of the proteasome predictors was tested using receiver operator characteristic (ROC) curves (Swets 1988). In a ROC curve, true positive proportions (TPP) and false positive predictions (FPP) are plotted on the y- and x-axis, respectively, for every threshold. The area under the ROC curve (AUC) is a measure of the predictor’s performance. If a predictor performs well, the TPPs increase faster than the FPP, and the AUC becomes larger than 0.5; the maximal AUC is 1.0.
Statistical tests were performed using the stats-package from the scipy-module in Python. The difference between AUC/VUP performance measures was determined by deriving AUCs/VUPs on 50 new data sets that were generated by bootstrapping the original data set. The derived AUCs/VUPs were compared using a paired two-tailed t test; p values less than 0.001 were considered significant (as in Tenzer et al. (2005)). We refer to this test as the ROC-comparison test.
We thank Morten Nielsen, Becca Asquith, Bjoern Peters, Berend Snel, Ilka Hoof, Hanneke van Deutekom, and Xiangyu Rao for discussion on this research project and technical support. This study was financially supported by the Netherlands Organization for Scientific Research (www.nwo.nl, Computational Life Sciences Program, grant number 635.100.025), and by Utrecht University ( www.uu.nl). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Emmerich NP, Nussbaum AK, Stevanovic S, Priemer M, Toes RE, Rammensee HG, Schild H (2000) The human 26 S and 20 S proteasomes generate overlapping but different sets of peptide fragments from a model protein substrate. J Biol Chem 275(28):21,140–21,148. doi:10.1074/jbc.M000740200 CrossRefGoogle Scholar
- Florea BI, Verdoes M, Li N, van der Linden WA, Geurink PP, van den Elst H, Hofmann T, de Ru A, van Veelen PA, Tanaka K, Sasaki K, Murata S, den Dulk H, Brouwer J, Ossendorp FA, Kisselev AF, Overkleeft HS (2010) Activity-based profiling reveals reactivity of the murine thymoproteasome-specific subunit beta5t. Chem Biol 17(8):795–801. doi:10.1016/j.chembiol.2010.05.027 PubMedCentralPubMedCrossRefGoogle Scholar
- de Graaf N, van Helden MJG, Textoris-Taube K, Chiba T, Topham DJ, Kloetzel PM, Zaiss DMW, Sijts AJAM (2011) PA28 and the proteasome immunosubunits play a central and independent role in the production of MHC class I-binding peptides in vivo. Eur J Immunol 41(4):926–935. doi:10.1002/eji.201041040 PubMedCentralPubMedCrossRefGoogle Scholar
- Guillaume B, Chapiro J, Stroobant V, Colau D, Holle BV, Parvizi G, Bousquet-Dubouch MP, Theate I, Parmentier N, den Eynde BJV (2010) Two abundant proteasome subtypes that uniquely process some antigens presented by HLA class I molecules. Proc Natl Acad Sci USA 107(43):18,599–18,604. doi:10.1073/pnas.1009778107 CrossRefGoogle Scholar
- Kessler JH, Khan S, Seifert U, Gall SL, Chow KM, Paschen A, Bres-Vloemans SA, de Ru A, van Montfoort N, Franken KLMC, Benckhuijsen WE, Brooks JM, van Hall T, Ray K, Mulder A, Doxiadis, IIN, van Swieten PF, Overkleeft HS, Prat A, Tomkinson B, Neefjes J, Kloetzel PM, Rodgers DW, Hersh LB, Drijfhout JW, van Veelen PA, Ossendorp F, Melief CJM (2011) Antigen processing by nardilysin and thimet oligopeptidase generates cytotoxic T cell epitopes. Nat Immunol 12(1):45–53. doi:10.1038/ni.1974 PubMedCrossRefGoogle Scholar
- Peters J, Schönegge AM, Rockel B, Baumeister W (2011) Molecular ruler of tripeptidylpeptidase II: mechanistic principle of exopeptidase selectivity. Biochem Biophys Res Commun. 414(1):209–14. doi:10.1016/j.bbrc.2011.09.058
- Reits E, Griekspoor A, Neijssen J, Groothuis T, Jalink K, van Veelen P, Janssen H, Calafat J, Drijfhout JW, Neefjes J (2003) Peptide diffusion, protection, and degradation in nuclear and cytoplasmic compartments before antige presentation by MHC class I. Immunity 18(1):97–108PubMedCrossRefGoogle Scholar
- Seifert U, Maranon C, Shmueli A, Desoutter JF, Wesoloski L, Janek K, Henklein P, Diescher S, Andrieu M, de la Salle H, Weinschenk T, Schild H, Laderach D, Galy A, Haas G, Kloetzel PM, Reiss Y, Hosmalin A (2003) An essential role for tripeptidyl peptidase in the generation of an MHC class I epitope. Nat Immunol 4(4):375–379. doi:10.1038/ni905 PubMedCrossRefGoogle Scholar
- Seifert U, Bialy LP, Ebstein F, Bech-Otschir D, Voigt A, Schroter F, Prozorovski T, Lange N, Steffen J, Rieger M, Kuckelkorn U, Aktas O, Kloetzel PM, Kruger E (2010) Immunoproteasomes preserve protein homeostasis upon interferon-induced oxidative stress. Cell 142(4):613–624. doi:10.1016/j.cell.2010.07.036 PubMedCrossRefGoogle Scholar
- Shen XZ, Lukacher AE, Billet S, Williams IR, Bernstein KE (2008) Expression of angiotensin-converting enzyme changes major histocompatibility complex class I peptide presentation by modifying C termini of peptide precursors. J Biol Chem 283(15):9957–9965. doi:10.1074/jbc.M709574200 PubMedCentralPubMedCrossRefGoogle Scholar
- Tenzer S, Stoltze L, Schonfisch B, Dengjel J, Muller M, Stevanovic S, Rammensee HG, Schild H (2004) Quantitative analysis of prion-protein degradation by constitutive and immuno-20S proteasomes indicates differences correlated with disease susceptibility. J Immunol 172(2):1083–1091PubMedCrossRefGoogle Scholar
- Tenzer S, Peters B, Bulik S, Schoor O, Lemmel C, Schatz MM, Kloetzel PM, Rammensee HG, Schild H, Holzhutter HG (2005) Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding. Cell Mol Life Sci 62(9):1025–1037PubMedCrossRefGoogle Scholar
- Toes RE, Nussbaum AK, Degermann S, Schirle M, Emmerich NP, Kraft M, Laplace C, Zwinderman A, Dick TP, Muller J, Schonfisch B, Schmid C, Fehling HJ, Stevanovic S, Rammensee HG, Schild H (2001) Discrete cleavage motifs of constitutive and immunoproteasomes revealed by quantitative analysis of cleavage products. J Exp Med 194(1):1–12PubMedCentralPubMedCrossRefGoogle Scholar