Introduction

The proteasome degrades intracellular proteins, marked for degradation by the ubiquitination pathway (Hershko and Ciechanover 1992). Protein degradation, i.e., proteolysis, is important to remove miss-folded proteins, to regulate cellular processes such as the cell-cycle or for the production of MHC-I ligands (Goldberg 2003; Seifert et al. 2010; van Leuken et al. 2008; Clijsters et al. 2013; Kloetzel 2001). Peptide fragments that result from proteolysis are rapidly degraded by cytosolic aminopeptidases (Reits et al. 2003). However, few peptides escape this degradation and are transported to the endoplasmatic reticulum (ER) by the transporter associated with antigen processing (TAP), where they can form peptide-MHC-I complexes (pMHCs) (Neefjes et al. 2011). pMHCs are presented on the cell surface to enable immune surveillance by T cells.

Most cells express the constitutive proteasome, which is a barrel-shaped multi-subunit protein complex, composed of two α- and two β-rings, where each ring contains seven subunits. In the β ring of the constitutive proteasome, three proteins are present that have proteolytic capacity: β1, β2, and β5 (Kloetzel and Ossendorp 2004). Under the influence of interferon- γ (IFN γ), these subunits can be substituted by β1 i , β2 i , and β5 i , respectively, to form the so-called immunoproteasome (Aki et al. 1994). Whereas the constitutive proteasome has a preference to cleave hydrophobic, acidic, and basic amino acids, the immunoproteasome is more efficiently cleaving after hydrophobic and basic amino acids (Gaczynska et al. 1993; Toes et al. 2001; Kesmir et al. 2003). Other proteasome types can be formed by a combination of constitutive and immunoproteasomal subunits (Guillaume et al. 2010), or with the β5 t subunit that is only expressed in cortical thymic epithelial cells (Murata et al. 2007). These different proteasome types largely overlap in their cleavage preferences (Guillaume et al. 2010; Murata et al. 2007; Florea et al. 2010), though the efficiency can differ at different cleavage sites which has an influence on the repertoire of MHC-I presented peptides (Kincaid et al. 2012).

Two main approaches have been taken to study proteolytic activity: in vitro digestion experiments and in vivo MHC-I-ligand elutions. In an in vitro digestion experiment, a protein is incubated with proteasomes. The peptide fragments that are formed during the digestion can be detected by mass spectrometry, and cleavage sites can be inferred from the fragments (Emmerich et al. 2000; Tenzer et al. 2004; Toes et al. 2001). So far, the cleavage sites of only three proteins, i.e., β-casein, enolase, and prion protein, have been determined in such in vitro assays (Emmerich et al. 2000; Tenzer et al. 2004; Toes et al. 2001). Alternatively, in vivo proteolytic activity can be measured by the analysis of digestion fragments that form pMHCs; these fragments can be eluted from a cell and identified by mass spectrometry. The C-terminus of the MHC-I presented peptide is generated by proteolytic activity and reflects an in vivo cleavage site in the protein from which the MHC-I ligand was derived (Kloetzel 2001). However, as many cleavage sites will result in fragments that do not become MHC-I ligands, only a small subset of all cleavage sites can be detected via this approach. In addition, other peptidases such as ACE, TPPII, and Nardilysin (Geier et al. 1999; Shen et al. 2008; Kessler et al. 2011) can influence the C-terminus of MHC-I ligands. Therefore, the MHC-I-ligand data is more likely to reflect the proteolytic activity of all cellular proteases, rather than the activity of just the proteasomes or one proteasome-type.

To study proteolyis and to aid MHC-I ligand predictions, different proteolyis predictors have been developed (Holzhutter et al. 1999; Holzhutter and Kloetzel 2000; Kesmir et al. 2002; Nielsen et al. 2005; Tenzer et al. 2005; Ginodi et al. 2008; Kuttler et al. 2000; Nussbaum et al. 2001. Most predictors, e.g., FragPredict (Holzhutter et al. 1999; Holzhutter and Kloetzel 2000), ProteaSMM (Tenzer et al. 2005), PAProC (Kuttler et al. 2000; Nussbaum et al. 2001), and PepCleave (Ginodi et al. 2008), have been trained on the in vitro proteasome digestion data from β-casein and enolase (Emmerich et al. 2000; Toes et al. 2001). NetChop 20S (Kesmir et al. 2002; Nielsen et al. 2005) and the so-called enhanced versions of ProteaSMM are trained on the in vitro proteasome digestion data from β-casein, enolase, and the prion-protein (Emmerich et al. 2000; Toes et al. 2001; Tenzer et al. 2004). Unlike the other predictors, NetChop Cterm is trained on in vivo MHC-I ligand data (Kesmir et al. 2002; Nielsen et al. 2005). Besides the different data sets that were used for training the methods, different computational techniques were used to construct the predictors. For instance, ProteaSMM models the cleavage pattern with a stabilized matrix method (SMM) using six amino acids C-terminal and four amino acids N-terminal of a potential cleavage site, and NetChop is based on a neural network that uses nine amino acids C-terminal and eight amino acids N-terminal of a potential cleavage site.

In 2005, Tenzer et al. (2005) bench-marked FragPredict, PAProC, NetChop-2.0, and ProteaSMM on several data sets, and showed that ProteaSMM best predicted in vitro proteasome digestion cleavage patterns, whereas NetChop-2.0 Cterm best predicted the cleavage patterns based on MHC-I ligands. Tenzer et al. (2005) argued that the increased performance of NetChop-2.0 Cterm on the MHC-I ligand data was due to a recognition of TAP-transportable peptides. After this study, NetChop was updated to version 3.0 (Nielsen et al. 2005) and a new method, PepCleave, was developed (Ginodi et al. 2008). Unfortunately, PepCleave cannot be compared to the other predictors as it predicts fragments and not cleavages (Ginodi et al. 2008). Therefore, we have chosen to compare ProteaSMM and the newest version of NetChop on new in vitro proteasome digestion data sets, and a new benchmark set of MHC-I ligands. Next to benchmarking, our analysis shines light on the nature of the difference between in vitro proteasome-only and in vivo whole-cell proteolytic activities, suggesting an important role for proteases other than the proteasome.

Results

Predicting in vitro cleavage patterns

To compare proteasome predictors, we generated a new independent data set. This data set was based on in vitro digestions of 17–30 amino acids long HIV–1 peptides; the products of these digestions were analyzed using mass spectrometry to determine cleavage and non-cleavage sites (see “Methods” section). Digestions were performed with either constitutive or immunoproteasomes. Of 368 possible cleavage sites, 150 (41 %) were used by the constitutive proteasomes, and 148 by the immunoproteasomes, 103 sites (of the 148 cleavage sites) were cleaved by both proteasome-types (Supplementary Table S1). Thus, even though the different proteasomes can target the different sites with varying efficiencies, the set of cleavage sites that is identified in this assay largely overlaps.

The prediction performance of ProteaSMM and NetChop-3.0 was analyzed using receiver operator characteristic (ROC) curves, where the number of correct and false predictions is plotted for every prediction threshold (Swets 1988). The area under a ROC-curve (AUC) is a performance measure of the predictor, and is widely used because it is threshold independent (Swets 1988). For each predictor (and different versions of the predictors), the AUCs were determined on both constitutive and immunoproteasomal cleavage patterns obtained from the in vitro digestions (Table 1). In general, the methods performed better in predicting the immunoproteasomal cleavage pattern. This could be explained by the more biased cleavage preference of immunoproteasomes, that cleave after hydrophobic and basic amino acids with greater, and after acidic amino acids with lesser efficiency (Gaczynska et al. 1993; Toes et al. 2001; Kesmir et al. 2003). Such a more biased cleavage pattern might be easier to predict. The immunoproteasomal cleavage pattern was best predicted by proteaSMM-immuno and proteaSMM-constitutive (ROC-comparison test: p < 0.001; Table 1), and the constitutive cleavage pattern was best captured by proteaSMM-constitutive and NetChop-3.0 20S (ROC-comparison test: p < 0.001; Table 1). Surprisingly, the enhanced ProteaSMM versions did not perform better, even though they are trained on extra data from proteasomally digested prion protein (Tenzer et al. 2004). NetChop-3.0 20S is also trained on prion data, but no version of this method is available that is not trained on prion data, to test if prion data negatively affects the performance of NetChop-3.0 20S.

Table 1 Predictor performances on in vitro proteasomal cleavage pattern predictions

In summary, the methods that have been trained on in vitro proteasome digestion data (proteaSMMs and NetChop-3.0 20S) outperformed the method that has been trained on in vivo MHC-I ligand data (NetChop-3.0 Cterm), which agrees with previous observations (Tenzer et al. 2005; Saxova et al. 2003) and the expectation that methods trained on in vitro data can best predict proteasome-only cleavage patterns.

Predicting in vivo cleavage patterns

In vivo proteolytic activity can be rather different from pure proteasomal activity, if other peptidases e.g., ACE, TPPII, or Nardilysin (Geier et al. 1999; Shen et al. 2008, 2011; Kessler et al. 2011) contribute to the in vivo proteolysis. As a result, the ability of different proteasome predictors to predict in vivo proteolysis might be different from their ability to predict in vitro proteasome-only cleavages. To test and compare the in vivo proteolysis prediction performances, we inferred in vivo cleavage sites from non-redundant MHC-I ligands that have been identified from 2005 on, after NetChop Cterm was last updated (n = 832; see Fig. 1 and “Methods” section). A data set of in vivo non-cleavage sites was derived in two ways: (1) by shuffling, 100 non-cleavage sites were made by shuffling the 19 amino acids flanking a cleavage site (the area used by NetChop for predictions plus one N-terminal and one C-terminal extension, see Fig. 2). (2) By assuming that all sites in the source protein of the MHC-I ligand that are not identified as cleavage sites are non-cleavage sites (“Methods” section). The predictors were assessed for their capacity to discriminate cleavage sites from non-cleavage sites, by comparing AUC values. Not surprisingly, NetChop-3.0 Cterm most accurately captured the in vivo cleavage pattern irrespective of the non-cleavage data set (ROC-comparison test: p < 0.001; Fig. 3). This is expected as NetChop Cterm has been trained on in vivo cleavage patterns inferred from MHC-I ligands.

Fig. 1
figure 1

Constructing the MHC-I ligand data set. MHC-I ligands and source proteins, that were discovered in elution studies, were derived from the SYFPHEITI database (Rammensee et al. 1999) and the IEDB database (Vita et al. 2010). The data sets were combined and non-redundant ligands that were not published before 2005 were selected. Every MHC-I ligand in its source protein represents a cleavage site; non-cleavage sites were derived by either shuffling an area of 19 amino acids around the cleavage site (Fig. 2 and “Methods” section), or by defining all other sites in the source proteins of MHC-I ligands as non-cleavage sites (“Methods” section)

Fig. 2
figure 2

Constructing non-cleavage sites by shuffling. The C-terminus of an MHC-I ligand (between P1’ and P1) is defined as a cleavage site. An area of 19 amino acids (from P10’ to P9) around a cleavage site was shuffled and the middle position was assigned as a non-cleavage site. For every cleavage site, 100 non-cleavage sites were constructed. The positions that are used by NetChop-3.0 (P9’ to P8) and ProteaSMM (P6’ to P4) for predicting cleavage probabilities are indicated

Fig. 3
figure 3

Predicting in vivo proteolysis. Proteasome cleavage predictors were tested as a stand-alone predictor, or in combination with a TAP predictor, and performance was assessed using AUC and VUP, respectively (see “Methods” section). The performance was tested using either non-cleavage data sets derived by the shuffling method (a) or by taking other sites from the source protein as non-cleavage sites (b) (see Fig. 1 and “Methods” section). Examples of the AUC and the VUP analyses are shown in the upper part, AUC and VUP scores are given in the lower part. In all analyses, NetChop-3.0 Cterm showed the highest performance (ROC-comparison test: p <0.001). The performance of NetChop-3.0 Cterm is shown in red lines, NetChop-3.0 20S in black, ProteaSMM Immuno in yellow, ProteaSMM Immuno enhanced in green, ProteaSMM constitutive in blue, and ProteaSMM constitutive enhanced in magenta lines

As in vivo proteolysis is inferred from MHC-I ligand data and NetChop Cterm is trained on such data, Tenzer et al. (2005) noted in an earlier benchmark study that the superior performance of NetChop might be due to a biased recognition of peptides with a high TAP affinity. To exclude this effect, the performance of the different proteasome predictors was tested in combination with a TAP transport predictor (Peters et al. 2003). Therefore, we first followed the approach from Tenzer et al. (2005) by summing TAP-transport and proteasome cleavage scores into a single score. For both non-cleavage site definitions, NetChop-3.0 Cterm outperformed the other predictors, even when the TAP transport scores were differently weighted prior to summation (Fig. 4 and S1).

Fig. 4
figure 4

Predicting in vivo proteolysis by combining proteolysis and TAP transport predictor scores. For the different proteasome cleavage predictors, the proteasome cleavage prediction score was added to the TAP transport prediction score (as proposed by Tenzer et al. (2005)). Prediction performance was measured as the AUC of an ROC-curve (Y-axis), using the shuffled sequences as non-cleavage sites (see Fig. 2 and “Methods” section). When combining the scores, the weight of the TAP transport score was changed by the factor W (on the X-axis). The combined score (C), based on the TAP transport (T) and proteolysis (P) score would be C=WT+P. As a result, the proteasome cleavage or the TAP transport predictor has a larger influence in the combined score if W is smaller or larger, respectively. See Fig. 3 for color coding

In an AUC-analysis, one can test the predictive performance of a single set of scores. However, we wanted to test the performance of a combination of two scores, i.e., proteasome cleavage and TAP transport scores, as an alternative to the additive model proposed by as Tenzer et al. did. Therefore, we developed a new method to measure the performance of these two scores simultaneously. In this method, for every TAP binding threshold, the performance of the cleavage predictor was measured on cleavage and non-cleavage sites exceeding the threshold. Next, an integration over all the performance scores was combined in a score called volume under the plane (VUP; see “Methods” section). For both non-cleavage definitions, NetChop-3.0 Cterm outperformed the other proteasome predictors based on VUP-scores (ROC-comparison test: p < 0.001; Fig. 3), again indicating that its higher performance is not due to a biased recognition of TAP ligands. Taken together, NetChop Cterm seems to predict in vivo proteolysis better than the other predictors that are trained on proteasome-only in vitro proteolysis data. This suggests that the proteolytic activity in vivo that underlies MHC-I ligand production is markedly different from in vitro proteasome-only proteolysis.

Comparing in vitro and in vivo proteolysis activity

To better understand why NetChop-3.0 Cterm predicts in vivo proteolysis better than the other predictors, even though these predictors better predict proteasome-only in vitro proteolysis, we examined for each predictor which cleavage sites were given a low prediction score. The cleavage sites with a bottom 5 % prediction score were selected for further analysis. A striking difference between NetChop-3.0 Cterm and the other predictors was observed at position P1’ of these poorly predicted cleavage sites (i.e., the C-terminus of the MHC-I ligand; Fig. 2). Whereas the amino acids at position P1’ were equally distributed for NetChop-3.0 Cterm, a Lysine was found in at least 50 % of the cases for the other predictors (Fig. 5). In other words, the predictors based on in vitro proteasomal cleavage data fail to capture the in vivo cleavage after Lysine residues. This fits with the described proteolytic preferences of TPPII and Nardilysin (Geier et al. 1999; Kessler et al. 2011), and the suggested role of these proteases in the generation of MHC-I ligands, for instance for HLA-A*03 and HLA-A*11 (Seifert et al. 2003; Kloetzel 2004; Kloetzel and Ossendorp 2004. In addition, other proteases such as ACE have been shown to influence the generation of MHC-I ligands (Shen et al. 2008, 2011) and their proteolytic activity could be captured by NetChop-3.0 Cterm. Taken together, these results suggest that NetChop-3.0 Cterm incorporates the activity of all different proteases that make a substantial contribution to in vivo proteolysis, thereby can predict in vivo proteolysis better.

Fig. 5
figure 5

Proteolytic activity after Lysine residues is only predicted by NetChop-3.0 Cterm. For every proteasome cleavage predictor, 5 % of the true cleavage sites with the lowest prediction scores were determined. The amino acid profile at P1’ (i.e., the C-terminus of the presented MHC-I ligand) of these cleavage sites with a low prediction score was analyzed. The height of the letters represents their frequency in the amino acid profile

Discussion

In this study, we analyzed how well different methods can predict the cleavage patterns in proteolysis. In vitro cleavage patterns were shown to be best captured by methods trained on in vitro proteasome digestion data, i.e., ProteaSMM and NetChop-3.0 20S (Table 1). Similarly, in vivo proteolysis was best predicted by the method that is trained on MHC-I ligand data, NetChop-3.0 Cterm (Fig. 3). Furthermore, we showed that the better prediction of in vivo proteolysis was not due to an embedded recognition of TAP transportable peptides (Figs. 3 and 4).

There can be two explanations for the difference between in vitro and in vivo proteolysis: First, the proteolytic activity of proteasomes in vitro might be different from their in vivo activity. This difference might result from the interactions with other molecules such as PA28 or the 19S cap regulatory particle (de Graaf et al. 2011; Emmerich et al. 2000). Second, other proteases such as TPPII, ACE, or Nardilysin might make a substantial contribution to the in vivo proteolysis (Geier et al. 1999; Shen et al. 2008, 2011; Kessler et al. 2011). The best described example of in vivo proteolytic activity that is not observed in vitro is the cleavage after Lysine residues. This activity is required to generate ligands for HLA–A*03 and HLA–A*11 that bind peptides with a Lysine at the C-terminus (Seifert et al. 2003; Kloetzel 2004; Kloetzel and Ossendorp 2004). A well-described example of such peptides is the HIV Nef-derived epitope at positions 73 to 82 with a Lysine at its C-terminus, and it was shown that the generation of this peptide depends on TPPII activity (Seifert et al. 2003). However, it is not yet known how dominant this endopeptidase activity is within the TPPII enzyme complex (Geier et al. 1999), and therefore it is not yet clear whether TPPII is responsible for all the activities creating the peptides with a Lysine at its C-terminal. More recently, a more detailed analysis of the substrate specificity of TPPII has been published , which suggests that the endopeptidase activity of TPPII is very much dependent on the length of the substrate and thus is not likely to be a very general enzymatic activity of TPPII. We show that only NetChop-3.0 Cterm captures this hallmark of in vivo proteolysis (Fig. 5). As this activity has not been contributed to the proteasome, we conclude that NetChop Cterm has learned to incorporate non-proteasomal proteolytic activity.

A biased recognition of TAP transportable peptide is not explaining the increased performance of NetChop Cterm on the prediction of in vivo cleavage sites derived from MHC-I ligand data (Figs. 4 and 5). Similarly, one could think that a bias to recognize MHC-I presented ligands should be controlled. NetChop Cterm, it was trained on in vivo cleavage sites derived from a set of pMHCs with a homogenous distribution of MHC-I molecules with various binding preferences (Nielsen et al. 2005) to minimize such a bias that would be due to the recognition of MHC-I binding peptides. In addition, the in vivo cleavage/non-cleavage site data sets in this study are derived from peptides that were not used to train NetChop Cterm and that were eluted from many different MHC-I molecules.

The evaluation of different proteasome cleavage predictors depends on the construction of a set of non-cleavage sites, as the performance on these and on the true cleavage sites needs to be compared. Unfortunately, a substantial set of true non-cleavage sites is not available, and therefore we have to rely on assumptions when compiling a set of non-cleavage sites. To prevent a bias as a result of such assumptions, we have followed two different sets of assumptions when constructing the non-cleavage sites. First, non-cleavage sites were made by shuffling the sequence around a cleavage site to destroy any motif that is used by the proteasome while keeping the same distribution of amino acids. Second, we considered other positions in the source protein as non-cleavage sites. Although identical conclusions were drawn from the analyses with the different sets of non-cleavage sites, identification of true in vivo non-cleavage sites is required to permanently settle this issue or to describe sequence motifs that truly inhibit proteasomal cleavage.

The development of proteasome predictors serves two goals. First, to understand the specificity and biochemical processes that underly proteolysis. Second, to predict and understand how this process influences the MHC-I ligandome. With respect to the first goal, we show that profound differences between proteasome activity in vitro and cellular proteolysis in vivo exist, suggesting a non-negligible role of non-proteasomal proteases. Evidently, the specificity of these additional proteases should be taken into account for optimal MHC-I ligand predictions. Therefore, we conclude that NetChop Cterm or future proteolysis predictors trained on in vivo data should be used in MHC-I ligandome predictions.

Methods

Data collection

Proteasomal in vitro cleavage patterns were derived from a digestion of HIV-1 peptides with constitutive or immuno-proteasomes, as explained in (Peters et al. 2002). Sixteen peptides from the HIV-1 proteins GAG and TAT, with a length of 17 to 30 amino acids, were degraded. After 0, 1, 2, 4, 8, and 24 h of degradation, peptide fragments were analyzed using mass spectrometry (as in Peters et al. (2002)). To avoid analyzing secondary cleavage products, peptide fragments found after 4 h of degradation were used to infer cleavage sites. Of 368 possible cleavage sites, 150 were efficiently cleaved by the immunoproteasome after 4 h and 148 were efficiently cleaved by the constitutive proteasome; 103 sites (69 %) were shown to be cleaved by both proteasome subtypes (Supplementary Table S1). The ProteaSMM proteasome cleavage predictors require six amino acids N-terminal and four amino acids C-terminal of a possible cleavage site. Therefore, cleavage predictions cannot be made at the beginning and end of a peptide sequence. As a result of this limitation, only 240 (of the 368) sites could be used to compare the different proteasome predictions. Of these 240 sites, 99 were efficiently cleaved by the immunoproteasome and 99 were efficiently cleaved by the constitutive proteasome; 68 sites were put in both sets.

In vivo cleavage sites were inferred from MHC-I ligand data . Ligands that were identified in MHC-I elution studies were downloaded from the SYFPHEITI database (Rammensee et al. 1999) and the IEDB database (Vita et al. 2010). Source proteins of the MHC-I ligands were downloaded from the NCBI via links that were provided by the SYFPHEITI and IEDB databases. The C-terminal residue of an MHC-I ligand was regarded as position P1’ of a cleavage site (Fig. 2). In total, 3076 MHC-I ligands with their source protein were derived from the SYFPHEITI database and 457 MHC-I ligands with their source protein were derived from the IEDB database. Identical peptides, or peptides that were either a C- or N-terminal extension of each other, were regarded as redundant. In addition, the ligands and their corresponding source proteins that were published before 2005, or which were redundant/identical to an MHC-I ligand published before 2005, were excluded because they could have been used for training of NetChop-3.0 Cterm. This filtering resulted in 832 MHC-ligands and their source proteins, of which every MHC-I ligand corresponds to a peptide fragment that is generated by in vivo proteolytic activity (Fig. 1).

Detecting in vivo non-cleavage sites based on the absence of a peptide in the MHC-I ligand databases is not possible, as many other reasons might underlie the absence of an MHC-I ligand, e.g., further degradation of the fragment or low affinity to MHC-I molecules. Therefore, non-cleavage sites were generated in two ways: (1) by shuffling of an area of 19 amino acids around the cleavage site (the longest flanking region used by a proteasome predictor method plus one extra amino acid on each side, as indicated in Fig. 2). After shuffling, the middle position, previously corresponding to the cleavage site, was assigned as a non-cleavage site. For every cleavage site, 100 non-cleavage sites were generated, i.e., in total 83.200 non-cleavages sites (Fig. 1). The advantage of this method is that the amino acid frequencies of cleavage and non-cleavage sites remain identical. (2) All sites in the source proteins of the MHC-I ligands, that were not assigned as a cleavage site were assumed to be non-cleavage sites (N = 507.538, Fig. 1).

Prediction performance measures

Proteasome cleavage and TAP transport predictions were performed as suggested by the developers of the different prediction methods (Peters et al. 2003; Tenzer et al. 2005; Nielsen et al. 2005). The different proteasome predictors were assessed for their performance in discriminating cleavage from non-cleavage sites. First, the performance of the proteasome predictors was tested using receiver operator characteristic (ROC) curves (Swets 1988). In a ROC curve, true positive proportions (TPP) and false positive predictions (FPP) are plotted on the y- and x-axis, respectively, for every threshold. The area under the ROC curve (AUC) is a measure of the predictor’s performance. If a predictor performs well, the TPPs increase faster than the FPP, and the AUC becomes larger than 0.5; the maximal AUC is 1.0.

The AUC can only be determined on a single set of prediction scores. However, we aimed to compare the prediction performance of the proteasome predictors in combination with the TAP transport predictor. Therefore, we developed an alternative performance measure: for every TAP transport prediction value, based on the cleavage and non-cleavage sites that exceeded the TAP transport value (T), the AUC was determined ( A U C T ). If less than 25 cleavage sites or non-cleavage sites exceeded the TAP threshold, it was discarded. A score was derived by integrating over all the AUCs with respect to the TAP threshold values and subsequent normalization by the range of TAP thresholds (1). The resulting score ranges between 0 and 1, a random predictor would score 0.5 and a perfect predictor would score 1, similar to the scores obtained in an AUC analysis. This score reflects the predictive performance of the proteolysis predictor for different data sets which have been selected over a range of possible TAP values. We call this performance measure volume under the plane (VUP):

$$ VUP = \frac{{\sum}_{i=1}^{n} \left(T_{i-1}-T_{i}\right)\times AUC_{T_{i-1}} + \frac{(T_{i-1}-T_{i})\times (AUC_{T_{i-1}}-AUC_{T_{i}})}{2}}{Max(T)-Min(T)} $$
(1)

Statistics

Statistical tests were performed using the stats-package from the scipy-module in Python. The difference between AUC/VUP performance measures was determined by deriving AUCs/VUPs on 50 new data sets that were generated by bootstrapping the original data set. The derived AUCs/VUPs were compared using a paired two-tailed t test; p values less than 0.001 were considered significant (as in Tenzer et al. (2005)). We refer to this test as the ROC-comparison test.