Introduction

In the vertebrate immune system, T cell receptors (TCRs) recognize peptides presented by major histocompatibility complexes (MHCs) on the surface of antigen-presenting cells. Peptides that bind to the MHC molecule and trigger a cellular immune response are referred to as T cell epitopes, which are of great importance in the development of epitope-based vaccines and immunotherapies against viral infections, tumors and autoimmune diseases [14]. The binding of a peptide to a MHC molecule is a prerequisite to eliciting a cellular immune response. In the past few years, immunologists have put great efforts into identifying peptides that bind to MHC molecules. The accumulation of these peptide data enables the development of highly accurate computational predictors of MHC-binding peptides. The predictions of MHC-binding peptides, especially the peptides that bind to MHC class I molecules, are well-studied problems in immunoinformatics [57].

Immunogenicity is the ability of a MHC-presented peptide to induce an immune response, which is an important factor that influences whether a MHC-presented peptide can be a T cell epitope. In the past, studies about immunogenicity focused on immunodominance, which describes the dominant recognition of a relatively small number of potential epitopes by the immune system [8, 9]. Immunodominance is determined by many factors, such as the binding affinity of peptide–MHCs (pMHCs) [10], the stability of pMHCs [11, 12], the efficiency of MHC ligand processing [1315], immunoregulatory phenomena [16] and competitive exclusion of the T cell repertoire [17].

Peptide-immunization experiments have shown that approximately half of MHC class I-binding peptides cannot elicit a T cell response [18, 19], indicating the importance of analyzing the variables that affect the immunogenicity of MHC-binding peptides. Harndahl et al. [20] suggested that peptide-MHC class I stability is more important than peptide affinity in the prediction of CTL immunogenicity. Calis et al. [21] performed a detailed study on the properties of MHC class I-presented peptides that enhance immunogenicity and found that several important amino acids at specific positions are associated with immunogenicity. In these studies, the term immunogenicity was based on the definitions established by Sercarz et al. [22], where dominant, subdominant, and even cryptic epitopes were classified as immunogenic peptides. Therefore, the immunogenicity of MHC-presented peptides could be defined as the recognition of pMHCs by TCRs.

The immunogenicity of the MHC-presented peptide is determined by the interaction between the pMHC and TCR. The available quantitative data describing the interactions of this ternary complex include the binding affinity and stability of pMHCs. To date, numerous studies have explored the contributions of the binding stability and affinity of pMHCs to the immunogenicity in response to a host-pathogen interaction [11, 12, 20]. However, experiments performed on limited numbers of peptides binding to a specific MHC molecule might induce biased results; the relative importance of these two factors needs further investigation with a large dataset, which should considerably improve the prediction of T cell epitopes. The building and the continual updates of the epitope databank IEDB (http://www.iedb.org/) [2325], which is the largest database of immune epitopes and contains large amounts of experimental data on the binding abilities and T cell activities of pMHCs, make it possible to perform a system analysis of the variables affecting the immunogenicity of MHC-presented peptides based on a large dataset.

In this study, we extracted the T cell activity data and quantitative binding data of peptides presented by HLA molecules from IEDB and investigated the effect of peptide–HLA binding ability on the immunogenicity of potential CD8+ and CD4+ T cell epitopes by analyzing the interrelations among the binding affinity, stability and immunogenicity of the peptides presented by HLA class I and class II molecules.

Methods and materials

Data source

The datasets used here were mainly obtained from IEDB [23] (http://www.iedb.org/) in February 2015. The binding affinity and stability data of the peptide–MHC complexes were directly extracted from the “MHC ligand Assays” in the IEDB, while the data for the T cell activities of MHC-presented peptides were acquired from the published dataset downloaded from “T cell assays,” where each entry contained a specific PMID number. The human and mouse proteomes were downloaded from the UniProt [26] database (http://www.uniprot.org/) in February 2015.

Immunogenicity of pMHCs

The immunogenicity of each HLA-presented peptide was assigned a specific value based on the reported T cell activity in the IEDB as follows: 1 for positive and −1 for negative. If there were more than one published entry for a specific pMHC, the immunogenicity of the pMHC was determined by the sum of these values. pMHCs with a positive value were collected as immunogenic pMHCs, and pMHCs with a negative value were defined as non-immunogenic pMHCs. Other pMHCs with a value of zero were removed from the dataset.

Immunogenicity of non-self pMHCs and self pMHCs

Because some peptide-immunization experiments were performed using transgenic mouse models, all peptides contained in the immunogenic and non-immunogenic datasets were checked using the human and mouse proteomes. Peptides that could be found as a subsequence of any protein in these two proteomes were defined as self pMHCs, and the other peptides contained in the immunogenic and non-immunogenic datasets were defined as non-self pMHCs.

The detailed data of the immunogenicity of the self and non-self pMHCs can be found at http://www.immunoinformatics.net/supplementary/immunogenicity.xlsx.

Datasets used to analyze the variables affecting the immunogenicity of HLA-presented peptides

The immunogenic/non-immunogenic pMHCs with quantitative binding affinities (IC50 or EC50) or stabilities in the MHC ligand assays were collected to analyze the relationship between affinity and immunogenicity or the relationship between stability and immunogenicity. HLA–peptide complexes with identified binding affinity and stability were collected to detect the correlation between affinity and stability. The stability dataset submitted by Weinhold et al. (2009) (Reference ID in IEDB: 1014192 and 1014194), which had an obvious error in the quantitative unit, was not used in this study.

An overview of the quantitative data used here is shown in Table S1 (see supplementary data). HLA-A*02:01 and HLA-DRB1*0101 are the best studied HLA class I and II alleles, respectively, and the datasets of these two alleles contained the largest number of peptides with verified binding ability and immunogenicity in the respective HLA class. The analysis of the relationship between the binding ability and immunogenicity was mainly performed based on the peptides presented by these two well-studied alleles. The detailed data used here are now available at http://www.immunoinformatics.net/supplementary/Quantitative.zip.

Datasets used to evaluate the performance of the affinity-based prediction of the immunogenicity of the HLA-presented peptides

HLA class I

HLA-A*02:01: The binding affinities of the HLA-A*02:01-restricted 9-mers in the immunogenicity dataset of non-self pMHCs were predicted using NetMHC-3.4, the updated version of the best performance predictor according to the benchmark study [27]. Peptides with a predicted affinity stronger than 500 nM were collected and defined as Dataset_pred. The HLA-A*02:01-restricted 9-mer peptides with an experimental binding affinity stronger than 500 nM were categorized as Dataset_exp. Redundant peptides were removed from the two datasets to ensure that all remaining peptides differed from each other by at least two amino acids and that the subsequences of P3–P8 differed from each other by at least one amino acid. The 9-mer peptides in Assarsson’s dataset [18] (vaccinia-derived peptides presented in an HLA-A*02 transgenic mouse model) were also used in this study. The immunogenicity of the peptides from Assarsson’s dataset was defined by Calis et al. [21] as follows: The dominant, subdominant, and cryptic epitopes were classified as immunogenic, and the peptides with a negative response were classified as non-immunogenic.

HLA-A*24:02, B*07:02 and B*35:01: These were the other three alleles with more than 100 peptides in the immunogenicity dataset of non-self pMHCs. Due to the lack of quantitative binding data for these allele-restricted peptides with experimental immunogenicity, only peptides with a predicted affinity stronger than 500 nM were used to evaluate the performance of the affinity-based prediction of immunogenicity.

HLA class II

HLA-DRB1*01:01: The HLA-DRB1*01:01-restricted peptides in the immunogenicity dataset of the non-self pMHCs were predicted using NetMHCII2.2 [28, 29] (http://www.cbs.dtu.dk/services/NetMHCII/). Peptides with a predicted affinity stronger than 500 nM were included and defined as HLA-DRB1*01:01_pred. HLA-DRB1*01:01_exp was the set of HLA-DRB1*01:01-restricted peptides with an experimental affinity stronger than 500 nM in the immunogenicity dataset of the non-self pMHCs.

We also evaluated the performance of the affinity-based prediction of immunogenicity for the other HLA class II alleles with enough peptide data. After consideration of the ratio of immunogenic and non-immunogenic peptides and the total number of peptides with an experimental or predicted affinity stronger than 500 nM, only HLA-DRB1*03:01 and DRB1*07:01-restricted peptides with a predicted affinity stronger than 500 nM were available to perform the evaluation.

Evaluation

The overall performance of the prediction model was assessed by receiver operating characteristic (ROC) curve analysis [30], which is a widely used nonparametric performance measure. A ROC analysis tests the ability of a model to separate positive data from negative data without the need for selecting a threshold. The area under the ROC curve (AUC) provided a useful measure of the prediction quality as follows: a value of 0.5 indicates random prediction, and a value of 1.0 indicates perfect prediction. The efficacy of the binding affinity for distinguishing the immunogenic peptides from the non-immunogenic peptides was assessed using the ROC curves generated using the experimental affinity or predicted affinity. The performance of the affinity-based prediction of the immunogenicity of HLA-A*0201-restricted peptides was also compared to that of the prediction tool developed by Calis et al. [21] (http://tools.immuneepitope.org/immunogenicity/).

Results

Interrelations among the affinity, stability, and immunogenicity of HLA-I–peptide complexes

The binding between a peptide and a MHC molecule is required to trigger a cellular immune response. Binding affinity and stability are two general factors used to evaluate the peptide-MHC-binding ability. We investigated the effects of these two factors on the immunogenicity of HLA class I-presented peptides. As shown in Fig. 1, the percentage of immunogenic peptides in the non-self dataset increased as the binding affinity increased; this trend was observed in both the entire HLA-I-restricted dataset and the well-studied allele HLA-A*02:01-restricted dataset (Fig. 1a). The relationship between the stability and immunogenicity of HLA-I-peptide complexes showed a similar result to that observed between affinity and immunogenicity (Fig. 1b). According to the conventional standard for the definition of MHC binders, the stability of peptide–HLA-I complexes was more suitable than the affinity for distinguishing immunogenic peptides from non-immunogenic peptides. For example, 72.0 % of the HLA-I-presented non-self peptides with a high binding stability (T 1/2 ≥ 1 h) were immunogenic, whereas only 62.7 % of peptides with a binding affinity stronger than 500 nM (IC50 < 500 nM) were immunogenic; For HLA-A*02:01-restricted non-self peptides, 67.0 % of the peptides with high binding stability (T 1/2 ≥ 1 h) were immunogenic, while only 57.2 % of the peptides with a binding affinity stronger than 500 nM were immunogenic.

Fig. 1
figure 1

Percentages of immunogenic peptides in the collected dataset at various binding affinity and stability levels. a Relationship between affinity and immunogenicity. b Relationship between stability and immunogenicity

We also investigated the relationships between the binding abilities and the immunogenicity of self peptides presented by HLA class I molecules. After the self-reactive T cells were removed during the negative selection, most of self peptides were non-immunogenic. Thus, compared to non-self peptides, self peptides showed much lower percentages of immunogenic peptides at the corresponding affinity cutoffs. Nevertheless, the percentage of immunogenic peptides also increased with the binding affinity for the self peptides with a binding affinity stronger than 500 nM (Fig. 1a). The relationship between the stability and immunogenicity of self peptides showed was similar to that observed in non-self peptides, but the percentages at the corresponding stability cutoffs were higher than those of the non-self peptides (Fig. 1b), which might suggest that the binding stability of the HLA-I–peptide complex is a key factor for identifying self-reactive epitopes.

We then extracted the HLA-I-presented peptides with a binding affinity stronger than 500 nM or with a high stability (T 1/2 ≥ 1 h) and compared the affinities and stabilities between the immunogenic and non-immunogenic groups. As shown in Fig. 2a, the immunogenic peptides presented a significantly higher affinity (low IC50 values) than the non-immunogenic peptides; this difference was observed in both the HLA-A*02:01-restricted non-self and self datasets (non-self p < 0.0001, self: p = 0.0203, unpaired two-tailed t tests). However, significant differences in the stability were not observed between these two groups (Fig. 2b). We also collected peptides in a HLA-A*02:01-restricted peptide dataset in which the affinity, stability, and immunogenicity had all been experimentally determined, and we performed the comparison described above; the same results were observed (Fig. S1, see supplementary data). Due to the lack of quantitative peptide data, the stabilities could not be compared between the immunogenic and non-immunogenic groups for the other HLA class I alleles. We compared the affinities between the immunogenic and non-immunogenic groups for the peptide data of HLA-A*01:01 and B*07:02, the two alleles other than HLA-A*02:01 that contained the largest number of peptides with verified binding affinity and immunogenicity. However, significant differences were not observed. This result might be due the limited number of binding peptides with experimentally verified immunogenicity (A*0101: 44 and B*0702:131) and the unbalanced distribution of the binding affinities (e.g., about 80 % of the non-immunogenic peptides shared a binding affinity stronger than 100 nM in the collected HLA-B*0702-restricted dataset).

Fig. 2
figure 2

Comparison of the immunogenic and non-immunogenic peptides presented by HLA-A*02:01. a Comparison of the affinity of the immunogenic and non-immunogenic peptides with a binding affinity stronger than 500 nM. b Comparison of the stability of the immunogenic and non-immunogenic peptides with a high binding stability (T 1/2 ≥ 1 h). P values were obtained using unpaired two-tailed t tests

Furthermore, we investigated the correlation between the stability and affinity of peptide–MHC-I based on pMHCs whose binding affinity and stability were both experimentally determined. As shown in Fig. 3a, the stability of HLA-A*02:01-restricted peptides showed a significant tendency to increase with the affinity (R 2 = 0.2951, p < 0.0001), and this increase was only observed in the peptides with a binding affinity stronger than 500 nM. The correlation between stability and affinity was also detected for HLA-A*30:02-restricted peptides that contained the second largest number of peptides with verified binding affinity and stability, and similar results were observed (Fig. 3b). Combining with the result that significant difference between the immunogenic and non-immunogenic peptides was only observed in pMHCs with a binding affinity stronger than 500 nM, we proposed that the stability of a pMHC could not be used solely as a predictor to distinguish immunogenic pMHCs from non-immunogenic pMHCs. For the peptides with high binding affinities, the high binding stability was in favor of the immunogenicity. In our collected datasets, more than 70 % of HLA-A*02:01-restricted peptides with a binding affinity stronger than 500 nM and a high stability (T 1/2 ≥ 1 h) were found to be immunogenic. This percentage was higher than the values based on the two factors alone at the corresponding thresholds.

Fig. 3
figure 3

Correlation between the stability and affinity of HLA-I-restricted peptides. a HLA-A*02:01-restricted peptides; b HLA-A*30:02-restricted peptides

Predicting the immunogenicity of HLA-I-presented peptides based on binding affinity

Since the immunogenic peptides presented a significantly higher binding affinity than did the non-immunogenic peptides, we evaluated the ability of the binding affinity to distinguish immunogenic peptides from non-immunogenic peptides using the following three HLA-A*02:01-restricted 9-mer datasets: Dataset_pred, Dataset_exp, and Assarsson’s dataset. The performance was assessed by the ROC curves generated using the experimental affinity or the predicted affinity.

As shown in Fig. 4, the predicted affinity was a poor predictor for separation of the immunogenic peptides from the non-immunogenic peptides, and the area under the ROC curve (AUC) that was generated based on the predicted IC50 values was only 0.517 (Fig. 4a). However, the immunogenicity of pMHCs was predictable when using the experimental binding affinities (AUC of 0.650) (Fig. 4b). The prediction tool of immunogenicity provided by IEDB was useful when experimental binding affinity data were lacking, but the evaluation on Dataset_exp indicated that the experimental binding affinity was better at distinguishing immunogenic peptides from non-immunogenic peptides. Furthermore, we evaluated the overall performance on Assarsson’s 9-mer peptide data, where each peptide was bound to HLA-A*02:01 with an affinity stronger than 100 nM. The areas under the ROC curves generated using the experimental IC50 values and the predicted immunogenicity of the IEDB tool were 0.739 and 0.663, respectively (Fig. 4c), which further proved that the experimental binding affinity could be useful for defining immunogenic pMHCs.

Fig. 4
figure 4

Prediction of the immunogenicity of HLA-A*02:01-restricted peptides using the binding affinity and the IEDB tool. Performances were evaluated for the peptides with a predicted IC50 < 500 nM (a), peptides with an experimental IC50 < 500 nM (b) and Assarsson’s dataset (c)

The effect of binding ability on the immunogenicity of HLA-II-presented peptides

During a cellular immune response, the TCRs on the surface of Th cells recognize the peptides presented by MHC class II. Therefore, the variables affecting the immunogenicity of HLA-II-presented peptides were also investigated in this study. Because of the biased composition of the collected self peptides presented by HLA-II molecules (more than 75 % of these peptides were immunogenic) and the relative low number of self peptides presented by a specific HLA-II molecule, the peptides used here were non-self peptides. The percentage of immunogenic peptides increased with the binding affinity, and this was observed in the entire HLA-II-restricted dataset and the well-studied allele HLA-DRB1*01:01-restricted dataset (Fig. 5a); this finding was similar to the result for the peptide–HLA-I complexes. Significant differences in the binding affinities were also observed between the immunogenic and non-immunogenic HLA-II-restricted peptides with a strong binding affinity (IC50 < 500 nM, p = 0.0003) (Fig. 5b). Most of the HLA-DRB1*01:01-restricted peptides with a strong binding affinity were immunogenic (79/90), and significant differences between these two groups were not found (data not shown). In the collected stability dataset of peptide–HLA-II complexes, only 15 in 94 of the peptide–HLA-II complexes with a high stability (T 1/2 ≥ 1 h) were immunogenic. Each of the 15 immunogenic peptides had a binding affinity stronger than 1000 nM, and 12 of the 15 immunogenic peptides had an affinity stronger than 500 nM. Some of the peptides that showed binding to a specific HLA-II molecule with a high stability shared a low binding affinity, which further proved that the stability of pMHC could not be solely used as a predictor to distinguish the immunogenic from non-immunogenic peptides. However, the high percentage of immunogenic peptides at the threshold of 500 nM indicated that the binding affinity is sufficient for distinguishing immunogenic from non-immunogenic HLA-II-restricted peptides.

Fig. 5
figure 5

Effect of binding affinity on the immunogenicity of HLA-II-restricted peptides. a Relationship between affinity and immunogenicity. b Comparison of the affinity of the immunogenic and non-immunogenic peptides with a binding affinity stronger than 500 nM. p values were obtained using an unpaired two-tailed t test

We evaluated the efficacy of the binding affinity in predicting the immunogenicity of HLA-II-presented peptides with a binding affinity stronger than 500 nM as was assessed for peptide–HLA-I complexes. The performance was assessed in the following three datasets: all HLA-II-restricted peptides with an experimental affinity stronger than 500 nM, HLA-DRB1*01:01-restricted peptides with an experimental affinity stronger than 500 nM, and HLA-DRB1*01:01-restricted peptides with a predicted affinity stronger than 500 nM (Fig. 6). These areas under the three generated ROC curves were 0.713, 0.792 and 0.760, respectively, for the above-mentioned datasets. This result suggested that the immunogenicity of HLA-II-presented peptides could be well predicted using the binding affinity and even the predicted binding affinity.

Fig. 6
figure 6

Predicting the immunogenicity of peptide–HLA-II complexes based on the binding affinity. The performance was assessed using the following three datasets: all HLA-II-restricted peptides with an experimental affinity stronger than 500 nM (HLA-II); HLA-DRB1*01:01-restricted peptides with an experimental affinity stronger than 500 nM (HLA-DRB1*01:01_exp); and HLA-DRB1*01:01-restricted peptides with a predicted affinity stronger than 500 nM (HLA-DRB1*01:01_pred)

The ‘holes’ in the T cell repertoire

The above results indicated that binding affinity of pMHC could be a predictor of T cell immunogenicity, but its efficacy for HLA-I- and HLA-II-restricted complexes presented obvious differences. As shown in Table 1, a much better performance (AUC values) for distinguishing immunogenic from non-immunogenic peptides using the experimental or predicted binding affinity was observed in the HLA-II-restricted datasets. The percentages of immunogenic peptides in the HLA-II-restricted datasets with an experimental binding affinity stronger than 500 nM were also much higher than those in the HLA-I-restricted datasets. The relatively low percentage of immunogenic peptides in the HLA-II-restricted dataset with a predicted binding affinity stronger than 500 nM could be due to the relatively low performance of prediction tools for HLA-II-binding peptides. Resetting the threshold, we found that almost 80 % of HLA-II-restricted peptides with a predicted binding affinity stronger than 50 nM were immunogenic. Thus, we investigated how these large differences between the immunogenicity of HLA class I- and class II-restricted peptides were generated.

Table 1 Comparison of the effects of binding affinity on the immunogenicity of HLA class I and class II-restricted peptides

Recent studies have demonstrated that the degenerate T cell recognition of peptides on MHC class I molecule created large holes in the CD8+ T cell (CTL) repertoire [31, 32]. Due to the deletion of self-reactive T cells caused by thymus selection and peripheral tolerance, foreign peptides that are highly similar to a self peptide do not trigger a T cell response. Foreign peptides that are never used in immune responses are referred as the “holes in the repertoire.” The similarity of foreign peptides to self peptides was correlated to the length of the peptides. MHC class I molecules usually bind peptides of 8–10 residues in length in an extended conformation with the anchor residues (P2 and C-termini positions) buried in specificity pockets. The middle positions (P3–P8) of the binding peptide bulge out of the binding groove and play the main role in contacting the TCR [33]. A similarity calculation, based on the middle positions, indicated that almost 30 % of foreign peptides binding HLA class I molecules overlapped with the self peptides, which strikingly agreed with the experimental results. The peptides binding to MHC-II molecules were much longer than MHC-I-binding peptides, and the TCR interacts with the binding core (9-core) and the N-terminal residues of the peptide [33]. According to the result of Calis et al. [31], in which the overlap of self and foreign peptides was lower than 1 % when every position of the MHC class I-binding peptide was taken into account, we inferred that the overlaps of the foreign and self peptides that bound to MHC class II molecules were no more than 1 %.

The TCR repertoire of CD4+ and CD8+ T cells can then be described using the MHC-presented peptides (Fig. 7). Assuming that 2 % of all possible 9-mer peptides actually bind to a particular MHC molecule, the number of peptides that can be presented by a given MHC molecule was calculated to be 209 × 0.02 ≈ 1010 (MHC-II-restricted peptides were calculated based on the binding core: 9-core). The estimated number of self peptides that can be presented by this MHC molecule is approximately 105 (the upper limit) [34]. Although the number of self-reactive peptides was much less than the total number of MHC-presented peptides, the deletion of the self-reactive T cell repertoire made some foreign peptides that highly overlapped with self peptides unable to evoke an immune response due to the degenerate T cell recognition of pMHCs (the gray regions in the diagrams of Fig. 7). Because the self/non-self overlaps in MHC-I-restricted peptides were much higher than that of MHC-II-restricted peptides, the holes in the CD8+ T cell repertoire (3 × 109) were much larger than those in the CD4+ T cell repertoire (108). The differences between the immunogenicity of MHC-I- and MHC-II-presented peptides can then be easily understood.

Fig. 7
figure 7

Diagram describing the TCR repertoire of the CD8+ and CD4+ T cells using the HLA-presented peptides. a CD8+ T cell (or CTL); b CD4+ T cell (or Th). Black self peptides; White foreign peptides; Gray foreign peptides with high similarity to self peptides

Discussion and conclusions

Immunogenicity is the ability of a MHC-presented peptide to induce an immune response and is an important factor that influences whether the peptide can be a T cell epitope. Understanding the variables affecting the immunogenicity of MHC-presented peptides will be useful for studying the cellular immune response and improving the prediction of T cell epitopes. The immunogenicity of a MHC-presented peptide is determined by the interaction between the TCR and the pMHC. The binding ability of a peptide to a MHC molecule, which can be quantitatively measured by the binding affinity or stability, is key to the interaction between the TCR and pMHC. Our results showed that the immunogenicity of HLA-I-presented peptides was still predictable using the experimental binding affinity, although approximately one-third of the peptides with a binding affinity stronger than 500 nM were non-immunogenic; in contrast, the immunogenicity of HLA-II-presented peptides was well predicted using the experimental affinity and even the predicted affinity. The positive correlation between the binding affinity and stability was only observed in peptide-HLA-I complexes with a binding affinity stronger than 500 nM, which suggested that the stability alone could not be used for the prediction of immunogenicity.

Our results contradicted the results of the recent study performed by Harndahl et al. showing that the stability is a better predictor than the peptide affinity for CTL immunogenicity [20]. In their study, the measured stability was significantly correlated to the affinity of the selected HLA-A*02:01-restricted peptides where all of the peptides shared a relatively high binding affinity (IC50 < 1000 nM) and more than 90 % were stronger than 500 nM. However, the affinities of the collected peptides in our study ranged from 10−1 to 106 nM. The significant correlation between the stability and affinity values was not observed in the HLA-A*02:01-restricted peptides with a binding affinity weaker than 500 nM (IC50 ≥ 500 nM). Combining with the result that significant difference between the immunogenic and non-immunogenic peptides was only observed in pMHCs with a binding affinity stronger than 500 nM, we proposed that the stability of a pMHC could not be used solely as a predictor to distinguish immunogenic pMHCs from non-immunogenic pMHCs. Our results from the HLA-II-presented peptides, which showed that only 15 of the 94 collected HLA-DRB1*01:01-restricted peptides with a high binding stability of T 1/2 ≥ 1 h were immunogenic, indirectly proves our argument. A high binding stability favored the immunogenicity for peptides with high binding affinities, which was consistent with the results presented by Harndahl [20]. In our collected datasets, more than 70 % of HLA-A*0201-restricted peptides with a binding affinity stronger than 500 nM and a high stability (T 1/2 ≥ 1 h) were found to be immunogenic. This percentage was higher than the values based on the two factors alone at the corresponding thresholds. The improved accuracy for T cell epitope discovery was also achieved by integrating the predictions for pMHC binding affinity and stability [35].

The immunogenicity of MHC-presented peptides could be predicted using the experimental binding affinity of pMHCs. For peptide–MHC-II, approximately 90 % of MHC-II-restricted peptides with a binding affinity stronger than 500 nM were immunogenic. The predicted affinity was also useful in distinguishing the immunogenic peptides from the non-immunogenic peptides. For HLA-DRB1*01:01-restricted peptides with a predicted affinity stronger than 500 nM, the area under the ROC curve generated using the predicted affinity values was 0.760, which almost corresponds to a good performance [30]. The immunogenicity model developed by Calis et al. [21] was built by analyzing the residue properties that enhanced the immunogenicity of MHC class I presented peptides with a predicted affinity stronger than 500 nM. The cross-validation and blind tests of the prediction model showed that the immunogenicity is predictable with an average AUC of 0.65 [21]. In this study, our result suggested that the immunogenicity of MHC-I-restricted peptides was also predictable using the experimental binding affinity (AUC = 0.650 for 9-mer peptides binding to HLA-A*02:01 with an affinity stronger than 500 nM). An evaluation of Assarsson’s 9-mer peptides, each of which showed a binding affinity to HLA-A*02:01 stronger than 100 nM, confirmed our results.

Since the immunogenicity of MHC-binding peptides with an affinity stronger than 500 nM could be predicted using their experimental binding affinity, there should be an optimal affinity threshold that determines the capacity of a MHC-binding peptide to elicit a T cell response. For MHC class I-restricted peptides, the threshold was approximately 50 nM, which agreed with Sette’s early result [10]. In the HLA-A*0201-restricted epitopes of vaccinia virus identified by Assarsson et al. [18], each dominant epitope shared an experimental binding affinity stronger than 50 nM and a predicted affinity stronger than 50 nM was also observed in 12 of the 15 dominant epitopes. For MHC class II-restricted peptides, an experimental affinity stronger than 100 nM or a predicted affinity stronger than 50 nM was the best choice in which more than 80 % peptides were immunogenic. This result is important for the identification of T cell epitopes. Although peptides with high binding affinity could be more likely to evoke T cell responses, it should be noted that these epitopes might also lead to the development of immune escape variants. It has been suggested that mutations within CTL epitopes may be exploited by viruses, especially viruses with high mutagenicity, such as HIV and influenza A virus, to evade protective immune responses [3638]. Therefore, conserved epitopes are likely the best choice for the development of vaccines against these viruses [39].

The degenerate T cell recognition of the pMHCs creates large holes in the T cell repertoire that accounted for the T cell tolerance to a large fraction of the presented foreign peptides [31]. A similarity calculation indicated that the overlaps of the foreign and self peptides presented by MHC-I and MHC-II molecules were 30 and 1 %, respectively. After self-reactive T cells were removed during the negative selection, the ‘holes’ in CD4+ T cell repertoire were much smaller than those in CD8+ T cell repertoire. A recent study on the TCRβ repertoire of CD4+ and CD8+ T cells indicated that the estimated species richness of TCRβ is approximately five times greater in CD4+ than in CD8+ T cells with the same number of cells [40]. This result might imply that CD4+ TCRs recognize greater numbers of antigenic epitopes than do CD8+ T cells, which is consistent with our findings that the number of foreign peptides recognized by CD4+ TCRs was greater than those for CD8+ TCRs (the white regions in the diagrams of Fig. 7). The diagram describing the TCR repertoire using MHC-presented peptides should facilitate the understanding of the cellular immune response.