Binding of peptides to MHC II molecules play a major role in governing adaptive immune responses. They allow peptides derived from pathogens in the extracellular compartment to be presented by professional antigen presenting cells (APCs) to T helper cells of the immune system.

These T cells might in turn activate the presenting cell to kill intracellular bacterial infections. Help is also for most antigens needed to activate B cells to produce antibodies that may neutralize the pathogen. Over the last decade a number of different methods for prediction of binding to MHC II molecules have been developed, the most known being the TEPITOPE method [1]. Prediction of binding of peptides to MHC II is complicated by the immense polymorphism of the MHC class II alleles since the many different encoded MHC class II molecules (more than 690 different known HLA-DR alleles are known) bind very different sets of peptides. The TEPITOPE method covers 50 of these HLA-DR alleles. During the last decays several data driven so-called allele-specific methods have been developed for alleles where sufficient numbers of binding peptides are known. These methods cover a very broad range of different bioinformatics training algorithms including Gibbs sampler [2, 3], artificial neural networks [4, 5], support vector machines [68], hidden Markov models [9], as well as other (often exotic) motif search algorithms [1018]. For a detailed review please refer to Nielsen et al. [19].

These methods can interpolate between peptide binding data and create predictions for peptides not present in the training set. Recently, pan-specific methods that in principle can make predictions for all alleles with known amino acid sequence have been developed [2026]. These methods work by including information about the amino acid sequence of the MHC molecule as input to the method allowing the methods to integrate information across multiple alleles simultaneously thus boosting the predictive performance and potentially extrapolate the predictions to previously un-characterized MHC molecules. Several benchmark calculations have demonstrated the power of such pan-specific methods [27] and have shown how accurate predictions can be obtained also for alleles for which no or very limited binding data have been identified [21, 28].

One of the best performing pan-specific MHC class II prediction method is the NetMHCIIpan method [29]. An important limiting factor for this method lies in the need for a pre-alignment of the input training data identifying the peptide-binding core prior to the training of the method. Such pre-alignments require sufficient data being available for all MHC molecules included in the training data in order to derive accurate allele-specific predictions. It has earlier been shown that this number of peptide binding data for MHC class II is of the order of many hundred [3, 19], which makes it very costly to develop accurate MHC class II predictions. In order to circumvent this, we here propose a less demanding, yet highly efficient method to generate MHC class II predictors. This method is a pan-specific version of the earlier published allele-specific NN-align algorithm [5] and does not require any pre-alignment of the input data. The method hence has the potential to benefit also from information from alleles covered by limited binding data. Here, we demonstrate its predictive power in a series of large-scale benchmark calculations.

Materials and methods


Quantitative peptide binding data covering 24 HLA-DR molecules were obtained from the IEDB database and combined with data from an in-house database containing MHC class II peptide binding affinity data obtained from a high-throughput peptide-binding screening assay described earlier [30]. The peptide coverage in the data set varied from a maximal coverage of 7685 peptide binding measurements for the DRB1*0101 allele to a minimal coverage of only 30 peptide binding measurements for the DRB1*1404 allele (see table 1). The peptide data were split into 5 groups used for cross validation using the approach described by Nielsen et al. [3] minimizing the sequence overlap between the training and test data. Each data set and the corresponding partitions are available online at

Table 1 Quantitative HLA-DR peptide binding data

A large set of MHC class II ligands from the SYFPEITHI database [15] (November 2009) was used as external evaluation set. Only ligands with at least four digit HLA-DR resolution were used. All ligands included in the training data were excluded from the evaluation set. The SYFPEITHI evaluation data sets consist of 1164 MHC class II ligands, restricted to a total of 28 HLA-DR alleles (see table 2).

Table 2 MHC class II ligands from the SYFPEITHI database

A second evaluation set consisted of HLA-DR class II restricted T cell epitopes downloaded from the IEDB database June 28th, 2010 [31]. Also here, only epitopes with four digit HLA-DR resolution were used. As above, all epitopes included in the quantitative training data were excluded. Further, epitopes shorter than 9 or longer than 24 amino acids were excluded, since shorter peptides do not fit the 9 amino acid core of the HLA-DR binding motif, and longer peptide most likely are not experimentally characterized as minimal epitopes. This leaves us with a set of 1325 epitopes covering 42 HLA-DR alleles (see table 3).

Table 3 HLA-DR restriction T cell epitope from the IEDB database


The pan-specific NetMHCIIpan-2.0 method is a hybrid of the earlier published methods for pan-specific for MHC class I and class II binding, NetMHCpan [20, 21], and NetMHCIIpan [29], and the NN-align method recently published for allele-specific MHC class II binding predictions [5]. The overall method architecture is similar to the NN-align method, and the manner in which the MHC polymorphism method is incorporated is similar to that of the NetMHCpan and NetMHCIIpan methods.

The method was implemented as a conventional feed-forward artificial neural network. Like the NN-align method, the method consists of a two-step procedure that simultaneously estimates the optimal peptide binding register (core) and network weight configuration. Initially, all network weights were assigned random values. Given this set of network weights, the core of a given peptide was identified as the highest scoring of all nonamers contained within the peptide. The score of a nonamer peptide was calculated using the conventional feed-forward algorithm. The network weights were updated using gradient descent back-propagation. Given a peptide core alignment, the weights were updated to lower the sum of squared errors between the predicted binding score and the measured binding affinity target value. A peptide core was presented to the network as described for the NN-align method including encoding of peptide flanking regions (PFR), PFR length and the peptide length. The MHC environment defining the peptide binding strength was implemented in terms of the MHC pseudo sequence constructed from 21 polymorphic amino acid positions in potential contact with the bound peptide as described by Nielsen et al. [29]. Two types of sequence encodings (sparse and blosum) were applied for the peptide-core and MHC pseudo sequences as described by Nielsen et al. [32]. For each peptide core, the input to the neural network thus consisted of the peptide core and MHC environment residues ((9+21) × 20 = 600 inputs), the PFRs (2 × 20 = 40 inputs), the peptide length (2 inputs), the length of the C and N terminal PFR's (2 × 2 = 4 inputs) resulting in a total of 646 input values. The peptide binding affinity IC50 values were encoded to the neural network as log-transformed values, using the relation 1-log (aff)/log(15,000), where aff is the measured binding affinity (IC50) in nM units [32]. Note, that we here use 15,000 as the base for the logarithmic transformation. This is in contrast to the 50,000 used in previous works by our group, and is chosen due to a lower sensitivity for weak binding peptides of the high-throughput peptide-binding screening assay described by Justesen et al. [30].

The networks were trained using 5-fold cross-validation. Network ensembles were trained with 40 hidden neurons. The procedure of i) identifying the optimal peptide core, and ii) updating the network weights to lower the predictive error was repeated for 500 cycles. Since the "search landscape" has a large set of local minima each with close to identical performance values, the network training was repeated 10 times, each with different initial configuration values. This led to significantly improved prediction accuracy (data not shown). In total 20 (2 encoding schemes*10 seeds) networks were created for each training/test set configuration. The binding core of a given peptide was assigned by a majority vote of the networks in the ensemble.

Leave-one-out (LOO) network training and benchmark

Leave-one-(allele)-out experiments were conducted to investigate the predictive performance of the method in situations where binding data for a given allele was excluded from the training. Two types of LOO experiments were conducted. In the first type, peptide-binding data for a given allele were excluded from the training of the prediction method, and upon training, the predictive performance was evaluated using the peptide binding affinity values for the HLA-DR molecule in question. This is the LOO approach applied in the NetMHCIIpan-1.0 method [29] and allows for a direct comparison of this method to the method proposed here when trained on similar data sets.

Since many of the peptides in the training data have been measured for binding affinities to multiple alleles, the above LOO experiment can lead to a significant overestimation of the performance for a given prediction method. To reduce this bias, a second type of LOO experiment was conducted where all data representing a given peptide was excluded from the training of the prediction method. That is, if a given peptide was measured against multiple alleles including the allele in question, all these measurements were excluded from the LOO training. To avoid reducing the size of the training data too much, this second type of LOO training was performed as a three-fold cross-validation for alleles characterized by more than 200 data points.

Performance measures

The predictive performance was measured in terms of the area under the ROC curve (AUC) value and Pearson's correlation coefficient (PCC). The receiver operating characteristic (ROC) curve is a graphical plot of the sensitivity versus the false positive rate (1 - specificity) as the discrimination threshold is varied. Through out this work, a binding threshold value of 500 nM was used to classify the peptides. The area under the ROC curve (AUC) gives an indication of the accuracy of a prediction method. An AUC value of 1 corresponds to perfect predictions and a value of 0.5 reflects random predictions. Likewise, PCC is a measure of the accuracy of a prediction method. It is obtained by dividing the covariance of the two variables by the product of their standard deviations. For perfect predictions PCC is 1 (or -1), and for random predictions PCC is 0.

Nearest neighbor distance calculation

The distance between two MHC alleles was estimated as described by Nielsen et al. [29] using the relation , where s(A, B) is the BLOSUM50 similarity score between the pseudo sequences of allele A and B, respectively. Next, the nearest neighbor distance for an allele is defined as the minimal distance to any allele included in the training data set.


Pan-specific versus allele-specific predictions

In contrast to allele-specific MHC class II prediction methods, the pan-specific method outlined here is proposed to benefit from information even from alleles covered by limited binding data. To demonstrate this, we in table 4 show the performance values obtained by the new NetMHCIIpan-2.0 and older NN-align method using 5 fold cross-validation. The NN-align method was trained in an allele-specific manner as described in by Nielsen et al. [5]. As a reference, the performance of the TEPITOPE method is also included in the benchmark study. The predictive performance for each HLA allele was measured in terms of the area under the ROC curve (AUC) value and Pearson's correlation coefficient (PCC).

Table 4 Five-fold cross-validation performance of the pan-specific NetMHCIIpan-2.0 method compared to the allele-specific NN-align and TEPITOPE methods on the quantitative benchmark data set

From the results in table 4, it is apparent that the NetMHCIIpan-2.0 method significantly outperforms both the NN-align and TEPITOPE methods (p < 0.01, binomial test). For 9 of the 9 alleles covered by less than 400 peptide-binding measurements, we find that NetMHCIIpan-2.0 outperforms NN-align. These results strongly indicate that NetMHCIIpan-2.0 is capable of benefiting from information from the multiple alleles included in the benchmark to boost the predictive performance and deliver accurate predictions also for alleles covered by limited binding data. Only for 3 out of the 24 alleles does the NN-align perform better than NetMHCIIpan-2.0. These alleles are all covered by more than 1500 peptide-binding measurements. This hence confirms the results obtained earlier for MHC class I binding predictions namely that pan-specific predictions are particularly beneficial when binding data are scarce or absent [28]. What is also clear from the data in table 4 is that the NetMHCIIpan-2.0 method is capable of maintaining its high performance also for alleles not characterized by the TEPITIOPE method.

NetMHCpanII-1.0 versus NetMHCpanII-2.0

In the pan-specific training algorithm implemented in the NetMHCIIpan-2.0 method, alignment and binding affinity prediction is performed simultaneously. To further demonstrate that this approach does indeed outperform the NetMHCIIpan-1.0 method where the two steps were decoupled, we performed a series of LOO experiments as described in Materials and methods. In these experiments, peptide-binding data for a given allele were excluded from the training of the prediction method, and upon training, the predictive performance was evaluated using the peptide binding affinity values for the HLA-DR molecule in question. This experiment thus simulated prediction of binding to hitherto un-characterized HLA-DR molecules. The first LOO experiment was conducted based on the binding data using in the original NetMHCIIpan paper covering 14 HLA-DR alleles [29]. Details of this analysis are shown in table 5. The average PCC and AUC values for the 14 LOO experiments were 0.541, 0.768 and 0.606, 0.799 for the NetMHCIIpan-1.0 and NetMHCIIpan-2.0 methods, respectively. This difference is statistically highly significant (p < 0.01, binomial test). Only for one allele (DRB1*1302) did the NetMHCIIpan-1.0 method achieve a higher performance than NetMHCIIpan-2.0. These results thus demonstrate, that the training algorithm implemented in the NetMHCIIpan-2.0 method leads to significantly improved prediction accuracy compared to the algorithm employed in the NetMHCIIpan-1.0 method.

Table 5 LOO benchmark comparison of the pan-specific NetMHCIIpan-2.0 and the NetMHCIIpan-1.0 methods

Next, we investigated to what extent expanding the peptide data set with broader allelic coverage and more binding data would lead to an improved predictive performance. To do this, we conducted a second series of LOO experiments comparing the LOO predictive performance of the NetMHCIIpan-2.0 method when trained on the old data set covering 14 HLA-DR allele (termed OLD) to its performance when trained on the extended data set covering 24 HLA-DR alleles (termed NEW). The peptide overlap in both datasets is high, and many peptides have been measured for binding affinities against multiple alleles. This peptide overlap can impose a strongly bias in the benchmark evaluation [20], and to lower this bias all peptides used to characterize a given allele were excluded from the training of the prediction method in the second LOO benchmark (for details see Materials and methods). Both the OLD and NEW methods were evaluated using the peptide binding data in the new peptide data set. The results of the extended LOO calculation are shown in table 6.

Table 6 The extended LOO benchmark

The results presented in table 6 clearly demonstrate that the enrichment of novel peptide binding data with a broader allelic coverage leads to an improved predictive performance of the pan-specific prediction method. We can quantify to what degree an allele will benefit from other alleles being present in the training data by calculating its distance to the nearest neighbor in the training data (see Materials and methods). Earlier work has demonstrated that this distance measure correlates strongly with the performance of pan-specific prediction methods [20, 21]. For 10 of the 11 alleles, where including the novel 10 alleles has decreased the nearest neighbor distance, the NEW method has a higher AUC predictive performance compared to the OLD. For the remaining 13 alleles, the 10 novel alleles have not altered the nearest neighbor distance and the performance of the two methods is similar. This strongly underlines the essential prerequisite for accurate pan-specific prediction methods demonstrated earlier for MHC class I, namely a population of the close neighborhood of un-characterized MHC molecules [20, 28].

Lin benchmark

The Lin benchmark consists of binding affinities of 103 overlapping peptides to seven common HLA-DR molecules (DRB1*0101, 0301, 0401, 0701, 1101, 1301, and 1501). The results of this benchmark are shown in table 7.

Table 7 Predictive performance in terms of the AUC on the Lin benchmark data set

The results in table 7 clearly show that NetMHCIIpan-2.0 outperforms both the earlier NetMHCIIpan-1.0 method, as well as the other methods included in the benchmark.

Identifying MHC class II ligands and T cell epitopes

The performance of the NetMHCIIpan-2.0 method on the large set of SYFPEITHI ligand and IEDB T cell epitope data was next investigated. The benchmark was performed as described by Nielsen et al. [29]. The ligand source protein was split into overlapping peptides of the length of the ligand/epitope. All peptides except the annotated HLA ligand/epitope were taken as negatives. This is a very stringent assumption since suboptimal peptides sharing the ligand binding-core are counted as negatives even though they could be presented on the HLA molecule. Thus, this setup is likely to underestimate the predictive performance, but the effect should be equal for all methods compared in the benchmark. For each protein-HLA ligand/epitope pair, the predictive performance was estimated as the AUC value. Table 8 gives a summary of the performance of this benchmark calculation (details can be found in Additional file 1: Table S1).

Table 8 Endogenous HLA-DR ligand benchmark

Both the SYFPEITI ligand and IEDB epitope benchmarks show that the NetMHCIIpan-2.0 method performs better than the original NetMHCIIpan-1.0 method. On the per ligand basis, the NetMHCIIpan-2.0 method significantly outperforms NetMHCIIpan-1.0 for both data sets (p < 0.01, binomial test excluding ties). In terms of the per-allele performance, the NetMHCIIpan-2.0 also achieved a higher performance than the NetMHCIIpan-1.0 method. This difference is however not statistically significant (p > 0.1 binomial test, in both cases). For the alleles characterized by the TEPITOPE method, the TEPITOPE method achieves the highest performance of the three methods for both data sets. This difference, however, is not statistically significant (p > 0.5 in all cases, binomial test). For alleles not characterized by TEPITOPE, the NetMHCIIpan-2.0 method significantly outperform NetMHCIIpan-1.0 for the IEDB data set (p < 0.01, binomial test), whereas the two methods for this set of alleles achieve a similar predictive performance when evaluated on the SYFPEITHI dataset.

We next investigated how the predictive performance of the NetMHCIIpan-2.0 method depended on the length of the ligand/epitope under investigation. Figure 1 shows a histograms of the average AUC values for the NetMHCIIpan-2.0 (named 2.0) and NetMHCIIpan-1.0 (named 1.0) methods as a function of the ligand/epitope length for the SYFPEITHI and IEDB data sets, respectively.

Figure 1
figure 1_47

Histogram of the predictive performance measured in terms of the AUC value for the ligands/epitopes in the SYFPEITHI/IEDB dataset as a function of the peptide length. 2.0 refers to the pan-specific method developed here, and 1.0 refers to the NetMHCIIpan-1.0 method. SYF refers to the SYFPEITHI ligand data set, and IEDB refers to the IEDB T cell epitope data set.

Figure 1 clearly demonstrates that the NetMHCIIpan-2.0 method, for the majority of peptide lengths, outperforms the NetMHCIIpan-1.0 method. Only for very short peptides (length equal to 9 for the SYFPEITHI data set and length equal to 10 for the IEDB data set) does the NetMHCIIpan-1.0 achieve the highest AUC value. What is also clear for the IEDB data set is that both methods achieve their highest predictive performance for peptides of length less than 15 amino acids. The average AUC for epitopes with a length less then 15 amino acids is 0.823. This values is significantly higher than the average AUC for epitopes with a length greater than 15 (0.704, p < 0.005, t-test). This difference is not observed for the SYFPEITHI ligand data set, hence strongly suggesting that the longer epitopes in the IEDB data set are not "true" epitopes in the sense of defining the minimal HLA restriction element.


Development of accurate prediction algorithms for MHC class II binding is complicated by the fact that the MHC class II molecule has an open binding cleft, and that peptide binders are accommodated in the binding cleft in a binding register that a priori is unknown. Training of methods for prediction of peptide-MHC class II binding hence rely on either a two step procedure where first the binding register is identified and next the aligned peptides are used to train the binding prediction algorithm or a procedure where these two steps are integrated and performed simultaneously.

We have earlier shown that developing allele-specific prediction methods for MHC class II binding using the latter approach leads to higher prediction accuracy [3, 5]. We have further for MHC class I demonstrated that training the predictors in a pan-specific manner, incorporating all binding data across multiple MHC molecules simultaneously in the training, leads to a significant boost in the predictive performance in particular for MHC molecules characterized by few or no binding data [2022, 28].

Based on these findings, we have in this paper developed a pan-specific method for prediction of MHC class II binding affinities. The method was trained on binding data covering multiple MHC class II simultaneously, and does not require any prior alignment or binding register-identification. The method was evaluated in several large-scale benchmarks and shown consistently to outperform all other methods investigated, including state-of the-art allele-specific (NN-align [5]) and pan-specific (NetMHCIIpan [29]) methods, as well as and the well-known TEPITOPE method [1]. In particular, it was demonstrated that the proposed method due to its pan-specific nature could boost performance for alleles characterized by limited binding data, and in such cases significantly out-perform allele specific methods. The method thus demonstrates great potential for efficient boosting of the accuracy of MHC class II binding prediction, as accurate predictions can be achieved for novel alleles at a highly reduced experimental cost, and pan-specific binding predictions can be obtained for all alleles with known protein sequence by a method trained using data with limited allelic coverage.

When benchmarked on large data sets of know HLA-DR ligands and epitopes, the method was shown to have a predictive performance comparable to that of TEPITOPE for alleles covered by this method, and maybe more important maintain this high performance also for alleles not described by the TEPITOPE method.

For MHC class I, we have earlier demonstrated that a pan-specific predictor can benefit from being trained on cross-loci (and cross-species) peptide binding data [20]. The development of a cross-loci model for HLA class II is complicated by the fact that the HLA-DRA molecule is close to monomorphic (only two allelic version exists). This is in contrast to HLA-DP and HLA-DQ where both the α and β chains are highly polymorphic. Moreover, the structures of the HLA molecules are less conserved across the three loci for class II compared to class I, and finally very limited peptide binding data have been generated characterizing the different HLA-DP and DQ molecules. As of September 2010, only five HLA-DP and six HLA-DQ alleles have been characterized in the IEDB database with more than 200 peptide-binding measurements [31]. Nonetheless, large amounts of peptide binding data for the HLA-DP and HLA-DQ loci will most likely become available in the near future providing a broader allelic coverage, and future evaluations will demonstrate if also MHC class II binding prediction algorithms using training algorithms like the one outlined in this work, will benefit from pan-specific training across the different loci.

The method and benchmark data sets described in this work are available at (method) and (benchmark data).