Background

Computational algorithms are increasingly utilised in biological modelling and offer the potential to reduce the time and expense of immunological assays. Computational algorithms were initially demonstrated as useful tools for predicting potential epitopes that might elicit quality T-cell responses [1, 2]. Computational algorithms that predict potential HLA binding T-cell epitopes can facilitate the design of vaccines capable of inducing T-cell immunity against HIV-1. The high variability of HIV-1 and the extensive genetic polymorphism of HLA molecules can be managed in silico, allowing immunogen optimisation to increase breadth and magnitude of T cell responses in respect of HLA allele frequencies and circulating virus strains in different populations. Bioinformatics approaches were previously applied as proof of concept for an HIV-1 peptide-based vaccine for the env and gag genes [3] in cynomolgus macaques for a broad spectrum of HIV-1 clades. Computational optimisation of immunogens facilitates the development of the multivalent and mosaic vaccines [4] necessary to control recombinant HIV-1 strains, an increasingly common occurrence in the epidemic in Uganda [5]. Computational approaches aim to identify optimal epitopes relevant to vaccine development and are not isolated to HIV-1 only, but a wide range of pathogens, including Ebola virus [6], therefore various statistical validation approaches have been applied for evaluation of these methods [7,8,9,10].

For HIV-1 vaccine design purposes an important consideration for the suitability of a computational algorithm is the breadth of discrete number of T-cell epitopes it generates that could reach particular levels of coverage [11] of circulating viruses. The higher the number of epitope variants the more the reduction in their requirements to attain optimum coverage levels for any epidemic. Previous data has shown that breadth of T-cell response is associated to viral set point in chronic HIV-1 infection [12,13,14,15,16,17]. In order to translate the computational epitope prediction into vaccine design, the number of discrete epitopes computationally generated from particular HIV-1 proteins is an important metric for further investigation [11].

A reliable pan-HLA-specific algorithm NetMHCpan4.0 [18,19,20] that has been improved by advances in HLA binding data, covers 172 MHC class I molecules from human (HLA-A, B, C, E), mouse (H-2), cattle (BoLA), primates (Patr, Mamu, Gogo) and swine (SLA) [20, 21], and can also predict binding to alleles devoid of experimental data basing on similarity to known binders and non-binders [22, 23]. This is an artificial neural network (ANN) algorithm for predictions of 8-14aa and capable of predicting epitopes for other HLA alleles using data for similar alleles by positional similarity of residues in their binding motifs. NetMHCpan4.0 is considered to be the tool of choice for such predictions considering the benchmarking done against other related tools [24]. Nevertheless to have a conclusive outcome of the computational performance we compared NetMHCpan4.0 to both an older and recent tool, NetCTL1.2 [25,26,27] and MHCflurry1.2.0 [28] respectively. The binding of CTL epitopes to MHC class I molecules is linear, anchoring at residues 2 and 9; hence the interface between ligand and CTL can be determined computationally [29]. Validation of such computational applications can be done by comparing their predictions with suitable experimental data. Despite the paucity of data validating the performance of computational methods relative to wet laboratory experiments, a few have documented them to achieve an area under the curve AUC of over 90% [18, 19, 30, 31] by isolated experimental data. We have not come across a wet experiment that evaluated computational predictors to achieve a robust AUC using a single set of wet laboratory experimental data. The previously reported 90% AUC is largely based on positional specific scoring algorithms (PSSM) for the collective isolated experiments alongside probability models used to establish affinity or binding scores. One study that explored the reliability of in-silico approaches in epitope prediction and its application for vaccine design reported a meagre 22, 44%, and relatively higher 78% match for three computational tools namely YFPEITHI, CTLPRED and IEDB respectively [32]. Using experimental epitope mapping data generated from 757 peptides tested on cells of 6 early HIV-1 infected individuals at paired time points, we show that NetMHCpan4.0 can be useful for markedly reducing pooled peptide experiments as demonstrated by the 95% experimental and computational concordance.

Methods

Experimental binder data

The data used was from an independent study that did not include this analysis in its objectives. Experimental data of peptides previously mapped for HIV-1 epitope recognition of 6 individuals for a separate study (Table 1) at 2 time points each was used for comparison with the computationally predicted binders. These were from a Ugandan early HIV-1 serodiscordant couple cohort approved by the Uganda Virus Research Institute (UVRI), Research and Ethics review board and the Uganda National Council of Science and Technology (UNCST). All participants provided informed consent. Six (6) participants whose experimental epitope recognition profile we evaluated were early HIV-1 infections (Table 1), enrolled under the following criteria: (i) detection of HIV-1 P24 antigen with a simultaneous negative HIV-1 antibody ELISA (2 participants) or documented HIV-1 sero-negative test in the previous 12 months (4 participants); (ii) HAART naïve (all). Early infection was determined following the Fiebig Staging criteria [33] as described elsewhere by Obuku A.E. et.al [34]..

Table 1 Participant characteristics, HIV-1 infecting clade, Fiebig stage and HLA class I haplotypes

The experimentally tested peptides totalled 757 (Fig. 1), were 17aa long, overlapping by 11aa and spanning the HIV-1 proteome consensus for subtypes A1 and D. Cultured ELISPOT assays using 200,000 cells/well as previously documented by Obuku AE. et.al [34]. and ex-vivo IFN-γ ELISPOT assay using 100,000 cells/well were used for testing peptide pools and epitope mapping respectively. Experimental positive pools were 3 times the background wells and at least 600 spot forming units per million cells. “Deconvolute This” software [35] was used to identify possible responding individual peptides from the pools or where it was not possible all the peptides in a pool were tested as single peptides.

Fig. 1
figure 1

ELISPOT peptide consort; the experimental peptide mapping data was generated by culture ELISPOT of multiple peptide pools tested in duplicate wells per time point, followed by ex-vivo ELISPOT of potential candidate epitopes. To experimentally map a single time point required at least 541 assay wells

HLA typing

High resolution reference strand conformation analysis HLA class I tissue typing for the early infected subjects was done using methods described elsewhere [36].

HIV-1 subtyping

HIV-1 subtyping determination was performed on the gag gene [37, 38] using Sanger method generated sequences. The sequences were input into the REGA HIV-1 automated subtyping tool to determine the HIV-1 clade [39, 40].

Computation epitope prediction

HIV-1 subtypes A1 and D consensus sequences were used as inputs for the computational epitope prediction. These peptide sequences were all for the year 2004 downloaded from the Los Alamos database (hiv.lanl.gov/content/sequence/NEWALIGN/align.html). The web version of NetMHCpan4.0 [19] (http://www.cbs.dtu.dk/services/NetMHCpan/) was configured to predict 9mer through 14mer epitopes for 22 HLA class I alleles (Table 1) that were expressed by the 6 HIV-1 infected donors. Linux version MHCflurry1.2.0 [28] was used to predict 9mer epitopes and an earlier tool NetCTL1.2 was also used to predict 9mer epitopes for the 22 HLA class I alleles expressed by the 6 study individuals. Perl version 5.26.2 was used to extract the binders from all the NetMHCpan4.0 predictions and also to compare the computational binders to the 93 mapped experimental 17aa peptides for 9mer through 14mer hits using a sliding window. An experimental peptide was considered a hit if any of the computational 9mer through 14mer sequence was contained in the 17 amino acid experimental peptide sequence as well as any of the HLA-A, B or C expressed by the individual matched the NetMHCpan4.0 HLA class I type(s). If multiple computational epitope predictions were contained in a single 17mer experimental peptide they were counted as a single hit. These were determined by a BLAST search of the computational binders against the derivative experimental peptides to determine computational predictions from the same test peptide. The accession numbers of the sequences used to determine the HIV-1 subtypes for 5 of the 6 study subjects are; KT825896, KT825897, KT825898, KT825899, KT825900, KT825901, KT825902, KT825903, KT825904, KT825905, KT825906, KT825907, KT825908, KT825909, KT825910, KT825911 and KT82512.

Data analysis

Statistics computations and plots were generated using SPSS version 24.0.0.0. The NetMHCpan4.0 computational performance was evaluated using a confusion matrix to classify true positives, true negatives, false positives and false negatives that were used for the Receiver Operator Characteristic (ROC) plot. The hit rate (sensitivity) and false hit rate (specificity) of binder predictions as determined by the NetMHCpan4.0 threshold of peptides within the top 2% (with a score of 2 or less) were calculated and the strength of the model was determined by calculating the area under the curve, AUC of the ROC plot [41,42,43]. Pearson’s correlation coefficient was used to evaluate the relationship between the number of epitopes with various HIV-1 genes. To evaluate if there were any differences in the early versus late time point peptides for the binding ranking of the experimentally mapped peptides as predicted by the computational score the Wilcoxon signed rank test was used. To evaluate if HIV-1 subtypes A1 and D affected the number of computational predictions generated, Fisher Exact Test was used. To determine whether multiple computationally predicted epitope sequences were derived from the same experimental peptide sequence, a local blast database was set up using Geneious version 9.0.5. Both HIV-1 clades A1 and D experimental consensus sequences were used separately each as a reference sequence for the blast. The computational peptide sequences were then aligned against the consensuses to evaluate those derived from a single 17 amino acid experimental peptide sequence. Where an experimental peptide was predicted by multiple or overlapping computational peptides, the average NetMHCpan4.0 score was assigned as the computational score for this peptide. This score was also used during the generation of the ROC curve and the confusion matrix. To compare the association between ELISPOT spot forming units and NetMHCpan4.0 scores or MHCflurry1.2.0 affinities and also the association between the values for the 2 computational tools, Pearson’s correlation coefficient was used.

Results

Number of experimental assays compared to computationally guided prediction assay projections

To experimentally determine epitopes for 757 peptides spanning the whole HIV-1 proteome for clades A and D as well as both time points of the 6 individuals required a total of 4230 test assay wells. For each test subject these included 9 antigen proliferation wells, 384 culture ELISPOT wells and an average of 164 epitope mapping ELISPOT wells (Range; 148–186 test wells). Using the 22 HLA alleles represented in the study subjects we were able to computationally predict 95% of the experimentally mapped epitopes. This approach could have reduced the test assays by eliminating all the T-cell antigen proliferation and culture ELISPOT steps totalling to 3258 assay wells (77%) and leaving only 972 (23%) epitope mapping assays required. Applying a pooling strategy to the computational predictions similar to that used in the experimental pooling where each pool contained approximately 20 peptides with a coverage of 3 per peptide pool, the 923 potential peptides (95% of experimental peptides for epitope mapping ELISPOT derived from the 972 (23%) eligible epitope mapping peptides) would make at most 46 pools. Consequently the computational prediction approach could have reduced the experimental assays by at least 80%.

Magnitude of epitope predictions are variable across HLA alleles, HIV-1 proteins and clades

The input HIV-1 subtypes A1 and D consensus whole proteome sequences evaluated for potential 9, 10, 11, 12, 13 and 14-mer binders to the 22 HLA alleles represented in the six patients, varied in the distribution of predicted binders across HIV-1 genes and HLA alleles. All the peptide hits predicted for 10 through 14-mer were also all predicted in the 9-mer set except for two 14-mer peptides. An expected positive correlation for HIV-1 protein length with number of epitopes predicted was observed as illustrated by Spearman’s rank order correlation; rs = 0.88 (Fig. 2, a and b). NetMHCpan4.0 predicted 95% (88/93) (Table 2) of the experimentally mapped peptides as binders and missed 5% (5 out of 93) (Table 3) for the 12-time points of the 6 participants. MHCflurry predicted 91% (85/93) of the experimental peptides and had a lot of similarity to NetMHCpan4.0 for the predicted HLA. NetCTL was the least performing tool with only 15% (14/93) predicted experimental peptides (Table 2).

Fig. 2
figure 2

NetMHCpan Binder Predictions. a Using our experimental peptide sequences as inputs into NetMHCpan4.0 to predict epitopes for 22 HLA types represented in the 6 HIV-1 Infected people, a heatmap showing absolute counts of computationally predicted 9-mer binders against HIV-1 genes was constructed. The dendrogram shows the nearest similarity for the number of predicted counts across HLA types; b the length of the HIV-1 protein sequence plotted against the absolute number of NetMHCpan4.0 predicted 9mer binders showing a positive correlation (Spearman’s correlation coefficient, rs = 0.88). The number of distinct predictions is dependent on the length of the HIV-1 sequence; c comparison of HIV-1 clade A and D absolute number of NetMHCpan4.0 predicted 9mer binders per HIV-1 gene for the wet experiment test peptide sequences. The algorithm predicted more binders for clade D than clade A

Table 2 Experimentally Mapped Peptides and Computationally Predicted Epitopes
Table 3 Peptides not predicted

Comparison of the various epitope prediction length set showed that the 9mer setting was ideal for NetMHCpan4.0. The number of predictions were 88, 79, 55, 39, 39 and 37 hits out of 93 for 9, 10, 11, 12, 13 and 14-mer epitopes respectively. Increasing the prediction length from 9mer through 14mer resulted in a smaller number of predicted binders as illustrated in Fig. 3. Since we held the assumption that our wet experimental data was the gold standard we evaluated the sensitivity and specificity of NetMHCpan4.0.The computational predictor had more predicted binders than those determined by the experimental mapping as presented in the confusion matrix in Table 4. The experimental positive’s count also shown in Table 2 under column “Hit No” shows the test peptide count (1through 88) that contained the computational 9-mer sequence. Multiple computational epitopes may be contained in a single experimental peptide, as shown in the column “NetMHCpan4.0 9-mer Epitope Prediction” in Table 2. Overall HIV-1 Clade A 9-mer predictions were fewer in number than clade D (Fig. 2, c) though the difference did not approach statistical significance.

Fig. 3
figure 3

Computational epitope prediction. NetMHCpan4.0 set length plotted against the number of predicted binders per HLA type shows that the number of predictions reduces as the input set length increases. The dotted line is the trend line, whereas the solid line is the line of best fit. The core 9mer epitope sequence was similar across 9mer through 14mer set length except for one 14-mer peptide (hit 72 in Table 2)

Table 4 Experimental and computational 9mer peptide confusion matrix

Comparison of experimentally mapped epitopes with in-silico prediction

The experimental peptide mapping data was derived from a baseline time point corresponding to HIV-1 Fiebig stages IV, V and VI (Table 1) and a later time point. Ninety-three (n = 93) epitopes were experimentally mapped of which 12 were recognized at both baseline and later time points, 34 only at baseline and 54 only at the later time point. Comparison of the ranked computational score for Netmhcpan4.0 binders of early (n = 34) versus later peptides showed that the later time point predictions were stronger binders reaching statistical significance (Wilcoxon signed rank p-value = 0.0000005) (Fig. 4). NetMHCpan4.0 ranked binders as those predicted to be in the top 2% and assigned a score of 0.2 or below. Any binder within the top 0.5% and assigned a score of 0.05 or below was ranked as a strong binder. Considering only the 9-mer computational predictions, peptides that were derived from the same 17-mer experimental peptide were determined by a BLAST mapping to their derivative sequences. The 17-mer peptides were then classified into a confusion matrix (Table 4) as true positives, false positives, true negatives or false negatives. From the classification the true positive rate (sensitivity) was plotted against the false positive rate (1-specificity) using an ROC curve and the AUC attained reached 0.928 (Fig. 5). Only 9-mer length epitopes were considered in the ROC analysis as increasing the length to 10-mer through 14mer NetMHCpan4.0 predictions neither raised the number of predicted binders nor improved the hit rate as all their predictions contained the sequence already predicted in the 9-mer set except 1, 14-mer peptide (hit 72 in Table 2). Comparison of the ELISPOT magnitude of response (spot forming units) did not show any association to either NetMHCpan4.0 scores or MHCflurry1.2.0 affinity values. Similarly a comparison of the latter 2 computational predictors did not show any association between their assigned “affinity” values. NetMHCpan4.0 registered the highest concordance to the wet experiments followed by MHCflurry1.2.0.

Fig. 4
figure 4

Early versus Late Peptides. Experimentally mapped peptides at baseline (n = 34) and at least 12 months later (n = 34) were compared using the 9-mer computational NetMHCpan4.0 scores of the hits. The lower the computational score the stronger the predicted binding. Late peptides were significantly stronger binders than early peptides (Wilcoxon signed rank test, p = 0.0000005)

Fig. 5
figure 5

ROC plot. False versus true positive rate for all 9-mer and a single 14-mer test peptides across the 22 test HLA class I types. The diagonal line shows the random guess whereas the red curve shows the observed experimentally mapped epitopes versus the NetMHCpan4.0 expected predictions

Discussion

In this analysis we showed that the computational method NetMHCpan4.0 predicted 95% of previously experimentally mapped HIV-1 epitopes in 6 HIV-1 infected individuals expressing a total of 22 different HLA class I alleles. In our IFN-γ ELISPOT assays we evaluated 757 17mer peptides overlapping by 11 amino acids and covering the whole HIV-1 subtype A1 and D consensus proteomes. Out of the 5 experimentally determined epitopes missed by the algorithm (Table 3), 4 were actually computationally predicted as binders but were not included for lack of concordance with the participant’s HLA alleles. About one third (37) of 125 total positive predictions were not experimentally supported in our tests. These do not necessarily represent false positives, as ELISPOT detection depends on the frequency of specific T cells in the participant’s repertoire, and we observed changes in dominant T cell specificities within a given participant between early and later time points after HIV-1 infection. A formal ROC evaluation of the score generated by NetMHCpan4.0 as a classifier for peptides recognised/not recognised by PBMC in IFN-g ELISPOT assays, produced an AUC of 0.928. Thus experimental confirmatory tests cannot be dropped altogether, however the NetMHCpan4.0 algorithm could provide a considerable saving of time and resources in verifying just the predicted epitopes.

As the participants had been enrolled in the acute/early phase of HIV-1 infection and we had observed intra-participant changes in epitope recognition between early and late time points after infection, we compared the binding scores of confirmed epitopes at these time points and found a statistically significant change towards recognition of higher binding peptides as the infection entered the chronic phase. This might represent better support of the T-cell response directed at more stable HLA/peptide complexes as the infection progresses into chronicity.

The NetMHCpan4.0 algorithm, which is based on binding affinity and integrates data on eluted naturally processed ligands, reflected optimal HLA class I binding for 9-mers, producing a decreasing number of predictions when the peptide size was increased from 9 to 11 amino acids. With a single exception, predicted binders between 11 and 14 amino acids included at least one 9mer predicted to bind on its own, suggesting a destabilizing effect of the extra amino acids beyond the canonical HLA class I binding pockets at positions 2 and 9 could account for fewer predictions.

Important limitations are the lack of predictions of HLA class II restricted epitopes, which might have contributed to a fraction of IFN-γ ELISPOT responses. Approximately 5% of the computational predictions may be false positives that only increase the size of planned wet experiments and approximately 1% of true positives may also be missed.

Conclusion

In this analysis, using NetMHCpan4.0, MHCflurry and NetCTL to predict previously experimentally mapped epitopes, we demonstrate that the computational methods reliably predict an acceptable portion of binder epitopes. We recommend the use of such computational methods to reduce the size of experiments required cost associated.