Introduction

Hepatitis C virus (HCV) is estimated to infect 185 million people worldwide [1]. Chronic HCV infection leads to progressive liver disease, being one of the major causes of hepatocellular carcinoma and one of the most common indications for liver transplantation [2]. The World Health Organization (WHO) strongly recommends combination therapy with pegylated interferon and ribavirin for chronically infected patients who qualify for treatment [1]. Recently, two NS3 protease inhibitors (boceprevir and telaprevir) have been approved by the US Food and Drug Administration, and the WHO conditionally (until more evidence has accumulated) recommends that these drugs should be given in combination with pegylated interferon and ribavirin for the treatment of chronic HCV genotype 1 infections. Also, the WHO strongly recommends that sofosbuvir be given in combination with ribavirin alone in patients who cannot tolerate interferon and are chronically infected with genotypes 1, 2, 3 and 4 [1]. However, these therapies are still not affordable in most developing countries. As a result, the development of an effective HCV vaccine is undoubtedly the best solution for the ultimate control of HCV infections, and is a public health priority.

Prophylactic vaccines against viral infections are generally aimed at inducing a humoral (B-cell) immune response, while therapeutic vaccines preferably activate both humoral and cellular (T-cell) immune responses [3]. A successful HCV vaccine will need to stimulate both arms of the adaptive immune response, since while both cellular and humoral .immune responses occur in a naturally infected host, the current consensus is that a strong cellular response is vital for viral clearance and protection [4].

The development of an effective HCV vaccine requires an understanding of the host’s adaptive immune response to natural infection. As with other viral infections, viral antigens are presented to CD4+ and CD8+ T-cells via human leukocyte antigen (HLA) class II and class I molecules, respectively [5]. Different HLA class alleles have been found to be associated with HCV infection. For example, HLA-A*11, HLA-Cw*04 and HLA-B*53 have been associated with HCV persistence [6, 7], while HLA-B*27, HLA-A*11:01, HLA-B*57, HLA-Cw*01:02 and HLA-A*03 have been associated with spontaneous HCV clearance [8, 9]. Also, HLA-DRB1*11 and HLA-DQB1*03:01 have been associated with decreased disease severity of HCV infection globally, suggesting they may present HCV-derived epitopes more efficiently to CD4+ T-cells than others and thus capable of viral clearance [10]. During acute HCV infection, development and persistence of strong specific responses by CD8+ and CD4+ T-cells [11, 12] and neutralizing antibodies [13] are associated with viral clearance, with HCV-specific CD8+ and CD4+ T-cells usually being transient or absent in patients who develop persistent infections. The rate of chronic liver disease progression has been shown to be determined by the magnitude of HCV-specific CD4+ T-cell responses, since these cells are essential for both the cellular and humoral responses [14]. CD8+ T-cells are essential for long-term protection against chronic HCV [15], while CD4+ T-cells play a role in viral clearance [16].

HCV infection evades the host’s immune system by generating immune escape variants through alteration of the virus HLA-restricted epitopes to avoid being recognized by T-cells and neutralizing antibodies [14]. Thus effective HCV vaccines will need to target protective epitopes that display minimal cross-genotype amino acid variability as they will provide broad potency [17]. Peptides corresponding to protective epitopes are desirable vaccine candidates because they are easy to construct and produce, and they do not contain infectious materials [18]. The first step in the process of epitope-based vaccine design and development is the in-silico prediction of peptide binding affinities to HLA proteins [19].

Genotype 5 accounts for over 50% of HCV infections in South Africa [20], and is becoming more prevalent in Europe and North America [21]. It is the most conserved of HCV genotypes, being classified into only one subtype (5a) [2224]. The growing prevalence of HCV genotype 5a in different parts of the world necessitates its molecular characterization in order to improve the formulations of vaccine candidates that are in development. Thus the aim of this study was to assess immunological determinants by predicting conserved epitopes in near-full length HCV genotype 5a sequences using a suite of online programmes to help in the designing of new vaccine candidates.

Results

Prediction of T-cell epitopes

For HLA class I, a total of 24 antigenic epitopes were predicted in the consensus near full-length of genotype 5a (Table 1). Epitope NS31325–1333 covered 30 of the 47 HLA class I alleles that were analysed assuring high binding affinity to different alleles. For conservation analysis with other genotypes, 8 of 24 epitopes were 100% conserved for specific genotypes. Epitopes NS31332–1340 and NS5B2557–2565 were highly conserved in all genotypes analysed, and epitope E2684–692 was conserved in all genotypes except for genotype 2, while 5 other epitopes were conserved in either 2 or 3 genotypes analysed. In addition epitope NS4B1832–1840 was conserved in genotype 1a, 1b, 2 and 4 epitope variants at anchor residues position 2 and 9 while E2677–685, NS31325–1333 and NS31357–1365 were conserved in at least 3 genotypes epitope variants each (Table 1). For HLA class II, 77 epitopes were predicted (Table 2). Epitope NS4B1879–1887 and NS4B1880–1888 covered 51 of 51 HLA class II alleles analysed. For conservation analysis with other genotypes, 31 of 71 epitopes were conserved for specific genotypes. Some epitopes were highly conserved in all genotypes (E2507–515, E2509–517, NS31253–1261, NS31254–1262, NS31327–1335, NS31392–1400, NS4B1916–1924 and NS4B1919–1927), while other epitopes were conserved in at least 1 to 4 of the genotypes analysed. Epitope NS4B1886–1894 was conserved at anchor residues position 1, 4, 6 and 9 in all 6 genotypes epitope variants while E2692–700, NS2964–972, NS31418–1426 and NS41561–1569 were conserved in at least 5 genotypes epitope variants each (Table 2). Epitopes NS31585–1593, NS5A2285–2293 and NS5B2889–2897 were predicted to cover both HLA class I and HLA class II alleles. The highest number of HLA class I binding epitopes were predicted within the NS3 (63%), followed by NS5B (21%), and for the HLA class II, the highest number of epitopes were predicted in the NS3 (30%) followed by the NS4B (23%) proteins (Table 3).

Table 1 HLA class I predicted epitopes of HCV genotype 5a and their antigenicity prediction score, number of allele and conservation (in percentages) in different genotypes
Table 2 HLA class II predicted epitopes of HCV genotype 5a and their antigenicity prediction score, number of allele and conservation (in percentages) in different genotypes
Table 3 Distribution of genotype 5a HLA class I and II predicted epitopes in each of the HCV gene

Epitope binding affinity to common South African HLA class I and HLA class II alleles

Genotype 5 epitopes and their genotypic variants were analysed for their binding affinity to HLA class I and HLA class II alleles most common in South Africa, where genotype 5a is predominating. For HLA class I, 11 of the most common South African HLA-A alleles (HLA-A*01:01, HLA-A*02:01, HLA-A*30:01 and HLA-A*30:02), HLA-B (HLA-B*0702, HLA-B*08:01 and HLA-B*3501), and HLA-C (HLA-C*04:01, HLA-C*06:01, HLA-C*07:01 and HLA-C*07:02) were analysed. The limitation of Propred 1 is that it does not cover most of the main HLA class I alleles: HLA-A (HLA-A*01:01, HLA-A*30:01 and HLA-A*30:02), HLA-B (HLA-B*08:01) and HLA-C (HLA-C*04:01, HLA-C*06:01, HLA-C*07:01 and HLA-C*07:02) that are observed in South Africa. As a results the IEDB epitope analysis tool was used to predict epitopes of all the 11 most common HLA-A, HLA-B, HLA-C covering HLA class I alleles found in South Africa with the ANN prediction server. Thirteen antigenic epitopes with high binding affinity score of <50 IC50nM were predicted. Most epitopes bind with high affinity to single HLA class I alleles with exception of epitope NS5B2889–2297 LSAFSLHSY that bind with high affinity to 2 HLA-A alleles (HLA-A*01:01 and HLA-A*30:02). Four of the epitopes binding to HLA-B*35:01 followed by HLA-A*02:01 with 3 epitopes binding to it, however, none of the epitopes bind to HLA-C*04:01, HLA-C*06:01 and HLA-C*07:02 alleles. The NS31359–1367 shows a level of promiscuity to HLA-A and HLA-B alleles. NS31359–1367 HPNIEEVAL bind with intermediate affinity to HLA-B*07:02, while its genotype 2, 3 and 4 variant HSNIEEVAL bind with poor affinity and genotype 6 variant HPNITETAL bind with high affinity. For HLA-B*35:01, the NS31359–1367 HPNIEEVAL and genotype 6 variant HPNITETAL bind with high affinity while genotype 2, 3 and 4 variant HSNIEEVAL bind with poor affinity. Three (E2684–692, NS31032–1040 and NS31359–1367) of the thirteen epitopes were predicted by both Propred 1 and IEDB analysis tool (Table 4).

Table 4 Binding affinity scores of predicted epitopes and their variants to common HLA I allele types prevalent in South Africa

For HLA class II alleles, 4 most common HLA-DRB alleles (HLA-DRB1*03:01, HLA-DRB1*04:01, HLA-DRB1*11:01 and HLA-DRB1*15:01) were analysed. Nineteen antigenic epitopes with high binding affinity score of <50 IC50nM were predicted. The HLA-DRB1*11:01 has the highest number of binding epitopes (10) followed by HLA-DRB1*04:01 and HLA-DRB1*15:01 (7 each). Epitopes NS4B1919–1927 LIAFASRGN was predicted to be the most promiscuous epitope binding to HLA-DRB1*04:01, HLA-DRB1*11:01 and HLA-DRB1*15:01 with high affinity and HLA-DRB1*03:01 with intermediate affinity. This epitope is conserved in all genotypes. Epitopes E2507-515, NS4B1774–1782, NS4B1919–1927, and NS4B1920–1928 were highly conserved in all genotypes (Table 5).

Table 5 Binding affinity scores of predicted epitopes and their variants to common HLA class II allele types prevalent in South Africa

Validation of epitopes

Seven of the predicted epitopes were previously confirmed experimentally by other studies as true positives in comparison with the epitopes analysed in the IEDB resource database. Majority of the epitopes predicted in this study have not been previously tested experimentally. The ‘true epitopes’ are highlighted by (#) in Tables 1, 2 and 4.

Discussion

Several studies that have published HCV epitopes focused mainly on genotype 1 [41, 42], but most of these studies do not take into account the diversity in other genotypes that are common in developing countries like most African countries. In the present study, predicted antigenic epitopes of HCV genotype 5a proteins from South Africa were analysed followed by conservation with randomly selected genotypes 1–6 references from GenBank. Several studies have confirmed the importance of using immunoinformatics as good predictors for selecting HLA ligands, T-cell epitopes and immunogenicity [43]. As a result, several immunoinformatics methods have been developed to assist in the identification of HLA binding peptides [44, 45].

For this analysis, near full-length sequences covering all HCV proteins with the exclusion of the 3′end of the NS5B were included to maximize number of epitopes predicted. The use of the whole viral genome for developing epitope vaccines has a potential control over the immune response and eliminating the side effects [43], and it also increases the chance of detecting a virus at any developmental stage [46]. It has been shown that multiple epitopes from different parts of the HCV genome are important to produce a vaccine that can elicit strong humoral immune responses and multiple specific cellular immune responses [47]. A polyepitope-based strategy with multiple components combining core, E1, and E2 proteins; and conserved T-cell epitopes in the NS proteins has been suggested to be a good vaccine candidate for HCV [48].

High number of epitopes was predicted for HLA class II as compared to class I. The findings of this study are consistent with a study by Shehzadi et al. that predicted epitopes in genotype 3 from Pakistan. The study showed that majority of predicted epitopes were found in the NS3 protein for both HLA class I and HLA class II alleles and most of the epitopes were conserved among different genotypes [49]. Although the NS3 region is one of the conserved regions in HCV, variability in the nucleotide and amino acids has been reported by several studies in the same genotype and also in different genotypes [50, 51]. A recent study that analysed 1568 NS3-protease sequences from genotypes 1–6 reported that the protease amino acids sequence was moderately conserved and majority of the amino acids clustered in small regions. Of the 181 amino acids analysed 47% showed <1% variability among all HCV genotypes, and 17.1% amino acid positions showing >25.1% variability [51]. The NS3 is considered to be a good cellular target candidate for a therapeutic vaccine [52] since majority of the HCV viral epitopes recognized by CD8+ and CD4+ T-cells are located in the NS3 region [5356]. The NS3 specific CD4+ and CD8+ T-cell responses were reported in patient responders to interferon therapy [57] and in spontaneous clearance of HCV [58].

Most of the predicted epitopes in the study sequence were found to be conserved across different HCV genotypes with a higher number of epitopes conserved at the anchor residues. The anchor residues are important for epitope high binding affinity to HLA [59]. Conserved epitopes might influence the immunogenic potential since mutations within the epitopes can increase the chance of immune escape [60]. For a vaccine to be effective globally the selected epitopes must cover HLAs of different populations and it must also be conserved among different genotypes. The high mutation rates of viral epitopes and HLA polymorphisms are some of the challenges that are associated with the development of peptide vaccines [61]. Successful epitope vaccine design requires a broad knowledge of HCV genotype diversity. This will help in the proper selection of conserved HCV-specific T-cell epitopes that will help in avoiding HCV immune evasion [62]. This study attempted to ensure maximal coverage of HLA polymorphism and different genotypes by analyzing conserved epitopes considering different HLA alleles.

Majority of the epitopes predicted from HCV proteins isolated from South African genotype 5a were good binders against HLA alleles that are found worldwide. HLA is both polygenic and polymorphic, and the pool of HLA molecules differs for every individual. Different HLA alleles bind peptides with a particular sequence pattern [63]. For an HLA allele to be covered by a set of epitopes, at least one of the epitopes should be capable of inducing an immune response when bound to the corresponding HLA molecule [46]. The epitopes predicted in this study bind to many HLA alleles including the ones common in South Africa and can be used for designing good vaccine candidates that will eventually work in genetically diverse populations. In-vitro and in-silico studies have showed that HLA alleles preferentially bind to conserved regions of viral proteins in human viruses [64].

Very few epitopes were found to be experimentally true positive, however this can be due to the fact that most of the previous studies focused on genotype 1. A limitation of the study was a lack of in-vivo and in-vitro studies to confirm the predicted immunogenic epitopes, which will be the focus of future studies. However in-silico studies still provide the basis for designing good vaccine candidates.

In conclusion, the results of this study demonstrated antigenic T-cell epitopes that are conserved among genotypes and good HLA binders derived from genotype 5a sequences that can be good candidates for vaccine development. Predicted epitopes analysed in this study will contribute to the future design of an efficient vaccine with the use of conserved epitopes to avoid variation in genotypes and as such, it will be able to induce broad HCV specific immune responses. Conserved epitopes among different genotypes will be experimentally tested in the future to determine their involvement in immune response.

Methods

Ethical statement

The study was approved by the Medunsa Research and Ethics Committee of the Faculty of Health Sciences at the University of Limpopo as project no MREC/p/142/2009:PG. The MREC is registered as an Independent Review Board with a reference no (IRB00005122).

Prediction of T-cell epitopes

Genotype 5a full-length sequences available in the GenBank and 6 of the near full length sequences generated from a previous study conducted by our group [24] were aligned and consensus sequences created using BioEdit [65] for the prediction of T-cell epitopes. For HLA class I, prediction for binding alleles was performed using ProPred I (http://www.imtech.res.in/raghava/propred1/) at a 4% default threshold by keeping the proteosome and immunoproteosome filters on at 5% threshold. ProPred 1 predicts antigenic epitopes for 47 HLA class I alleles [44]. For HLA class II, prediction was performed using ProPred (http://www.imtech.res.in/raghava/propred/) at a 3% default threshold. ProPred predicts antigenic epitopes for 51 HLA class II alleles [45].

Antigenicity of the epitopes

The Antigenicity score of all the predicted epitopes were analysed using VaxiJen v2.0 online antigen prediction (http://www.ddg-pharmfac.net/vaxijen/). Epitopes having antigenic score >0.5 were selected as antigenic. Vaxijen server performed well with 87% accuracy at a threshold of 0.5 antigenic score for viruses. VaxiJen v2.0 allows antigen classification based on the physicochemical properties of proteins without recourse to sequence alignment.

Epitope conservation analysis

All predicted epitopes were analyzed for conservation using the IEDB database (http://tools.immuneepitope.org/tools/conservancy/iedb_input) at a threshold of 100% conservation in comparison with 406, 221, 98, 33, 45, 45 randomly selected sequences from each of the HCV genotypes 1a, 1b, 2, 3, 4 and 6 respectively. The epitopes were considered conserved in another genotype if it shows 100% identity across the epitope in at least 70% of sequences in that genotype in the randomly selected sequences used in this study, downloaded from the public database. In addition, epitope variants that were conserved in at least 70% of the sequences were analysed for conservancy for anchor residues at positions 2 and 9 for HLA class I and positions 1, 4, 6 and 9 for HLA class II.

Validation of predicted epitopes

All the predicted epitopes were submitted to IEDB database (http://www.immuneepitope.org/) to confirm if they had been tested previously by other studies. The immuneepitope database contains experimentally confirmed data about antibody, T-cell epitopes, HLA binding, HLA restriction and HLA class.

Common South African HLA alleles

For epitope prediction binding to common HLA alleles found in South Africa, the IEDB epitope analysis tool (http://tools.immuneepitope.org/tools/conservancy/iedb_input) was used for HLA class I using the artificial neural network (ANN) algorithm [66] on the IEDB server, while for Class II ProPred (http://www.imtech.res.in/raghava/propred/) at a 3% default threshold was used. ProPred predicts antigenic epitopes for 51 HLA class II alleles [54]. The most common South African alleles were found in published literature [67].