Introduction

Campylobacter jejuni are helical-shaped, non-spore forming, microaerophilic gram-negative bacteria and are a major cause of bacterial campylobacteriosis worldwide (Friedman et al. 2000). The Campylobacter spp. was considered as zoonotic pathogen until isolation of C. jejuni was accomplished from human feces in 1968 (Dekeyser et al. 1972). Since its discovery in the 1970s, C. jejuni remains the most frequent cause of infectious diarrhea affecting over 450 million people every year throughout the world, attributing to a large economic burden (Friedman et al. 2000). The first C. jejuni (strain NCTC11168) genome was sequenced in 2000 with 94.3 % of the genome coding for proteins (Parkhill et al. 2000). C. jejuni pathogenesis mechanisms are poorly understood as virulence determinants appear to be multifactorial in nature such as chemotaxis, motility, toxins, flagella, invasion and adherence, and surface polysaccharide structures (Ketley 1997). Antibiotic therapy traditionally involves treatment with erythromycin and ciprofloxacin, but many reports have witnessed resistance of C. jejuni to different antibiotics such as tetracycline, kanamycin, chloramphenicol, erythromycin, and ciprofloxacin (Alfredson and Korolik 2007, Thakur et al. 2010). Due to irrational use of antibiotics, antibiotic resistance has escalated posing a challenge to current treatment regimens. Thus, there is a pressing need to develop alternative treatments.

Vaccination has proven to be a cost effective, safe, and efficient solution to combat infectious diseases like meningococcal, diphtheria, tetanus, poliomyelitis, pertussis, measles, mumps, and rubella in human health care (Moriel et al. 2008). The traditional approach to subunit vaccine development has negative aspects involving time and labor intensive nature, failure in cases where microorganism cannot be cultured or obtained in sufficient amounts (Rinaudo et al. 2009). For limiting increasing antibiotic resistance and increasing number of human infections, developing vaccines against C. jejuni is both indispensable and attractive. Some mutants of C. jejuni with defects in pili or invasion biosynthesis are being evaluated for their protective efficacy in animal models. Flagellin and adhesin proteins have been suggested as potential subunit-based vaccine candidates such as a recombinant-truncated flagellin protein (rFla-MBP) conferred 60 % protection in a ferret model of diarrhea. Several killed whole-cell (WC) and heat-labile toxin (LT) adjuvanted vaccines are under development (O’Ryan et al. 2015, Albert 2014). In one such example, killed Campylobacter whole-cell (CWC) organism adjuvanted with heat-labile enterotoxin (LT) of Escherichia coli showed protection against intestinal colonization in mice and rabbits. However, currently, there are no approved vaccines available to treat Campylobacter-associated illness. Sequencing the genome of many Campylobacter strains together with development of omics techniques and advanced bioinformatics approaches significantly improve the process of candidate epitope identification minimizing the arduous peptide screening task for immunobiological properties. The present study has employed a range of computational approaches to investigate the entire proteome of C. jejuni for identification of B- and T-cell epitopes as potential vaccine candidates. This study has important repercussion for selection of vaccine candidates, a critical step in vaccine development.

Methods

Retrieving non-homologous proteins from pathogen whole proteome

As described in the workflow diagram (Fig. 1), the complete proteome of the C. jejuni O:2 (strain NCTC 11168) encoding 1623 proteins was retrieved from Uniprot (Proteome ID UP000000799). Proteins non-homologous to host from pathogen proteome were retrieved using a two-step filtration procedure. In the first step, sequences with length less than 100 amino acids (aa) were filtered based on the fact that the average protein length in bacteria is 267 aa (Brocchieri and Karlin 2005). Consequently proteins with length less than 100 aa would probably not code for any protein. In the next filtration step, sequences were further screened out based on homology with the host (Homo sapiens) proteome at an e value cutoff of 0.05. In the BLASTp search, proteins which showed no hits below e value inclusion threshold were selected as non-homologous pathogen proteins.

Fig. 1
figure 1

Schematic representation of the protocols used for epitope identification

Antigenicity and transmembrane prediction

To predict antigenic sequences, these non-homologous pathogen proteins were subjected to VaxiJen server (Doytchinova and Flower 2007), which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties with a threshold value of 0.7. The sequences with antigenicity value above threshold were subjected to PSORTb version 3.0 to retrieve outer membrane localized proteins. PSORTb utilizes a Bayesian network model to calculate associated probability for five localization sites viz. cytoplasmic, inner membrane, periplasmic, outer membrane, and extracellular with a default probability value (p value) of 7.5 (Yu et al. 2010).

T-cell epitope prediction

NetCTL 1.2 Server was used to predict cytotoxic T lymphocyte (CTL) epitopes from the antigenic sequences localized in outer membrane, at a threshold value of 0.75 to maintain high sensitivity and specificity levels, and the prediction was restricted to 12 major histocompatibility class I (MHC-I) supertypes. NetCTL is an artificial neural network (ANN) and weight matrix-based tool combining the prediction of peptide MHC-I binding, proteasomal C terminal cleavage, and TAP transport efficiency (Larsen et al. 2007). The CTL epitopes generated from NetCTL were assessed for their allergenicity by subjecting them to AllerHunter program which is based on support vector machine (SVM) and pair-wise sequence similarity (Muh et al. 2009). A threshold value of 0.06 was specified for prediction of cross-reactive allergen.

An Immune Epitope Database (IEDB) tool based on combined predictors of proteasomal processing, TAP transport, and MHC binding was used for predictions of antigen processing through MHC-I (Tenzer et al. 2005). IEDB is the most inclusive database of experimentally characterized B- and T-cell epitopes. The stabilized matrix-based method (SMM) which can model the sequence specificity of quantifiable biological processes (Peters and Sette 2005) was employed to compute inhibitory concentration (IC50) values of peptide binding to MHC­I molecules. In conjunction with the IEDB tool, MHCPred which uses a partial least squares-based multivariate statistical approach (Guan et al. 2003) was used for prediction of both MHC-I and MHC-II binders of the predicted peptides. The alleles with binding affinity IC50 value less than 500 nM from both the servers were considered as efficient peptide binders.

Epitope conservancy and HLA distribution analysis

For each identified peptide, the conservancy was predicted using the IEDB tool (Bui et al. 2007). The degree of conservation of each peptide was calculated as the fraction of protein sequences of different strains retrieved from UniProt that match the aligned peptide sequence above a defined identity level. An IEDB-based tool for human population coverage analysis (Bui et al. 2006) was used to study the distribution of human HLA alleles among the predicted epitopes. The predicted peptides with their corresponding MHC-I and MHC-II alleles were submitted with default parameter settings (the final set containing frequencies of 3245 alleles for 16 geographical areas, 21 ethnicities, and 115 countries). The predictions were made using the latest dataset from the Allele Frequency Net Database (AFND) (Gonzalez-Galarza et al. 2011).

Molecular docking studies of HLA-epitope

Designing epitope 3D structure

To study the molecular interactions between the predicted T-cell epitopes (YIQDNFNFY and NTDQAQGTV) and HLA molecules, PEP-FOLD based on a hidden Markov model-derived structural alphabet (SA) (Thevenet et al. 2012) was used to predict the 3D structure of the peptide. PEP-FOLD generated five models for input peptide sequence. The best model was selected for docking studies.

Docking

To validate our results, we performed a docking study of HLA-A*11:01 and selected epitope using Hex, the first Fourier transform (FFT)-based protein docking server (Macindoe et al. 2010). The crystal structure of HLA-A*11:01 in complex with sars nucleocapsid peptide (PDB Id: 1X7Q) was simplified to HLA-A*11:01, prepared by adding hydrogen atoms. Finally, the docking was carried out in Hex using prepared HLA-A*11:01 and PEP-FOLD predicted epitopes as starting structures. The parameters were set to default except for correlation type which uses both shape and electrostatics criteria for docking calculations. The best conformation was selected based on the Etotal (binding affinity) value, and complexes and interactions were visualized in PyMOL molecular graphics package (Schrodinger 2010) and Ligplot, respectively (Laskowski and Swindells 2011).

B-cell epitope identification

BCPred (El-Manzalawy et al. 2008) and AAP (Chen et al. 2007) methods at BCPred server, both of which use SVM-based classifiers, were utilized with an aim to identify potential antigens which can interact with B lymphocytes. Tools from IEDB were employed to find the B-cell epitopes and further screen out the potential epitopes. Emini surface accessibility prediction (Emini et al. 1985), Karplus and Schulz flexibility prediction (Karplus and Schulz, 1985), and Parker hydrophilicity prediction (Parker et al. 1986) programs were used from IEDB. The regions common to predictions from both BCPred server and IEDB tools were considered as potential B-cell epitopes. These epitopes were further filtered based on allergenicity and antigenicity criteria using AllerHunter and VaxiJen, respectively.

Results

Retrieving non-homologous proteins from pathogen whole proteome

C. jejuni O:2 (strain NCTC 11168) whole proteome encodes 1623 proteins. After filtering out protein sequences on length criteria, we were left with 1500 proteins. We subjected the rest of the protein sequences to a homology search against human proteome database using BLASTP search from a standalone blast suite and retrieved a total of 210 pathogen proteins which were non-homologous to humans. Identifying proteins non-homologous to humans is essential as it excludes the possibility of the peptide vaccine targeting hosts enzymes, thus avoiding adverse effects on humans (Butt et al. 2012). Besides, self-peptides can mount an autoimmune response in the host.

Antigenicity and transmembrane prediction

The VaxiJen server used to assess the antigenicity of the protein sequences predicted 157 proteins as antigenic above a threshold of 0.7 which were further analyzed for their cellular location, and it revealed that 24 proteins were localized in outer membrane. Identification of outer membrane proteins is critical for reliable and rapid identification of vaccine candidates as many of the vaccines that trigger immune responses appeared to be secreted toxins or surface exposed molecules (Doro et al. 2009). Outer membrane localized proteins were further analyzed for vaccine candidate identification.

T-cell epitope prediction

NetCTL predicted T-cell epitopes from each sequence against MHC supertypes. Twenty-eight epitopes with their combinatorial score above threshold 2 were selected from the outer membrane localized antigenic proteins. These epitopes were further assessed by AllerHunter for allergic cross-reactivity and by VaxiJen for antigenicity. This step identified four epitopes as potential T-cell epitopes (Table 1). For each epitope, SMM-based IEDB MHC-I processing prediction tool retrieved the MHC-I alleles with IC50 value less than 500 nM which were potential epitope binders. MHCPred predictions of MHC-I and MHC-II alleles as efficient epitope binders were taken together with IEDB tool predictions to generate a final list of potential binders for each epitope. The results are summarized in Table 2.

Table 1 Most probable predicted epitopes selected on the basis of their NetCTL (MHC binding, proteasomal processing, and TAP transport), AllerHunter (allergic cross-reactivity) and VaxiJen (antigenicity) score
Table 2 Predicted potential T-cell epitopes, along with their interacting MHC-I and MHC-II alleles with an affinity of <500 nM and corresponding IC50 values (in parentheses)

Epitope conservancy and HLA distribution analysis

For each predicted epitope, conservancy was determined using IEDB conservancy tool, and the results are shown in Table 2. Epitope NTDQAQGTV was 75 % conserved at identity of >60 %, while YIQDNFNFY was 50 % conserved at 100 % identity. Conservancy results for other epitopes (RSDEAQTNY and KSDEEMEKY) were not alluring. Due to scarcity of sequence data in UniProt database, conservancy results do not portray factual depiction of epitope conservancy. Population coverage analysis was then performed for epitopes NTDQAQGTV and YIQDNFNFY along with their associated MHC-I and MHC-II alleles as input to IEDB population coverage analysis tool. As shown in Table 3, immune response elicitation of the 81.07 and 85.27 % world population was covered by the epitopes NTDQAQGTV and YIQDNFNFY, respectively. Maximum coverage 85.99 % for epitope NTDQAQGTV was in Europe area followed by 85.53, 84.25, and 80.44 % in the population of South Africa, South Asia, and North Africa, respectively. For epitope YIQDNFNFY, maximum coverage 90.81 % was in Europe area followed by South Asia, North America, and Northeast Asia with coverage 85.70, 84.08, and 82.80 %, respectively.

Table 3 Population coverage of predicted epitopes based on MHC-I and MHC-II restriction data for epitopes NTDQAQGTV and YIQDNFNFY maximum population coverage by Europe

Molecular docking studies of HLA-epitope

PEP-FOLD generates peptide structures by performing a series of simulations based on structural alphabet (SA) profiles derived from amino acid sequences. PEP-FOLD then returns the representative configuration for the input epitope based on energy and population parameters. Using Hex, different conformations of the predicted epitopes bound in MHC cleft were generated. Hex correlates molecules to 3D parametric functions such as electrostatic charge, surface shapes, and potential dissemination which define electrostatic and van der Walls interactions. The best conformation was then selected based on binding affinity scores which is dependent on such interactions. The docked complexes were visualized in Pymol as shown in Fig. 2. HLA-A*11:01 binds with epitopes NTDQAQGTV and YIQDNFNFY with binding energies −386.53 and −350.09 kcal/mol, respectively. Figure 3 represents the interactions involved in HLA-A*11:01 binding with predicted epitopes. Epitope NTDQAQGTV interacts with HLA-A*11:01 through hydrogen bonds with Tyr 27 and van der Walls interactions with MHC residues Ser 4, Arg 6, Phe 8, Asp 30, Gln 96, Met 98, Tyr 113, Ala 211, Glu 212, Thr 233, and Phe 241. MHC interacts with epitope YIQDNFNFY through hydrogen-bonded interactions with Asp 29 and Asp 30. Asp 30 forms two hydrogen bonds with Tyr 1 and Ile 2 in the epitopic sequence having bond lengths 2.86 and 2.87 Å, and Asp 29 is hydrogen bonded to Tyr 1 with a bond length of 2.57 Å. Epitope YIQDNFNFY is bound in MHC cleft due to hydrophobic interactions with Arg 6, Phe 8, Asp 102, Pro 210, Ala 211, Glu 212, Glu 232, Thr 233, Arg 234, Pro 235, Lys 243, and Phe 241. Involvement of common residues in interaction with different peptides suggests the crucial role of Arg 6, Phe 8, Ala 211, Glu 212, and Phe 241 MHC residues in MHC-peptide binding.

Fig. 2
figure 2

Docked complexes of HLA-A*11:01 against predicted epitopes generated by Hex docking program. a Epitope NTDQAQGTV. b Epitope YIQDNFNFY

Fig. 3
figure 3

Interactions involved in HLA-A*11:01 binding. a Epitope NTDQAQGTV. b Epitope YIQDNFNFY

B-cell epitope identification

As per the criteria set for prediction of B-cell epitopes, Table 4 depicts the epitopes predicted using AAP, BCPred, and IEDB tools further filtered based on allergenicity and antigenicity properties. Antigenic regions common to both BCPred and AAP were subjected to IEDB Emini surface accessibility tool to predict peptides which were surface exposed. Predicted peptides were checked for flexibility and hydrophilicity using IEDB tools, Karplus and Schulz flexibility prediction and Parker hydrophilicity prediction. This yielded a total of 25 peptides as B-cell epitopes. To test such peptides for their potential as B-cell epitopes, they were checked for their allergenicity and antigenicity which yielded four epitopes with allergenicity score ≤0.06 and antigenicity score >1 as shown in Table 4.

Table 4 Four most potential B-cell epitopes by combined predictions of AAP, BCPred, and IEDB tools (Emini Surface Accessibility, Karplus and Schulz flexibility, and Parker hydrophilicity) filtered based on their AllerHunter and VaxiJen score

Discussion

With the advancement in sequencing technologies, there has been remarkable progress in the vaccinology area, enabling researchers to finally move beyond the traditional vaccinology approach. With computational approaches, it is now feasible to access the entire antigenic repertoire of an organism. Reverse vaccinology (RV) approach to vaccine identification came into existence with addressing the problem of vaccine identification against Meningococcus B (Men B). Men B is a pathogen which was intractable to vaccine development using conventional vaccinology approach as its capsular polysaccharide is identical to a human self-antigen (Giuliani et al. 2006). Hitherto, RV has been practically applied against many pathogens (Maione et al. 2005, Thorpe et al. 2007). In the post-genomic era, power of omics data has been complemented by bioinformatics approaches which may lead to the discovery of unique antigens that may eventually improve existing vaccines. Many researchers have already proposed an epitope-based vaccine candidate against C. jejuni with their studies aimed at identifying vaccine candidates from specific proteins like cytolethal distending toxin (CDT), autotransporter protein CapA, polysaccharide capsules, etc. (Ingale and Goto 2014, Ashgar et al. 2007, Guerry et al. 2012). Developing killed WC vaccines is complicated by dearth of information on pathogenesis of C. jejuni, and development of flagellin subunit-based vaccines is complicated owing to antigenic diversity of Campylobacter flagellins. Perceiving the gaps in current efforts for vaccine development against Campylobacter, we have undertaken current study of genome wide screening of C. jejuni using an in silico approach, aimed at identifying potential vaccine candidates against this organism and expedite the efforts in this direction.

Currently, most vaccines are based on B-cell providing antibody-mediated immunity. However, T-cells confer long-lasting immunity while antibody-mediated immunity can be easily overcome by surge of antigens (Bacchetta et al. 2005). Cytotoxic CD8+ T lymphocytes (CTL) hamper infectious agents from spreading by invading infected cells. Thus, in this study, we have proposed both B- and T-cell epitopes which could be experimentally tested for their efficacy in triggering humoral and cell-mediated immune responses. As described in schematic workflow diagram (Fig. 1), we have framed a set of criteria for identifying potential vaccine candidates which involves antigenicity, T-cell/B-cell processivity, interaction with HLA alleles, allergenicity, conservancy, and population coverage. Protective epitopes are not clearly defined for C. jejuni. Thus, while screening proteomic data, it is of utmost importance to select the proteins which can confer protection. To select such segments from the proteins, it is encouraged to select genomic segments with antigenic properties. Thus, antigenicity filter was employed at several stages of vaccine candidate identification task. Initially, the proteins with antigenicity score above threshold 0.7 were selected as antigenic. Identified B- and T-cell epitopes were also filtered on antigenicity criterion.

Physiochemical properties like flexibility, hydrophilicity, and solvent accessibility are distinctive features of B-cell epitopes. These features have been exploited in many B-cell epitope prediction programs (Li et al. 2014). Initially, based on surface accessibility, flexibility and hydrophilicity criteria B-cell epitopes which could be proficiently processed by B lymphocytes were identified. NetCTL server predicted T-cell epitopes based on combined predictions of MHC class I binding, proteasomal C terminal cleavage, and TAP transport efficiency. C. jejuni strains are highly diverse which further complicates the vaccine development against this pathogen. Consequently, conservation of the epitopes at sequence level reveals that these regions are imperative from evolutionary point. Population coverage plays an essential role in vaccine development process. Our predicted peptides showed good population coverage in spite of the fact that in case of MHC-II data was only available for the alleles HLA-DRB1∗01:01, HLA-DRB1∗04:01, and HLA-DRB1∗07:01. Though, all the predicted nonamers were interacting with the most common HLA allele HLA-DRB1∗01:01 as shown in Table 2. For the predicted epitopes, in developing the world’s highest population coverage was in Asian and African countries where the diarrheal incidence rate is reported to be the highest, and in industrialized nations, it was highest for Europe and North America, aligning with the fact that maximum number of travelers to Asia and Africa are observed from these countries (Harris et al. 2011).

Further investigation in the data shows that epitope NTDQAQGTV has high antigenicity value (Table 2), but epitope YIQDNFNFY has a maximum of 29 MHC-interacting alleles. Epitope YIQDNFNFY was identified from cmeD which encodes for outer membrane component of multidrug efflux system cmeDEF. In a study, cmeC which is an essential outer membrane component of cmeABC multidrug efflux pump was proposed as a promising subunit vaccine candidate against C. jejuni infection using a chicken model (Zeng et al. 2010). cmeDEF also plays important role in antibiotic resistance against several antibiotics and toxic compounds. cmeABC and cmeDEF act synergistically in retaining cell viability and conferring antibiotic resistance (Akiba et al. 2006). Epitope sequence NTDQAQGTV has lower IC50 value of 20.61 nM for MHC supertype (HLA-A∗11:01) as compared to YIQDNFNFY which has an IC50 of 69.82 nM with HLA-A∗11:01. The results of computational docking studies coincide with the binding affinity values. NTDQAQGTV has a stronger affinity for HLA-A∗11:01 with binding energy of −386.53, while YIQDNFNFY binds in the groove of HLA-A∗11:01 with total energy −350.09. Epitope sequence YIQDNFNFY is more conserved at 100 % identity and has a high score as a processed peptide as evidenced from NetCTL score (Table 1). As seen in Table 3, population coverage analysis reveals that epitope YIQDNFNFY covers a large proportion of human population. Lowest coverage for this epitopes is in Central America (15.10 %) which is much higher when compared to population coverage of NTDQAQGTV in the same region being 1.34 %.

Four potential B-cell epitopes were predicted with their VaxiJen score (antigenicity) >1 and with AllerHunter score (allergenicity) ≤0.06 threshold. The epitopic sequence YTGKAKRVNPNT has the highest antigenicity of 1.6624 followed by IYRKHSNSSNS and RFSERKNKEE with antigenicity scores 1.6432 and 1.3154, respectively. Based on AllerHunter results, epitope sequence NPQQEKSQN has the highest possibility of being a non-allergen as marked by the lowest score 0.05, while other B-cell epitopes YTGKAKRVNPNT, IYRKHSNSSNS, and RFSERKNKEE have AllerHunter score of 0.06. Elicitation of effective immune responses depends on the specificity and diversity of the T-cell epitopes binding to HLA alleles. Due to highly polymorphic nature of MHC, it is desirable to identify peptides which can bind to many MHC alleles (Germain 1994). Our predicted T-cell epitopes NTDQAQGTV and YIQDNFNFY bind to more than 20 MHC alleles and have broad human population coverage.

HLA-A*11:01 was selected for docking studies. The predicted T-cell epitopes interact with HLA-A*11:01 with varying affinities. From computational docking results, it was interpreted that the epitopes bind efficiently to the HLA-A*11:01. It is believed that such a systematic computational pipeline for prediction of vaccine candidates when employed to C. jejuni proteome reveals epitopes that would be able to elicit an efficacious immune response.

There has been growing body of evidence which state the indispensable role of bioinformatics approaches in translational medicine (Ashgar et al. 2007; Guerry et al. 2012; Ingale and Goto 2014; Binnewies et al. 2006; Huang et al. 2002). As shown in Fig. 4, there was a sturdy decline in the protein search space at each step. It was noticed that there was significant reduction in the proteome size to be searched for vaccine identification. We started with the proteome size of 1623 proteins. Applying different filtration criteria at each step, we were left with eight antigenic sequences which were eventually tested for presence of vaccine candidates. In summary, omics-guided approaches and bioinformatics analyses offer broad potential for further developments in global health relevant novel therapeutics.

Fig. 4
figure 4

Step-wise reduction in the total no. of proteins in search for the identification of vaccine candidates against C. jejuni

Conclusion

Traditional molecular immunology techniques for vaccine identification are time and labor consuming. A wide array of omics techniques, whole genome sequencing data, and novel bioinformatics approaches have substantially improved our systemic understanding of complex diseases. These techniques hold a greater potential to be utilized for rapid and reliable genome wide screening for identification of vaccine candidates, thus have hastened the pace of vaccine development to a great extent by significantly reducing the number of experimentally testable epitopes. Our predicted epitopes are prospective vaccine candidates on grounds of higher population coverage and interactions with many HLA alleles. In conclusion, an immunoinformatics-based approach was utilized for detection of protective antigens in C. jejuni, which may serve as potential vaccine candidates to control campylobacteriosis once validated experimentally in vitro and in vivo. This immunoinformatics-based approach can be applied to other hosts or other enteric pathogens.