1 Introduction

The interaction between an antibody and its antigen is at the heart of the humoral immune response. The detection of highly immunogenic regions within a given protein, specifically those that elicit a humoral immune response, i.e., B cell epitopes, is central to many immunodetection and immunotherapeutic applications (Irving et al. 2001). Antibodies bind to their corresponding antigens at discrete sites known as antigenic determinants of epitopes. The precise localization of an epitope can be essential in the development of biomedical applications such as designed vaccines, diagnostic kits, and immune-therapeutics (Westwood and Hay 2001). Predicting epitopes is fundamental to the understanding of the basis of immunological discrimination between self and non-self as well as for mechanisms of bio-recognition in general. Since proteins are one of the most abundant and diverse class of antigens, antigens of infectious agents, and allergens, much of the interest in antigen characteristics is focused on antigenic proteins (Rubinstein et al. 2008).

B cell epitopes are classified into two different groups. The first group consists of linear or continuous epitopes. A continuous epitope comprises a single, consecutive stretch of amino acids in the protein sequence, which is specifically recognized by an antibody raised against the intact protein. The second group is formed by conformational or discontinuous epitopes. These are epitopes composed of residues separated in the protein sequence, but in spatial proximity because of the protein fold (Van Regenmortel et al. 1996). It has been suggested that sequence-level analysis at exon and peptide level can be a valuable profile (Liou, and Huang, 2012). A specific and isolated linear peptide, derived from the sequence of a given antigen, was able to elicit antibodies that not only bound the peptide, but also strongly cross-reacted with the native antigen. Peptides containing linear epitopes are considered to have a high potential for vaccines. In addition to the advantages of being used for vaccines; peptides are easily synthesized, purified, stored, and handled (Andersen et al. 2008).

Neisseria meningitidis is a major cause of childhood meningitis and septicemia worldwide. Meningococcal disease is associated with a high fatality rate (approximately 10 %) and survivors may develop permanent abnormalities such as deafness, seizures, amputation, and mental retardation (Offit and Peter 2003). A Gram-negative encapsulated bacterium Neisseria meningitides is classified into groups according to the chemical composition and immunogenic properties of capsular polysaccharides. Serogroups A, B, C, W135, and Y account for >95 % of the infections. Capsular polysaccharide or capsular polysaccharide conjugate vaccines are available against serogroup A, C, Y, and W135 strains (Jódar et al. 2002; Morley et al. 2001; Rouphael and Stephens 2012). However, no capsule-based vaccine is available for N. meningitidis serogroup B. The immune system tolerates serogroup B capsular polysaccharide because of its similarity to human carbohydrate α(2 → 8)N-acetyl neuraminic acid or polysialic acid, both consisting of repeated units of two to eight linked sialic acid (Finne et al. 1983). Therefore, our study is focused on proteins of the outer membrane for development of effective epitope vaccine against meningococcal diseases. In our previous work, we predicted and characterized the T cell epitopes for epitope vaccine design from OMV proteins of N. meningitidis serogroup B (Chandra et al. 2010).

An unguided experimental search for such regions is clearly laborious and resource intensive. Thus, computational approaches that are able to perform this task are desired. The humoral immune response is based on antibody–antigen interaction which leads to a solution of numerous protein–protein interfaces. We compared these interfaces with respect to physico-chemical properties at the amino acid level from amino acid composition. Aside from the physico-chemical character of the interfaces, secondary structure content was also found to be an important property of protein–protein interfaces, and has been used as one of the analysis parameters. In the present study, the outer membrane proteins of outer membrane vesicles of Neisseria meningitidis serogroup B were used for the prediction of linear B cell epitopes. Amino acid composition, physico-chemical properties, and secondary structure element preference were analyzed to identify putative patterns in the predicted epitopes.

2 Materials and methods

2.1 Collection of data

A set of 15 OM protein complements was selected from the OMV 236 non-redundant proteins and represented only 6.4 % of the total number of proteins detected (Williams et al. 2007). The complete genome and protein sequences of N. meningitidis serogroup B (MC58) were taken from Genbank (NCBI) and UniProtKB. The selected protein sequences were retrieved in FASTA format and used for further analyses.

2.2 Vaccine candidate characterization

Amino acid compositions were computed by using Expasy’s ProtParam server (Gasteiger et al. 2005). The similarity of human protein to OM proteins of OMV were searched using BlastP. Protein secondary structure prediction was done by APSSP2: Advanced Protein Secondary Structure Prediction Server (Raghava 2002).

2.3 Epitope prediction

The Epitopia server (Rubinstein et al. 2009a, b) has been used to predict epitopes in the manually curated dataset. The Epitopia is a Web-based server that aims to predict immunogenic regions in either a protein three-dimensional structure or a linear sequence. Epitopia implements a machine-learning algorithm that has been trained to discern antigenic features within a given protein. The Epitopia algorithm (Rubinstein et al. 2009a, b) uses a naïve Bayes classifier to predict the immunogenic potential of protein regions. The classifier was trained to recognize immunogenic properties using a benchmark dataset of 66 non-redundant validated epitopes derived from antibody–antigen co-crystal structures and 194 non-redundant validated epitopes derived from antigen sequences.

A given antigen input is divided into overlapping surface patches (or stretches in the case of a linear sequence input), with the size of a typical epitope. Epitopia then computes for each patch (or stretch) the probability that it was drawn from the population of epitopes on which the classifier had been trained, with respect to each one of its physio-chemical and structural geometrical properties. The immunogenicity score is thus the sum of logs of these probabilities and is assigned to the central residue of the patch or to the middle residue in the linear stretch (Rubinstein et al. 2009a, b).

2.4 Validation of predicted epitopes

Predicted epitopes were searched in the IEDB database (www.immuneepitope.org) (Vita et al. 2010), to find out the known experimental entry of epitopes. This search result shows the possibility of epitope immunogenic property on the basis of an already known experiment. All predicted epitopes were searched for linear epitope match as substring (Table 1).

Table 1 Epitope predictions with position and number of residues in each predicted instance. Epitopes were organized according to rank of antigenicity (top 5)

3 Results and discussion

In OMV, out of 236 non-redundant proteins, only 15 (6.4 %) proteins were predicted to be located in the outer membrane. In our previous study, we found the similarity of two proteins with human proteins, when OM proteins were searched against translated human genome through BlastP. The fkpA (NMB1567) has limited similarity to the human FK506 binding proteins 2 (FKBP2) and AIP aryl hydrocarbon receptor interacting proteins, and omp85 (NMB0182) has limited similarity with human SAMM50 sorting and assembly machinery component 50 homolog, which might prove a plausible immunological cross-reactivity between vaccine and host cell proteins, and therefore could lead to autoimmunity. To overcome this technical limitation, we selected 13 out of 15 OM proteins for epitope prediction (Chandra et al. 2010). These 13 OM proteins did not show any significant similarity with human proteins and neglected the chances of autoimmunity response. The summary of the results of the prediction server Epitopia used for B cell epitope prediction in the present study is given in Table 1. The peptides are given according to their rank (from 1 to 5) of the most significant immunogenic stretches obtained by the clustering procedure (Rubinstein et al. 2009a, b). The predicted epitopes are searched in the IEDB database for their possible similarity to the linear epitope data present in the IEDB. We found that the predicted epitopes of porA and porB proteins had positive entry as linear epitope when searched as substring. Two predicted epitopes of the NMB1961 protein were also present in the IEDB database as linear epitopes. Some entries were also found in IEDB, while others were not found because they were not utilized as vaccine candidate in any previous study.

3.1 Amino acid preference of epitopes

Feature selection is an important aspect of multifaceted analysis and has to be implemented properly for multidimensional datasets in bioinformatics (Hulse et al. 2012). The peptides of 13 antigenic proteins were classified into epitope and non-epitope groups. The predicted peptides signify the epitopic region, while protein sequence of OM protein minus epitope sequence signifies the non-epitopic region. The dataset was constructed for both groups individually. The amino acid preference of epitopes differs from that of the remaining antigen surface (Jones and Thorntom 1995; Lo Conte et al. 1999). Here, the amino acid preference is evaluated using the amino acid frequencies (Fig. 1). The overall amino acid composition is found to differ significantly between the epitope and non-epitope surface. The amino acid composition was computed and results were evaluated in three categories of amino acids: hydrophobic, charged and polar and other small amino acid. Our data show that charged and polar amino acids are found more frequently in epitopic group than in the non-epitopic group (Fig. 1). On the other hand, the non-epitopic group has more hydrophobic amino acid as compared to other amino acids (Fig. 1). These findings are in support of previous reports (Bogan and Thorn 1998; Jackson et al. 1999) claiming that charged residues are generally preferred in protein–protein interaction due to their capability to form a multitude of interactions.

Fig. 1
figure 1

Amino acid preference of epitope and non-epitopes. All 20 standard amino acids and their frequency of usage have been represented on the x and y axes, respectively. Amino acids have also been classified into three distinct classes of biochemical properties (hydropathy, charged, polar and other small), indicated on top of the image

3.2 Epitopes’ secondary structure

The important structural aspect of an epitope that may distinguish it from the remaining antigen protein part is its secondary structure composition. To test which kind of specific secondary structure elements the epitopes were enriched with, each amino acid was assigned to either of the following three secondary structure groups: (1) alpha-helix, 310 helix, and pi-helices were grouped as helix; (2) isolated beta bridges and extended beta strands were grouped as strands; and (3) turns, bends, and irregular structures were grouped as loops. This analysis revealed that epitopes were significantly enriched with loops and significantly depleted of helix and strands. The frequency of distribution in our analysis for loops, helices, and strands was 0.58, 0.25, and 0.17, respectively (Fig. 2). Since loops tend to be more flexible than other organized secondary structure elements (Jemmerson and Paterson 1985; Pellequer et al. 1991), these results suggest that epitopes are relatively flexible (Fig. 2). The abundance of flexible secondary structure elements in epitopes may facilitate the capacity demonstrated by epitopes to undergo conformational adjustment upon antibody binding.

Fig. 2
figure 2

Distributions of secondary structure elements in epitopes. Major secondary structure elements such as loops, helices, and sheets have been incorporated. These elements and their frequency of distribution have been represented on the x and y axes, respectively

3.3 Evolutionary conservation of epitopes

Functional regions on protein surfaces tend to be evolutionarily conserved relative to other regions (Zhou and Shan 2001). Epitopes may overlap such functional regions due to shared constraints imposed by the nature of protein–protein interactions. If so, epitopes should be more evolutionarily conserved than the remaining antigen surfaces. But epitopes are enriched with unorganized secondary structures (loops). It is claimed that amino acid replacements in surface loops usually do not perturb the three-dimensional structure of the protein, since surface loops are relatively flexible (Saunders and Baker 2002). Thus, the conservation variability of epitopes might be biased by the abundance of loops in epitopes. These results imply that epitopes do not tend to overlap functional regions, but rather cover separate regions.

This study also proposed some new directions for further analysis of this kind of dataset. One such approach could be the comparative analysis using various available tools/servers/databases under similar conditions and parameters to check the performance of these methods. Such kind of analyses also provide robust and repetitive data which could help experimental biologists to verify it experimentally (Gupta et al. 2011). This kind of approach could also be applied to other pathogens and diseases. The generated data could also be implicated in the development of novel prediction methods for linear and discontinuous epitope vaccine candidates by applying machine-learning techniques.

4 Conclusion

The predicted epitopes from the genome/proteome sequences of the pathogens would greatly reduce the time as well as cost and be useful for experimental planning in the development of epitope vaccines. We have predicted numerous epitopes in OMV proteins, which would be useful for the earlier identification of meningitis and septicemia. The predicted epitopes may be used for safe vaccine development against meningococcal diseases. Epitopes are enriched with charged and polar residues and depleted of hydrophobic ones. Epitopes are enriched in loops and depleted in helix to beta strands. The lack of conservation can be partially explained by the enrichment of loops in epitopes, as they are relatively tolerant to amino acid replacements. An additional explanation for the lack of epitope conservation involves self-tolerance, as the conserved regions of antigen, which are expected to have much lower potential for eliciting an immune response, may also present in the host itself. These results have important implications for the development and use of vaccines based on epitopes of OM proteins subsequent to in vitro validations. However, all the findings of this study will definitely aid in the process of vaccine design.