Keywords

15.1 Introduction

Coronaviruses are remarkably large, positive-stranded RNA viruses that are enveloped with the nucleocapsid having helical symmetry. The corona in coronavirus is a Latin word that means a “crown”, and it indicates to the typical presentation of virions underneath electron microscopy with a periphery of hefty, globular surface projections similar to that of a crown. Coronavirus is a pathogen associated with severe respiratory symptoms and was first identified from the nasal cavities of sufferers with the common cold in the early 1960s (de Groot et al. 2013; Brown et al. 2012). These were named human coronavirus OC43 and human coronavirus 229E. A total of 40 sequenced genomes of different strains of coronavirus are accessible from National Center for Biotechnology Information (NCBI), out of which 7 are pathogenic to humans. A coronavirus, i.e. SARS-CoV, was responsible for outbreak of severe acute respiratory syndrome (SARS) in the year 2003, whereas Middle East respiratory syndrome coronavirus (MERS-CoV) caused the most recent outbreak in 2012 causing acute respiratory disease in affected people with signs of fever, cough and difficulty in breathing. After first reported from Saudi Arabia in 2012, this novel virus has also dispersed to other countries like the United States and was known to have high death rate. MERS-CoV infections are highly communicable, and no explicit antiviral cure has been designed for it till date (Azhar et al. 2017).

It compelled us to apply the well-known reverse vaccinology (RV) approach on available proteome of coronavirus. RV approach has been successfully applied on many prokaryotes, but there are very few known applications on eukaryotes and viruses. So, it is worthwhile to explore the potential of this approach to identify potential vaccine candidates for coronavirus. RV basically does the in silico examination of the viral proteome to hunt antigenic and surface-exposed proteins. This approach was initially applied successfully to Neisseria meningitidis serogroup B (Kelly and Rappuoli 2005) against which none of the prevailing techniques could develop a vaccine. The present book chapter is intended to explore the potential of RV approach to select the probable vaccine candidates against coronavirus and validate the results using docking studies.

15.2 The Elementary Concept of Reverse Vaccinology

Undoubtedly, the traditional approaches for vaccine development are fortunate enough to efficiently resist the alarming pathogenic diseases of its time. However, the traditional approach suffers from certain limitations like it is very time-consuming, the pathogens which can’t be cultivated in the lab conditions are out of reach, and certain non-abundant proteins are not accessible using this approach (Rappuoli 2000). Consequently, a number of pathogenic diseases are left without any vaccine against them. All these limitations are conquered by reverse vaccinology approach utilizing genome sequence information which ultimately is translated into proteins. Hence all the proteins expressed by the genome are accessible irrespective of their abundance, conditions in which they expressed. The credit of fame of reverse vaccinology should go to the advancements in the sequencing strategies worldwide. Accordingly, improvement in the sequencing technologies has flooded the genome databases with huge amount of data which can be computationally undertaken to reveal the various crucial aspects of the virulence factors of the concerned pathogen. Reverse vaccinology is based on same approach of computationally analysing the genome of pathogen and proceeds step by step to ultimately identify the highly antigenic, secreted proteins with high epitope densities. The best epitopes are selected as potential vaccine candidates (Pizza et al. 2000). This approach has brought the unapproachable pathogens of interest in spotlight and is evolving as the most reassuring tool for precise selection of vaccine candidates and brought the use of peptide vaccines in trend (Sette and Rappuoli 2010; Kanampalliwar et al. 2013).

15.3 Successful Applications of Reverse Vaccinology

Bexsero is the first universal serogroup B meningococcal vaccine developed using RV, and it has currently earned positive judgement from the European Medicines Agency (Gabutti 2014). Whether it is discovery of pili in gram-positive pathogens which were thought to not have any pili or the sighting of factor G-binding protein in meningococcus (Alessandro and Rino 2010), the reverse vaccinology steals all the credits from other conventional approaches. Most of the applications of RV are against prokaryotes and very few against eukaryotes and viruses because of complexity of their genome. Corynebacterium urealyticum (Guimarães et al. 2015), Mycobacterium tuberculosis (Monterrubio-López et al. 2015), H. pylori (Naz et al. 2015), Acinetobacter baumannii (Chiang et al. 2015), Rickettsia prowazekii (Caro-Gomez et al. 2014), Neospora caninum (Goodswen et al. 2014) and Brucella melitensis (Vishnu et al. 2017) are the examples of some pathogens that are recently approached using this in silico technique in order to spot some epitopes having potential of being a vaccine candidate. Herpesviridae (Bruno et al. 2015) and hepatitis C virus (HCV) (Kolesanova et al. 2015) are the examples of the viruses that are addressed using this approach.

15.4 Workflow of Reverse Vaccinology (with Example of Coronavirus)

15.4.1 Retrieval of Proteome of Different Strains of Coronavirus from NCBI

The proteome of different strains of the coronavirus of interest was downloaded from NCBI’s ftp site (ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/; NCBI Resource Coordinators 2017). The proteome information is available for download in many formats including FASTA format for different sequenced viruses. Strains pathogenic to humans were selected for further analysis. Among them a single strain was selected as the seed genome on the basis of literature. Sequence similarity searches using Blastp (http://blast.ncbi.nlm.nih.gov/blast, http://ugene.unipro.ru/) were performed to reveal the orthologs in different strains (Altschul et al. 1990; Okonechnikov et al. 2012; Golosova et al. 2014). Multiple sequence alignment (MSA) was done via ClustalW, and the phylogenetic tree was constructed using NJ method from Unipro UGENE 1.16.1 bioinformatics toolkit (Okonechnikov et al. 2012).

15.4.2 Analysis of Secondary Structure of Proteins from Seed Genome

Analysis of secondary structure of the proteins of seed genome was done by means of ExPASy portal. The aim is to forecast the solvent accessibility, instability index, theoretical pI, molecular weight, grand average of hydropathicity (GRAVY), aliphatic index, number of charged residues, extinction coefficient etc. (http://web.expasy.org/protparam/; Gasteiger et al. 2005).

15.4.3 Subcellular Localization Predictions and Count of Transmembrane Helices

Virus-mPLoc was used to identify the localization of proteins of virus in the infected cells of host (http://www.csbio.sjtu.edu.cn/bioinf/virus-multi/; Hong-Bin Shen and Kuo-Chin Chou 2010). This information is important to understand the destructive role and mechanism of the viral proteins in causing the disease. In total six different subcellular locations, namely, host cytoplasm, viral capsid, host plasma membrane, host nucleus, host endoplasmic reticulum and secreted proteins, were covered. These predictions could help in formulation of better therapeutic options against the virus. As per the protocol of RV, secreted and membrane proteins are of special interest, therefore, filtered for further analysis. To predict the number of transmembrane helices TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/TMHMM/; Krogh et al. 2001) was used.

15.4.4 Signal Peptides

Signal peptides are known to impact the immune responses and possess high epitope densities. Moreover, most of the known vaccine candidates also possess signal peptides. Hence, it is worthwhile to predict signal peptides in proteins prior to epitope predictions. Signal-BLAST web server is used to predict the signal peptides without any false predictions (http://sigpep.services.came.sbg.ac.at/signalblast.html; Frank and Sippl 2008). The prediction options include best sensitivity, balanced prediction, best specificity and detect cleavage site only. We choose to make the predictions using each option, and the proteins predicted as signal peptide by all the four options were preferred for further investigation.

15.4.5 Adhesion Probability

The most appropriate targets as vaccine candidates are those which possess the adhesion-like properties because they not only mediate the adhesion of pathogen’s proteins with cells of host but also facilitate transmission of virus. Adhesions are known to be crucial for virulence and are located on surface which makes them promptly approachable to antibodies. The stand-alone SPAAN with a sensitivity of 89% and specificity of 100% was used to carry out the adhesion probability predictions, and the proteins with having adhesion probabilities higher than or equal to 0.4 were selected (Sachdeva et al. 2004).

15.4.6 BetaWrap Motifs

BetaWrap motifs are dominant in virulence factors of the pathogens. If the proteins are predicted to possess such motifs, then they are appropriate to be taken under reverse vaccinology studies. BetaWrap server is the only online web server to make such predictions. The proteins having P-value lower than 0.1 were anticipated to contain BetaWraps (http://groups.csail.mit.edu/cb/betawrap/betawrap.html; Bradley et al. 2001).

15.4.7 Antigenicity Predictions

For added identification of the antigenic likely of the proteins, they were subjected to VaxiJen server version 2.0. It is basically an empirical method to hunt antigenic proteins. So, if the proteins are not found antigenic using other sequence-based methods, then they can be identified using this method. This step confirms the antigenicity of proteins selected using above-mentioned steps (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html; Doytchinova and Flower 2007).

15.4.8 Allergenicity Predictions

For being a probable vaccine candidate, the protein should not exhibit the characteristics of an allergen as they trigger the type-1 hypersensitivity reactions causing allergy. Therefore, to escape out such possibilities, the proteins were also subjected to allergenicity predictions using Allertop (http://www.pharmfac.net/allertop; Dimitrov et al. 2014) and AlgPred tools (http://www.imtech.res.in/raghava/algpred/submission.html; Saha and Raghava 2006a, b).

15.4.9 Similarity with Host Proteins

To check whether the filtered proteins possess any similarity to host proteins or not, the standard Blastp (http://blast.ncbi.nlm.nih.gov/blast) searches were performed. In case of sequence similarity, there is a feasibility of generation of immune responses against own cells.

15.4.10 Epitope Mapping

Predicting the epitopes binding to MHC class I is the main decisive phase of the RV to carry out valid vaccine predictions. The epitopes showing their affinity for T-cells were first selected via IEDB (http://tools.immuneepitope.org/mhci/), ProPred-I (http://www.imtech.res.in/raghava/propred1/; Singh and Raghava 2003), BIMAS (http://www-bimas.cit.nih.gov/molbio/hla_bind/; Parker et al. 1994) and NetCTL tools (http://www.cbs.dtu.dk/services/NetCTL/; Larsen et al. 2005). For the epitope to be included in the hit list, it must be predicted by any three of these four mentioned tools. For making the predictions of B-cell epitopes, BepiPred (http://www.cbs.dtu.dk/services/BepiPred/; Larsen et al. 2006) and ABCPred tools (http://www.imtech.res.in/raghava/abcpred/ABC_submission.html; Saha and Raghava 2006a, b) were used. The overlapping B-cell and T-cell epitopes were identified.

15.4.11 Docking of the Predicted Epitopes with HLA-A*0201

The predicted epitopes were docked with receptor that is HLA-A*0201 using ClusPro (http://cluspro.bu.edu/login.php; Kozakov et al. 2017) that is an automated protein-protein docking web server. The literature searches provided the information of conserved residues of the receptor site. The default parameters were used for docking (Comeau et al. 2004a, b; Kozakov et al. 2006).

15.5 Results and Discussion

15.5.1 Retrieval of Proteome from NCBI

A total of 40 different sequenced strains of coronavirus are available at NCBI. Among them 7 strains are pathogenic to humans. Various information regarding source, host and collection of these strains are presented in Table 15.1 and 15.2. This information can be obtained from NCBI’s genome database, the Virus Pathogen Database and Analysis Resource and Genomes OnLine Database (Liolios et al. 2006; Pickett et al. 2012). The MERS strain is taken as seed genome as it is the most prevalent and disastrous strain among others. Its proteome consists of total 11 proteins as shown in Table 15.3. The results of sequence similarity to reveal orthologs using Blastp are shown in Table 15.4. The sequences with greater than 30% identity score are considered as homologs. The phylogenetic tree is depicted in Fig. 15.1 and the MERS-CoV, taken as seed genome, found clustered with different Bat coronaviruses.

Table 15.1 Information of coronavirus strains available at NCBI
Table 15.2 Detail information about seven strains of coronavirus which are pathogenic to humans
Table 15.3 Information of MERS-CoV (NC_019843.3) proteins
Table 15.4 Results of Homology searches of the proteins of seed genome using Blastp
Fig. 15.1
figure 1

Phylogenetic tree of 40 different strains of coronavirus using whole genome sequences (Alignment of genome sequences is done using ClustalW, and tree is created using NJ method from Unipro UGENE 1.15.1 bioinformatics toolkit)

15.5.2 Analysis of Secondary Structure

The results of analysis of secondary structure of the proteome using ExPASy tools are shown in the Table 15.5. From the analysis of charge on the residues and pH values, it is concluded that six of the proteins are basic and positively charged unlike allergens which are acidic in nature. However, five proteins are acidic and show negative charge. The negative GRAVY score of five proteins justify them to be of hydrophilic nature with majority of the residues positioned towards the surface. For the rest of six proteins, the GRAVY score is positive; it means that these are hydrophobic proteins. The proteins with less than 40 value of instability index are quite stable than those with higher values. All the proteins are having the molecular weight less than 110 kDa except 3 (YP_009047202.1, YP_009047203.1 and YP_009047204.1). This exhibits the effectiveness of lightweight proteins as targets as they can be easily purified because of their low molecular weights. The protein YP_009047204.1 is reported as a spike glycoprotein. It is acidic with prominent negative charge, with negative GRAVY score which suggests its hydrophilicity and presence on surface. However the envelope protein YP_009047209.1 and membrane protein YP_009047210.1 are basic and hydrophobic.

Table 15.5 Secondary structure analysis of MERS-CoV proteins

15.5.3 Subcellular Localization Predictions

Figure 15.2 depicts the subcellular localization of proteins of the seed genome, i.e. MERS-CoV. Only one protein was predicted to be localized in host cytoplasm, four in host membrane, two in both host cell membrane and endoplasmic reticulum (ER) while two in only ER, and two are left unrecognized. The known spike protein is predicted to be localized in host ER. From these results we decided to pick the proteins which are located in host membrane or were predicted to be localized in both host membrane and ER. The two are known envelop protein and membrane protein from bibliographic studies, and along with that, the known spike protein was also included in the filtered results. Out of the filtered proteins, only two (YP_009047210.1 and YP_009047208.1) contain more than two transmembrane helices, therefore filtered out. The results of transmembrane helices prediction are tabulated in Table 15.6. Figure 15.3 depicts the subcellular localization of proteins of all the four selected genomes using Virus-mPLoc prediction tool.

Fig. 15.2
figure 2

Subcellular localization of seed genome proteins predicted using Virus-mPLoc

Table 15.6 Subcellular Localization prediction results using Virus-mPloc
Fig. 15.3
figure 3

Subcellular localization of proteins of all four selected genomes predicted using Virus-mPLoc

15.5.4 Signal Peptides

The proteins that are predicted to possess the signal peptides by Signal-BLAST web server are YP_009047204.1 and YP_009047205.1. The results of Signal-BLAST web server are tabulated in the Table 15.7.

Table 15.7 The signal peptide prediction results for proteins of MERS coronavirus strain

15.5.5 Adhesion Probability

This step takes into account the concept of adhesion-based virulence. Adhesions cause pathogen recognition and initiation of inflammatory responses by the host. SPAAN predicted 2 (YP_009047204.1 and YP_009047205.1) out of 11 proteins of MERS strain as adhesive (Table 15.8).

Table 15.8 Table illustrating the prediction results made for selecting adhesion proteins using SPAAN, BetaWrap predictions and antigenicity predictions using Vaxijen version 2.0

15.5.6 BetaWrap

Only one protein (YP_009047204.1) was predicted to contain BetaWrap motifs within it (Table 15.8). Hence, it is considered virulent and might be responsible for initializing the infection in the host.

15.5.7 VaxiJen 2.0

A total of 9 out of 11 proteins of MERS strain were predicted antigenic (prediction values greater than 0.4). The protein with accession number YP_009047206.1 and YP_009047208.1 were among the filtered proteins, however, not predicted antigenic, therefore filtered out. As a result, only four proteins (YP_009047204.1, YP_009047205.1, YP_009047207.1 and YP_009047209.1) were kept for further analyses.

15.5.8 AlgPred and Allertop

None of the 11 proteins of MERS-CoV possessed any clue of allergenicity as per prediction results from AlgPred and Allertop tools; it means that no vigorous immune responses will be mounted if the epitopes from these proteins will be adopted as vaccine candidates.

15.5.9 Similarity with Host Proteome

None of the protein of MERS strain shows similarity with the proteins of host that demonstrates that the epitopes from these proteins can safely elicit the required immune response without the hazard of autoimmunity.

15.5.10 Epitope Mapping

In total 12 different 9-mer epitopes with potential to bind to receptors of both B-cell and T-cell were predicted. The list of the predicted epitopes can be found in the Table 15.9 and are specific for MERS-CoV strain. All these epitopes displayed no conservancy with proteins of other human and non-human pathogenic strains.

Table 15.9 The results of overlapping T-cell and B-cell epitope predictions for four filtered proteins

15.5.11 Docking Analysis

Docking permits to reveal the binding energy or potency of connection among epitopes and the receptor in appropriate orientation. The ClusPro docking server was used to dock the predicted 90 epitopes against HLA-A*0201. The structure of the receptor was available from PDB and was optimized before docking to free it from the complexed self-peptide (4U6Y, Resolution 1.47 Å, Bouvier et al. 1998). PEPstr (Peptide Tertiary Structure Prediction Server; Kaur et al. 2007) was used to derive the tertiary structure of the predicted peptides.

Figure 15.4 depicts the quaternary structure of the receptor HLA-A*0201 with its conserved active site known to form complex with the peptides (Bouvier et al. 1998). The binding energy results obtained after performing docking analysis are listed in Table 15.9.

Fig. 15.4
figure 4

3D structure of receptor site of HLA-A*0201 visualized using Swiss PDB viewer 4.10. The residues shown in globular structure are known to be conserved and form hydrogen bonds with the binding peptides

The 9-mer epitope VVCAITLLV at site 21 of protein YP_009047209.1 docked to the receptor with smallest amount of binding energy (−951.7) and 12 hydrogen bonds. The next epitope in the list was also from the same protein YP_009047209.1 at site 27, i.e. TLLVCMAFL. The predicted structure of the top 5 potent epitopes on the basis of docking energy and the snapshots of docking results are displayed in Figs. 15.5, 15.6, 15.7, 15.8 and 15.9.

Fig. 15.5
figure 5

(a) 3D Structure of the 9-mer epitope starting from 21(VVCAITLLV) position of protein YP_009047209.1 (b) Docking results of epitope “VVCAITLLV” with A chain of HLA-A*0201 using ClusPro. (c) The snapshot representing the epitope docked in the pocket of molecular surface of the receptor (all the structures are visualized using Chimera 1.10.1)

Fig. 15.6
figure 6

(a) 3D Structure of the 9-mer epitope starting from 27(TLLVCMAFL) position of protein YP_009047209.1. (b) Docking results of epitope “TLLVCMAFL” with A chain of HLA-A*0201 using ClusPro. (c) The snapshot representing the epitope docked in the pocket of molecular surface of the receptor (all the structures are visualized using Chimera 1.10.1)

Fig. 15.7
figure 7

(a) 3D Structure of the 9-mer epitope starting from 716(GLVNSSLFV) position of protein YP_009047204.1. (b) Docking results of epitope “GLVNSSLFV” with A chain of HLA-A*0201 using ClusPro. (c) The snapshot representing the epitope docked in the pocket of molecular surface of the receptor (all the structures are visualized using Chimera 1.10.1)

Fig. 15.8
figure 8

(a) 3D Structure of the 9-mer epitope starting from 18(YVDVGPDSV) position of protein YP_009047204.1. (b) Docking results of epitope “YVDVGPDSV” with A chain of HLA-A*0201 using ClusPro. (c) The snapshot representing the epitope docked in the pocket of molecular surface of the receptor (all the structures are visualized using Chimera 1.10.1)

Fig. 15.9
figure 9

(a) 3D Structure of the 9-mer epitope starting from 160(KMGRFFNHT) position of protein YP_009047204.1. (b) Docking results of epitope “KMGRFFNHT” with A chain of HLA-A*0201 using ClusPro. (c) The snapshot representing the epitope docked in the pocket of molecular surface of the receptor (all the structures are visualized using Chimera 1.10.1)

The most chief restriction for developing a safe and sound vaccine against any of the virus is to identify the protective antigens. The present study is an effort of application of reverse vaccinology approach to investigate a choice of coronavirus proteomes to identify possible vaccine targets. This technique has demonstrated to be a competent way to forecast 12 different epitopes from the selected seed genome. These epitopes are from spike glycoprotein, NS3 protein, NS4B protein and envelope protein. Unfortunately none of the epitope is found conserved in other strains, and all are specific to MERS-CoV. The docking analysis studies revealed perfect binding between HLA-A*0201 receptor and epitopes. The conserved residues of the receptor site are also involved in H-bonding with epitope residues. Further, the selected antigenic epitopes must be validated using in vitro and in vivo studies to confirm their potential as vaccine candidates.