Introduction

A recent study suggests that coronaviruses (CoVs) and bats have co-evolved for millions of years but CoVs seldom jumped across to infect humans (Joffrin et al. 2020). They were considered a comparatively harmless pathogen causing mild respiratory illness in humans (Song et al. 2019). But in the last two decades after the incidences of SARS infection in 2003 and MERS infection in 2013, the understanding regarding CoVs has changed. SARS-CoV-2 has been reported to target pneumocytes which cause severe respiratory distress characterized by bronchiolitis, bronchitis and pneumonia (Peele et al. 2020)

SARS-CoV-2 like SARS and MERS belongs to the Coronaviridae family. The presence of spikes on the outer surface of CoVs has a resemblance to the crown and hence the name corona (in Latin corona means crown) (Pandey et al. 2020). SARS-CoV-2 is an enveloped, positive-sense, single-stranded RNA beta-coronavirus with 26–32 thousand base pairs (de Wilde et al. 2018). Its genome encodes both structural like E (envelop) protein, N (nucleocapsid) protein, M (membrane) protein, S (spike) protein and non-structural proteins (nsp) like proteases (Mousavizadeh and Ghasemi 2020). Out of all the four structural proteins, the S protein plays a prominent role in the binding of the SARS-CoV-2 to the host target receptors, especially through the ACE-2 (angiotensin-converting enzyme 2) receptor (Wrapp et al. 2020; Zhou et al. 2020). Cleavage of S protein by furin or other proteases has been reported in many of the avian and mammalian CoVs. The cleavage of S results in S1 which bears the receptor-binding domain (RBD) and S2 responsible for fusion (Wrapp et al. 2020).

Epitope-based vaccine approach has been employed to combat several diseases like a solid tumours (Knutson et al. 2001), malaria (Wang et al. 2004) and multiple sclerosis (Bourdette et al. 2005). Recently, many authors have reported epitope-based vaccine candidates for the mitigation of Covid-19 (Oany et al. 2014; Abdelmageed et al. 2020; Enayatkhani et al. 2020; Mukherjee et al. 2020; Naz et al. 2020; Abraham et al. 2020). Yang et al. have combined immunoinformatic methods with the deep learning algorithm to directly predict potential vaccine subunits from the S protein sequence of SARS-CoV-2 and constructed a multi-epitope vaccine for the SARS-CoV-2 virus (Yang et al. 2020b). Similarly, Singh et al. have designed multi-epitope vaccine peptides by targeting four different structural proteins (Spike, Envelop, Membrane and Nucleocapsid) of SARS-CoV-2 (Singh et al. 2020). Lin et al. have also employed several immunoinformatic tools to design a multi-epitope vaccine by targeting three different structural proteins of SARS-CoV-2 (Lin et al. 2020).

Employing modern computational tools, several compounds inhibiting different targets of SARS-CoV-2 have also been reported. Singh et al. have reported some 1,2,3-triazole scaffolds as potential main-protease (Mpro) inhibitors by using steered MD simulation studies (Singh et al. 2022). Employing in-silico studies, dicaffeoylquinic acid has also been reported as a potential S-protein and human ACE2 fusion inhibitor (Singh et al. 2021). Although several compounds have been identified and biologically evaluated for potential anti-SARS-CoV-2 activity, still it will take a long time for a new anti-COVID drug to come into the market. Computational studies help to reduce the initial timeline of drug discovery i.e., during discovery and pre-clinical stages but the clinical trials are still a lengthy stage. But in a short span of 2–3 years, several anti-COVID vaccines have been developed using various platforms and they have proved to be quite successful in containing the spread and severity associated with SARS-CoV-2. Moreover, the lesson learnt to develop a vaccine against this strain of the virus might come in handy in developing vaccines for different strains or different viruses in future.

In a short span of 2 years from the onset of Covid-19, several vaccines have been developed and used worldwide. Till now, ten vaccines have been approved for emergency usage by WHO (world health organization). For the first time novel vaccine platforms like RNA-based vaccines have been developed against a virus. A list of all the approved vaccines, their manufacturer and platform employed has been summarised in Table 1.

Table 1 Vaccines that have been approved by WHO for emergency use

In the present study, we have targeted the S protein of SARS-CoV-2 along with all the seven known strains of human coronaviruses (HCoVs) i.e., both alpha (HCoV-NL63 and HCoV-229E) and beta coronaviruses (HCoV-OC43, HCoV-HKU1, MERS, SARS-CoV and SARS-CoV-2). As S protein is a common structural protein among the HCoVs, hence, we have taken up the S protein sequence to screen for potential epitopes. The objective of this study was to develop a broad-spectrum vaccine candidate capable of conferring immunity against all the seven known strains of HCoVs. Our hypothesis of identifying an epitope capable of providing immunity against various CoVs is also supported by the recent work of Mateus et al., where they have identified cross-reactive T-cell epitopes of SARS-CoV-2 in unexposed humans. They have concluded that pre-existing immunity could be derived from exposure to other HCoVs like HCoV-OC43, HCoV-229E, HCoV-NL63, or HCoV-HKU1, which cause the common cold (Mateus et al. 2020). Yang et al. have reported that a vaccine candidate targeted towards the RBD of the S protein of SARS-CoV-2 led to the induction of protective immunity (Yang et al. 2020a). A flowchart summarizing the methodology and tools employed for the current work has been depicted in Fig. S2.

Materials and methods

Computational configuration of the in-silico studies

All the immunoinformatic exercises were carried out using free web-based tools. For cheminformatics experiments like molecular docking and molecular dynamics simulations were carried out by using Maestro (Release 2018–3) which is a commercial small-molecule drug discovery suite from Schrodinger Inc. (USA) on an HP computer with Linux Ubuntu 18.04.1 LTS operating system.

Retrieval of S protein sequences

For the present study, we have taken the sequences of S protein from seven strains of coronaviruses (CoVs): (i) NL63 (ii) 229E (iii) OC43 (iv) HKU1 (v) MERS (vi) SARS-CoV and (vii) SARS-CoV-2. The sequences were retrieved in the FASTA format from the UniProtKB database (Apweiler et al.).

Prediction of T-cell epitopes

Herein, we have predicted 9-mer CTL epitopes for each of the HCoV strains employing the NetMHCpan4.1 web server which is also recommended by IEDB (Immune Epitope Database) (Reynisson et al. 2020). The NetMHCpan-4.1 server employs artificial neural networks (ANNs) to predict the binding of epitopes to any MHC (major histocompatibility complex) molecule of known sequence. Its prediction method has been developed using a combined training set of > 850,000 quantitative BA (Binding Affinity) and Mass-Spectrometry EL (Eluted Ligands) peptides (Reynisson et al. 2020). In the present study, the HLA-A*24:02 supertype of the human leukocyte antigen (HLA) allele has been employed for all the predictions. The rationale behind selecting this supertype allele was the availability of a high-resolution X-ray crystal structure (PDB ID 3I6L) of HLA-A*24:02 in a complex with an epitope N1 derived from SARS-CoV N protein. The other reason behind the selection of this supertype was that according to Lu et al., one of the most common HLA-A alleles found in the Asian population is HLA-A*24:02 (Lu et al. 2005).

The best two 9-mer peptides from each HCoV strain were selected based on their binding affinity (nM) and percentile rank. Thus, a total of 14 CTL epitopes were selected for further study. But we wanted to refine our selection of CTL epitopes further and hence these epitopes were put for epitope conservancy analysis by employing the IEDB tool for the epitope conservancy analysis (Bui et al. 2007). For the present work, linear epitope conservancy analysis was done with a sequence identity threshold of 100%. The epitopes which were conserved in all the strains (reflected by their minimum sequence identity) were taken up for further analysis. The higher the value of minimum percentage identity, the higher the chances of those epitopes being conserved among all the strains. Further, these 14 epitopes were put for molecular docking studies using peptide docking under the Biologics suite of Maestro (Release 2018–3). The X-ray crystal structure of HLA-A*24:02 protein (PDB ID 3I6L) was retrieved from the protein data bank (PDB) (Berman et al. 2000; Liu et al. 2010). The retrieved protein structure was prepared using the protein preparation wizard of Maestro and the optimised protein structure was energetically minimized by employing the OPLS3e force field (Sastry et al. 2013; Kumar et al. 2019). The epitope sequences were imported into the workspace and using peptide builder peptides were generated. The generated peptides were energetically minimized and docked against the receptor grid generated at the binding site of epitope N1 derived from SARS-CoV N protein on the HLA-A*24:02 protein. The number of poses to return for each docking run was fixed at 10 as peptides are very flexible hence, several docking runs were used for each peptide. Each docking run employed a different conformation of the epitope to enhance the sampling (Tubert-Brohman et al. 2013). The MM-GBSA (molecular mechanics generalized Born model and solvent accessibility) was used as the scoring method which is more accurate than the Glide score method but less fast (Li et al. 2011).

The epitopes were ranked based on the docking score and the highest-ranked CTL epitope was further put for molecular dynamics (MD) simulations analysis by using the Desmond module of the Maestro suite (Bowers et al. 2006). The docked complex of the selected epitopes with the HLA-A*24:02 protein (PDB ID 3I6L) was first put for the system builder where the Na+ and Cl were added to neutralize the charges. A total of 9 Na+ ions were added. SPC (simple point charge) was used as the solvent system in an orthorhombic boundary box shape. The solvated system of epitope and the HLA-A*24:02 protein was put for minimization by using the SD (steepest descent) method. The convergence threshold was fixed at 1 kcal/mol/Å with a maximum iteration of 2000. A slow relaxation protocol was followed for equilibration at a temperature of 300 K and pressure of 1.01325 bar. Nose–Hoover thermostat was used to maintain the temperature while Martina–Tobias–Klein was the barostat employed. The minimized protein-epitope complex was simulated for 50 ns to analyse the stability of the epitope-protein complex under biological conditions. Based on the above methodologies i.e., percentile rank, binding affinity, minimum percentage conservancy and docking score, a total of ten CTL epitopes were selected for the design of vaccine candidate.

Prediction of linear B-cell epitope and helper T-cell (HTL) epitopes

B-cell epitopes are a specific part of the antigen recognized by the B lymphocytes of our immune system. For the present work, the BepiPred-2.0 webserver has been employed to predict linear B-cell epitopes of 14 sequence lengths for all seven HCoV strains. BepiPred-2.0 is superior to other sequence-based linear epitope prediction tools and employs RF (random forest) algorithm trained on peptides annotated from antigen–antibody protein complexes (Jespersen et al. 2017). HTL epitopes of 15 sequence lengths were predicted using the NetMHCII 2.2 Server for all seven strains (Reynisson et al. 2020). The NetMHCII 2.2 server employs artificial neuron networks to predict the binding of epitopes to HLA-DQ, HLA-DP, and HLA-DR alleles (Nielsen and Lund 2009).

Multi-epitope vaccine candidate construction and its conservancy analysis

The multi-epitope vaccine candidate was designed by linking different epitopes and adjuvants. CTLs epitopes were linked through an AAY linker while HTL and B-cell epitope was linked using a GPGPG linker. At the N-terminal, a 50 S ribosomal protein L7/L12 (Locus RL7_MYCTU) was added as an adjuvant (Accession no. P9WHE3) through an EAAAK linker. The adjuvant sequence was retrieved from the UniProt database (Bateman et al. 2017). At the carboxy-terminal end, a 6x-His tail was also added to generate a final vaccine construct. The final designed vaccine peptide was put for conservancy analysis without the adjuvant across the seven strains of the selected S-protein sequences of the HCoVs (Bui et al. 2007).

Prediction of IFN-gamma-inducing epitope

Cytokines like Interferon-gamma (IFN-γ) play a crucial role in the immune responses by stimulating NK (natural killer) cells and macrophages and providing an enhanced response to MHC antigen. Herein we have employed the IFNepitope server to predict 15-mer IFN-γ epitopes for the designed vaccine candidate. IFN- γ epitopes were predicted separately for the adjuvant and the main vaccine peptide due to the limitation on the number of sequences that can be used as input in the server. The server employs a support vector machine (SVM) hybrid approach for the prediction of IFN-γ epitopes (Dhanda et al. 2013).

Antigenicity, allergenicity and assessment of physicochemical properties of the vaccine construct

Antigenicity prediction for the vaccine construct was carried out using ANTIGENpro (Magnan et al. 2010) and the VaxiJen2.0 server. Like ANTIGENpro the antigenicity prediction by VaxiJen v2.0 is alignment-free and employs the physiochemical parameters of the protein (Doytchinova and Flower 2007). Allergenicity of the designed vaccine construct was predicted using AllerCatPro and AllergenFP servers. AllerCatPro predicts the allergenicity of the query proteins by comparing their 3D structure along with their amino acid sequence with a dataset of 4180 unique protein sequences which have been reported to be allergenic (Maurer-Stroh et al. 2019). To predict the allergenicity of proteins, AllergenFP employs a descriptor-based fingerprint approach and thus is an alignment-free tool (Dimitrov et al. 2014). The physiochemical parameters like theoretical pI, in vitro and in vivo half-life, amino acid composition, instability index, molecular weight, aliphatic index and GRAVY (grand average of hydropathicity) of the vaccine construct were analysed by the ProtParam web server (Gasteiger et al. 2005). The solubility of the vaccine construct was analysed by the Protein-Sol web server (Hebditch et al. 2017).

Secondary and tertiary structure prediction of the vaccine construct

The secondary structure of the designed vaccine construct was predicted by employing PSIPRED and RaptorX Property web servers. PSIPRED 4.0 performed Position-specific iterated BLAST (Psi-Blast), for the identification of sequences that showed considerable homology to the vaccine peptide (McGuffin et al. 2000). These sequences were selected to build a position-specific scoring matrix. The RaptorX Property web server employs a template-free approach to predict the secondary structure properties of the query protein. Its algorithm is based on DeepCNF (Deep Convolutional Neural Fields) to simultaneously predict secondary structure (SS), disorder regions (DISO) and solvent accessibility (ACC) (Wang et al. 2016). The tertiary structure of the designed multi-epitope vaccine candidate was predicted by the I-TASSER server webserver (Yang and Zhang 2015). The I-TASSER (Iterative Threading Assembly Refinement) server is an integrated platform to predict 3D structure from a peptide sequence. It generates 3D atomic models from multiple threading alignments and iterative structural assembly simulations and has been recognized as the best server for the prediction of protein structure (Roy et al. 2010). The best 3D-model for the vaccine construct obtained from the I-TASSER webserver was refined first by the ModRefiner and then by using the GalaxyRefine server. The refinement of protein structures by the ModRefiner server is based on a two-step, atomic-level energy minimization which leads to improvements in both local and global structures (Xu and Zhang 2011). GalaxyRefine method first rebuilds side chains and then carries out side-chain repacking and finally employs MD simulation for overall structural relaxation (Heo et al. 2013). Finally, the refined 3D model of the vaccine construct was validated using ProSA-web (Wiederstein and Sippl 2007) and ERRAT server (Colovos and Yeates 1993). Ramachandran plot was obtained from the protein structure quality tool of the Prime module in Maestro11.4.

Protein–protein docking studies

Molecular docking was carried out to understand the binding of the vaccine candidate with the TLR-3 receptor as the immune response depends upon the interaction between an antigenic molecule (here the designed vaccine candidate) and an immune receptor (in this case TLR-3). The first step in this milieu was the prediction of the binding sites of the proteins which was carried out using the CASTp server (Binkowski et al. 2003). Molecular docking of the vaccine candidate with the TLR3 (PDB ID: 1ZIW) (Choe et al. 2005) receptor was carried out using the HADDOCK 2.4 webserver (Van Zundert et al. 2016) and GRAMM-X Simulation webserver (Tovchigrechko and Vakser 2006). The interactions were visualized using LIGPLOT v.4.5.3 (Wallace et al. 1995). Finally, the binding energy of the top-ranked docking pose of the vaccine candidate-TLR4 complex was predicted using the PRODIGY webserver (Xue et al. 2016).

Prediction of discontinuous B-cell epitopes

It has been reported that more than 90% of B-cell epitopes are discontinuous, i.e. they comprise distantly separated segments in the pathogen protein sequence which are brought close by the folding of the protein (Barlow et al. 1986). Herein, we employed ElliPro for the prediction of discontinuous B-cell epitopes of the validated 3D structure of the vaccine candidate (Ponomarenko et al. 2008).

In-silico cloning and codon optimization of the vaccine construct

Java Codon Adaptation Tool (JCat) server was employed for reverse translation and codon optimization so that the designed multi-epitope vaccine candidate could be expressed in a selected expression vector (Grote et al. 2005). Codon optimization was carried out to express the final vaccine candidate in the E. coli (strain K12) host. The output of the JCat server comprises the codon adaptation index (CAI) and percentage GC content. CAI score reflects codon usage biases and ideally, the value of CAI should be 1.0 but a score above 0.8 is a good score (Grote et al. 2005). The gene sequences of the designed vaccine candidate were optimized, and two restriction sites (Nde I and Xho) were introduced at the C and N-terminals of the sequence. In the last step, the optimized gene sequence of the vaccine construct along with the inserted restriction sites was inserted into the pET-28a ( +) vector using the SnapGene programme.

Immune simulation

In-silico immune simulations were carried out for further characterization of the immunogenicity and immune response profile of the designed multi-epitope vaccine candidate. Herein, we employed the C-ImmSim server for immune simulation studies which is an agent-based model that employs a position-specific scoring matrix (PSSM) (Rapin et al. 2010). Three injections were given at four weeks intervals by keeping all simulation parameters at the default setting with time steps set at 1, 84, and 168 (each time step is 8 h and time step 1 is injection at time = 0).

Results and discussion

Prediction of T-cell epitopes

The goal of T-cell epitope prediction is to identify peptides with the shortest sequence length within an antigen that is capable of stimulating either CD4 or CD8 T-cells (Ahmed and Maeurer 2009). Antigen-presenting cells (APCs) present T-cell epitopes on the surface where they bind to either MHC class I protein or MHC-II proteins. It has been reported that MHC-I protein presents T-cell epitopes of 8–11 amino acid residue length while MHC-II presents longer peptides in the range of 13–17 amino acid residues (Steers et al. 2014). T-cell epitopes bound to MHC I protein are recognized by CD8 T-cells which become CTL (cytotoxic T lymphocytes), while those presented by MHC-II are recognized by CD4 T-cells and become helper T-cells (Sanchez-Trincado et al. 2017). Many authors have reported these tools for CTL epitope prediction not only against SARS-CoV-2 but also against SARS-CoV (Oany et al. 2014; Enayatkhani et al. 2020; Naz et al. 2020; Abraham et al. 2020; Panda et al. 2020). But this is the first such analysis involving seven strains of the HCoVs. Our present hypothesis to screen the S protein sequences of all the seven reported HCoVs for identifying CTL epitopes as a potential vaccine candidate against SARS-CoV-2 is also supported by the recent findings of Barun et al., who have demonstrated the presence of S-cross-reactive T cells in unexposed healthy individuals probably because of previous exposure to endemic coronaviruses like 229E or OC43 (Braun et al. 2020).

A total of 142 CTL (9-mer) epitopes were predicted to be strong binders for the seven selected S proteins of different HCoVs using the NetMHCpan4.1 server with default settings. We selected the best two epitopes based on their percentile rank, binding affinity (nM) and binding level for each of the HCoV strains. Thus, a total of fourteen 9-mer CTL epitopes were selected for further study. The conservancy analysis of these fourteen CTL epitopes showed that the QYIKWPWYI epitope from the SARS-CoV-2 strain had a minimum conservancy of 66.67% across all the strains. Another epitope with sequence YYNKWPWYI from the MERS strain showed a minimum conservancy of 55.56% across all the strains. Hence, these two CTL epitopes were selected for the final vaccine design. Peptide docking analysis of these epitopes with HLA-A*24:02 protein showed that these peptides showed favourable binding with the protein. But we selected only one CTL epitope (NYNYLYRLF) which showed the highest docking score of -12.759 kcal/mol. MD simulations study for this epitope showed that the epitope-protein complex was stable throughout the simulations period as reflected by their RMSD (root mean square deviation) plot in Fig. 1. A histogram representing the interaction of the epitopes with the amino acid residues of the protein was also plotted as shown in Fig. 2. The non-bonding interactions were similar to the ones observed during peptide docking studies, but water bridges were formed with LYS66, HIS70, ASP74, TYR84, THR143 and GLN155 residues. A 2D interaction diagram (Fig. 3) suggested that apart from intermolecular H-bond interactions, there were quite several intramolecular H-bond interactions that could make the epitope-protein complex more stable. The importance of intramolecular H-bonding for the epitopes binding to HLA-A*24:02 protein has also been highlighted by Liu et al. (Liu et al. 2010). Several other authors like Abraham et al. and Enayatkhani et al. have also employed MD simulation studies to analyse the stability of the designed epitopes and protein complex (Enayatkhani et al. 2020; Abraham et al. 2020). Hence, for the final vaccine design, a total of ten CTL epitopes (Table 2) were selected out of which seven were the best-ranked epitope from each strain, and two were the best-conserved epitope across all the strains and one was the best-docked epitope with HLA-A*24:02 protein.

Fig. 1
figure 1

RMSD plot of CTL-epitope 2 in complex with HLA-A*24:02 protein, where blue colour shows protein RMSD while red colour shows ligand RMSD

Fig. 2
figure 2

Histogram of the CTL-epitope contact with HLA-A*24:02 protein where green colour shows hydrogen bonding, grey colour shows hydrophobic interactions, blue colour shows water bridges, and pink colour shows ionic interactions

Fig. 3
figure 3

A schematic of detailed ligand atom interaction of CTL-epitope with HLA-A*24:02 protein (PDB ID 3I6L) in 2D where green colour shows hydrophobic interactions, blue colour shows polar interactions, pink arrows H-bond. Note: Interactions that occur more than 30.0% of the simulation time in the selected trajectory (0.00 through 50.02 ns), are shown. The percentage indicates the percent of simulation time for which contact was observed between the epitope and the receptor

Table 2 List of the selected CTL, HTL and linear B-cell epitopes from S protein sequences of different HCoV strains for constructing the multi-epitope vaccine candidate

Prediction of linear B-cell epitope and Helper T-cell (HTL) epitopes

A total of seven 14-mer B-cell epitopes (Table 2) were selected based on the epitope threshold and relative surface accessibility as predicted by the BepiPred2.0 server. A total of seven 15-mer high-binding MHC-II epitopes (Table 2) for human alleles HLA-DR, HLA-DQ and HLA-DP as predicted by the NetMHCII 2.2 webserver was selected. Some of the B-cell epitopes and HTL epitopes overlapped.

Multi-epitope vaccine candidate construction and its conservancy analysis

The final multi-epitope vaccine constructs comprised 10 CTL epitopes, 7 linear B-cell epitopes, and 7 HTL epitopes. The predicted CTL epitopes were joined by using an AAY linker while B-cell epitope and TL epitopes were fused with GPGPG linkers. The 50S ribosomal L7/L12 (Locus RL7_MYCTU) was selected as the adjuvant to accentuate antigen-specific immune responses and fused at the N-terminal by using the EAAAK linker. Finally, to help in protein identification and purification a 6xHis tail was inserted at the carboxy-terminal end of the vaccine peptide. Thus, the final multi-epitope vaccine candidate was constructed with a total of 532 amino acid residues derived from 24 merged multi-epitopes. The final designed vaccine peptide was put for conservancy analysis without the adjuvant across the seven strains of the selected S-protein sequences of the HCoV strains. The sequences of the vaccine candidate showed a minimum and maximum conservancy of 73.97% across all the strains of the HCoVs. A higher degree of conservation across all the HCoV strains might confer broader protection across multiple strains by recognition of the conserved epitopes by our immune system. Hence, we hypothesize that this vaccine construct which has more than 73% of sequences conserved across the seven HCoV strains might be a potential broad-spectrum vaccine candidate against HCoVs.

Prediction of IFN-gamma-inducing epitope

Due to the restriction on the number of residues that can be used as input in the IFNepitope server, IFN-γ inducing epitopes were predicted separately for the adjuvant and main vaccine peptide. For the adjuvant, a total of 127 potential IFN-γ inducing epitopes (15-mer) were predicted. For the main vaccine peptide, 389 potential 15-mer epitopes were predicted out of which 82 had a positive prediction score.

Antigenicity, allergenicity and assessment of physicochemical properties of the vaccine construct

As per the predictions of the VaxiJen 2.0 server, the antigenicity of the vaccine constructs along with the adjuvant sequence was found to be 0.7106 in a bacteria model at a threshold of 0.5 while the main vaccine sequence without the adjuvant showed a score of 0.8135. As per the predictions of the ANTIGENpro server the probable antigenicity of the vaccine candidate with the adjuvant was 0.829721 while without the adjuvant was 0.869507. This result indicates that the designed vaccine candidates are antigenic (with or without adjuvant) in nature. The main vaccine sequence seems to be more antigenic without the adjuvants as per the predicted scores. The allergenicity of the vaccine candidate was predicted by the AllerCatPro and AllergenFP servers and both predicted it to be non-allergenic. The molecular weight (MW) of the final vaccine candidate was predicted to be 56.5 kDa. The theoretical isoelectric point (pI) value was predicted to be 7.11. The half-life was predicted to be 30 h in mammalian reticulocytes (in vitro), > 10 h in E. coli and > 20 h in yeast (in vivo). The instability index (II) was computed to be 22.81 which classifies the vaccine candidate to be stable. It had an aliphatic index of 79.29 which suggests it to be thermostable (Ikai 1980). The GRAVY score for the vaccine candidate was computed to be − 0.407 which indicates that it could be hydrophilic (Ali et al. 2017). The predicted scaled solubility value was computed to be 0.432 which indicates that the vaccine construct would have an acceptable solubility profile.

Secondary and tertiary structure prediction of the vaccine construct

The secondary structure of the final multi-epitope vaccine peptide was computed, and it was predicted to have 20% alpha-helix, 20% beta-sheet and 59% coil (Fig. 4). The solvent accessibility of amino acid residues as predicted by the RaptorX property server showed that 49% of the total residues were exposed, 24% were medium exposed and 26% was buried. A total of 42 residues i.e., 7% of residues were computed to be in the disordered region. Five tertiary structure models of the final vaccine construct were predicted by the I-TASSER server based on 10 threading templates, of which PDB hits 1dd4A, 5tsjN, 1rquA, 2nbiA, 1dd3A, 2ocwA and 2ftc were the best hits. All of the 10 selected templates showed good alignment which was evident from their Z-score values which ranged from 1.02 to 5.35. The first model (Fig. 5a) was selected for further refining which had an estimated C-score of 0.77, TM score of 0.62 ± 0.14 and RMSD value of 9.2 ± 4.6 Å. The C-score range is between − 5 and 2, where higher values towards 2 indicate higher confidence. The TM-score is an indication of the structural similarity between two structures and unlike RMSD, it is not sensitive to local error (Zhang and Skolnick 2004). From the TM value of the selected model, we can infer that the model has the correct topology. This selected model was further put for refining first by ModRefiner and then by GalaxyRefine server which generated five models (Table 3). According to the model quality scores (Table 2) for all the five refined models, model 3 (Fig. 5b) was found to be the best model. It had a GDT-HA score of 0.9408, RMSD value of 0.443, MolProbity score of 2.072, clash score of 11.3 and a poor rotamers score was 0.2. The Ramachandran plot score for model 3 was predicted to be 91.5% which was less when it was put for validation by the Ramachandran plot analysis. The Ramachandran plot of the selected model 3 (Fig. 5c) suggests that it's 94.528% of the residues are in the favourable region with 2.64% of residues in the allowed region. The overall quality factor was computed by the ERRAT server and for the selected model 3, it was predicted to be 86.235. The ProSA-web computed the Z-score for model 3 and it was found to be − 2.41 (Fig. 5d).

Fig. 4
figure 4

Representation of the secondary structure of the vaccine sequence (showing strand, helix and coil) as predicted by the PSIPRED

Fig. 5
figure 5

Tertiary structure modelling, refining and validation. a The final homology 3D model of the vaccine construct as computed by the I-TASSER. b Refinement of the selected I-TASSER 3D model by GalaxyRefine where the refined 3D model (coloured) has been superimposed on the initial crude model (grey). c Ramachandran plot of the refined 3D model of the vaccine candidate d Plot by ProSA-web showing the Z-score

Table 3 Detailed result of the five models refined by the GalaxyRefine server

Protein–protein docking studies

The CASTp 3.0 server was employed to compute the protein binding and hydrophobic interaction sites on the surface of the TLR-3 receptor and the final refined 3D model of the vaccine construct. A total of 105 binding pockets were predicted for the vaccine candidate with different molecular surface areas and volumes. Similarly, for the TLR-3 protein (PDB ID 1ZIW), a total of 77 binding pockets were predicted. For the present work, a pocket (Fig. 6a) with a molecular surface area of 2125.000 Å2 and volume of 13,438.169 Å3 was selected for the TLR-3 receptor. For the vaccine candidate, the selected pocket (Fig. 6b) had a molecular surface area of 917.628 Å2 and a volume of 2414.204 Å3. Both the proteins were put for molecular docking simulations first using the GRAMM-X server which led to the generation of ten binding poses. The poses were visualised using the DIMPLOT of LIGPLUS version 4.5.3 and the best-docked pose of the vaccine-TLR-3 complex in 3D has been depicted in Fig. 7a and b. The non-bonding interactions between the docked protein complex were also visualized in 2D and have been shown in Fig. 8. It was observed that ARG306, TYR526, ANS521 and GLY46 residues of the vaccine construct showed hydrogen bonding with TRP273, ANS247, LEU243 and HIS432 residues of the TLR-3 protein respectively. The 2D interaction plot also showed a possible salt bridge between GLU307 and GLY46 of the vaccine construct with the LYS272 and SER481 of the TLR-3 protein. Further, the best-docked pose was put for the prediction of binding affinity (ΔG) and dissociation constant (Kd). The binding energy (ΔG) for the vaccine-TLR-3 complex was computed to be − 20.8 kcal/mol with a Kd value of 5.9 × 1016 M at 25.0 ℃. The Prodigy server also calculated the number of interfacial contacts (ICs) per property within the threshold distance of 5.5 Å and its ICs charged-charged, ICs polar-polar, and ICs apolar-apolar were computed to be 14, 7 and 64 respectively. It suggests that most of the interfacial contacts were hydrophobic.

Fig. 6
figure 6

The selected binding pocket of the proteins as predicted by the CASTp3.0 server. a The predicted binding pocket of the TLR-3 protein. b The predicted binding pocket of the vaccine candidate

Fig. 7
figure 7

Visualization of the best-docked poses for the vaccine-TLR-3 receptor complex. a 3D-interaction diagram of the protein–protein complex generated by the GRAMM-X server where blue mesh structure represents the vaccine candidate while coloured ribbon shows the TLR-3 receptor. b 3D-interaction diagram of the protein–protein complex generated by the HADDOCK2.4 server where the blue ribbon represents the vaccine candidate while the orange ribbon shows the TLR-3 receptor

Fig. 8
figure 8

2D-interaction diagram of the protein–protein complex generated by the GRAMM-X server where chain A is from the TLR-3 receptor and chain B is the vaccine construct

The HADDOCK2.4 server clustered 103 structures in 14 clusters, which represent 51% of the water-refined model generated by the HADDOCK. The clusters were ranked on several parameters and out of the 14 clusters, cluster 3 (cluster size 10) was reported to be the best cluster. It had a HADDOCK score of − 79.4 ± 7.1, RMSD from the overall lowest-energy structure was 43.8 ± 0.4, Van der Waals energy − 100.6 ± 15.7 kcal/mol, Electrostatic energy − 415.7 ± 78.8 kcal/mol, Desolvation energy − 27.2 ± 4.0 kcal/mol and Z-Score − 1.5. The detailed result for all the clusters has been depicted in Figs. 9 and 10. The best pose from cluster 3 was also put for the prediction of binding affinity (ΔG) and dissociation constant (Kd). The binding energy (ΔG) was computed to be − 20.3 kcal/mol with a Kd value of 1.3 × 1015 M at 25.0 ℃ which was similar to that of the best docked pose generated by the GRAMM-X server. The ICs charged-charged, ICs polar-polar, and ICs apolar-apolar were computed to be 16, 7 and 25 respectively.

Fig. 9
figure 9

Bokeh plot showing the result of various parameters for all the clusters by the HADDCOK2.4 server. a Plot between HADDOCK score and FCC (fraction of common contacts). b Plot depicting HADDCOK score vs interface-RMSD

Fig. 10
figure 10

Bokeh plot depicting the analysis of various energy parameters for all the clusters as predicted by the HADDOCK2.4 server. a Van der Waals energy b Electrostatic energy c Restraints energy

Prediction of discontinuous B-cell epitopes

The ElliPro server predicted a total of 7 discontinuous B-cell epitopes of the validated 3D structure of the vaccine candidate comprising 270 residues. The conformational B-cell epitopes had scores ranging from 0.598 to 0.875 where the best epitope had only three residues while the last ranked epitope comprised of 8 residues. This study suggests that the designed multi-epitope vaccine might lead to heightened immune response through the conformational B-cell epitopes also.

In-silico cloning and codon optimization of the vaccine construct

Immunoreactivity through serological assays is needed to validate a designed vaccine candidate and the first step for it is the expression of the multi-epitope vaccine candidate in a suitable host (Gori et al. 2013). Escherichia coli expression systems have been widely used to produce recombinant proteins and hence to achieve maximal protein expression, we selected the E. coli (strain K12) system for the codon optimization by the Java Codon Adaptation Tool (JCat) (Chen 2012; Rosano and Ceccarelli 2014). The optimized codon sequence for the final vaccine candidate was 1596 with a CAI value of 0.9237 and 53.57% of GC content. Ideally, the GC content should be in the range of 30% to 70% and hence, our vaccine candidate might show high-level expression in the bacterial host system. Finally, the optimized gene sequences of the designed vaccine candidate with two restriction sites (Nde I and Xho) at the C and N-terminals of the sequence were inserted into the pET-28a ( +) vector using SnapGene software to generate the sequences of the recombinant plasmid (6892 bp) as shown in the supplementary file (Fig. S1).

Immune simulation

The immune stimulation by the C-ImmSim server showed results that are consistent with actual immune responses. The primary response after the first dose of the vaccine was characterized by increased levels of IgM as shown in Fig. 11a. After the second and third doses of the vaccine, there was a remarkable increase in the levels of IgG1, IgG1 + IgG2, IgM, and IgG + IgM antibodies (Fig. 11a). The level of IgM + IgG peaked after the third dose to around 2,30,000 and IL-2 level (after 2nd dose) was around 9,00,000 which suggests a strong humoral immune response which is better than reported by Abraham et al. (around 2,00,000 for IgM + IgG after 3rd dose and IL-2 level was around 7,50,000 ng/ml after 2nd dose) for multi-epitope vaccine design against SARS-CoV-2 (Abraham et al. 2020). Similarly, when compared to Shey et al. for multi-epitope vaccine candidates (the level of IgM + IgG was around 6,50,000) against onchocerciasis our designed vaccine-elicited stronger immune response (Shey et al. 2019). It was also observed that after the second exposure the population of the antigen was also falling which indicates increased clearance by the immune system. B-cell population (Fig. 11b) and especially the B-memory cell population (> 800 cells per mm3) also increased and peaked after the third dose. Similarly, a greater response was observed in the cytotoxic T-cells (Fig. 11c) and helper-T cell populations (Fig. 11d) with corresponding memory development which lasted for many months. It was also observed that the levels of IFN-γ and IL-2 (Fig. 11e) increased significantly after the first dose of the vaccine and maintained their peak levels after multiple exposures to the antigen. The Simpson index, D which indicates a measure of diversity, predicted this vaccine candidate to have a diverse immune response. This diverse immune response might be possible as the vaccine comprises different epitopes.

Fig. 11
figure 11

Results of immune simulation by the C-ImmSim server. a Immunoglobulin and immunocomplexes production in response to antigen injections at different time intervals (black vertical lines); specific subclasses have been shown as coloured peaks. b The increase in the populations of B-cell post three doses of the vaccine. c The evolution of T-cytotoxic cell population per state post three vaccine doses. d The evolution of T-helper cell population per state post three vaccine doses. e The main plot depicts the levels of cytokine levels post three vaccine doses. The insert plot depicts the IL-2 level with the Simpson index, D which is indicated by the green dotted line

Conclusions

In the present study, several machine learning-based in-silico tools were used to design a potential broad-spectrum multi-epitope vaccine candidate against spike protein of human coronaviruses. To the best of our knowledge, it is one of the first studies, where multiple B-cell epitopes and T-cell epitopes (CTL and HTL) were predicted from the spike protein sequences of all the seven known human coronaviruses. The final vaccine candidate was found to have a minimum conservancy of 73.97% across all the seven HCoV strains and it might protect from SARS-CoV-2 as well as other HCoVs. The designed vaccine is predicted to be antigenic and non-allergenic. Its physicochemical properties are also in the acceptable range. Molecular docking of the refined tertiary structure of the vaccine candidate with the TLR-3 protein also indicated its favourable binding with the TLR-3 receptor. In-silico cloning study indicated that the designed vaccine candidate might show high-level expression in the bacterial host system. The immune simulation analysis suggests that the vaccine might elicit a strong immune response and when compared to other similar publications which have reported immune simulation for the designed vaccines, our designed vaccine candidate induced a higher level of immunoglobins (IgM + IgG), immunocomplexes and interleukins (IL-2). The Simpson index, D which indicates a measure of diversity, predicted this vaccine candidate to have a diverse immune response. The major limitation of this study is that it’s a computational study. MD simulation could have been done for 100 ns for more epitopes rather than one epitope that we selected based on the docking analysis. A population-wise study can also be done to understand the immune response elicited by the vaccine construct among different races. The predictions made by different in-silico tools need to be validated through various immunological assays.