Introduction

Tuberculosis remains at the forefront of infectious diseases, infecting at least 10 million people globally [1]. This disease has existed since the palaeolithic era caused by the infecting various organism, with Mycobacterium tuberculosis (Mtb) solely infecting humans [2]. Mtb has caused an estimated 1.4 million deaths in 2019, threatening the livelihoods of populations, particularly in impoverished countries [1]. Mtb can exist in two known states, i.e., latent, and active states within an individual [3, 4]. Latent Mtb is a state of dormancy in the host roughly affecting one-third of the global population, which could progress to an active form of Mtb at any point [5, 6]. Despite the copious amounts of research and the surge of the antibiotic era, Mtb has remained in the top 10 of infectious diseases [7, 8]. Despite the medicinal and prevention control, the abovementioned host–pathogen disease has developed resistance towards most treatment methods. The relationship between Mtb and COVID-19 remains unclear, although an assumption can be made on the basis that both affect the respiratory systems and are transmitted similarly. Despite the desperate attempt to curb the COVID-19 infections around the globe, the pandemic threatens to reverse progress made in the control of Mtb [9]. Vaccinations have become the most popular preventative method of infectious disease within populations of people compared to any other treatment options [10]. The research world has come a long way from the Bacillus Calmette-Guérin (BCG) vaccine, and the development of antibiotics used to inhibit Mtb. The control strategy of the BCG vaccine has since become unreliable although it has been the only prevention method since its discovery in 1921 [11]. The BCG vaccine has become inconsistent in adults that are immunosuppressed or immunocompromised, rendering them undesirable, further necessitating the development of a novel Mtb vaccine [12, 13].

Laboratory studies are time-consuming, labour-intensive, and expensive and put an unnecessary risk to patients under trial [14]. Until recently, the development of vaccines has been an extensive process, based on attenuated pathogens that are used to create attenuated vaccines [15]. Using live pathogens comes with an increased risk of toxicity, which has forced vaccine developers to consider a new avenue. A vaccine candidate that consists of T- and B-cell peptides that elicit an immune response, are considered the new age of vaccine development [15]. The basic model of a subunit vaccination relies on the recognition of the pathogen and cellular immune response that is emitted by the T-cell complex [15, 16]. B-cell epitopes are known regions of the antigen’s surface that can bind to the produced antibodies making them the centre of the adaptive immune system. A good vaccine candidate possesses the ability to produce antibodies, have memory cells, and trigger the immune system [17]. Bioinformatic methods and techniques have grown exponentially over the years and recently impacted immunological studies [16]. This method can provide knowledge on predicted vaccine candidates with limited resources. Immunoinformatic studies based on B- and T-peptide discovery have become necessary for building potential vaccines. This approach is termed ‘Reverse vaccinology’ whereby the peptides from an antigenic sequence are predicted to be used in a multi-epitope vaccine (MEV) candidate. A group of genes known as the PE_PGRS gene family, found within the Mtb genome, play a role in the pathogenic pathway of Mtb although their specific function and role are unknown [18]. Various studies have dubbed the PE_PGRS family responsible for evading the host’s defence [19]. This gene group is specific and unique to Mtb, however, still remains under investigated [20]. A study conducted by Bansal and colleagues expressed the high antigenic properties that PE_PGRS17 possesses, making it an excellent candidate for potential vaccine studies [21]. In the last decade, a tremendous number of online-based tools have become available to assist in identifying the immunogenic pathway of potential vaccine complexes that can bring new light to this age of research. Potential biomarker candidates can be screened using in silico procedures before preclinical or clinical trials, saving copious amounts of time and money in research. Thus, this study sought to investigate PE_PGRS17 as a probable vaccine candidate using an immunoinformatics approach. The ideal candidate would possess B-cell and T-cell epitopes with specific properties such as immunogenic, antigenic, non-allergen, and non-toxic. This study could be the stepping-stone towards in vitro and in vivo studies and towards a preventative method for Mtb.

Methods and materials

Selection of Mtb strain, antigens, and retrieval of protein sequences

Due to the potential of Mtb biomarker PE_PGRS family as a vaccine candidate, the protein sequence and genomic information of PE_PGRS17 (Gene Rv0978c in Mycobacterium tuberculosis H37Rv accessed on 01/08/2021) were retrieved from the MycoBrowser database (https://mycobrowser.epfl.ch/) (supplementary information) [22].

Prediction of antigenicity, allergenicity, and toxicity of the protein sequence

The properties antigenicity, allergenicity, and toxicity must be adhered to before and after the creation of the vaccine construct subunit. The biomarker sequence PE_PGRS17 was subjected to software that helps predict its antigenic, allergenic, and toxicity potential. Antigenicity was measured using ANTIGENpro (http://scratch.proteomics.ics.uci.edu/), and VaxiJen v2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html). ANTIGENpro is an algorithmic program using sequence-based methods which an SVM classifier summarises. The prediction results in a probable antigen or not with the probability. VaxiJen 2.0 server determined whether epitopes were probable antigens and those were selected for the vaccine construct. The threshold of the server was set as 0.4. The server makes use of the ACC (auto and cross-covariance) calculations of proteins to produce an output.

Allergenicity is an essential property that is important to the construction of the vaccine sequence. The servers AllerTOP v.2.0 and AllergenFP were used to measure allergenicity. The online database AllerTOP v2.0 (https://ddg-pharmfac.net/AllergenFP/) used the ACC transformation of the protein sequence, finding the nearest k value and comparing the E descriptors of amino acids. The allergen factor is also dependent on the physicochemical properties of the protein. AllergenFP (https://ddg-pharmfac.net/AllergenFP/) is also used to classify an allergen or non-allergen. The server is an integrated four-step algorithm. The properties of the protein are defined such as size and α helix and β-strand forming propensities. ACC transformation is used to transfer strings into equal-length vectors. These vectors are translated into fingerprints using the Tanimoto coefficient. These steps will give a positive for either an allergen or non-allergen which is highly accurate and suitable to rely on for vaccine construction. The toxicity of the epitopes was measured using ToxinPred server (https://webs.iiitd.edu.in/raghava/toxinpred/multi_submit.php) and all non-toxic epitopes were chosen for vaccine construction.

Prediction of B- and T-cell epitopes

B-cell prediction

Vaccines require a B-cell epitope to interact with B lymphocytes. The IEDB server (http://tools.iedb.org/bcell/) was used to analyse B-cell candidates that met the criteria of the predicted score, antigenicity, allergenicity, and toxicity. The IEBD uses prediction methods that identify surface accessibility, antigenicity, and hydrophilicity using an algorithm. Bepipred Linear Epitope Prediction 2 website used different methods of prediction to assume B cell prediction. The Emini Surface Accessibility Prediction tool was used to detect the surface accessibility with a default threshold value of 1.000 The Kolaskar and Tongaonker Antigenicity method was used to identify the antigenicity sites of the candidate epitopes with a default threshold value of 1.032 (http://tools.iedb.org/bcell/result/). The Parker Hydrophilicity Prediction tool was used to identify the hydrophilic, accessible, or mobile regions with a default threshold value of 1.695.

Prediction of cytotoxic T lymphocyte (CTL) epitope/MHC class I binding

Another aspect of a constructed vaccine subunit is the cytotoxic T lymphocyte (CTL) epitope prediction. The server NetCTL 1.2 (http://www.cbs.dtu.dk/services/NetCTL/) was used for the prediction of CTL epitopes using three categories for validation which include the binding affinity, proteasomal C terminal cleavage performed using artificial neural networks (ANN) and TAP (Transporter Associated with Antigen Processing) transport efficiency with thresholds of set 0.05, 0.15, and 0.75, respectively. The A1 supertype was used and any prediction that met the abovementioned criteria was selected. The predicted nonamers were then subjected to IEDB MHC I server for the prediction of CD8+ of epitopes.

Prediction of helper T lymphocyte (HTL) epitope/MHC class II binding

The HTL epitopes were predicted using the IEDB MHC II server (http://tools.iedb.org/mhcii/). There are many alleles available, but the search was restricted to Human/ HLA-DR and a 7-allele human leukocyte antigen (HLA) reference set [23]. This type of allele was focused to save time since it is well studied within immunology research. The length of the epitope was selected at 15-mer and was based on their percentile score. Epitopes that show a lower percentile score show a higher affinity to the MHC class II. The generated results were confirmed by the IEDB MHCI tool (http://tools.iedb.org/mhci/). The IFN-gamma epitope server (http://crdd.osdd.net/ragha va/ifnep itope /scan.php) was used to predict interferon-gamma inducing epitopes. If the value is predicted as a positive value, the epitope has met the criteria and been selected for MEV construction.

Construction of multi-epitope vaccine candidate sequence

The epitopes were sorted by the most antigenic, immunogenic, non-toxic, and non-allergenic. The epitopes that met all the criteria were selected for the final construct of the vaccine. The sequence was made up of CTLs, HTLs, and B-cell epitopes predicted above to construct a multi-epitope vaccine (MEV) sequence. The GPGPG linker was used to link B-cell and HTL epitopes whereas the AAY linkers were used for CTL epitopes. The griselimycin sequence (PubChem CID: 429,055) was used as an adjuvant to increase the immunogenicity of the vaccine which was linked to the vaccine by an EAAAK linker [24]. The sequence of the adjuvant was retrieved from the Antimicrobial Peptide Database (https://aps.unmc.edu/). The final construct subunit was also run through the software testing for antigenicity, allergenicity, and toxicity.

Physiochemical properties and solubility prediction of the MEV candidate

The Expasy Protparam database was used to predict a wide range of physiochemical properties of the MEV candidate such as GRAVY, amino acid count, theoretical isoelectric point (pI), and instability index. Protein–Sol server (https://protein-sol.manchester.ac.uk/) was used to predict the solubility of the protein. The Query solubility is compared to a population average of all experimental datasets. The query is seen to have a higher solubility value than the population average.

Secondary structure prediction of MEV candidate

Two online-based servers were used to predict aspects of the secondary structure of the MEV construct. PRISPRED (http://bioinf.cs.ucl.ac.uk/psipred/) can predict the structure along with the transmembrane topology, transmembrane helix, fold, and domain recognition etc. efficiently. RaptorX Property (http://raptorx.uchicago.edu/StructurePropertyPred/predict/) is used to predict the secondary structure of the protein using an algorithm that can define the form in which the sequence takes. The server can also assess solvent access of the protein. The I-TASSER online server (https://zhanggroup.org/I-TASSER/) provides supplementary information and confirmation of the secondary structure prediction.

Tertiary structure, refinement, and validation of the MEV candidate

The promising three-dimensional tertiary structure was modelled using the homology modelling tool, I-TASSER server (https://zhanggroup.org/I-TASSER/. I-TASSER (Iterative Threading ASSEmbly Refinement) has a stored protein data bank in which the structure and function of other proteins are used to analyse similar patterns and thus provide a prediction model. The program uses the sequence to structure to function algorithm to identify 3D atomic models using alignment methods and simulations. The models are scored by TM value, C-score, and RMSD to denote the quality of the model [25]. The GalaxyREFINE database (http://galaxy.seoklab.org/cgi-bin/report_REFINE.cgi?key=e0867efa82610f5bd5b374be756c152e) was used to refine the 3D model generated by I-TASSER. The server, which was released in 2013, enables a refinement strategy based on molecular dynamics simulation and repacking of side chains which in turn relaxes the structure [26]. This method is highly encouraged to improve the prediction quality in computational studies. Validating this structure requires additional methods in achieving these high-quality models. This step can identify any errors in the construct of the model. The quality of the model can be verified in a Ramachandran plot which was predicted using the Ramachandran Plot server (https://zlab.umassmed.edu/bu/rama/). The plot describes favoured and unfavoured regions where amino acid residues are present to deduce the quality. Another quality test involves the use of the ProSA-web server (https://prosa.services.came.sbg.ac.at/prosa.php) which estimates z scores of the model. If the Z score is not within the range, the structural model is considered to have potential errors [27].

Prediction of discontinuous B-cell epitopes

Computational analysis can be divided into either sequence-based or structural based which can determine either linear or discontinuous B-cell epitopes. An online tool called ElliPro (http://tools.iedb.org/ellipro/) is used to predict the discontinuous B-cell epitopes which in turn validates the 3D model [28]. The system employs three algorithms; it estimates the protein as an ellipsoid, calculates the PI (protrusion index) of the residues, and clusters the residues [29]. The score of each epitope is recorded as the average PI for each output epitope. The PI score of 0.9 represents 90% of the residues within the ellipsoid while 10% remains outside. The server considers each residue’s centre of mass which makes ElliPro the best approach.

Molecular docking of the MEV with the immune receptor, TLR2

For a vaccine construct to be considered for effective immune response, it is important to combine the construct with an immune receptor. TLR2 (PDB id: 5D3I) was retrieved from the Protein Databank (https://www.rcsb.org/). The online servers used for molecular docking as well as refinement of the docking were ClusPro 2.0 (https://cluspro.bu.edu/login.php), HADDOCK server (https://wenmr.science.uu.nl/prodigy/), PatchDock server (https://bioinfo3d.cs.tau.ac.il/PatchDock/), and FireDock server (https://bioinfo3d.cs.tau.ac.il/FireDock/). The HawkDock server (http://cadd.zju.edu.cn/hawkdock/) was used to confirm the docking of the TLR2 and vaccine construct once again. The server can give a predicted visualisation as well as an MM-GBSA (Molecular Mechanics/Generalized Born Surface Area) score. The score translates to a binding affinity score; the lower the score is, the better the prediction.

Molecular dynamics simulation

Molecular docking of the MEV candidate was conducted by the server iMODs (http://imods.chaconlab.org/) was used to predict the stability and physical movement of the docked receptor and ligand molecules. The simulation is a free, user-friendly server allowing fast predictions that are trustworthy [30].

Codon optimisation and in silico cloning

Expression studies are an important factor in drug-design-related studies. Codon optimisation is necessary to enhance protein expression. The Java Codon Adaption Tool (JCat) (http://www.jcat.de/) web server is used to measure the CAI (codon adaption index) and GC content of the MEV candidate for protein expression using the E. coli codon system. The ideal range for the CAI is between 0.8–1.0 and 30–70% for the GC content to be favourable for translation and transcription steps [31].

The SnapGene trail user (https://www.snapgene.com/try-snapgene/) was used to visualise the multi-epitope vaccine subunit gene sequence cloned into the E. coli plasmid pET-30a( +) vector. The restriction enzymes NotI and HindIII were used at the N and C terminals as restriction sites to insert the MEV fragment. This process is used to confirm the expression capabilities of the vaccine candidate.

Immune simulation

A simulation web server, C-IMMSIM (C language version of the IMMune system SIMulator) (https://150.146.2.1/C-IMMSIM/index.php), was used to develop immunogen profiles of the abovementioned vaccine construct for Mtb. The server can be described as a collection of various models in one software that checks the humoral and cellular response to the vaccine construct [32]. The preventive tuberculosis vaccine was administrated three times a week at simulated time intervals of 1, 84, and 168. All the other parameters were run at default besides the volume of simulation and steps of the simulation which were set at 50 and 1100 respectively. The random seed of the vaccine injection was set to 1234 and it did not contain LPS.

Results

Protein sequence

The mycoBrowser database was used to retrieve the amino acid sequence of the PE_PGRS17 protein in FASTA format. The protein is part of the Mycobacterium tuberculosis (Rv0978c) PE family which is glycine-rich proteins. The functional sequence was used in further predictions of T- and B-cells which can be used to develop a novel vaccine candidate for Mtb.

Prediction of antigenicity and allergenicity of all epitopes and vaccine candidates

The antigenic property was predicted using the ANTIGENpro and VaxiJen v2.0 web tool for the PE_PGRS17 biomarker sequence as well as the predicted epitopes from this sequence. PE_PGRS17 sequence was predicted to have an antigenic score of 1.0245. The Vaxijen server was set to a threshold of 0.4 and resulted in a 0.9778 antigenicity score whereas the ANTIGENpro tool resulted in a 0.8724 antigenicity. Two servers were used to check the allergenicity of the vaccine construct, Allertop and AllergenFp which resulted in the protein prediction as a non-allergen. All predicted epitopes were selected based on their ‘probable antigen’, no allergenicity and non-toxic prediction status before being constructed into the vaccine construct. The MEV vaccine subunit, therefore, meets all the criteria for a good antigen molecule.

B-cell prediction

Linear B-cell epitopes were predicted using the ABCpred server seen in Table 1. Only epitopes that are 15mer in length with a cut-off binding score greater than 0.9, highly antigenic, non-toxic, and non-allergenicity were chosen for vaccine construction. Sequence analysis of linear B-cells was conducted, and epitopes that met the above criteria were subjected to Bepipred linear epitope 2, Emini surface accessibility, Kolaskar & Tongaonkar antigenicity, and Parker hydrophilicity prediction methods seen in Fig. 1(a–b). All four epitopes passed three out of five predictions and were considered suitable for the potential vaccine construct.

Table 1 Prediction of linear B-cell epitopes with a binding score greater than 0.9, probable antigen, non-toxic, and non-allergenicity, are only selected for the final vaccine construct
Fig. 1
figure 1

Analysis of the sequence of PE_PGRS17 from Mtb for B-cell epitope prediction. Yellow areas above the threshold line are proposed to be a positive result for B-cell epitopes and green areas are not. a Bepipred Linear Epitope Prediction 2; b EMINI surface accessibility prediction; c Karplus and Schulz flexibility prediction; d Kolaskar and Tonganokar antigenicity prediction; e Parker Hydrophilicity prediction

Prediction of cytotoxic T lymphocyte epitope

The CTL epitopes were estimated using the NetCTL1.2 webserver set at specific thresholds. Only 6 epitopes met the criteria of binding capacity towards the MHC-I, transport efficiency, anti-allergenicity, antigenicity, and non-toxicity illustrated in Table 2. The predicted epitopes were run through the IEDB server to confirm the predicted epitopes.

Table 2 Prediction of CTL epitopes that possess a binding affinity to MHC-I A1-supertype alleles, C-terminal cleavage affinity, transport efficiency, antigenic, non-allergenic, and non-toxic to be considered for the vaccine construct

Prediction of helper T lymphocytes epitope

The IEDB MHC-II webserver was used to predict HTL epitopes specific for human alleles, HLA. A summative of nine HTL epitopes showed promise for the final construction of a vaccine. The optimisation of these chosen epitopes relied on binding affinity, antigenicity, non-allergenicity, and non-toxicity, as shown in Table 3. The prediction was also based on interferon-gamma inducing epitopes. IFN-γ has a role in the immune response of the cells and acts as a cytokine for CTL epitopes. The IFN-γ was predicted using the SVM (support vector machine) method. The epitopes that presented with a positive score for IFN-gamma were selected for vaccine assembly.

Table 3 Predicted selection of HTL epitopes that fulfilled all the criteria for antigenicity, non-allergenicity, and non-toxicity, and could also induce the IFN-γ immune response which binds to the HLA-DR group of alleles

Construction of multi-epitope subunit vaccine

The predicted B- and T-cell epitopes showed promising properties to be considered for the final vaccine construct. The adjuvant griselimycin (ID: AP02688) was used on the N and C terminal of the vaccine construct to improve the immune response of a vaccine. The adjuvant is linked to the terminals via an EAAAK linker. Linker GPGPG was used to link B-cell and HTL epitopes and AAY linker to link CTL epitopes together. The sequence of the vaccine was turned into FASTA format and subjected to various criteria prediction such as antigenicity, non-allergenicity, non-toxicity, and solubility. Once fulfilling all criteria of antigenicity, allergenicity, and toxicity, the vaccine construct was modelled and docked accordingly. A schematic presentation of the final multi-epitope vaccine peptide of the current study is depicted in Fig. 2.

Fig. 2
figure 2

A schematic representation of the constructed multi-epitope vaccine peptide. The peptide sequence was estimated to be 361 amino acids long. The N and C terminals contained grisemylin adjuvants (blue rectangular blocks) used for tuberculosis vaccine constructs. The EAAAK linker (green blocks) joins the adjuvant to the multi-epitope sequence. B cell epitopes and HTL epitopes are linked using GPGPG linkers (red arrow) while the CTL epitopes are linked with AAY linkers (light blue speech bubbles)

Prediction of physiochemical properties and amino acid content of MEV candidate

The physiochemical properties (Table 4) help deduce the protein’s environmental fate or whether it is a hazard. The molecular weight of the final protein was estimated to be 34,341 Daltons linked to a theoretical isoelectric point (pI) of 5.82. There are more negatively charged residues than positively charged residues. The instability index is recorded to be 15.23 indicating a stable protein during expression increasing the probability of its further use. The high aliphatic score of 77.06 suggests the protein can adapt to the high temperature. The GRAVY index is also predicted and found to be 0.278 indicating a hydrophilic nature of the construct. The vaccine construct is also seen to be glycine-rich seen by the number of amino acid residues (Fig. 3a). The solubility score of the vaccine construct is predicted to be 0.455 higher than the average soluble E. coli protein (Fig. 3b).

Table 4 The predicted physiochemical properties computed by Protparam online server which assist in further analysis of the MEV candidate
Fig. 3
figure 3

Predicted primary structure properties of the multi-epitope vaccine subunit. a Number of amino acid residues within the construct and b solubility of the constructed MEV candidate

Prediction of secondary structure and solubility of MEV candidate

The prediction presented as 45% β-strand and 54% coil for the secondary structure of the MEV construct. The prediction of solvent accessibility resulted in 37% amino acid residues exposed, 27% moderately exposed, and 35% were buried. One percent of the amino acid residues were reported as distorted. Figure 4a and b share that the structure is a coil and beta-sheets and possesses mostly small nonpolar residues, hydrophobic residues, a few polar residues, and aromatic residues.

Fig. 4
figure 4

Estimated secondary structure information of the MEV construct. a Prediction of the shape of the residues including what areas of the cell it may be interacting with. The residues were just shown as a strand or a coil in this prediction; b shows the nature of the residue being either small non-polar, hydrophobic, polar, or aromatic residues

Tertiary structure modelling, refinement, and validation of MEV candidate

The I-TASSER server was used to predict several three-dimensional tertiary structures with z score ranging from 1.45 to 3.29 and confidence value (C-score) from − 2.77 to − 1.88. The structure with the highest C-score value is usually more reliable. The structure related to the C-score of − 1.88 was chosen for further analysis. The TM score of the chosen model is predicted as 0.49 ± 0.15 and the root-mean-square deviation (RMSD) is calculated to be 11.0 ± 4.6 Å. To improve the prediction consistency of the model selected, the webserver GalaxyREFINE was used. The program works by enhancing loop refinement and the energy function to create good-quality models [33]. The server produces five model structures, based on structural factors such as GDT-HA (0.9564), RMSD (0.398), and MolProbity (2.512); model 4 was the most significant. Other factors were predicted on this webserver such as the clash value (25.1), rotamers value (0.4), and the Rama favoured value (85.2) which confirmed the selection. The validation of the structure was approached by predicting a Ramachandran plot. The web tool estimated that 81.8% of amino acids are within the favoured region, 12.2% are in the allowed region, and 5.6% are outliers. The ProSA-web tool was used to verify the quality of the 3D model. The Z score estimated by the ProSA-webserver was calculated to be − 3.19 (Fig. 5).

Fig. 5
figure 5

Tertiary structure prediction of the MEV candidate including refinement and validation of the model. a The predicted tertiary structure modelled using the I-TASSER database where the C-score was − 1.88; b the refinement of the model was predicted using the GalaxyRefine online tool using superimposition; c validation of the structure is conducted by Ramachandran analysis which shows 81.9% of residues in the favoured/preferred region, 12.05% in the allowed region, and 5.9% in the disallowed; d to deduce the quality of the structure, a Z-score of − 3.19 was predicted by the ProSA server

Prediction of discontinuous of B-cell epitopes

There were 120 residues predicted by the ElliPro tool which are found or three epitopes (Table 5). Two epitopes were predicted, one of which had a maximum score of 0.76 and therefore is considered the discontinuous epitope as shown in Fig. 6. Epitopes that retained scores higher than 0.69 were selected.

Table 5 The ElliPro database predicted 120 residues found in two highly scored conformational B-cell epitopes
Fig. 6
figure 6

Three-dimensional representation of the predicted discontinuous epitopes of the peptide vaccine construct. The violet conformational surface describes the discontinuous B-cell epitope predicted and the grey stick skeletons represent the rest of the polyprotein of the vaccine subunit

Molecular docking of the vaccine construct with TLR2

Molecular docking was confirmed by multiple online-based programs to visualise the protein–protein interaction of the MEV vaccine and the TLR2 (Table 6). The use of various tools improves the quality and accuracy of the prediction. The servers ClusPro 2.0, PatchDock, FireDock, HADDOCK, and HawkDock were used to compare various aspects of the docking predictions. ClusPro 2.0 databases result in up to 10 docking models. The model that ranked on top for the property of balance coefficients module was selected and downloaded in PBD format for further analysis. The HawkDock server ranks predicted models according to the binding energy (Fig. 7). The MM/GBSA scores were integrated into the server to highlight key binding residues with the molecule [34]. The PatchDock program confirmed the docking of the ClusPro server. Further analysis using the PRODIGY tool from the HADDOCK web server is used to predict the binding affinity score which is − 15.6 kcal/mol. The FireDock (Fast Interaction Refinement in molecular DOCKing) predicted the global binding energy of the subunit docked with the TLR2 to be − 40.88 kcal/mol.

Table 6 A summary of the online-based docking tools used with the predicted measurements of binding energy
Fig. 7
figure 7

The interaction between the multi-epitope subunit vaccine construct and receptor protein (TLR2). a The HawkDock result of the docked vaccine construct and TLR-2. b The refined docked vaccine component and TLR-2 whereby the ligand–protein is indicated by green colour, the receptor protein is indicated by blue colour, and the red regions indicate the top 10 binding residues of the predicted complex

Molecular dynamics simulation

The molecular dynamic simulation result from the iMOD server of the vaccine construct and the TLR2 complex is depicted in Fig. 8. The method was used to determine the movement of atoms within a rigid body of the vaccine construct. The illustration in Fig. 8b can be described as the main chain deformity graph of the docked structure. The peaks of the graph are indicative of the distortions within the protein regions. The B-factor graph seen in Fig. 8c shows the relationship between the NMA and the corresponding PDB field. Figure 8d predicts the eigenvalue of the construct to be 8.478465e − 06 related to the motion of the structure. The inverse of the eigenvalue is the variance of the structure which is described in Fig. 8e. The graph shows the cumulative variance in green colour and the individual variance in red colour. Another factor calculated by this online tool is a covariance matrix which indicates coupling between pairs of residues. The residues could either be correlated, uncorrelated, or anti-correlated in motion seen in Fig. 8f. The last prediction is an elastic network map of the docked complex. Each dot in the graph represents one spring between the corresponding pair of atoms. The stiffness of the atoms is indicated by a grey range colour in Fig. 8f. Darker grey dots predict stiffer areas in residues whereas lighter grey dots predict flexible ones. The molecular simulation conducted by the IMODs server suggests that the docked vaccine construct with TLR2 complex is stable and can therefore proceed to further analysis.

Fig. 8
figure 8

The result for the iMOD server showing the molecular dynamic simulation of the docked vaccine construct with TLR-2. a Visualisation of the docked vaccine construct with the TLR-2. b Main-chain deformability graph analysis. c Experimental B-factors. d Eigenvalues related to each mode index representing the motion stiffness. e Variance against mode index. f Co-variance map of the docked system (correlated (red), uncorrelated (white), or anti-correlated (blue) motions). g Elastic network, dots are coloured according to their stiffness; the darker grey dots indicate stiffer springs and vice versa

Codon optimisation

An online tool referred to as JCat was used to reverse translate the protein sequence into a nucleotide sequence. The DNA sequence of the construct was calculated to be 1083 bp in length. The Codon Optimization index (CAI) was predicted to be 1.0 and the GC content of the nucleotide sequence was 58.4% by the same server. This shows a possibility of good expression of the vaccine construct in the E. coli K12 strain. The restriction enzymes KpnI and BstBI were tagged onto the N and C terminal of the insertion fragment (vaccine construct) which was then cloned into the pET-30a ( +) vector plasmid. Figure 9b was illustrated using the SnapGene free trial software.

Fig. 9
figure 9

In silico restriction cloning. A computational restriction cloning of the reversed translated MEV candidate fragment into the pET30(a) + expression vector using the trial user Snapgene program. The black ring represents the vector backbone, and the red arrow represents the MEV reverse translated fragment

Immune simulation

A successful immune response is measured using the C-IMMSim online tool to deduce the memory of an immune response. The immunogenic profile of the vaccine constructs immune simulation was estimated through the webserver C-IMMSim. This simulation is able to provide insight of how an immune system would behave when the MEV subunit is introduced into a patient. The results depicted in Fig. 10ae confirm that the vaccine candidate can produce both humoral and cellular immune responses. The level of IgM concentration illustrated in Fig. 10a is evidence of a primary response. The secondary and tertiary responses are much higher than the primary indicating a positive immune response to the antigen molecule. After three doses of the injections, there was no sign of the antigenic molecule in Fig. 10a. The B-cell population rise was shown by the increase in IgG1 + IgG2, IgM, and IgG + IgM. An increase in helper and cytotoxic T cells is also predicted indicating the developed memory cells to defend the host. The level of cytokines in the form of interferon-gamma and interleukins (anti-inflammatory cytokines) is seen to increase in Fig. 10e showing replication inhibition and T-cell-mediated immune response.

Fig. 10
figure 10

The computational immune simulation of the vaccine construct under the program C-ImmSim. a The response of immunoglobulin production when exposed to the antigen. The antigen is depicted by a black vertical line and specific subclasses are multicoloured. b The population of B lymphocytes, y2, represents the scale of memory B cells subdivided into isotypes. c The state of the T-cytotoxic cell population after administered injections. The blue line depicts the resting state which are cells that were not exposed to the antigen. The purple line represents the tolerance of the T cells when exposed to the antigen repeatedly. d The progression of T-helper cells. e An illustration of the concentration cytokines and interleukins after repeated injections. The D curve, shown by a brown line in the graph, represents the diversity and is a signal of danger

Discussion

A large percentage of Mtb cases occur within developing countries [35]. Tuberculosis will continue to be a health threat in these countries unless there is enough innovative and critical research dedicated to its eradication. Elimination of this dreadful disease can be achieved through proper immunisation of the populations that the disease greatly affects.. A study conducted by Mcshane and Wilkie discusses ‘the need for a blueprint to progress in vaccine development’ as a strategy for the next decade of research. They have identified 5 key areas to establish innovative and creative mechanisms to reduce tuberculosis globally [12]. One of these key factors is to identify the correlation between immunity and biomarkers to create an Mtb vaccine [12]. Vaccine informatics or immunoinformatics has become a fast-moving entity helping eliminate time-consuming developments of vaccines in a laboratory [15, 36]. An epitope-based prediction method is a much safer, specific, and efficient method for vaccine development [37]. These computational methods were used in several pathogen-causing diseases such as COVID-19 [38]. A summative study conducted by Maio et al. provides evidence that the PE_PGRS gene family is at the forefront of the interchange between host and pathogen of Mtb [18]. The protein may hold a clue to novel Mtb treatment and contribute towards the ‘end TB by 2030 strategy’ proposed by the World Health Organisation [39]. A study conducted by Moodley et al. used in silico studies to investigate the structural and functional role of four PE_PGRS families, PE_PGRS17, PE_PGRS31, PE_PGRS50, and PE_PGRS54 [40]. The biomarker PE_PGRS17 demonstrated potential for immunological studies and thus is considered for further immunogenic analysis.

Multiple prediction tools were used to conduct this study to investigate potential B- and T-cell epitopes which are responsible for humoral or cell-mediated immunity. B-cell epitopes are important characteristics that produce antibodies known as immunoglobulins. These epitopes are either in a linear or conformational form [41]. Linear B-cell is contiguous found only in the primary structure of the protein. Discontinuous or conformational B-cell epitopes are bought closer together by protein folding [28]. This is a fundamental aspect of vaccine design [42]. T cells are subsequently known as CD4+ and CD8+ cells that recognise and interact with MHC (Major Histocompatibility Complex) which is bound to antigen-presenting cells (APCs). In summary, the T-cell receptors recognise the surface antigens found of these APCs [41]. This recognition is essential for multi-epitope vaccine advancements and immune simulation. A vaccine subunit was constructed from these epitopes using AAY, GPGPG, and EAAAK linkers.

The adjuvant used in this construct is referred to as the griselimycin adjuvant. Adjuvants are used to enhance the immunogenicity of the vaccine construct leading to increased antibody production [10]. In this study, an adjuvant was used to prepare for the possibility of in vivo work in the future. Adjuvants also relate to the cost ineffectiveness of the vaccine by increasing its supply [43]. The current tuberculosis vaccine is non-adjuvanted leading to a suspicion of the underlying issue of the current vaccine being incapable of working in immunocompromised beings [10]. The vaccine subunit was subjected to antigenicity, allergenicity, and toxicity predictions which resulted in antigenic, non-allergenic, and non-toxic vaccine candidate. Examining the physiochemical properties of the final candidate, the molecular weight is 34.4 kDa slightly lower than that of an ideal vaccine candidate. Studies show that the molecular weight between 40 and 50 kDa encourages improved uptake by the lymphatic system [44]. Vaccine emits greater immune responses when the T cells and B cells are localised with the lymphoid organs [45]. Methods that increase the hydrodynamic nature of small vaccine molecules are favoured and help direct them towards lymphatic uptake [45]. The predicted isoelectric point of the subunit is 5.82 showing the weakly acidic nature of the predicted vaccine. The protein is estimated to be stable with an instability index of 15.23 which is much lower than 40. The aliphatic index represents the hydrophobicity of the protein. The prediction falls within the range of 66.5 to 84.33 to be considered thermally stable. These parameters are useful during expression studies [46]. The predicted protein was shown to be rich in glycine amino acids which are already known to us about the PE_PGRS family [47] and predicted to be soluble upon expression.

A prediction of the secondary and tertiary structures of the vaccine subunit is vital for future analysis. The secondary structure resulted in 54% of coiled residues and 45% of the residues being in β-strand form with a 1% of disordered residues. Alpha-helical coiled-coil has been considered an optimum scaffolded structure exposing vital epitope regions [48]. The solvent accessibility surface area (or ASA) is a prediction that helps assess the protein structure and stability [49]. The result illustrated 37% of residues are exposed and 35% of residues are buried. This result is considered an evolutionary property within protein families. The modelled three-dimensional tertiary structure of the construct was refined and showed high quality based on the Ramachandran plot. The plot suggested around 81.8% of residues were in the favoured region, 12.2% were in the allowed region, and 5.9% were considered outliers. A z-score prediction shows that the model lies within a range for a good quality structure. The model is of an acceptable standard and can be used in further docking studies.

Toll-like receptors enhance and stimulate the production of APCs and other necessary innate immune cells. The TLR2 was chosen due to its ability to interact well with non-TLR molecules and are highly expressed in Gram-positive bacteria [50]. This receptor can produce an innate immune response which could assist the vaccine construct [50]. Molecular docking of the 3-D model and TLR2 is conducted by a variety of online docking tools. The high binding affinity of the vaccine complex towards TLR2 makes this an accepted docked complex. The docked complex can produce adaptive and innate immune responses. The HawkDock server uses MM/GBSA scores which were integrated into the server to highlight key binding residues with the molecule [34]. The server also ranks the residues of the receptor and ligand molecules according to the binding energy. The top 10 were selected and illustrated in the result (Table 7). A molecular dynamic simulation was performed of the TLR2-vaccine construct to deduce its stability. The result illustrated the motion and rigidity of the complex which was predicted to be stable.

Table 7 The top 10 receptor and ligand residues for the docked MEV and TLR2 molecule with its corresponding binding energy retrieved from the HawkDock software

An evaluation using in silico restriction cloning of the reverse translation of the vaccine construct into the pET30a ( +) vector was conducted. The CAI and GC content is predicted to supply vital information about protein expression. The expression level of the vaccine candidate was predicted to fall with a good range of optimal expression. These references should be considered indicative which require wet-laboratory procedures to be able to verify [31]. The predicted CAI and GC content will support in vitro expression studies. Immunogenic profiles of the vaccine construct were estimated by an online tool referred to as C-ImmSim. The antigen concentration decreases after the first injection due to the memory cells that built up an immune defence. The memory B and T cells (cytotoxic and helper) are responsive in the presence of the antigen molecule which triggers a response. The level of secreted cytokines is high which show replication inhibition and T-cell-mediated immune response [46]. Further analysis should proceed via a bacterial expression which will allow the vaccine candidate to be used in various immunological studies that can validate what this study has discovered. Pre-clinical trials such as tissue-culture or cell-culture systems and animal testing would thereafter lead the investigation.

Conclusion

Technology has brought new light to vaccine development that could be added to the current methods such as in vitro pre-clinical trials for Mtb. This study acquired the use of online-based prediction tools to discover B and T cell epitopes to assemble a multi-epitope vaccine candidate for Mtb. The constructed candidate met all immunogenic properties for an appropriate vaccine. The investigation led to various properties of the construct highlighting the nature of the sequence, the folding of the residues and the tertiary structure analysis. The binding affinity and stability of the docked vaccine complex with TLR2 using molecular dynamic simulation were estimated and accepted. Further analysis demonstrated that the predicted vaccine subunit may potentially provide the necessary immune response as a vaccine candidate. However, further lab-based immunological studies are required to validate the predicted immune-informatics data in this study.