Introduction

Emerging literature on the development of drug resistance against pathogens compel us to establish and reinforce the fact that immunization is one of the most effective way to provide long-lasting protection against microbial diseases (Jansen et al. 2018). Despite the availability of various vaccines in the market against different pathogens, there is always a scope in improving already available or preparing new vaccines against pathogens. For example, “first generation vaccines” [use of inactivated or live attenuated pathogens e.g., Bacillus Calmette Guerin (BCG), plague, pertussis, polio, rabies, and smallpox] were developed long ago while “second generation” vaccines (cell components e.g., polysaccharides or protein antigens of the microorganisms and referred as subunit vaccines) were developed in the last decades. Parallel to these vaccine candidates, use of genetic material (DNA and RNA), genetically modified cells and non-virulent viruses packaged with antigenic DNA or RNA are being explored for vaccine development. After learning from first and second generation of vaccine and evolution of advanced tools in vaccine science, there is enormous scope to improve the development of “third generation” with the help of bioinformatics, genomics, proteomics and other associated techniques. Due to several limitations in conventional approaches, it is essential to adapt well proven and advanced techniques to accelerate vaccine development program bypassing time and energy (Rappuoli 2004; Bagnoli et al. 2011).

Current emphasis is focused on the identification of genes/proteins of pathogens which plays important roles in host-pathogen interaction, bacterial pathogenesis and its survival in host body. The last decades have witnessed a number of technological advances that have the potential to be exploited in the expansion of new vaccines development; therefore, the use of these techniques to develop vaccines against important human pathogens is the focus of this article.

Genomics approaches

Being advantageous over conventional methods, genomics approaches can be used in case of both cultivable as well as non-cultivable microorganisms and are highly efficient to identify all possible antigens expressed universally in different pathogenic strains. Genome-based technologies are helpful to find genuine antigens as well as mimetic antigens that can induce protective immunity against bacterial epitopes (Seib et al. 2012). Gene-expression microarray resolves the problem of complexity of overwhelming data generated by many genomics techniques and offers a useful snapshot of the main cellular events that contribute to the process of microbial pathogenesis, and the identification of key potential vaccine candidate.

Comparative genomics and pan-genome analysis allow a deep comprehensive study of intra and interspecies antigen variability and distribution; and has been described in case of many pathogens like Plasmodium, group B streptococcus (GBS), Neisseria meningitidis serogroup B (MenB) and Streptococcus pyogenes (Vernikos et al. 2015; Swapna and Parkinson 2017; Carlton et al. 2008; Margarit et al. 2009; McCarthy et al. 2018; Lin et al. 2018; Tettelin et al. 2002; Maione et al. 2005). Intra-species analysis helps by sorting common antigens present in all prevalent strains to act as a universal vaccine candidate. The comprehensive methods like multigenome analysis or pan-genome approach have been demonstrated to identify potential vaccine candidates against highly variable pathogens like S. pyogenes (Sharma et al. 2013). The development of a GBS vaccine to fight invasive neonatal disease is considered priority by global health authorities (Lin et al. 2018). The genome sequences of GBS serotype type III strain NEM316 and GBS serotype type V strain 2603 V/R have been exploited to identify novel and universally accepted vaccine candidates (Glaser et al. 2002; Tettelin et al. 2002, 2005; Maione et al. 2005). Huge intra-species diversity suggests that single genome sequence is not entirely representative and does not offer a complete picture of the genetic variability of a species. Therefore, comparative genomics allows identification of potential antigens on the basis of sequence conservation in different serotypes and strains of a given pathogen (Sharma et al. 2013).

Inter-species analysis guides us to identify and filter out antigens that show high degree of similarity with genes present in the human microbial flora avoiding undesirable cross-reaction of vaccine-elicited antibodies against known benign commensal species. An inter-species comparison of predicted protein sets of S. agalactiae, S. pyogenes, and S. pneumoniae has shown that approximately 50 % of proteins are homologous, signifying substantial overlap in potentially relevant pathogenic mechanisms (Tettelin et al. 2002). This information may be exploited to develop a vaccine against multiple species of streptococci. Genomics approaches can identify potential vaccine candidate genes when the pathogen is grown in isolation. Functional aspect of genome dynamics of microorganisms needs to be analysed when they interact with the host. Several functional genomics approaches have been developed to compensate this limitation.

To understand the mechanism of microbial pathogenesis, it is essential to identify the set of genes responsible for the initiation and maintenance of an infection. Initially, there was a lack of suitable techniques for the testing of individual mutants in animal models to understand the function of the genes during infection. The first group of techniques including In vivo expression technology (IVET) and differential fluorescence induction (DFI) has been used to identify bacterial genes by specifically inducing promoters in the infected host (Mahan et al. 1993; Bumann and Valdivia 2007; Adams and Jewett 2018; Roberfroid et al. 2016). In contrast, signature-tagged mutagenesis (STM) was designed to identify genes which are essential for the bacteria to survive in vivo. The technique is based on random mutagenesis of bacteria to identify genes required for in vivo survival where every mutant carries effective molecular signature which can be identified through hybridization (Hensel et al. 1995). The tags from a mixed population of bacterial mutants representing the inoculum and bacteria recovered from infected hosts are detected by PCR, radiolabelling and hybridization analysis. STM is advantageous over other approaches that rely on in vitro grown bacteria and are likely to miss important protective antigens which functions only in vivo (Hensel et al. 1995; Mazurkiewicz et al. 2006; Saenz and Dehio 2005; Ponnusamy et al. 2015). The second group of techniques, including gene expression microarrays, add further advantage to directly measure the gene expression levels on a true genome-wide scale. However, the application of these techniques for analysis of bacterial pathogens during the infection process is still in its early stages. Another method, Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) has qualities that bridge with other methods allowing accurate gene expression measurement on a sub-genomic scale. However, it is difficult to study bacterial pathogenesis during the infection process by these techniques.

An anti-genomic approach—combining genome with serological antigen identification technologies have been used to explore the antigenic repertoire of bacterial pathogens (Meinke et al. 2005; Etz et al. 2002; Fritzer et al. 2010; Nafarieh et al. 2017). This integrated approach is useful for antigen validation, as selected clones can be used directly for the generation of specific immune sera without the demanding task of high throughput recombinant protein production. These sera can be used in surface protein localization studies and in vitro functional assays. The anti-genome of a particular pathogen, defined by this method, typically consists of approximately 100 antigens; most of them are located on the cell surface or secreted into the external environment. Small-insert genomic libraries are also employed for the in vitro protein selection method termed as ‘ribosome display’. It has been employed to identify and characterize the proteins of immunological importance on a genomic scale of the human pathogens (Weichhart et al. 2003; Xiao et al. 2011). In addition, ‘Lambda phage display’ has been used for domain mapping, antigen discovery and protein interactions to identify potential antigens (Nicastro et al. 2014).

Another antibody-based selection method, in vivo induced antigen technology (IVIAT), has been used to identify antigens in pathogens that are only expressed during human infection (Mahan et al. 1993; Hang et al. 2003; Lombardo et al. 2007). For example, Escherichia coli expressed genomic libraries of Vibrio cholerae were probed by colony blotting, using convalescent human sera. A major challenge in the development of efficient screening methods is the direct selection for protective candidates. For direct selection of protective candidates, expression library immunization technology (ELI) has been successfully applied in case of Mycobacterium tuberculosis. This strategy is based on immunization with plasmid DNA incorporating the whole genome split into small fragments. However, ELI is limited to the study of genes that can be expressed in eukaryotic cells, and it also demands animal models which are suitable for screening purposes (Barry et al. 2004; Talaat and Stemke-Hale 2005; Yang et al. 2017).

Bioinformatics and computational approaches

The genome sequences are complete inventory of every possible vaccine candidate. Advanced bioinformatics tools can be used to examine the genetic content as well as transcription and translation profiles of any pathogen to unravel more details of its pathogenicity. During last two decades after first bacterial (Haemophilus influenzae) genome sequence was published, 1000s of bacterial and viral genome sequences have been completed (http://www.ebi.ac.uk/genomes/bacteria.html, http://www.genomesonline.org/cgi-bin/GOLD/bin/sequencing_status_distribution.cgi, http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239&opt=Virus). The generated genomic information is used to screen the inclusive set of potential proteins encoded by pathogens for the search of vaccine candidates—an approach referred to as reverse vaccinology (Bagnoli et al. 2011).

In silico whole-genome analysis tools accelerate the process of vaccine candidate identification by integrated use of genomics and proteomics study because one can also predict (I) subcellular localization of vaccine candidate proteins, (II) conserved vaccine candidates among different strains and species, (III) topology of surface proteins, (IV) immunogenicity of different epitopes in vaccine candidate proteins, (V) allergic property of proteins and (VI) 3D structure of vaccine candidate proteins to analyse accessibility of immunologically relevant epitopes with the help of already available bioinformatics tools (Gourlay et al. 2017; Alvarez et al. 2018). Successful use of the bioinformatics tools has been evidently reported in case of many bacterial and viral pathogens. For example, due to the heterogeneous distribution of group A streptococcus (GAS) (> 200 serotypes have been reported) in world population and variation of amino acid sequences in proteins across all serotypes, it is very difficult to identify universal vaccine candidates. Therefore, a comprehensive in silico study has been reported (Sharma et al. 2013). The in silico approaches help to predict of functionality of a particular gene which needs to be verified by using proteomics and genomics tools (Table 1).

Table 1 Some important available bioinformatics tools used for data mining and prediction of potential vaccine candidates

Proteomics approaches

To overcome the limitations of genomics approaches like (1) mRNA expression levels does not represent the actual amount of active protein in a cell, (2) gene sequence gives incomplete information about post-translational modifications, and (3) genome information does not describe dynamic cellular processes; proteomics has been used in different ways to identify novel vaccine candidates against several human pathogens (Scoffone et al. 2020; Sousa et al. 2020; Sharma et al. 2013; Rodriguez-Ortega et al. 2006; Nilsson et al. 2018; Zielke et al. 2016; Couto et al. 2016; Lo et al. 2017). The present focus on using proteomics is to identify surface proteins in pathogens that can be targeted as potential vaccine candidates. Authors have analysed and identified microbial surface proteins by using two-dimensional gel electrophoresis (2-DE) coupled with mass spectrometry (MS) and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). However, proteins of hydrophobic nature can not be identified efficiently by this method. Subsequently, a two-dimensional liquid chromatography coupled with tandem mass-spectrometry (2D-LC-MS/MS) was introduced and found to be particularly useful to identify proteins that are either highly hydrophobic or basic, inadequately expressed, with high molecular weight, and extreme isoelectric points (Bagnoli et al. 2011; Gygi et al. 1999; Chen et al. 2006). Improved protocol was adopted to identify surface proteins (surfome) with minimized contamination of cytoplasmic proteins by careful surface digestion of live bacteria with different proteases and mass spectrometry analysis (Rodriguez-Ortega et al. 2006). The same techniques mentioned above have been applied for the analysis of bacterial culture supernatants to identify and analyse bacterial ‘secretome’ (He et al. 2015; Ravipaty and Reilly 2010). Proteomic studies have been used to study the role of the environment in regulating the pathophysiology of several microorganisms as well as investigate host–microbial interactions (Chen et al. 2016; Agudo et al. 2004; Hardwidge et al. 2004). The application of proteomics provides major opportunities to elucidate disease mechanisms and identify new and globally useful vaccine candidates against microbial infections. A comparative proteomic approach allows the selection of vaccine candidates based on differential expression in virulent versus non-virulent strains, invasive versus less invasive conditions or colonizing versus invasive strains (Shaw et al. 2002).

For large-scale quantitative protein expression, ‘Shotgun proteomics’ or multidimensional protein identification technology (MudPIT) were devised to identify the proteins expressed in lower abundance which is not possible by 2-D gel electrophoresis (Wu and Yates 2003). However, this method was not found suitable for comparative analysis unless Isotope-coded affinity tags (ICATs) were developed for the detection of proteins expressed in abundance as well as at low levels (Gygi et al. 1999; Guina et al. 2003). A further advantage of the ICAT method is that it is based on post-isolation stable isotope labelling of proteins and is therefore not limited to incompatibility of cells and tissues with metabolic labelling. Because ICAT label binding is limited to cysteine residue only, quantification of cysteine deficient proteins (10–20% in case of bacteria) may not be done. To overcome these troubles, cysteine independent iTRAQ technique was introduced, which uses a set of four isobaric tags comprised of an amine-specific (peptide N-terminus and lysine residues) reactive group, a neutral linker group (28–31 Da mass), and a reporter region (114–117 Da mass) (Snelling et al. 2007). These labels can therefore be used to simultaneously track up to four samples in a single experiment (Choe et al. 2005). Since the tags have the same complete mass, each peak detected in MS represents a single peptide from the combined four samples. MS/MS of each peptide releases the reporter allowing simultaneous quantitation and identification of the peptide. Another method stable isotope labelling with amino acids in cell culture (SILAC) was designed for the quantification of proteins and has been employed on many pathogens to identify vaccine candidates (Ong et al. 2002; Kani 2017; Jang and Kim 2018). This method is similar to those described above, except that cells subjected to different biological conditions are grown in culture in the presence of an essential amino acid with a stable isotopic nucleus (e.g., deuterium). Therefore, one sample could be incubated with an unlabelled amino acid and the test sample incubated with a deuterated form. Because the amino acid is essential, the organism requires it for survival, and through several replication cycles all that particular amino acid will be present in the cells proteins in either unlabelled (control) or deuterated (test) form, allowing true quantitation (Ong et al. 2002).

Quantitative comparison of protein expression in a variety of normal, developmental and disease states to understand highly regulated and critically timed cellular processes occurring inside pathogenic bacteria can further be characterized by monitoring the fate of post-translationally modified (PTM) proteins that have a role in pathogenesis and ultimately assist in identifying a suitable vaccine candidate (Macek et al. 2019). The combination of proteomics and serological analysis developed a new technology naming serological proteome analysis (SERPA) that has been useful in the identification of potential vaccine candidates (Klade 2002). These technologies provide valuable insights into the molecular basis of microbial pathogenesis to identify potential vaccine candidates that otherwise might not have been identified using more conventional methods.

Antibody-profiling technologies, such as protein microarrays have been used to estimate antibody responses to hundreds of recombinant antigens and allow the screening of high-density protein arrays for enzyme–substrate, DNA–protein, and protein-protein interactions (Bensi et al. 2012; Emili and Cagney 2000). Furthermore, long-lasting humoral responses against different pathogens can be analysed for diagnostics, understanding pathogenic mechanisms and for the development of vaccines against bacteria, protozoa and viruses (Zhou et al. 2015; Kempsell et al. 2015; Felgner et al. 2009; Vigil et al. 2011; Crompton et al. 2010; Duke-Cohan et al. 2009; Fernandez et al. 2011). Despite several advantages, some of the limitations with protein microarray are (1) non-recognition of misfolded or multimeric proteins (2) requirement of additional procedures for identification of PTMs or non-protein antigens (3) requirement of expensive fluorescent microarray scanner and sophisticated statistical methods (4) requirement of expensive robotics for printing arrays. Antibody microarray was found to be the most versatile multiplexed immunoassay technology that was used for the exploratory detection and study of protein abundance, function pathways, and potential vaccine /drug targets. Applications of antibody microarray in basic biology and clinical studies have been recently detailed out providing insights into the current trends and future of protein analysis (Chen et al. 2018).

Apart from application of omics-based tools and reverse vaccinology, development of nucleic acid based vaccines have gained attention in last three decades. It combines the positive aspects of live-attenuated vaccines, viral vectors and subunit vaccines. Nucleic acid based vaccines include viral vectors, plasmid DNA (pDNA) and RNA. These vaccines have their own advantages such as (1) induction of both B and T-cell responses; (2) specificity; (3) high stability; (4) economical; (5) no anti-vector immunity. Published research reports show faster progress in nucleic acid based vaccine development against viruses in comparison to bacterial pathogens. Few reports came up showing potential of these vaccines against bacterial pathogens (Maruggi et al. 2017; Budachetri et al. 2020; Silveira et al. 2017).

Current status of vaccine research against major pathogens

The catastrophic pandemic outbreak of the century by Covid-19 killing 1000s and infected millions in the world warrants global proactive and integrated research and vaccine development program. Other pathogens which causes, AIDS, malaria, tuberculosis, meningitis, dengue are also the major concerns for vaccine research. Most research groups are focused on intra-species conserved surface proteins of bacterial pathogens (GAS, Streptococcus pneumoniae, M. tuberculosis), multi-valent vaccine (Dengue virus and GAS), killed or attenuated whole organisms (Poliovirus, M. tuberculosis, Dengue virus and Helicobacter pylori), and capsular polysaccharide (GBS, Meningococcus and other Gram-positive bacterial pathogens) for vaccine development. The biggest challenge for vaccine development is against highly variable and fast evolving human pathogen like coronaviruses, HIV, Influenza, Ebola and Nipah viruses.

We have witnessed the efficiency of powerful genomics, proteomics and bioinformatics approaches for identification of vaccine candidates in last two decades. However, despite the fast and efficient target identification, scientists are still facing the slow and laborious validation steps, use of animal models for the testing of the vaccine candidates and clinical trials. On the other hand, safety and affordability are the two important factors that must be taken under consideration for the development of modern vaccines. In our opinion, new vaccines should have the highest effectiveness in developed and developing countries. The successful use of multi-genome analysis, screening and use of reverse vaccinology approach to develop a universal vaccine against highly variable pathogen like GAS and GBS may open new vistas for the potential development of universal vaccines for other human pathogens.