1 Background

Since December 2019, COVID-19, caused by the novel coronavirus SARS-CoV-2, has spread to more than 223 countries to date causing huge health and economic crisis [1,2,3]. The genome of SARS-CoV-2, like other families of Coronaviruses (CoVs), is an enveloped positive-sense single-stranded RNA virus. It belongs to the Coronaviridae family, the Orthocoronavirinae subfamily and the Betacoronavirus genus [1,2,3]. Coronaviruses (CoVs) can mainly infect mammals by Alpha (α) or Beta (β) coronaviruses, and predominantly can infect birds by Gamma (γ) or Delta (δ) Coronaviruses [4, 5]. The SARS-CoV-2 viral genome of 29,903 nucleotides, approximately, contains 5’and 3’ untranslated regions and 11 Open Reading Frames (ORFs) encoding 11 proteins including the S protein [6]. Probably, the modes of SARS-Cov-2 transmission among humans are via three primary pathways: inhaling respiratory droplets directly from infected persons, or contact with infected environmental surfaces know as “fomites” and touching your mucous membranes with soiled hands, or inhaling infected airborne particles. Recent researches indicate a high correlation between the SARS-COV-2 genome and the two genomes of the bat-CoV RaTG13 and the pangolin-CoV MP789 followed by two other genomes: CoVZC45 and CoVZXC21 [7, 8].

The outbreak of SARS-CoV-2 caused 2,797,435 deaths with a total of 127,847,262 confirmed cases (December 20, 2019 to March 29, 2021) that have variable damages from one country to another. In the majority of cases, huge damages were reported, such as in the USA, Brazil, and India causing respectively 562,526, 312,299 and 161,881, deaths as well as 30,962,803, 12,534,688, and 12,039,644 confirmed cases up to 29 March 2020 [9] Nevertheless, in other regions, such as, Laos, Vietnam, and Finland damages seem to be limited and the number of deaths, respectively, did not exceed 0, 35, and 822 [9]. However, for other regions, like, Nicaragua, Sierra Leone, Vietnam, and Madagascar, cases and damages seem to be limited and the number of deaths, respectively, did not exceed three hundred cases. These temporal variances in a number of case fatality rates can be caused by different factors: political and economic strategies, cultural behavior, age, and also health infrastructure, [10, 11]. Furthermore, the population’s immunological background is probably due to the vaccination strategies used in these countries [11,12,13,14,15]. From another point of view, different vaccines such as BCG (Bacillus Calmette-Guérin), OPV (Oral Poliovirus Vaccine), and MMR (Measles, Mumps, and Rubella vaccines) demonstrated an immune response to fight various pathogens [12, 14, 16, 17].

On the other hand, these different variations may be attributed to the adoption of a universal and long-standing BCG as again found to be very significantly protective for whom vaccination records were available [11, 18, 19]. In addition, the MMR vaccine protective potential was investigated based on S protein bioinformatic analysis [20]. Based on this computational biology analysis, the MMR vaccine was investigated as being potentially protective for adults and provides advantageous protection for children against COVID-19 as well. However, experimental analysis is required. Furthermore, pneumococcal vaccination PCV13 was again found to be very efficient in a study of 137,037 individuals who received SARS-CoV-2 PCR tests [21]. A recent study proves great similarities between the SARS-CoV-2 genome and pneumococcal vaccines PspA and PspC [22]. Indeed, other researchers found that polio, Hemophilus influenzae type-B (HIB), varicella, geriatric flu, MMR, PCV13, and hepatitis A / B (HepA-HepB) vaccines administered in the past 1, 2, and 5 years are associated with decreased SARS-CoV-2 infection rates [22, 23].

In this work, we propose in silico study to investigate the potential protective effect of 14 investigated vaccines (Bordetella Pertussis, Tetanus, Haemophilus influenzae type B (Hib), Corynebacterium Diphtheriae, Streptococcus pneumoniae, Hepatitis A, and Hepatitis B) against COVID-19. We aim to localize similar amino acid (aa) regions in the S protein of the SARS-CoV-2 genome and the main antigenic proteins in other vaccines which may lead to the production of cross-reactive antibodies against the target viruses as well as SARS-CoV-2. To achieve this goal, we used a combination of bioinformatics, and signal processing tools to identify the common amino-acid (aa) sequences of the main antigenic protein of SARS-CoV-2 and investigated vaccines.

2 Methods

Recent research [7] has been suggested the SARS-Cov-2 genome shares 96% genetic similarity with a RATG13 coronavirus genome and 86% genetic similarity with the pangolin coronavirus genome. Subfigure (a) of Fig. 1 presents the distribution of nucleotide modifications by percentage along the SARS-COV-2 region compared to the RATG13 and the MP789 coronavirus genomes. Spike protein presents high recombination between RATG13 and the MP789 coronavirus genomes as shown in subfigure (a) of Fig. 1. Specifically in the position 1251 to 1600 base pairs, the total nucleotide mutation is equal to 14.28% (50 nucleotides) between the two coronavirus genomes (the pangolin and the SARS-CoV-2). These mutations are less than the mutation located between bat coronavirus and SARS -CoV-2 S gene (35% and 125 modified nucleotides). This research [7] suggests that the SARS-CoV-2 genome is the result of recombination events between two coronavirus genomes: The bat-CoV RATG13 and the pangolin-CoV MP789.

Fig. 1
figure 1

Percentage of nucleotide modifications of Sars-Cov-2 genome comparing to RATG13 and MP789 genomes [7]: a Percentage of nucleotide modifications of SARS-CoV-2 genome comparing to RATG13 and Pangolin genomes; b percentage of nucleotide modifications of S gene of SARS-CoV-2 genome comparing to the S genes of RATG13 and Pangolin genomes [7]

Figure 2 presents the Flowchart diagram of our adopted localization methodology to find similar amino acid sequences between SARS-CoV-2 genome and our investigated sequences. Here we have used the recombination between bioinformatics techniques and signal processing tools.

Fig. 2
figure 2

Flowchart diagram of methodology to localize the similarities between two amino acid sequences

2.1 Vaccines and Sequences investigated (databak accession numbers)

Our study is focused on the vaccines included in one of the countries presenting a very low number of confirmed COVID-19 cases. Our investigated vaccines include main old and more recent vaccines (a number equal to 14) with 34 protein sequences: BCG, Poliovirus, Measles, Mumps, Corynebacterium diphtheria, Tetanus, and Bordetella pertussis vaccines and more recent vaccines against hepatitis B and A viruses, Rubella virus, Hemophilus influenzae type B (Hib) and Streptococcus pneumoniae (PCV10, pspC, protein PspC, Protein A) [24]. The amino-acid sequences of these antigenic proteins (n = 34) constituting those vaccines as well as the amino acid sequence of Spike protein of the Wuhan Sars-Cov-2 genome strain were obtained from the database NCBI Genbank (https://www.ncbi.nlm.nih.gov). Accession numbers are presented in Table 1.

Table 1 Investigated vaccines and their corresponding antigenic proteins obtained from NCBI Genbank (https://www.ncbi.nlm.nih.gov)

2.2 Amino acid sequence alignment and hot spot analysis

We aim to detect similar amino-acid (aa) sequences between protein sequences. The presented results were obtained using Blastp program. This program can detect identical amino-acids and, or similar amino-acids of two genomic sequences. These results were presented using BioEdit software (version 7.2.5) (http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-alignment-editor.html). Figure 3 presents an example of our obtained results after comparing the S protein of the Sars-Cov-2 genome and two sequences of the tetanus vaccine. These obtained results present different similar patterns with their position in each genome.

Fig. 3
figure 3

Example of Bioedit ((http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-alignment-editor.html) version 7.2.5) results to identify similar pattern located between S protein of SARS-CoV-2 genome and two protein sequences of vaccines corresponding of two viruses (Poliovirus and Hemophilus influenzae serotype B (Hib))

2.3 Numerical mapping of amino acid sequences

The conversion of the genomic sequences into numerical ones using signal processing tools is an important step to characterize the unknown region [25,26,27,28]. It allows rapid image observation of similar patterns before assessing more precise analysis. By applying the Electron–Ion Interaction Potential (EIIP) coding technique, we can obtain for each protein sequence a signal. This type of coding technique (EIIP) has been used to transform the amino acid (aa) sequence into numerical representation [25]. For that, we converted our amino-acid sequences into numerical ones using the following Table 2, where each amino acid was represented by its corresponding value of the EIIP coding technique.

Table 2 EIIP coding technique for transformation of the amino acid sequence into a signal

Figure 4 shows the 1-D signal representation of protein sequence corresponding to 150 amino acids of S-protein of Sars-Cov-2 genome. Each signal value corresponds to an amino acid position in the protein sequence.

Fig. 4
figure 4

EIIP Numerical representation (generated signal) of 150 amino acids of Sars-Cov-2 Spike protein

After transforming the amino-acid sequence into a signal, we aimed to see it in a 2-D representation to focus more on the information that can contain. The Continuous Wavelet Transform (CWT) (along 64 scales with w0 ~ 5.5) was applied to protein signal to obtain a protein image (2-D scalogram) [27, 28].

$$w_{0} = 2*\pi *f_{0}$$

\(w_{0}\) corresponds to the oscillations number of wavelet transform, and the parameter f0 is the central frequency of the basic wavelet. These images (wavelet scalograms) are the best method to detect the homologous regions between two amino-acid sequences [27, 28].

3 Results

3.1 Numerical mapping

To transform a protein sequence into numerical ones, we need to apply a coding technique to have a signal and an analysis technique to have an image.

In this work, and after applying different methods, we have chosen to use the EIIP as a coding technique and the wavelet analysis as an analysis technique. After applying the EIIP coding technique and CWT of an amino acid sequence, we obtained a scalogram image (modulo of the matrix contains CWT coefficients). The amino acid scalograms (2D) are the best way to see the similarities between two sequences if it exists. As a result, we obtained two similar patterns (scalograms) that confirm these similarities. Figure 5 presents images corresponding to similar patterns between the region of S protein of Wuhan SARS-CoV-2 genome and some regions of our investigated amino-acid sequences.

Fig. 5
figure 5

Wavelet representation (Scalograms) of similar regions (patterns) of amino acid region of SARS-COV-2 S gene (QJT73034.1) compared with the acid amino vaccines; the first column shows the S gene of the SARS-cov-2 genome and the second shows the vaccine sequence

3.2 Alignment of amino acid sequences

As a result, similar segments (4–8 aa) between the S protein of the SARS-Cov-2 genome and our investigated vaccine antigenic proteins were identified (Fig. 6 and Table 3). Only antigenic protein included in Hepatitis B, Hib, Poliovirus, PCV10 vaccines showed 6 to 8 similar consecutive amino-acids. The corresponding motifs are indicated in Table 3. PCV10 antigenic proteins shared 2 similar motifs ( ) of 8 and 6 amino acids.

Fig. 6
figure 6

Highly similar sequences were identified in our investigated vaccines and SARS-CoV-2 proteins using BioEdit software (http://www.mybiosoftware.com/bioedit-7-0-9-biological-sequence-alignment-editor.html)

Table 3 Results of total similar patterns number were identified in our investigated vaccines and SARS-CoV-2 proteins

Figure 6 contains 23 similar patterns corresponding to more of 4 similar consecutive amino-acids of the S protein and our investigated vaccines. A total of 11 similar patterns have a size equal or more than 6 amino acids were identified.

Table 3 presents the total repetition number of each similar pattern that was identified between vaccine and S protein of Sars-Cov-2 genome.

To more see if these patterns are present in the other genes of the SARS-Cov-2 genome, we were also able to map and compare all other SARS-CoV-2 genes with our investigated vaccines. We confirm that these presented similar patterns ( , , , and ) between the Sars-Cov-2 S gene and vaccines are not presented in the other genes of the Sars-Cov-2 genome. Moreover, we detect a new similar pattern ‘ ’ between the BCG vaccine and Sars-Cov-2N gene (Fig. 7).

Fig. 7
figure 7

Scalograms (amino acids Wavelet images) of different amino acids region of different genes of SARS-COV-2 genome compared with the acid amino vaccines

4 Discussion

Different researchers prove that the Sars-Cov-2 genome is the result of the recombination between two beta-coronavirus genomes: Pangolin and Bat coronavirus [7, 8, 29, 30]. Various in silico studies search for the effect of the vaccination, especially with pneumococcal vaccines, to protect against symptomatic cases of SARS-CoV-2 infection and death [31,32,33,34,35,36]. Our main objective was to find similar amino-acid sequences in the S protein of the Sars-Cov-2 genome and the investigated proteins sequences mentioned above using the recombination between bioinformatics and signal processing tools. This study using an intelligent system suggests the possible influence of Hepatitis B, Hib, Poliovirus, Tetanus, and PCV10 vaccines in protecting against COVID-19. In addition, this work could explain the variation in the number of deaths among countries and the possibility that these vaccines contain antigens that might be cross-reactive with SARS-CoV-2. For this, we consider that some of these investigated vaccines as real supportive protective measures in the fight against the COVID-19 pandemic. A recent review [37] presents a comparative table that contains multiple vaccine platforms and has been explored to develop the COVID-19 vaccine. Each vaccine platform has unique advantages and disadvantages. Most of these vaccines use either the Spike protein or its receptor-binding domain (RBD) as the antigen.

For this, we studied the effect of our investigated vaccines and see if it can be implicated to boost our immune system against COVID-19. We investigate the possibility of the protection against COVID-19 through inducing a cross-reactive antibody by different existing vaccines. To achieve this goal, we used a combination of bioinformatic and signal processing tools to compare the amino-acid sequences of the main S protein of SARS-CoV-2 genome and our investigated vaccines. A numerical mapping showed two similar patterns corresponding to an exposed site have a size of 7 and 8 amino-acids: ‘ ’ detected in HBsAg-of Hepatitis B vaccine and ‘ ’ detected in PCV10 vaccine. Two other patterns (‘ ’ and ‘ ’) in two investigated vaccines (Polio, PCV10,) are identified in the S gene of the SARS-CoV-2 genome. Given the novel aspects of the importance of Hepatitis B, Streptococcus pneumoniae, and Polio immunities as potential protective factors in the COVID-19 pandemic, measures to organize mass immunization against these viruses should be strengthened. Vaccination with attenuated viruses may explain partially the low symptomatic infection rate among children as there are other factors yet to be considered.

5 Conclusions

The main of this work is tracking the effect of existing vaccines against COVID-19 threats using suitable computational techniques. As a conclusion, our obtained results indicate that some antigenic proteins in these investigated vaccines may protect against severe COVID-19. We suggest that these types of vaccines can reduce the risk of COVID-19 mortality, but we note that this silico study must be confirmed using in vitro analysis.