Introduction

The ethiopathology of the diseasome induced by the SARS-CoV-2 infection in the human host [1] is under intensive investigation. A likely mechanism is that the multitude of the diseases encompassed within COVID-19 derives from molecular mimicry phenomena between the virus and human proteins [2]. The rationale is that, following an infection, the immune responses raised against the pathogen can cross-react with human proteins that share peptide sequences (or structures) with the pathogen, in this way, leading to harmful autoimmune pathologies [3, 4]. Accordingly, lungs and airways dysfunctions associated with SARS-CoV-2 infection might be explained by the sharing of peptides between SARS-CoV-2 spike glycoprotein and alveolar lung surfactant proteins [2]. In support of this thesis, additional reports [5,6,7,8] highlight molecular mimicry and cross-reactivity as capable of explaining the SARS-CoV diseases. Of special interest, cross-reactive T cell recognition between circulating “common cold” coronaviruses and SARS-CoV-2 has been also suggested [9].

In this scientific framework, this study comparatively analyzed the peptide sharing between SARS-CoV-2 and mammalian species. Our reasoning is that if it were true that molecular mimicry between SARS-CoV-2 and human proteins contributes to or causes COVID-19, then different levels/patterns of molecular mimicry vs. the virus should characterize the various animal species. Indeed, scarce data exist to indicate that domestic animals, for instances dogs and cats, can either transmit the virus or develop the virus-associated diseasome [10]. In general, currently, the consensus remains that there is no evidence that infected pets are a source of SARS-CoV-2 infection for people or other pets [11, 12].

Based on this rationale and using hexa- and heptapeptides as sequence probes [13,14,15], the peptide overlap between SARS-CoV-2 spike glycoprotein and mammalian proteomes was analyzed.

Methods

Peptide sharing analyses have been extensively described elsewhere [16, 17]. Briefly, SARS-CoV-2 spike glycoprotein (NCBI protein Id=QHD43416.1) primary sequence was dissected into hexa- and heptapeptides offset by one residue (i.e., MFVFLV, FVFLVL, VFLVLL, FLVLLP). We obtained 1268 hexapeptides and 1267 heptapeptides. Then each viral hexa- or heptapeptide was analyzed as a probe to scan for occurrences of the same hexa- or heptapeptide in the reference proteome from the following mammalian organisms (with taxonomy ID in parentheses): human, Homo sapiens (9606); mouse, Mus musculus (10090); rat, Rattus norvegicus (10116); cat, Felis catus (9685); dog, Canis lupus familiaris (9615); rabbit, Oryctolagus cuniculus (9986); chimpanzee, Pan troglodytes (9598); gorilla, Gorilla gorilla gorilla (9595); and rhesus macaque, Macaca mulatta (9544). Three viral proteomes were added as coronavirus controls: human coronavirus HKU1 (290028); human coronavirus 229E (11137); and human coronavirus OC43 (31631). The hexa/heptapeptide matching analyses were conducted by using Pir Peptide Matching program [18].

The expected value for hexapeptide sharing between two proteins was calculated by considering the number of all possible hexapeptides. Since in a hexapeptide, each residue can be any of the 20 amino acid (aa), the number of all possible hexapeptides N is given by N = 206 = 64 × 106. Then, the number of the expected occurrences is directly proportional to the number of hexapeptides in the two proteins and inversely proportional to N. Assuming that the number of hexapeptides in the two proteins is << N and neglecting the relative abundance of aa, we obtain a formula derived by approximation, where the expected number of hexapeptides is 1/N or 20−6. By applying the same calculation, the expected value for heptapeptide sharing between two proteins is equal to 20−7.

Results

The graphical illustration of the peptide sharing between SARS-CoV-2 spike glycoprotein and the analyzed mammalian and coronavirus proteomes is reported in Fig. 1. The hexa- and heptapeptide sequences involved in the sharing are detailed in Tables S1 and S2, respectively.

Fig. 1
figure 1

Peptide sharing between SARS-CoV-2 spike glycoprotein and mammalian and coronavirus proteomes. a Peptide sharing at the 6-mer level. b Peptide sharing at the 7-mer level

Figure 1 shows that:

  • A massive heptapeptide sharing exists between SARS-CoV-2 spike glycoprotein and human proteins. Such a peptide commonality is unexpected and highly improbable from a mathematical point of view, given that, as detailed under the “Methods” section, the probability of the occurrence in two proteins of just one heptapeptide is equal to ~ 20−7 (or 1 out of 1,280,000,000). Likewise, the probability of the occurrence in two proteins of just one hexapeptide is close to zero by being equal to ~ 20−6 (or 1 out of 64,000,000).

  • Only the viral peptide sharing with the murine proteome and, at a lesser extent, with the rat proteome keeps up with that shown by human proteins;

  • Domestic animals, rabbit, and the three primates analyzed here have no or only a few peptide commonalities;

  • Likewise, the proteomes of the three human coronaviruses HKU1, 229E, and OC43, which were used as viral controls, have no or only a few peptides in common with the spike glycoprotein. In this regard, it seems that the SARS-CoV-2 spike glycoprotein is phenetically more similar to humans and mice than to its coronavirus “cousins”.

Conclusions

This study thoroughly quantifies the hexa- and heptapeptide sharing of SARS-CoV-2 spike glycoprotein—which is a major antigen of the virus—with mammalian proteomes. A massive peptide commonality is present with humans and mice, i.e., organisms that undergo pathologic consequences following SARS-CoV-2 infection. Instead, no or a lowest number of common peptides are present in mammals that have no major pathologic sequelae once infected by SARS-CoV-2 [10,11,12]. Hence, the data appear to be an indisputable proof in favor of molecular mimicry as a potential mechanism that can contribute to or cause the SARS-CoV-2 associated diseases [8].

As a second relevant annotation, this study indicates that particular attention has to be dedicated to the choice of the laboratory animals to be used in preclinical studies during the formulation/validation of anti-pathogen vaccines. In the case in object, given the lowest level of sequence similarity of SARS-CoV-2 spike glycoprotein vs. primates proteins, results obtained in studies that use primates as animal models, i.e., rhesus macaque [19], would be unreliable because of the impossibility of verifying the occurrence of cross-reactivity and related autoimmunity in the absence of shared sequences. In this regards, data illustrated in Fig. 1 explain why, as highlighted by Hogan [20], “SARS-CoV infection of cynomolgus macaques did not reproduce the severe illness seen in the majority of adult human cases of SARS” [21]. Actually, no clinical signs of disease or marked lung pathology were seen in a study in which both rhesus and cynomolgus macaques were infected with SARS-CoV [22], and the Authors’ conclusion is that the macaque model is of limited utility in the study of SARS and the evaluation of therapies. Likewise, McAuliffe et al. [23] described similar findings: “SARS-CoV administered intranasally and intratracheally to rhesus, cynomolgus and African Green monkeys replicated in the respiratory tract but did not induce illness”.

As for domestic animals and cattles, coronaviruses are long known to be enteric pathogens of cats (FeCoV), dogs (CaCoV), cattle (BCoV), and swine (TGEV) [24]. Nonetheless, coronaviruses do not appear to be pathogenic for domestic animals and cattles. Indeed, the scarce or null susceptibility to SARS-CoV-2-induced pathologies is certified by the American Veterinary Medical Association that verbatim declares: “during the first five months of the COVID-19 outbreak (January 1 – June 8, 2020), which includes the first twelve weeks following the March 11 declaration by the WHO of a global pandemic, fewer than 20 pets have tested positive, with confirmation, for SARS-CoV-2 globally. This despite the fact that as of June 8, the number of people confirmed with COVID-19 exceeded 7 million globally and 1.9 million in the United States” (https://www.avma.org/).

In conclusion, in light of the data exposed in Fig. 1 and given the susceptibility parameters such as aging and health status, only aged mice appear to be a correct animal model for testing an anti-SARS-CoV-2 spike glycoprotein vaccine to be used in humans [25, 26].

Finally, this study once more reiterates the concept that only vaccines based on minimal immune determinants unique to pathogens and absent in the human proteome might offer the possibility of safe and efficacious vaccines [16, 27,28,29,30].