Introduction

Viral infections of the respiratory tract are one of the leading causes of morbidity and mortality in humans worldwide, especially in infants and children (Jartti et al. 2012). Respiratory viruses are able to enter brain (neuroinvasion) and infect nerve cells (neurotropism) in addition to causing respiratory diseases (Desforges et al. 2014). Coronaviruses (CoVs) are one of the most important respiratory viruses and contain a non-segmented RNA sequence. CoVs are phylogenetically classified into four genera: α-, β-, γ-, and δ-CoVs. Some members of CoVs such as severe acute respiratory syndrome coronavirus (SARS-CoV) and human CoV strains 229E (HCoV-229E), OC43 (HCoV-OC43), NL63 (HCoV-NL63), and HKU1 (HCoV-HKU1) can predispose to neurological injury (Bergmann et al. 2006; Morgello 2020). HCoV-OC43, HCoV-HKU1, and HCoV-229E infections have especially been shown to be associated with nervous system injury in pediatric patients (Lau et al. 2006; Principi et al. 2010; Yeh et al. 2004).

Much attention has been given recently to the novel SARS-CoV-2 and its related coronavirus disease 2019 (COVID-19). Neurological syndromes including abnormalities in smell and taste, stroke, and acute necrotizing hemorrhagic encephalopathy have been observed in SARS-CoV-2-infected patients (Beyrouti et al. 2020; Poyiadji et al. 2020; Xydakis et al. 2020). Although these clinical findings suggest that SARS-CoV-2 could have neuroinvasive and neurotropic potential, whether SARS-CoV-2 plays a direct causative role remains to be determined. Moreover, SARS-CoV-2-related neurological abnormalities mostly occur in severe cases, in which virus-induced immune system hyperactivity, the “cytokine storm,” contributes to disease severity (Wu et al. 2020). SARS-CoV-2 infection is less likely to be symptomatic or result in severe disease in infants and children compared with adult patients (Zimmermann and Curtis 2020). Furthermore, whether SARS-CoV-2 can be detected in nervous system and directly predispose to neurological abnormalities in infants and children have not been reported up to April 2020. Furthermore, very limited information is available to describe the possible evolutionary and molecular relationships of SARS-CoV-2 with other neuroinvasive and neurotropic RNA viruses that have the potential to result in infection in pediatric patients.

The purpose of the present study was to compare phylogenetically the whole-genome sequences of SARS-CoV-2 and non-segmented RNA viruses including CoVs and other viruses that have the potential to infect the nervous system of infants and children with use of bioinformatics methodology. The conserved domains (CDs) of SARS-CoV-2 and multiple sequence alignment (msa) methods were used to compare selected CDs that exist in both CoVs and members of other RNA viral families. Finally, the surface spike (S) glycoprotein and its protease cleavage sites were aligned among the CoVs to investigate their potential contribution to neurovirulence. The S protein of SARS-CoV-2 consists of two functional domains: the receptor binding domain (RBD) of S1 protein that binds to host cell receptor angiotensin-converting enzyme 2 (ACE2), and the S2 protein that mediates viral and membrane fusion (Lan et al. 2020). The virus requires S protein priming by cellular proteases, furin, and transmembrane serine protease 2 (TMPRSS2) for entry and membrane fusion after binding to the ACE2 (Hoffmann et al. 2018). Furin cleaves the S protein at the S1/S2 boundary (Andersen et al. 2020), whereas TMPRSS2 cleaves the S protein at the S1/S2 boundary or within S2 subunit (Hoffmann et al. 2020; Lan et al. 2020).

Methods

The whole-genome sequences of 32 non-segmented RNA viruses including 10 CoVs (Table 1) were retrieved from National Center for Biotechnology Information (NCBI, Bethesda, MD, USA) for the purpose of phylogenetic analysis, which was conducted with MEGAX (Penn State University, PA, USA). Three major criteria were used for selecting virus sequences for the phylogenetic analysis. The viruses (1) have human as host, except murine hepatitis virus strain JHM (MHV-JHM), which has mouse as host, but shows a high degree of neuroinvasion and neurotropism (Bergmann et al. 2006; Desforges et al. 2014); (2) have been reported to be associated with neurological diseases in human, especially in fetuses, neonates, or children if possible; and (3) have been isolated from human (especially fetus, neonate, or children) nervous system such as brain, cerebrospinal fluid (CSF), spinal cord, and sensory organs if possible. Furthermore, because the SARS-CoV-2-related multisystem inflammatory syndrome (MIS) was initially reported in children in Europe and North America (DeBiasi et al. 2020; Riphagen et al. 2020; Verdoni et al. 2020), an Italy variant (GenBank: MT077125) and a USA variant (GenBank: MT325563) were added into the present study alongside the original Wuhan variant (SARS-CoV-2-Wuhan, GenBank: NC_045512). All genomic sequences were aligned with the ClustalW algorithm and phylogenetic prediction inferred by the maximum likelihood method and Tamura-Nei model (Kumar et al. 2018; Tamura and Nei 1993) except for sequence identity analysis between the three SARS-CoV-2 variants, which was detected by Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/). The reliability of phylogenetic inference at each branch node was estimated by the Bootstrap test with 1000 replications (Felsenstein 1985). Bootstrap values greater than 80% were considered statistically significant for grouping (Zhu et al. 2016). The trees were visualized by Dendroscope-3 (University of Tübingen, Baden-Württemberg, Germany) (Huson and Scornavacca 2012). The searches of each genomic open reading frame (ORF) and the CDs encoded by ORFs were performed through the ORF Finder and Conserved Domain Database webservers (NCBI), respectively (Marchler-Bauer et al. 2015). For the S protein sequence alignment, CoVs that do not infect humans such as porcine hemagglutinating encephalomyelitis virus (PHEV, GenBank: KY419112), feline coronavirus (FCoV, GenBank: DQ010921), and transmissible gastroenteritis virus (TGEV, GenBank: KX900411) (Bergmann et al. 2006) were also analyzed for the purpose of comparison. RStudio (RStudio, Inc., Boston, MA, USA) with msa package was used (Bodenhofer et al. 2015) for multiple protein sequence alignment. The results were visualized by RStudio and LaTeX with TEXshade package (Beitz 2000).

Table 1 Selected non-segmented RNA viruses (Homo sapiens as host) analyzed in phylogenetic analysis of this study

Results

SARS-CoV-2 is genetically distant from other non-segmented RNA viruses

Figure 1 b represents a circular phylogram demonstrating the evolutionary relationship between SARS-CoV-2 (blue-labelled) and 29 non-segmented RNA viruses known to infect nervous system (Messacar et al. 2018). Nineteen of these RNA viruses (red-labelled) are also known to be neuroinvasive and neurotropic to infants and children (Messacar et al. 2018). Three SARS-Cov-2 variants share more than 99.97% genomic sequence identity as the results given by the analysis using Clustal Omega. The phylogram demonstrates that the SARS-CoV-2 is the closest relative to SARS-CoV because the branch lengths of the phylogram are proportional to the amount of inferred evolutionary change. Furthermore, SARS-CoV-2 is grouped closed to SARS-CoV with bootstrap value of 100%. However, the bootstrap value for subdividing SARS-CoV-2-USA from SARS-CoV-2-Wuhan was 63.70%, suggesting that the support of grouping of both SARS-CoV-2 variants is low. The CoV sequences formed a genetic group distinct from other virus sequences, and are evolutionarily distant from other members of viral families. The bootstrap value for dividing CoV family from its closest viral family picornaviruses was 43.60%.

Fig. 1
figure 1

Evolutionary history of SARS-CoV-2 compared with selected CoVs and neuroinvasive and neurotropic non-segmented RNA viruses. Evolutionary analyses were conducted in MEGAX. The nucleotide sequences were aligned with ClustalW and the results were visualized by Dendroscope-3. The percentage of replicate trees in which the associated clusters was found by the bootstrap test (1000 replicates) is shown next to the branches. The accession numbers for the viruses studied in this present study are shown. The highest log likelihood of the tree is − 541499.02. The branch of SARS-CoV-2 was labelled with blue color, whereas the branches of other viruses infecting infants and children nervous system were labelled with red color. This analysis involved 32 nucleotide sequences and contained a total of 32822 positions in the final dataset. The distance scale bar suggests a 0.2 (20%) genetic variation for the length of the scale between sequences

SARS-CoV-2 conserved domains are found in other neuroinvasive and neurotropic RNA viruses

Figure 2 a is a schematic diagram showing some important SARS-CoV-2 ORFs and their encoded CDs. Similar to other CoVs, SARS-CoV-2 contains at least six ORFs in its genome with their encoded nonstructural (nsp), structural, and accessory proteins. Four main structural proteins, spike, membrane (or matrix), envelope (E), and nucleocapsid proteins, are encoded by ORFs 26, 75, 5, and 31, respectively, near the 3′-terminus. SARS-CoV-2 has two functional CDs, macrodomain (macro_X_nsp3-like) and viroporin (E protein) (Madan et al. 2005), which can be found in the other neuroinvasive and neurotropic RNA viruses, and also contains the common RNA viral domains such as RNA-directed RNA polymerase (RdRp). Figure 2 b (I, II) and c (I, II) show the msa results of macro_X_nsp3-like and viroporin, respectively. In both macro_X_nsp3-like and viroporin, all three SARS-CoV-2 variants have 100% of identity and similarity in protein sequences. The macro_X_nsp3-like is encoded by ORF9 and can be found in CoVs and togaviruses. Compared with the neuroinvasive and neurotropic viruses, SARS-CoV-2 shares 33.3% identity and 47.1% similarity with HCoV-OC43, 36.3% identity and 47.9% similarity with HCoV-229E, 31.3% identity and 41.5% similarity with rubella, 30.3% identity and 44.6% similarity with eastern equine encephalitis virus, 30.9% identity and 42.7% similarity with western equine encephalitis virus, 37.1% identity and 50.4% similarity with Venezuelan equine encephalitis virus, and 26.5% identity and 37.1% similarity with Chikungunya virus (Fig. 2b-II). The possibility of higher-level groupings of macro_X_nsp3-like is strongly supported within CoVs (bootstrap values, 99 to 100%), but not between CoVs and other neuroinvasive and neurotropic viruses (bootstrap values 54.3%, Fig. 2b-III).

Fig. 2
figure 2

a Schematic representation of SARS-CoV-2 complete genome (accession number: NC_04551) and selected ORF and encoded conserved domains. Macrodomain (macro_X_nsp3-like), S protein, and viroporin (E protein), investigated in the present study, are highlighted in blue color. The arrows indicate the protease priming sites on S protein. b The macrodomain (macro_X_nsp3-like) sequence of SARS-CoV-2 was compared with SARS-CoV, MERS-CoV, HCoV-OC43, HCoV-229E, HCoV-HKU1, HCoV-NL63, MHV-JHM, rubella, eastern equine encephalitis virus, western equine encephalitis virus, Venezuelan equine encephalitis virus, and Chikungunya virus. c The viroporin sequence of SARS-CoV-2 was compared with SARS-CoV, MERS-CoV, HCoV-OC43, HCoV-229E, HCoV-HKU1, HCoV-NL63, MHV-JHM, coxsackievirus, echovirus, poliovirus, and HIV-1. RStudio with msa package including ClustalW command was used for multiple sequence alignment. The results were visualized by RStudio and LaTeX with TEXshade package. All identical residues at a position were shaded in blue or purple if the number of matching residues is higher than 50% or 80%, respectively. The residues that are not identical but similar to the consensus sequence were shaded in red. Furthermore, the degree of protein sequence conservation and amino acid properties such as charge and hydrophobicity were shown as color scales and bar graph along the alignment. On the top of the plot, residue conservation was shown as bars and the charge of amino acid side chain was shown as color scales (red: acidic; blue: basic). Hydrophobicity was shown at the bottom of the plot (upper red box: hydrophobic; underside box: hydrophilic). The degree of similarity and identity between all sequences in the alignment were shown in tables (b-II, c-II). The bootstrap consensus trees inferred from 1000 replicates to represent the evolutionary history of the macrodomain (macro_X_nsp3-like) (b-III) and viroporin (c-III) between the viruses analyzed. The percentage of replicate trees in which the associated viruses clustered together in the bootstrap test is shown next to the branches. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed

The viroporin is encoded by ORF 5 and can be found in CoVs, coxsackievirus, echovirus, poliovirus, and HIV-1. Compared with the neuroinvasive and neurotropic viruses, SARS-CoV-2 shares sequence identity and sequence similarity with HCoV-OC43 (20.2% identity, 40.5% similarity), HCoV-229E (23.6% identity, 40.2% similarity), coxsackievirus (13.5% identity, 25.6% similarity), echovirus (13.5% identity, 24.3% similarity), poliovirus (13.5% identity, 31.0% similarity), and HIV-1 (9.4% identity, 27.0% similarity) (Fig. 2c-II). The possibility of higher-level groupings of viroporin within CoVs is high (bootstrap values, 74 to 100%), whereas it is low between CoVs and other neuroinvasive and neurotropic viruses (bootstrap values 61.7%, Fig. 2c-III).

Diversity of spike protein may determine the severity of SARS-CoV-2 infection

As the S protein is a major neurovirulent factor of several CoVs, there was a key focus on aligning the S protein among different CoVs (Miura et al. 2008; Phillips et al. 2002). Fingerprint analysis was implemented for the RBD of S1 (Fig. 3a) and S2 proteins (Fig. 3b) in order to gain an overview of the sequence identity and similarity among the different CoVs. The complete sequence is depicted in one single line, and the amino acid residues are presented as colored vertical lines. Higher similarities correspond to darker vertical lines. As shown in Fig. 3 a and b, the protein sequences of S1 RBD and S2 have 100% identity and similarity between all three SARS-CoV-2 variants. However, the S1 RBD is less conserved than that of S2. In the S1 RBD, SARS-CoV-2 shares 73.4% identity and 82.8% similarity with SARS-CoV, 20.8% identity and 32.5% similarity with HCoV-OC43, 15.6% identity and 26% similarity with HCoV-229E, 20.5% identity and 33.7% similarity with MHV-JHM, and 20% identity and 29.6% similarity with HCoV-HKU1. All of these viruses possess neuroinvasive and neurotropic properties. However, the S1 RBD has only 12.6% identity and 22.9% similarity to that of TGEV. This virus has not been reported to infect the human nervous system thus far. In the S2 protein, SARS-CoV-2 shares 89.4% identity and 95.5% similarity with SARS-CoV, 42.3% identity and 59.7% similarity with HCoV-OC43, 33.5% identity and 50.6% similarity with HCoV-229E, 41.3% identity and 59.7% similarity with MHV-JHM, and 39.1% identity and 57.2% similarity with HCoV-HKU1, but here SARS-CoV-2 shares relatively less identity and similarity with TGEV (32.2% identity and 47.9% similarity). Figure 3 c shows the protease cleavage sites at the S1/S2 boundary and within the S2 protein. All three SARS-CoV-2 variants have same cleavage motifs. The consensus motifs of the cleavage sites at the S1/S2 boundary (sites 1 and 2) are much less conserved than the motifs within the S2 protein. The cleavage site 1 of S1/S2 boundary of SARS-CoV-2, SARS-CoV, MHV-JHM, HCoV-HKU1, HCoV-OC43, PHEV, and MERS-CoV is prone to mutation, and the site 1 in HCoV-299E, HCoV-NL63, TGEV, and FCoV does not contain a consensus motif. A polybasic cleavage site (RRAR) for furin was found at the S1/S2 boundary of SARS-CoV-2 (Fig. 3c). This polybasic cleavage site was observed in MHV-JHM as well, but not in other CoVs and viruses studied. Furthermore, the cleavage site 2 at the S1/S2 boundary of SARS-CoV-2 shows a similar motif to SARS-CoV, MHV-JHM, HCoV-HKU1, HCoV-OC43, and PHEV, but not to HCoV-299E, HCoV-NL63, TGEV, FCoV, and MERS-CoV. These results suggest that SARS-CoV-2 binding and priming host receptor probably share more similar mechanisms with those from neuroinvasive and neurotropic CoVs rather than the other CoVs.

Fig. 3
figure 3

Multiple sequence alignments of S1 RBD, S2 protein, and S protein protease cleavage sites among CoVs. An overview of sequence similarities of S1 RBD (a) and S2 protein (b) was implemented by fingerprint plots, which depict the complete sequence in one single line. The residues were presented as colored vertical lines (red: similar; blue: ≥ 50% conserved; purple: ≥ 80% conserved). S1 RBD showed much less conserved than S2 protein. The star symbol (*) in a indicated that the whole S1 protein sequences of four CoVs (HCoV-229E, HCoV-NL63, TGEV, FCoV) were used for msa analysis, because the S1 RBD of them cannot be found in NCBI Conserved Domain Database. The degree of similarity and identity between all sequences in the alignment were shown in tables. c The protease cleavage sites of S protein were compared among CoVs with same msa methodology as described in Fig. 2 b and c. There are two cleavage sites at S1/S2 boundary. However, both are less conserved than the cleavage site at S2 protein. The polybasic cleavage sites (RRAR) for furin observed in SARS-CoV-2 and MHV-JHM at S1/S2 boundary were underlined in red

Discussion

The primary objectives of the current study were to determine the possible evolutionary and molecular relationships between SARS-CoV-2 and non-segmented RNA viruses, especially the viruses that can infect the nervous system in infants and children. Furthermore, the consensus sequence motifs of S protein and its protease cleavage sites were focused on to discover their potential roles in neurovirulence.

Three of the CoV members (HCoV-HKU1, HCoV-OC43, and HCoV-229E) have been reported result in injury of the nervous system in pediatric patients. Therefore, it remains possible that SARS-CoV-2 is also neuroinvasive, neurotropic, and even neurovirulent in infants and children because neurological impairment has been reported in SARS-CoV-2-infected patients (Mao et al. 2020; Xydakis et al. 2020). Although SARS-CoV-2 is genetically distant from the other members of the RNA viral families, its macrodomain (macro_X_nsp3-like) and viroporin (E protein) can also be found in the RNA viruses, which infect the nervous system of infants and children. In the present investigation, the macrodomain (macro_X_nsp3-like) was found in CoVs and togaviruses. The macrodomain plays an important role in viral replication and pathogenesis. It has been reported that MHV were unable to cause hepatitis or had reduced neurovirulence after the catalytic site of macrodomain had been mutated (Fehr et al. 2015; Park and Griffin 2009). The macrodomain activity in togaviruses has been shown to affect neurovirulence in mice (Abraham et al. 2020). Furthermore, macrodomain promotes virulence and suppresses interferon (IFN) expression in mice during the early stages of SARS-CoV infection (Fehr et al. 2016). Since SARS-CoV-2 shares high macrodomain sequence identity (80.3%) and similarity (89.3%) with SARS-CoV (Fig. 2b-II), and IFN acts as a modulator of blood-brain barrier (BBB) integrity for viral neuroinvasion (Miner and Diamond 2016), suppression of IFN by macrodomain activity may result in BBB leakage after viral infection and further facilitate neuroinvasion.

Viroporin is a family of small transmembrane proteins including the E protein of CoVs (Madan et al. 2005). In the present study, viroporin was detected in CoVs, picornaviruses such as coxsakie-, echo-, polio-viruses, and retroviruses such as HIV-1, which is consistent with previous reports (Gonzalez 2015; Madan et al. 2005; Nieva et al. 2012). Viroporin may play a role in neurotropism, because some CoVs such as the neurovirulent MHV with E gene deletion were severely disabled from infecting new host cells with significantly reduced viral titers (DeDiego et al. 2007). Moreover, viroporin in HCoV-OC43 has been show as a determinant of neurovirulence and central nervous system (CNS) pathology (Madan et al. 2005; Stodola et al. 2018). During SARS-CoV infections, viroporin can activate caspase-1 by activating the NLRP3 inflammasome (Chen et al. 2019; Farag et al. 2020). Caspase-1 cleaves pro-interleukin (IL)-1β to mature IL-1β. As a major proinflammatory cytokine, IL-1β facilitates neuroinvasion by disrupting blood-brain barrier (BBB) integrity (Miner and Diamond 2016). High viroporin sequence identity (95.9%) and similarity (97.2%) were observed between SARS-CoV-2 and SARS-CoV (Fig. 2c-II). Thus, it might speculate that SARS-CoV-2 is neuroinvasive by means of viroporin-induced inflammation causing BBB leakage. Although the exact functions of macrodomain and viroporin in SARS-CoV-2 have not been elucidated, the sequence identity and similarity suggest that SARS-CoV-2 may share similar mechanisms with neuroinvasive and neurotropic viruses to infect the nervous system in pediatric patients, which include suppression of IFN by macrodomain activity and/or increased membrane permeability and inflammation induced by viroporin. However, whether such mechanisms in neuroinvasive and neurotropic viruses differ from non-neuroinvasive and non-neurotropic viruses has not been studied. The further investigation is necessary.

The present analysis also showed that the S1 RBD and the protease cleavage sites at the S1/S2 boundary are much less well conserved compared with the S2 protein and the protease cleavage site at S2 among CoVs, respectively. These findings are consistent with previous reports (Perlman and Wheeler 2016) and suggest that each CoV may need a specific binding receptor and protease to bind to its target cells. The high diversity of S1 RBD and cleavage sites at S1/S2 boundary may also determine the severity of the injury to host resulting from the viral infection. However, both SARS-CoV-2 and SARS-CoV cell entries require ACE2 binding and TMPRSS2 priming (Hoffmann et al. 2020; Li et al. 2003). This is probably because they share high sequence identity and similarity of S1 RBD and protease cleavage sites. ACE2 receptor expression can be found in human brain and brain-derived microvascular endothelial cells (Li et al. 2007), whereas the TMPRSS2 gene appears to be low or absent in the human brain (Glowacka et al. 2011; Vaarala et al. 2001). Furthermore, TMPRSS2 expression in human immature BBB endothelial cells has not been reported. This could explain the reason that SARS-CoV-2 infection is usually mild in the nervous system of adults and children. Besides ACE2 binding and TMPRSS2 priming, SARS-CoV-2 cell entry requires furin, which is highly expressed in the CNS. The polybasic furin cleavage site (RRAR) was observed in all three SARS-CoV-2 variants and a highly neurovirulent MHV-JHM (Fig. 3C). This is of importance to understand the roles of SARS-CoV-2 in the neurologic diseases, because other proteases such as TMPRSS2 associated with S protein priming may not be present in the human brain. This suggests that SARS-CoV-2 has great potential to be neurotropic and even neurovirulent to the nervous system.

There are several limitations to the study. First, not all viral sequences selected from NCBI database were obtained from nervous tissues of pediatric patients with neurological diseases. This means that it is possible that not all of the results described have direct relevance to pediatric nervous system diseases. This could be corrected in future studies once updated viral sequences have been obtained and sequenced from nervous tissues of pediatric patients with neurological diseases. Secondly, although the homology of macro- and viroporin domains has been predicted between SARS-CoV-2 and other RNA viruses studied, it may be unrelated to the prediction for the analysis, especially in neurological diseases in pediatric patients. There are no direct evidence showing both domains of SARS-CoV-2 play important roles in neuroinvasion and neurotropism in pediatric patients. However, this homology prediction can still be useful in proposing and testing hypothesis in molecular biology, such as hypotheses about the drug design, ligand binding site location, and substrate specificity (Vyas et al. 2012; Xiang 2006). Thirdly, it will be important to understand the function and similarity of predicted domains in 3-dimension (3D) functional protein structure between SARS-CoV-2 and other RNA viruses, which was not performed in the current study. However, the 3D crystallization structure of SARS-CoV-2 macrodomain has been shown to have high similarity to SARS-CoV (Frick et al. 2020). The 3D remodeling of SARS-CoV-2 S protein using bioinformatic analyses has recently been reported, showing that SARS-CoV-2 and SARS-CoV have similar receptor utilization, but with low amino acid similarity in RBD (Jaimes et al. 2020), and the S proteins of SARS-CoV-2 and SARS-CoV are structurally and evolutionary related (Baig et al. 2020). The 3D structure of viroporin in SARS-CoV-2 has not been reported so far. However, Surya et al. showed a nuclear magnetic resonance spectroscopy (NMR) structure of the E protein of SARS-CoV (Surya et al. 2018), which will be helpful in studying the function and structure of viroporin in SARS-CoV-2 in the future.

Conclusion

In summary, although there is no evidence of SARS-CoV-2 directly causing any known human neuropathology, SARS-CoV-2 shares some close molecular and structural similarity to neuroinvasive and neurotropic non-segmented RNA viruses. This leads to speculation about possible involvement of SARS-CoV-2 in causing neurological abnormalities in pediatric patients.