Background

In December 2019, a novel coronavirus (SARS-CoV-2) was first identified as a zoonotic pathogen of humans in Wuhan, China, causing a respiratory infection with associated bilateral interstitial pneumonia. The disease caused by SARS-CoV-2 was named by the World Health Organization as COVID-19 and it has been classified as a global pandemic since it has spread rapidly to all continents. As of May 20, 2020, there have been 4.889.287 confirmed COVID-19 cases worldwide with 322.457 deaths reported to the WHO [1]. Whilst clinical and epidemiological data on COVID-19 have become readily available, information on the pathogenesis of the SARS-CoV-2 infection has not been forthcoming [2]. The transcriptomic and proteomic data on host response against SARS-CoV-2 is scanty and not effective therapeutics and vaccines for COVID-19 are available yet.

Coronaviruses (CoVs) typically affect the respiratory tract of mammals, including humans, and lead to mild to severe respiratory tract infections [3]. Many emerging HCoV infections have spilled-over from animal reservoirs, such as HCoV-OC43 and HCoV-229E which cause mild diseases such as common colds [4, 5]. During the past 2 decades, two highly pathogenic HCoVs, severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV), have led to global epidemics with high morbidity and mortality [6]. In this period, a large amount of experimental data associated with the two infections has allowed to better understand molecular mechanism(s) of coronavirus infection, and enhance pathways for developing new drugs, diagnostics and vaccines and identification of host factors stimulating (proviral factors) or restricting (antiviral factors) infection remains poorly understood [7]. Structures of many proteins of SARS-CoV and MERS-CoV, and biological interactions with other viral and host proteins have been widely explored; through experimental testing of small molecule inhibitors with anti-viral effects [8, 9]. ACE2, expressed in type 2 alveolar cells in the lung, has been identified as receptor of SARS-CoV and SARS-CoV-2, while dipeptidyl peptidase DPP4 was identified as the specific receptor for MERS-CoV [10, 11].

The investigation of structural genomics and interactomics of SARS-CoV-2 can be implemented through systematical mapping of protein–protein interactions (PPI) between SARS-CoV-2 and human host, and an integrated bioinformatics approach [12, 13]. Structural analysis of specific SARS-CoV-2 proteins, in particular Spike glycoproteins (S-glycoproteins), and their interactions with human proteins, can guide the identification of the putative functional sites and help to better define the pathologic phenotype of the infection. This functional interaction analysis between the host and other HCoVs, combined with an evolutionary sequence analysis of SARS-CoV-2, can be used to guide new treatment and prevention interventions.

We investigated here biologically and clinically relevant molecular targets of three human coronaviruses (HCoV) infections using a network based approach. A functional analysis of HCoV–host interactome was carried out in order to provide a theoretic host–pathogen interaction model for HCoV infections, and to predict viable models for SARS-CoV-2 pathogenesis. Three HCoV causing respiratory diseases were used as the model targets, namely: SARS-CoV, that shares with SARS-CoV-2 a strong genetic similarity, including MERS-CoV, and HCoV-229E.

Methods

Comparative reconstruction of S-glycoprotein in HCoVs

The reconstruction of virus–host interactome was carried out using the RWR algorithm to explore the human PPI network and the multilayer PPI platform enriched with gene expression data sets. 259 sequences of CoVs, infecting different animal hosts (Additional file 1: Table S1), were downloaded by GSAID and NCBI database in order to evaluate the variability in the S gene. SARS-CoV, HCoV-229E and MERS-CoV and other CoV full genome sequence groups were aligned with MAFFT [14], synonymous and non-synonymous mutations, and amino acid similarity were calculated using the SSE program with a sliding windows of 250 nucleotides and a pass of 25 nu [15]. A homology model was built for the amino acid sequences of the S-glycoprotein, derived from the full genome sequence obtained at “SARS-CoV-2/INMI1/human/2020/ITA” (MT066156.1). The Swiss pdb server was used to construct three-dimensional models for the S-glycoprotein of SARS-CoV-2 [16]. Among proteins with a 3D structure, the best match with the “SARS-CoV-2/INMI1/human/2020/ITA” sequence was the 6VSB.1, that was evaluated considering the identity of two amino acid sequences and the QMEN value included in Swiss pdb server. The model of a single chain was overlapped with the three-dimensional structure of S-glycoprotein single chain belonging to SARS-CoV (5WRG), HCoV-229E (6U7H.1) and MERS-CoV (5X59), using Chimera 1.14 [17]. In order to better evaluate the conservation of the sequence in each site, all sequences were aligned with MAFFT and the topology of all structures were compared. The detailed description of the reconstruction of S-glycoprotein structure is reported in Additional file 2.

PPI and gene co-expression network

Network analysis, based on protein–protein interactions and gene expression data, was performed in order to view all possible virus–host protein interactions during the HCoV infections. Since the SARS-CoV-2 genome exhibits substantial similarity to the SARS-CoV genome [18] and subsequently also the proteome [19], we hypothesized that several molecular interactions that were observed in the SARS-CoV interactome will be preserved in the SARS-CoV-2 interactome. Virus–host interactomes (SARS-CoV, MERS-CoV, HCoV-229E) were inferred through published PPI data, using two publicly accessible databases (STRING Viruses and VirHostNet), as well as published scientific reports with a focus on virus–host interactions [20,21,22]. As a next step, the virus–host PPI list, extracted in this first step, was merged with additional PPI databases, i.e. BioGrid, InnateDB-All, IMEx, IntAct, MatrixDB, MBInfo, MINT, Reactome, Reactome-FIs, UniProt, VirHostNet, BioData, CCSB Interactome Database, using R packages PSICQUIC and biomaRt [23, 24]. In total, a large PPI interaction database was assembled, including 13,020 nodes and 71,496 interactions.

The gene expression data set was built from the Protein Atlas database, using tissue and cell line data [25]. To identify the most likely interactions, and to obtain functional information, Random walk with restart (RWR), a state-of-the-art guilt-by-association approach by R package RandomWalkRestartMH [26] was used. It allows to establish a proximity network from a given protein (seed), to study its functions, based on the premise that nodes related to similar functions tend to lie close to each other in the network. For each node, a score was computed as measure of proximity to the seed protein. S-glycoproteins of SARS-CoV, MERS-CoV and HCoV-229E were used as seed in the application of the RWR algorithm.

Functional enrichment analysis

To evaluate functional pathways of proteins involved in host response, gene enrichment analysis was performed, using Kyoto Encyclopedia of Genes and Genomes (KEGG) human pathways and Gene Ontology databases. Network representation from the gene enrichment analysis was performed using ShinyGO v0.61 [27]. The statistical significance was obtained, calculating the False Discovery Rate (FDR).

Results

Structure of S-glycoprotein CoVs

To evaluate the diversity along the full genome, pairwise distance was calculated on 259 HCoV genomes. Diversity was distributed along the entire CoV genome, with the most conserved region located in Orf1ab, as expected, while the spike gene region exhibited a rather high diversity (Additional file 2: Figure S1), due to key role of the S-glycoprotein during viral entry in specific hosts [28].

Consequently, the analysis was focused on the S-glycoprotein, as a key virus component involved in host interaction [29]. A 3D model of S-glycoprotein of the SARS-CoV-2 sequence (MT066156.1) was built on the sequence obtained at Laboratory of Virology, National Institute for Infectious Diseases “L. Spallanzani” IRCCS, using Swiss pdb viewer server (Additional file 2: Figure S2a, b). The SARS-CoV-2 S-glycoprotein structure was then compared to other HCoVs as shown in Additional file 2: Figure S2. The S-glycoprotein structures of the various HCoV were very similar overall. In particular, a strong similarity was shown in the RBD (nCov: residues 319–591) [30], and this was most evident for the comparison between SARS-CoV-2 and SARS-CoV, which share the same cell receptor (ACE2). The amino acid differences among the S-glycoproteins of the selected HCoVs (SARS-CoV-2, MERS-CoV, SARS-CoV, HCoV-229E) are shown in Additional file 2: Figure S3, where a lower topology similarity was observed with HCoV-229E S-glycoprotein, which binds a different host receptor.

Overall, the pattern arising from such comparison was consistent with specific host receptors, as well as with different host reservoirs and ancestry [31].

Human CoV and host interactome

An interactome map was built to highlight biological connections among S-glycoprotein and the human proteome. Using the analysis pipeline described in the methods, a large PPI interaction database was assembled, including 13,020 nodes and 71,496 interactions between human host and the three selected viruses (SARS-CoV, MERS-CoV and HCoV-229E).

The interactome reconstruction was obtained with the RWR analysis, finding 200 closest proteins to seed, or S-glycoproteins of HCoV-229E, SARS-CoV and MERS-CoV (Additional file 2: Figures S4–S6). In Additional file 1: Tables S2–S4, lists of genes selected by RWR algorithm for HCoV-229E, SARS-CoV and MERS-CoV, along with proximity score were reported. In order to further dissect the S-glycoprotein-host interactions, enrichment analysis was carried out with Reactome and KEGG databases. Reactome pathway enrichment analysis revealed biological pathways of DNA repair, transcription and gene regulation for the HCoV-229E S-glycoprotein, with high significance (FDR < 0.01%). KEGG pathway enrichment analysis revealed ubiquitin mediated proteolysis as the most significant pathway (FDR < 0.01%), as well as cellular proliferation pathways, associated with other viral infections (Hepatitis B, measles, Epstein–Barr virus infection and Human T-cell leukemia virus 1 infection) as well as with carcinogenesis (Fig. 1). Next, the RWR algorithm was applied to a multilayer network built on the PPI interactome and on the Gene Coexpression (COEX) network, again with S-glycoprotein of HCoV-229E as seed. The results highlighted a set of genes that are connected in both PPI and COEX analysis, including ANPEP, RAD18, APEX, POLH, APEX1, TERF2, RAD51, CDC7, USP7, XRCC5, RAD18, FEN1, PCNA, all associated to the GO biological process category of DNA repair (FDR < 0.0001%) (Fig. 2). The same analyses were conducted for SARS-CoV and MERS-CoV.

Fig. 1
figure 1

KEGG human pathway and Reactome pathways enrichment analysis for 200 proteins identified by RWR algorithm using S-glycoprotein of HCoV-229E

Fig. 2
figure 2

PPI-COEX multilayer analysis, based on human PPI interactome and COEX network, with top 50 closest proteins/genes identified by RWR, using S-glycoprotein of HCoV-229E. Edges in blue represent protein-protein interactions, while red edges are coexpressions

The Reactome pathway enrichment analysis for the SARS-CoV revealed S-glycoprotein connection with early activation of innate immune system, such as the Toll Like Receptor Cascade and TGF-β, with a strong significance (FDR < 0.0001%), while the KEGG pathway enrichment analysis revealed an association with cellular proliferation, TGF-β and other infection-related pathways (FDR < 0.0001%) (Fig. 3). The PPI-COEX multilayer analysis highlighted a set of genes that are connected in both PPI and COEX analysis, i.e.CLEC4G, CLEC4M, CD209, ACE2, RPSA, all associated to the GO biological process category of SARS-CoV entry into host cell (FDR < 0.01%) (Fig. 4). In MERS-CoV, the Reactome pathway enrichment analysis showed a strong association with membrane signals activated by GPCR ligand binding (FDR < 0.0001%), and chemokine/chemokine receptor pathways. Consistent results were obtained with KEGG pathway enrichment, that highlighted cytokine–cytokine receptor and chemokine signalling pathways (FDR < 0.0001%) (Fig. 5). Finally, PPI-COEX multilayer analysis evidenced, for both PPI and COEX, CCR4, CXCL2, CXCL10, CXCL9, PF4, PF4V1, CCL11, CXCL11, XCL1, CXCR4 and CXCL14, all genes identified by the GO biological processes involved in the chemokine cascade (FDR < 0.0001%), in line with the results obtained with enrichment analyses (Fig. 6).

Fig. 3
figure 3

KEGG human pathway and Reactome pathways enrichment analyses for 200 proteins identified by RWR algorithm using S-glycoprotein of SARS-CoV

Fig. 4
figure 4

PPI-COEX multilayer analysis based on human PPI interactome and COEX network, with top 50 closest proteins/genes identified by RWR, using S-glycoprotein of SARS-CoV. Edges in blue represent protein-protein interactions, while red edges are coexpressions

Fig. 5
figure 5

KEGG human pathway and Reactome pathways enrichment for 200 proteins identified by RWR algorithm using S-glycoprotein of MERS-CoV

Fig. 6
figure 6

PPI-COEX multilayer analysis based on human PPI interactome and COEX network, with top 50 closest proteins/genes identified by RWR, using S-glycoprotein of MERS-CoV. Edges in blue represent protein-protein interactions, while red edges indicate coexpressions

Discussion

In-depth comparative analysis of S-glycoprotein

We applied network analysis, based on protein–protein interactions and gene expression data, in order to describe the interactome of the coronavirus S-glycoprotein and host proteins, with the aim to better understand SARS-CoV-2 pathogenesis. A preliminary structural analysis was conducted on the S-glycoprotein of SARS-CoV-2 as compared to the other 3 HCoV, using the S-glycoprotein as a model to shed light on the host–pathogen interaction in the dynamic process of SARS-CoV-2 infection. Although the amino acid sequences of the S-glycoprotein were different between the various HCoVs, the structural analysis exhibited high similarity; the best 3D structural overlap was found for SARS-CoV and SARS-CoV-2, consistent with the shared ACE2 predicted receptor.

Of note, the newly discovered SARS-CoV-2 genome has revealed differences between SARS-CoV-2 and SARS or SARS-like coronaviruses [31]. Although no amino acid substitutions were present in the receptor-binding motifs, that directly interact with human receptor ACE2 protein in SARS-CoV, six mutations occurred in the other region of the RBD [31, 32] were identified. On the other hand, the genomic comparative analysis highlighted the strong diversity in the S gene among CoV in different hosts, confirming the biologically vital role of the S-glycoprotein as a key factor in viral entry in cross-species transmission events [28].

In addition, the comparative 3D structural data may facilitate the definition of already known antibody epitopes in the S-glycoprotein of other coronaviruses, it will also be useful in rational vaccine design and in gauging anti-virus directed immune responses after vaccination [30]. In fact, S-glycoprotein remains an important target for vaccines and drugs previously evaluated in SARS and MERS, while a neutralizing antibody targeting the S-glycoprotein protein could provide passive immunity. The host interactome, linked to S-glycoprotein of SARS-CoV and MERS-CoV, mainly highlighted innate immunity pathway components, such as Toll Like receptors, cytokines and chemokines. The 3D structural analysis confirmed that we established that S-glycoprotein of SARS-CoV-2 has strong similarity in the 3D structure with SARS-CoV [18].

Host interactome in all three HCoV infections

The reconstruction of virus–host interactome was carried out using RWR algorithm to explore the human PPI network and studying PPI and COEX multilayer. The PPI network topology of host interactome in all three infections indicated the presence of several hub proteins. In the HCoV-229E–host interactome hub position was hold by RAD18 and APEX, which play an important role in DNA repair due to UV damage in phase S [33].

For the SARS-CoV interactome, the gene hubs were identified in ACE2, CLEC4G and CD209, which are known interactors with S-glycoprotein of SARS-CoV [34, 35].

In fact, two independent mechanisms were described as trigger of SARS-CoV infection: proteolytic cleavage of ACE2 and cleavage of S-glycoprotein. The latter activates the glycoprotein for cathepsin L-independent host cell entry. Activated the S-glycoprotein by cathepsin L mechanism in host cell entry was reported in many infections of CoV, such as HCoV-229E and SARS-CoV [36, 37]. A recent study speculated that this interaction will be preserved in SARS-CoV-2 [19], but might be disrupted of a substantial number of mutations in the receptor binding site of S gene will occur. Likewise, the S-glycoprotein in SARS-CoV-2 is expected to interact with type II transmembrane protease (TMPRSS2) and probably is involved in inhibition of antibody-mediated neutralization [38, 39]. It is rather unexpected that, for this virus, no intracellular pathways were highlighted by the multilayer analysis, suggesting that this field is still open to further investigation.

In MERS-CoV infection a gene hub role was described for DPP4, which is known to regulate cytokine levels through catalytic cleavage [40]. Immune cell—recruiting chemokines and cytokines, such as IP-10/CXCL-10, MCP-1/CCL-2, MIP-1α/CCL-3, RANTES/CCL-5, can be strongly induced by MERS-CoV, showing higher inducibility in human monocyte—derived macrophages by MERS-CoV as compared to than SARS-CoV infection [41]. The cellular proliferation pathways, involved immediately after virus entry, were described in all three models, resulting consistent with inhibiting activity on cell proliferation and cytotoxic effect due to HCoV infections [42, 43]. Finally, biological pathways, revealed by enrichment analysis in over all models, supported early activation of innate immune system, as Toll Like receptor Cascade and TGF-β for SARS-CoV, or chemokine and cytokine pathways and infection-related pathways for MERS-CoV, with a strong significance for both.

Pathogenic model for HCoV infections

We constructed a host molecular interactome with SARS-CoV, MERS-CoV and HCoV-229E in patients with cancer, assuming that most of these interactions, especially for SARS-CoV, are shared with SARS-CoV-2. A network-based methodology, along with guilt-by-association algorithm (RWR), was applied to define the pathological model of COVID-19 and provide a treatment of SARS-CoV-2, using existing transcriptomic and proteomic information.

Based on the main pathways identified by the network-based interactome analysis, the following issues require focus further study:

First, The predicted receptor for SARS-CoV-2 has been inferred to be ACE-2, i.e. the same used by SARS-CoV, based on the high similarity of the S-glycoprotein of the two viruses, and this is the basis for hypothesizing to use SARS-CoV as a model for virus–host interactome in COVID-19;

Second, Mitogen activated protein kinase (MAPK) is a major cell signalling pathway that is known to be activated by diverse groups of viruses, and plays an important role in cellular response to viral infections. MAPK interacting kinase 1 (MNK1) has been shown to regulate both cap-dependent and internal ribosomal entry sites (IRES)-mediated mRNA translation;

Third, The identification of the MAPK pathway in SARS-CoV model is highly consistent with in vivo model, where P38 MAPK was found increased in the lungs of mice infected with SARS-CoV [44];

Fourth, The identification of the TGF-β pathway in S-glycoprotein-induced interactome for SARS-CoV of particular interest, due to the previous evidence that this virus, and in particular its protease, triggered the TGF-β through the p38 MAPK/STAT3 pathway in alveolar basal epithelial cells [45, 46];

Fifth, Innate immune pathways were identified in S-glycoprotein-induced models of SARS-CoV and MERS-CoV, as Toll Like receptor, cytokine and chemokine.

Every described pathway can be matched with clinical aspects, the data presented in this report may therefore aid to design a ‘blue print’ for SARS-CoV-2 associated pathogenicity.

The severity and the clinical picture of SARS-CoV and MERS-CoV infections could be related to the activation of exaggerated immune mechanism, causing uncontrolled inflammation [47]; however, the role of strong immune response in SARS-CoV-2 infection severity is still uncertain.

However, we may consider that host kinases link multiple signalling pathways in response to a broad array of stimuli, including viral infections. TGF-β, produced during the inflammatory phase by macrophages, is an important mediator of fibroblast activation and tissue repair. High levels of systemic inflammatory cytokines/chemokines has been widely reported for MERS-CoV infections [48,49,50], correlating with immunopathology and massive pulmonary infiltration into the lungs [51]. Also the HCoV-229E infection can be described with this distance model, although this infection was not associated with a severe respiratory disease. In fact, HCoV-229E is responsible for mild upper respiratory tract infections, such as common colds, with only occasional spreading to the lower respiratory tract, but it interacts with dendritic cells in the upper respiratory tract, inducing a cytopathic effect [52].

Conclusions

In conclusion, we developed a network-based model, which could be the framework for structure-guided research process and for the pathogenetic evaluation of specific clinical outcome. Accurate structural 3D protein models and their interaction with host receptor proteins can allow to build a more detailed theoretical disease model for each HCoV infection, and support the drawing of a disease model for COVID-19. Our analyses suggests it is important to carry out in silico experiments and simulations through specific algorithms.

Limitations to our study

A single protein, namely S-glycoprotein was used as seed, therefore the highlighted interactions were limited to those connected with this unique viral protein. However, this is a proof of concept study, from which it appears that a similar approach may be used to study other viral proteins interacting with host cell pathways.

Another limitation is that the pathway analysis did not consider tissue and cell type diversity. Finally, the low threshold established for the number of nodes found by RWR (200) limited the reconstruction of the entire pathways. However, this was a software-imposed threshold.

In summary, the interactome analysis aided to guide the design of novel models of SARS-CoV pathogenicity.