Introduction

The severe acute respiratory syndrome coronavirus (SARS-CoV) is an alarming global health concern starting from the early twenty-first century. This virus is named Coronaviruses (CoVs) because of its characteristic halo structure under an electron microscope (corona, crown-like). Latin word “corona” means crown or “halo” and coronavirus particles display a crown-like fringe typically referred to as “spikes” under electron microscopy1. The CoVs are non-segmented single-stranded RNA viruses covered with envelope which can cause illness ranging in severity from the common cold to severe and fatal illness or even death. On the basis of serotype and genome, the coronavirus subfamily is divided into four genera: α, β, γ and δ, which have long been recognized as important veterinary pathogens that cause severe to lethal respiratory and enteric diseases in birds as well as mammals. Consequently, the SARS-CoV is a severe respiratory tract disease which was originally distinguished in Guangdong Province, China in 2002 and afterward it spread to 29 countries and was first authoritatively perceived in March 20032. In excess of 8,000 likely cases and 774 deaths were accounted for worldwide between March 2003 and July 2003 because of the outbreak of this coronavirus (CoV)3. During the outbreak, the normal death rate was around 9.6 percent4,5. Koch's proposed that SARS-CoV was identified with pathogenesis and it is a huge danger to human health6. Moreover, Acute Respiratory Distress Syndrome (ARDS) was found in 16% of the all-out SARS-CoV patients and the death rate became half in case of these kinds of SARS-CoV patients7,8. The year 2020 was started with the alarming of SARS-CoV-2 infections that has outbreak the novel coronavirus disease 2019 (COVID-19) and become a global pandemic. It has genetic similarity with SARS-CoV9. The COVID-19 causing a massive loss of lives was formally announced a pandemic by the WHO on 11 March, 202010. The COVID-19 affected persons suffer from fever, shortness of breath and cough within an incubation period of 2–14 days with both symptomatic and asymptomatic symptoms9. At present, the COVID-19 pandemic is a deadly and dangerous health concern around the globe. Coronaviruses are single-stranded, positive-sense RNA virus which has huge genomes of viral RNA11. Current examinations revealed that SARS-CoV-2 has a genomic structure near that of other beta-coronaviruses12. The novel Corona virus-2 makes a contrast in homogeneity with MERS-CoV and SARS-CoV while it has been sorted with the beta-coronaviruses. However, the SARS-CoV-2 genes seem to be as 89.10% nucleotide likeliness as well as < 80% nucleotide uniformity compared to the SARS-CoV genes13,14. SARS-CoV-2 has been identified as the seventh known human coronavirus (HCoV) from a similar family after 229E, NL63, OC43, HKU1, MERS-CoV and SARS-CoV15.

Nevertheless, the destructive stream/wave of SARS-CoV-2 to human life that causes millions of lives loss globally along with other paralyzing effects on economy as well as human civilization is unparallel and is close to every door. Almost all the countries are affected by this devastating wave but the United States, Brazil, Italy, Russia, Spain, UK and India are among the top six countries where SARS-CoV-2 has spread the most according to the report of Worldometer (https://www.worldometers.info/coronavirus/). As of 16 June 2021, around 4 million peoples died out of 175 million SARS-CoV-2 infections and gradually infected peoples are increasing worldwide. Scientists and pharmaceutical companies around the globe are trying to find out the way of escape from this devastating pandemic situation by means of discovering drugs and effective vaccines against COVID-19 virus. According to the published report in GISAID (Global Initiative on Sharing All Influenza Data) database on 20 November 2020, the SARS-CoV-2 strains were clustered into 8 clades (S, L, V, G, GR, GH, GV and others) based on their RNA sequence patterns and gradually clades are increasing. The unstable patterns of RNA sequences make this virus infections untouched and uncontrolled because once vaccine is going to be prepared based on a sequence of SARS-CoV-2, but afterwards this virus is automatically changing its genomic structure by mutations which through all the efforts in vein. Indeed, the mechanism is mysterious and horrible to control this virus.

Though a number of vaccines including Pfizer, Moderna, Sputnik, AstraZeneca, Ad5-nCoV, EpiVacCorona, BBIBP-CorV, BBV152, CoronaVac, WIBP are now available against SARS-CoV-2 and some are in progress16,17, researchers are worried about their effectiveness due to unstable RNA sequence of coronaviruses. For example, recently we observed that some already vaccinated peoples are also suffering from SARS-CoV-2 infections in our surrounding. Therefore, besides the vaccines, different variants of well-established drugs are also required for the treatment against coronavirus to save human lives. However, new drug discovery is a tremendous challenging, time consuming and expensive task. The main challenges are to explore drug target receptor proteins responsible for diseases and drug candidate small molecules that can reduce the diseases by the interaction with the target proteins. Genomic biomarkers induced proteins are considered as the key drug target receptor proteins. Transcriptomics analysis is a widely used popular approach to explore genomic biomarkers18,19,20,21,22. The repurposing of existing drugs for a particular disease could reduce the time and cost compared to de novo drug development. By this time, several authors proposed host/SARS-CoV-2 transcriptome-guided several sets of candidate drugs by molecular docking with different sets of target proteins (receptors) for the treatment against SARS-CoV-2 infections23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49. However, their published data did not display any common set of receptors and/or drugs, and so far, none of them yet investigated the resistance of their suggested drugs against the independent receptors proposed by others. Therefore, how a vaccine or a drug can be effective for all the peoples around the world is still a question mark. In this study, our main objectives are (1) computational identification of genomic biomarkers (drug targets) for SARS-CoV-2 infections highlighting their functions, pathways, regulatory factors and associated comorbidities (2) exploring candidate drugs for the treatment against different variants of SARS-CoV-2 infections with comorbidities and (3) In-silico validation on the resistance of the proposed candidate drugs against the state-of-the-art alternatives top-ranked independent receptors associated with SARS-CoV-2 infections published by others. The overall analyses plan is summarized in Fig. 1 below.

Figure 1
figure 1

The pipeline of this study.

Results

Identification of differentially expressed genes (DEGs)

We identified the set of 109 DEGs as \(({A}_{UR}\cup {A}_{DR})\cup ({B}_{UR}\cup {B}_{DR})\) between COVID-19 infected and control samples from RNA-Seq dataset by using two popular statistical R-packages (DESeq2 and edgeR) as introduced in the material and method section. The edgeR method identified the set of 100 DEGs as \(({A}_{UR}\cup {A}_{DR})\), where 18 DEGs were upregulated (AUR) and the rest 82 DEGs were downregulated (ADR) (Fig. 2a). The DESeq2 method identified the set of 29 DEGs as \(({B}_{UR}\cup {B}_{DR})\), where 15 DEGs were upregulated (BUR) and the rest 14 DEGs were downregulated (BDR) (Fig. 2b). The set of 20 DEGs was commonly identified by two methods as \(\left({A}_{UR}\cup {B}_{UR}\right)\cap \left({A}_{DR}\cup {B}_{DR}\right)\), which is displayed by Venn diagram in Fig. 2c. Both the methods detected the DEGs with the significant level at adjusted p-value < 0.05 and the fold change threshold |log2(aFCg)|> 1 by controlling the FDR at 5%. We separated upregulated and downregulated DEGs by using fold change criteria log2(aFCg) > 1 and log2(aFCg) < − 1 respectively. However, we observed that a set of 2 DEGs denoted by C = {ZNF638-IT1, FOSB} showed contradictory results by edgeR and DESeq2. So, we removed these 2 DEGs from further analysis. Finally, we considered the set of 107 DEGs as \(\left({A}_{UR}\cup {B}_{UR}\cup {A}_{DR}\cup {B}_{DR}\right)-C,\) which consisted of the set of 16 upregulated DEGs as \({DEG}_{UR}=\left({A}_{UR}\cup {B}_{UR}\right)-C\) and the set of 91 downregulated DEGs as \({DEG}_{DR}=\left({A}_{DR}\cup {B}_{DR}\right)-C\) (Table 2). To visualize sample clusters (case/control) and outliers simultaneously based on DEGs, we constructed the scatter plot of first and second principal components (PCs) of DEGs (Fig. 2d). We observed control samples are clearly separated from the case samples. So, the DEGs-set has a strong prognosis power. We also observed that 6 samples were contaminated by outliers (indicated with round line).

Figure 2
figure 2

(a) Volcano plot of the DEGs by edgeR Method and (b) by DESeq2 method. (c) Venn diagram of two DEGs-sets identified by DESeq2 and edgeR to show the common and uncommon genes. (d) The scatter plot first two principal components (PCs) of DEGs to see their prognostic performance of the case vs control.

Table 2 The identified upregulated and downregulated DEGs-set.

Protein–protein interaction (PPI) network analysis of DEGs

The PPI network of DEGs was constructed to detect the most representative DEGs so called hub-DEGs/proteins (see Fig. 3). A topological exploration based on dual-metric measurements (degree (> 10) and betweenness) was utilized to select the top-ranked 10 hub-DEGs/proteins that are SNRPD2, ZFP36, NEDD9, KLF6, USP53, IGF2, TLR2, PAG1, GUCY1A2 and CXCL2, where 2 hub-DEGs (IGF2 and SNRPD2) were upregulated and the remaining 8 were found as downregulated (see Table 2). These hub-DEGs/proteins were considered as the key controller of SARS-CoV-2 infections and drug target receptors.

Figure 3
figure 3

The PPI network of DEGs.

GO functional and KEGG pathway enrichment analysis of hub-DEGs

The GO functional enrichment analysis revealed that our proposed hub-DEGs are significantly enriched with numerous biological processes (BPs), molecular functions (MFs) and cellular components (CCs) (Table 3 and Supplementary File S1). The Table 3 shows top 10 significantly enriched GO-terms for each of three categories (MFs, BPs and CCs). We observed that 3 MFs (protein binding, binding, molecular function) were significantly enriched with all 10 hub-DEGs, 1 MF (protein-containing complex binding) was enriched with 3 hub-DEGs (IGF2, TLR2, GUCY1A2), 1 MF (C–C chemokine binding) was enriched with ZFP36 hub-DEG and the rest 5 MFs (triacyl lipopeptide binding, lipopolysaccharide immune receptor activity, lipopeptide binding, NAD+ nucleotidase, cyclic ADP-ribose generating and NAD+ nucleosidase activity) were significantly enriched with TLR2. Out of top 10 significantly enriched GO-terms of BPs by hub-DEGs, we observed that 2 BPs (biological process and cellular process) were highly enriched by all of 10 hub-DEGs, 5 BPs, (biological regulation, response to stimulus, positive regulation of cellular process, signal transduction and immune system process) were enriched by the individual sub DEGs-sets including 9, 8, 7, 7 and 6 hub-DEGs, respectively. The other 3 GO-terms (cellular response to lipopolysaccharide, cellular response to molecule of bacterial origin and cellular response to biotic stimulus) of BPs were enriched by the same 3 hub-DEGs (ZFP36, TLR2, CXCL2). Among the top 10 significantly enriched GO-terms of CCs by hub-DEGs, 2 CCs (cellular component and cellular anatomical entity) were enriched by all 10 hub-DEGS, 7 hub-DEGs significantly enriched with the cytoplasm of CCs, 2 hub-DEGs (TLR2 and PAG1) significantly enriched with 3 GO-terms membrane raft, membrane microdomain and membrane region of CCs and the rest 4 CCs (Dcp1-Dcp2 complex, Toll-like receptor 1-Toll-like receptor 2 protein complex, pICln-Sm protein complex and RISC-loading complex) were enriched by ZFP36, TLR2, SNRPD2 and ZFP36, respectively.

Table 3 The top 10 significantly enriched GO terms for each of BPs, MFs and CCs with the hub-DEGs/proteins.

The KEGG pathway enrichment analysis exposed that our proposed hub-DEGs are significantly enriched with plentiful pathway categories (see Fig. 4 and Supplementary File S1). The Fig. 4 showed the top 10 significantly enriched KEGG-pathway categories. We observed that the KEGG root term is significantly enriched with 7 hub-DEGs (SNRPD2, ZFP36, IGF2, TLR2, GUCY1A2, and CXCL2). Each of three pathway categories legionellosis, amoebiasis and rheumatoid arthritis was enriched by two hub-DEGs (TLR2 and CXCL2). Each of two pathway categories (proteoglycans in cancer and PI3K-Akt signaling) was enriched by two hub-DEGs (TLR2 and IGF2). The pathway category Kaposi sarcoma-associated herpesvirus infection was enriched by two hub-DEGs (ZFP36 and CXCL2). The two KEGG pathway categories (long-term depression and renin secretion) was enriched by the hub-DEG GUCY1A2. The rest one pathway category IL-17 signaling pathway was enriched by the hub-DEG CXCL2.

Figure 4
figure 4

The top 10 significantly enriched KEGG pathways with the hub-DEGs/proteins. The associated hub-DEGs are displayed in the right side of each bar. The hub-DEGs with bold represents upregulated genes and others are downregulated genes.

The gene regulatory network (GRN) analysis of hub-DEGs

Transcription factors (TFs) and microRNAs (miRNAs) are considered as the most important transcriptional and post transcriptional molecular regulatory factors of genes. We constructed undirected interaction network between TFs and hub-DEGs to explore key regulatory transcriptional factors of hub-DEGs (Fig. 5). In this network, hub-DEGs were represented by round nodes with red color and TFs were represented by square nodes with blue color, where larger numbers of connectivity were represented by the larger nodes. We selected FOXC1, GATA2, SRF, FOXL1, YY1 and NFIC as the top 6 key regulatory TFs of hub-DEGs based on higher degree of topological measures.

Figure 5
figure 5

The TFs and hub-DEGs interaction network, where the blue squared nodes represent the TFs and the red round nodes represent the hub-DEGs. The key TFs are represented by the larger squared nodes.

To explore key regulatory post-transcriptional factors of hub-DEGs, we constructed undirected interaction network between miRNAs and hub-DEGs (Fig. 6). In this network, hub-DEGs were represented by round nodes with red color and miRNAs were represented by octagonal nodes with green color, where larger numbers of connectivity were represented by the larger nodes as before. We selected the miRNAs namely, miR-107, miR-16-5p, miR-103a-3p, miR-27a-3p, miR-155-5p and miR-1-3p as the top 6 key regulatory post-transcriptional factors of hub-DEGs based on higher degree of topological measures.

Figure 6
figure 6

The miRNAs and hub-DEGs interaction network where the small green octagonal nodes stand for miRNAs and the round nodes with red color represents the hub-DEGs. The key miRNAs are represented by the larger highlighted octagonal shaped green colored nodes in the figure.

Association of hub-DEGs with comorbidities

To investigate the association of hub-DEGs with other diseases, we performed their interaction network analysis. The Fig. 7a and Supplementary Table S2 show the disease versus hub-DEGs interaction network analysis results. We observed that IGF2 gene is associated with 120 diseases including Cardiovascular Diseases, Colorectal Neoplasms, Cardiomyopathies, Liver carcinoma, Anemia; the CXCL2 gene is associated with 19 diseases including rheumatoid arthritis, heart failure, hypertensive disease, IBN inflammation, pulmonary fibrosis, acute lung injury while the ZFP36, KLF6, GUCY1A2 and PAG1 was linked with 18 diseases including liver cirrhosis, experimental prostatic neoplasms, stomach carcinoma, inflammation, arthritis, especially which could be a severe comorbidities of the COVID-19 patients. To assess the association of hub-DEGs with lung cancer, we performed multivariate survival analysis of lung cancer patients with expressions of hub-DEGs. The log-rank test was used to test the significant difference between two survivals curves (Fig. 7b) based on all hub-DEGs. We observed the significant difference between the low and high-risk group in survival probability, which indicates that the hub proteins are significantly associated with lung cancer.

Figure 7
figure 7

(a) The disease versus hub-DEGs interaction network finds the comorbidities (b) The multivariate survival curves of lung cancer patients based on hub-DEGs.

Drug repurposing by molecular docking study

We considered 10 hub-proteins corresponding to our proposed 10 hub-DEGs and their regulatory 5 key TFs-proteins (FOXC1, GATA2, SRF, FOXL1 and YY1) as the m = 15 drug target receptors. The SARS-CoV-2 3CL protease-guided top-ranked n = 90 drugs out of 3410 FDA approved antiviral drugs50, were considered as the drug agents that were mentioned previously in the materials and method section. We downloaded 3D structure of 7 hub proteins (SNRPD2, ZFP36, NEDD9, IGF2, TLR2, and CXCL2) from Protein Data Bank (PDB)51 with source codes 5xjs, 4j8s, 2l81, 1igl, 1fyw, and 5ob5, respectively. The 3D structure of 3 hub proteins (KLF6, USP53, GUCY1A2, and PAG1) were downloaded from SWISS-MODEL52 using UniProt53 ID of Q99612, Q70EK8, P33402, and Q9NWQ8, respectively. The 3D structure of 3 TFs proteins (GATA2, SRF and YY1) were downloaded from PDB with source codes 5o9b, 1hbx, and 1ubd, respectively, and the rest 2 TFs proteins (FOXC1 and FOXL1) were downloaded from SWISS-MODEL using UniProt ID of Q12948 and Q12952. The 3D structures of 90 FDA-approved drugs were downloaded from PubChem database54. Then we performed molecular docking study to calculate binding affinity scores for each pair of receptors and drug agents. Then we ordered the receptor proteins according to the descending order of row sums of the binding affinity score matrix A = (Aij) and drug agents according to the column sums to select the top-ranked few drug agents as the candidate drugs. Figure 8a displayed the image of binding affinity matrix \({{\varvec{A}}}^{*}=\left({A}_{ij}^{*}\right)\) corresponding to the ordered target proteins in Y-axis and top-ordered 50 drug agents out of 90 in X-axis. We observed that the first two top lead compounds (lead1: Torin-2 and lead2: Rapamycin) produce binding affinity scores less than or equal to − 8.0 kcal/mol with all of our proposed 15 receptor proteins. The next (3–7)th top lead compounds (lead3:Radotinib, lead4:Ivermectin, lead5:Thiostrepton, lead6:Tacrolimus and lead7:Daclatasvi) produced binding affinity scores less than or equal to − 8.0 kcal/mol with our proposed 13 receptor proteins out of 15. The rest 83 lead compounds produced binding affinity scores less than or equal to − 8.0 kcal/mol with the fewer number of receptors. Therefore, we considered the top 7 lead compounds (Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus and Daclatasvi) as the most probable candidate drugs for SARS-CoV-2 infections.

Figure 8
figure 8

Molecular docking results computed with autodock vina. Red colors indicated the strong binding affinities between target proteins and drug agents, and green colors indicated their weak bindings. (a) Image of binding affinities based on the top 50 ordered drug agents out of 90 in X-axis and ordered 15 target proteins (proposed) in Y-axis. (b) Image of binding affinities based on the proposed ordered 7 candidate drugs in X-axis and ordered 11 independent receptors (already published) in Y-axis. (c) Image of binding affinities based on the proposed ordered 7 candidate drugs in X-axis and original & protonated (at pH-7)* receptors in Y-axis, where * indicates the protonated receptors.

To validate our proposed 7 candidate drugs by molecular docking study with already published independent receptor proteins (genomic biomarkers) associated with SARS-CoV-2 infections available in the literature, we reviewed 22 published articles associated with SARS-CoV-2 infections those provided transcriptome guided hub-proteins (genomic biomarkers). We found total of 193 hub-proteins published in those 22 articles, but there was no hub-protein commonly published in all articles (Table 4). We found 11 hub-proteins (ICAM1, IRF7, MX1, NFKBIA, STAT1, IL6, TNF, CCL20, CXCL8, VEGFA, and CASP3) which were commonly reported in at most 3 articles out of 22 (Table 4). We considered these 11 hub-proteins as the publicly available top ranked receptor proteins associated with SARS-CoV-2 infections to validate the proposed repurposed drugs by molecular docking. The 3D structures of these 11 (ICAM1, IRF7, MX1, NFKBIA, STAT1, IL6, TNF, CCL20, CXCL8, VEGFA, and CASP3) proteins were retrieved from Protein Data Bank (PDB) with codes 5mza, 2o61, 3szr, 1nfi, 1bf5, 1il6, 1tnf, 2jyo, 1ikl, 1cz8, and 1gfw, respectively. Then molecular docking interactions of our proposed drugs with the publicly available top ranked receptor proteins were performed. Their binding affinities (kcal/mol) were visualized in Fig. 8b. We observed that their binding affinities ranged between (− 12.1 to − 7) kcal/mol and average binding affinities were less than or equal to − 9.5 kcal/mol which indicates the strong binding capacity. Then we compared the docking results of top-ranked eight receptor proteins (four from the proposed set and the other four from the published set) with their protonation state at their physical conditions of salinity = 0.15, internal dielectric = 10, pH = 7, and external dielectric = 80 (see Supplementary File S4)55. The docking analysis showed the significant binding affinities ranged between (− 11 to − 7.7) kcal/mol with the protonated receptors (Fig. 8c). We observed that both original and protonated (*) receptors produce almost similar binding scores, which indicate the proposed drugs might be effective against the protonated receptors also.

Table 4 Different key protein lists for SARS-CoV-2 infection published by different research articles in different international reputed journals.

Table 5, Supplementary Files S3 and S4 show the summary results of interacting properties of our target proteins with top-ranked drugs (lead compounds) that produced the highest binding affinity scores. The 3D structure of their interacting complex is shown in the 4th column of Table 5. The 2D schematic diagram of these 3 target proteins with mentioned candidate drugs interaction is given in the 5th column highlighting their neighbor residues (within 4 Å of the drug). Key interactions amino acids and their binding with potential targets were shown in the last column.

Table 5 Top 3 potential targets and top 3 lead compounds based on docking results. Lead three compounds Torin-2, Rapamycin, and Radotinib were selected by investigating the binding affinity score. The 3D structure of hub protein with candidate drugs is shown in 4th column. The 2D Schematic diagram of hub protein with candidate drugs interaction is given 5th column and neighbor residues (within 4 Å of the drug) are shown. Key interactions amino acids and their binding with potential targets were shown in the last column.

Molecular dynamic (MD) simulations

Among the proposed candidate drugs, Torin-2, Rapamycin and Radotinib were the top ranked three candidate drugs (Table 5). Therefore, these top three drug agents were selected for their stability analysis through 100 ns MD-based MM-PBSA simulations.

RMSD analysis

From Fig. 9, we observed that all the six systems are significantly stable between the variations of moving and initial drug-target complexes. We calculated their RMSD (root mean square deviation). Figures 9a,b, represents the RMSD corresponding to the proposed receptors (TLR2, USP53) and independent receptors (IRF7, STAT1), respectively. All the systems projected the RMSD around 2 to 4 Å except USP53_Radotinib complex which showed the RMSD around 3.5 to 6.5 Å. The average RMSD for TLR2_Torin-2, TLR2_Rapamycin and USP53_Radotinib complexes were 3.3 Å, 3.7 Å and 4.7 Å, respectively. TLR2_Torin-2 complex showed slight fluctuation in Cα backbone around 40,000 ps and was stabilized in the remaining simulation. Similarly, a streak of continuous fluctuation was found in the TLR2_Rapamycin complex, ranging from 30,000 to 56,000 ps, followed by inconsiderable change. For USP53_radotinib complex, a negligible fluctuation was observed in the starting 12,000 ps and around 62,000 ps to 80,000 ps and remained stable thereafter throughout the simulation. On the other hand, the average RMSD was found to be 3.3 Å for IRF7_Torin-2 complex with slight fluctuation in Cα backbone around 12,000 to 16,000 ps and was stabilized in the remaining simulation. For STAT1_Rapamycin complex, the average RMSD was found to be 2.9 Å. The average RMSD for the STAT1_radotinib bound complex was 2.4 Å, with an overall RMSD of approximately 2.6 Å, indicating that it was comparably more stable among the six selected systems. However, the data indicates that all the systems showed stable internal motion.

Figure 9
figure 9

The RMSD analysis results for the variations of moving and initial drug-target complexes with 100 ns based MD simulations. (a) represented the RMSD with proposed top ranked two receptors (TLR2, USP53) and (b) represented the RMSD with top ranked two independent receptors (IRF7, STAT1).

Binding free energy

Here we have calculated the MM-PBSA binding energy for three drug agents as mentioned previously, Fig. 10a,b represents the binding energy with the top ranked two proposed (TLR2, USP53) and independent (IRF7, STAT1) receptors, respectively. On an average, USP53_Radotinib, TLR2_Torin-2 and TLR2_Rapamycin complexes produced binding energies 144.44 kJ mol−1, 107.97 kJ mol−1 and 67.64 kJ mol−1, respectively. On the other hand, STAT1_Radotinib, STAT1_Rapamycin and IRF7_Torin-2 complexes produced average binding energies 59.264 kJ mol−1, 93.333 kJ mol−1 and − 52.638 kJ mol−1, respectively.

Figure 10
figure 10

Binding free energy (in kJ mol−1) of each snapshot was calculated through 100 ns MD-based MM-PBSA simulations (a) represented the binding free energy with proposed top ranked two receptors (TLR2, USP53) and (b) represented the binding free energy with top ranked two independent receptors (IRF7, STAT1).

Discussion

The current study analyzes the high throughput RNA-Seq data to identify key genomic biomarkers (hub DEGs/proteins) highlighting their GO terms and KEGG pathways, key regulatory components (TFs and miRNAs), associated comorbidities and repurposable drugs for the treatment against SARS-CoV-2 infections by using the integrated bioinformatics approaches that were summarized in Fig. 1. Totally 109 DEGs were identified between SARS-CoV-2 infected and control samples; among them 16 upregulated and 91 down regulated genes (Table 2) were finally reported. Among them 107 DEGs encoded proteins were used to construct the PPI network (Fig. 3) which revealed ten hub-DEGs/proteins (SNRPD2, ZFP36, NEDD9, KLF6, USP53, IGF2, TLR2, PAG1, GUCY1A2 and CXCL2) which are considered as the key genomic biomarkers for SARS-CoV-2 infections.

The GO functional enrichment analysis of the proposed hub-DEGs/proteins and all the DEGs reflected the significant molecular functions that are highly linked with the COVID-19 infection and proliferation in host cell (Table 3 and Supplementary Files S1 and S2). Among the enriched MFs, the lipopolysaccharide (LPS) immune receptor activity driven by TLR2 hub-DEG is associated with the LPS-induced production of pro-inflammatory cytokines reduction, inflammation by affecting the lungs LPS due to the COVID-19 infection56,57,58. The most important and significant functions namely, cytokine regulation, produces the unnecessary “cytokine storm" that promote the adverse events like alveolar damage and fibrosis59,60 on COVID-19 patients. The interleukin (IL) regulatory pathways are crucial for the important pathophysiological mechanisms called systemic inflammation and cytokine release syndrome61,62,63 which were significantly associated with the hub genes (Supplementary File S1). The C–C chemokine binding functions driven by ZFP36 hub-DEG are directly involved with the T-cell induced pathogen burden controlling which is also an important receptor group protein for COVID-1963,64,65. The NAD + nucleotidase MF (steered by TLR2 hub-DEG) was found to have protective roles, and mitigate the disease severity if administered prophylactically, and its anti-hyper inflammation properties66,67 and the Dcp1–Dcp2 complex (steered by ZFP36 hub-DEG) play a positive role in viral infection68. The above enrichment analysis noticeably focuses on the association of the identified hub proteins with the diverse significant functions that are crucial for COVID-19. Moreover, the functional enrichment and pathway analysis of all the identified DEGs were recorded, and it was found that the functional pathways enriched by the key DEGs were also enriched by all the DEGs significantly (Supplementary File S1 and S2). The common functional pathways enrichment showed the biological uniformity characteristics among the all identified DEGs and the hub-DEGs.

The KEGG pathway analysis of the proposed hub-DEGs/proteins showed some enriched significant pathways. The top 10 significant KEGG pathways included the legionellosis related pathway, IL-17 signaling pathway, Rheumatoid arthritis pathway, PI3K-Akt signaling pathway, Kaposi’s sarcoma-associated herpesvirus infection and the proteoglycans in cancer pathways (Fig. 4). One of the most important pathways driven by TLR2 and CXCL2 hub-DEG namely, the legionellosis which is a typical pneumonia and exposes the cough, shortness of breath, high fever, muscle pains, and headaches69. These symptoms are also highly related and most common symptoms for COVID-19 positive patient70. During this pandemic situation, the emergency COVID-19 positive patients were permitted to treat with the Hydroxychloroquine (HCQ) although its molecular mechanisms were not completely known and later on WHO advised to avoid this for the treatment. The HCQ -is also commonly used in rheumatic disease treatment and it has been shown that the patients with rheumatoid arthritis (RA) represent lower risk of COVID-19 infection64. In our analysis, the hub-DEGs TLR2 and CXCL2 significantly enriched the rheumatoid arthritis pathway which indicates that these genes may have the antagonized property against the COVID-19 infection. The other significant pathways are Kaposi’s sarcoma-associated herpesvirus infection (steered by ZFP36 and CXCL2), PI3K-Akt signaling pathway and the proteoglycans in cancer pathways (steered by IGF2 and TLR2). The Kaposi’s sarcoma-associated herpesvirus infection is associated with the lung infection71, and the PI3K-Akt signaling pathway mainly works with the cell cycle and also with the various proteins function72,73. On the other hand, the proteoglycans in cancer pathways is treated as an important cancer related pathway in human74. Therefore, based on the molecular pathway enrichment analysis, it can be presumed that the proposed hub proteins may have significant roles in SARS-CoV-2 infection and proliferation and may be treated as prominent therapeutic target.

The TFs versus Hub-DEGs interaction network analysis revealed 6 key TFs-proteins (FOXC1, GATA2, SRF, FOXL1, YY1 and NFIC) as the transcriptional regulatory factors of hub-DEGs (Fig. 5). The basal-like breast cancer (BLBC), Alzheimer’s disease, tissue invasion are highly associated with the FOXC1 protein75,76. The GATA2 protein is associated with breast and kidney cancer related pathway, when the higher expression pattern of YY1 protein increases the tumour size, higher TNM stage77,78,79, the FOXL1 TF are related with proliferation, cell-cycle80. The SRF protein is associated with the regulation of cell survival and cell cycle progression in cardiac fibroblasts81. The NFIC protein has greater involvement with the tumor genesis of breast cancer, gastric cancer, and glioma82,83,84. Also the identified TFs proteins has a significant involvement in various biological functions and pathways19,20,21,22,85. The miR-107, miR-16-5p, miR-103a-3p, miR-27a-3p, miR-155-5p and miR-1-3p were identified as the key post transcriptional regulatory factors of hub-DEGs (Fig. 6). The miR-107 (microRNA) has direct interaction with the Coxsackie B3 virus (CVB3) replication and release86. The miR-16-5p represented higher expression pattern in lung cancer cell87 and the miR-103a-3p and miR-27a-3p has a positive correlation with the renal inflammatory dysfunction, cell proliferation and apoptosis88,89 while the miR-155-5p is associated with breast cancer90 and the miR-1-3p has the probable interaction with the SARS-CoV291. The above discussion gives the evidence that the proposed regulatory TFs and the miRNAs have an enormous correlation with various biological functions and processes that are closely connected with the symptoms of SARS-CoV-2 infections and proliferation process.

The diseases versus hub-DEGs interaction network analysis showed that the predicted hub-DEGs are associated with various types of cancers and other complex diseases including the respiratory cases (Fig. 7, Supplementary Table S2). The IGF2 was connected with maximum number of diseases in the network followed by the other hub-DEGs. We observed that IGF2 gene is associated with 120 diseases including Cardiovascular Diseases, Colorectal Neoplasms, Cardiomyopathies, Liver carcinoma, Anemia; the CXCL2 was associated with 19 diseases including Rheumatoid Arthritis, Heart failure, Hypertensive disease, Inflammation, Pulmonary Fibrosis, Acute Lung Injury while the ZFP36, KLF6, GUCY1A2 and PAG1 was linked with 18 diseases including Liver Cirrhosis, Experimental Prostatic Neoplasms, Stomach Carcinoma, Inflammation, Arthritis, especially which could be a severe comorbidities of the COVID-19 patients. Among the associated diseases six diseases were connected with two hub-DEGs, notably, inflammation, mammary neoplasms, myocardial ischemia, colorectal neoplasms, somatic mutation, schizophrenia. The inflammation is considered as vital COVID-19 related comorbidity while others are also crucial. The hub-DEGs those are related with the above comorbidities also play a significant role for these diseases during the COVID-19 affection. The association of hub-DEGs with several diseases is also supported by the literature review. For example, the hub-protein SNRPD2 is significantly associated with histologic grade in Hepatocellular carcinoma (HCC), mild cognitive impairment (MCI) and Alzheimer's disease (AD)92,93. The hub-protein ZFP36 is associated with breast cancer and tumor-suppressive actions during hepatic tumor progression94,95. The hub-protein NEDD9 is significantly associated with head and neck and lung cancers96,97. The hub-protein KLF6 has a direct involvement in ovarian cancer cell proliferation and metastasis promotion and also works as a critical regulator of pathogenic myeloid cell activation in human98,99. The USP53 genes has a greater involvement in cholestatic liver disease100,101 and the IGF2 proteins are widely known as the diabetes associated protein and can control the insulin secretion in β-cells during fasting102. However, many genes/proteins related to lung disease including cancer is highly interacting with SARS-CoV-2 infections, since the patients suffer from the major complexities when the virus infects the lung. The idiopathic pulmonary fibrosis (IPF) is treated as one of the most crucial and serious risk factors of COVID-19103, since the COVID-19 positive patients have a greater chance for being enhanced with the IPF which creates numerous complications and leads a high risk to recover from COVID-19104,105. The TLR2 is highly responsive in immune-enhancing activity106 and the Type 2 lung inflammation is associated with the PAG1 gene expression107.

The multivariate survival analysis for lung cancer patients is based on low and high expressions of hub-DEGs significantly differentiated the survival curves (Fig. 7b), which indicates that the lung cancer is significantly influenced by the hub-DEGs of SARS-CoV-2 infection. Also, the patients with the lung cancers belong to the high risk of mortality from COVID-19 infection. The above discussion indicates that the proposed genomic biomarkers responsible for SARS-CoV-2 infection are also associated with various comorbidities including diabetes, lung diseases, and respiratory disease and immune systems. Therefore, covid-19 patients usually suffer from multiple complexities and reach to the severe condition that has complex comorbidities108,109,110.

To explore effective drugs for the treatments against SARS-CoV-2 infections with comorbidities, we considered the proposed human genomic biomarkers guided 10 hub-proteins (SNRPD2, ZFP36, NEDD9, KLF6, USP53, IGF2, TLR2, PAG1, GUCY1A2, and CXCL2) and 5 key TFs proteins (FOXC1, GATA2, SRF, FOXL1, and YY1) as the drug target receptor proteins and performed their docking analysis with the SARS-CoV-2 3CL protease-guided top ranked 90 FDA approved repurposable drugs. Then we selected top ranked 7 drugs (Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus, and Daclatasvir) as the candidate drugs for the treatment against SARS-CoV-2 infection, where the first two drugs showed strong binding affinities with all the target proteins (Fig. 8b). Among the identified candidate drugs, the Ivermectin and the Rapamycin were used to treat the COVID-19 affected patients although it has a lack of wide range of information about their activity against the SARS-CoV-2 virus111. Since Ivermectin has a potential antiviral activity, it has been used in the treatment of various virus infection including the SARS-CoV-2 treatment by dosing solely or with a combining other drugs112,113. Moreover, many studies suggested to use the Ivermectin as a potential therapeutic for COVID-19114,115. On the other hand, Rapamycin is widely used as inhibitor of protein synthesis and constrains the expression of pro-inflammatory cytokines such as IL-2, IL-6 and, IL-10116, therefore it has been also given for COVID-19 treatment116,117. Rapamycin can also interact with the spike protein of the SARS-CoV-2 and work in mTOR pathway inhibitors118,119,120,121. This result is indicating the consistency of therapeutic potentiality of the proposed hub proteins and TFs for the COVID-19 treatment. The remaining drugs were found as new potential drug candidates based on their binding affinity with the hub proteins and TFs. The Torin-2 is considered as a kinase inhibitor which worked in the PI3K-Akt/mTOR signaling pathway117,119,122, which supports our previous pathway analysis for the hub proteins. Radotinib (IY-5511) were being prescribed for the chronic myeloid leukaemia (CML)123,124 whereas Thiostrepton was used for acute kidney injury treatment125. The evidences show that the Tacrolimus has a positive inhibitory impact on the COVID-19 patients with comorbidities like kidney and liver transplantation126,127. Moreover, we validated these seven candidate drugs against the state-of-the-art alternatives already published top-ranked 11 independent and 8 protonated receptors and found their strong binding affinities (Fig. 8c), which indicates that the proposed drugs are effective against the state-of-the-arts alternatives SARS-CoV-2 infection causing independent receptor proteins also. Finally, we examined the stability of top-ranked three drugs (Torin-2, Rapamycin and Radotinib) by using 100 ns MD-based MM-PBSA simulations for two top ranked proposed (TLR2, USP53) and independent (IRF7, STAT1) receptors, and observed their stable performance according to the laws of physics128,129. Therefore, the proposed candidate drugs might play the vital role for the treatment against different variants of SARS-CoV-2 infections with comorbidities since our proposed target proteins are also associated with several comorbidities. The present study emphasises the further wet lab experimental validation for both the proposed target proteins and candidate drugs.

Conclusion

The present study aims to explore genomic biomarkers (drug targets) for SARS-CoV-2 infections highlighting their functions, pathways, regulatory factors, associated comorbidities and candidate drugs. To achieve the goal, at first 109 DEGs between SARS-CoV-2 and control sample were detected from RNA-Seq profiles. The top ranked 10 hub-DEGs/proteins (TLR2, USP53, GUCY1A2, SNRPD2, NEDD9, IGF2, CXCL2, KLF6, PAG1 and ZFP36) were identified as genomic biomarkers by the PPI network analysis of DEGs with the NetworkAnalyst tool and STRING database. Gene-set enrichment analysis (GSEA) through the GO functional and KEGG pathway was then conducted to predict the associated functions and pathways of these genomic biomarkers. The gene regulatory network (GRN) analysis identified top ranked 5 TFs proteins (FOXC1, GATA2, SRF, FOXL1 and YY1) and 6 miRNAs (miR-107, miR-16-5p, miR-103a-3p, miR-27a-3p, miR-155-5p and miR-1-3p) as the transcriptional and post-transcriptional factors, respectively. The diseases versus genomic biomarkers interaction network analysis, survival analysis of lung cancer patients with genomic biomarkers and literature review showed that our proposed genomic biomarkers are associated with various types of comorbidities including diabetes, lung and heart diseases, respiratory disease and immune systems. Then we considered 10 hub-proteins (proposed genomic biomarkers) and 5 key TFs-proteins as the drug target receptor proteins to explore effective drugs by molecular docking analysis with the SARS-CoV-2 3CL protease-guided top 90 FDA approved anti-viral drugs. Then we selected top ranked 7 candidate drugs (Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus and Daclatasvir) with respect to our proposed 15 target proteins for the treatment against SARS-CoV-2 infection. Then we investigated the resistance of our proposed candidate drugs against the state-of-the-art alternatives of recently published top ranked 11 independent receptors for SARS-CoV-2 infections and observed that our proposed drugs are also effective against those independent receptors. Finally, we investigated the stability performance of top three drugs (Torin-2, Rapamycin and Radotinib) by using 100 ns MD-based MM-PBSA simulations for two top ranked proposed receptors (TLR2, USP53) and top two independent receptors (IRF7, STAT1), and observed their stable performance. In the context of already published host transcriptome-guided candidate drugs for covid-19, so far, no researchers yet investigated the resistance of their suggested drugs against the state-of-the-art alternatives independent receptors proposed by others computationally. In this study, we considered this issue. Thus, we may sate that this study is partially unique. As covid-19 is a new coronavirus disease, there has been little research on exploring globally effective drugs. In this regard, this research on coronavirus disease might open up new possibilities to explore globally more effective drugs computationally.

Materials and methods

Data sources and descriptions

We used both original data and metadata associated with SARS-CoV-2 infections to reach the goal of this study as described below.

Collection of RNA-Seq profiles as original data (case/control)

We collected the original host RNA-Seq count dataset to explore genomic biomarkers and drug target key receptor proteins associated with SARS-CoV-2 infection. This dataset was downloaded from the NCBI Gene Expression Omnibus (GEO) data repository with the accession number GSE150316. This dataset consisted of 88 samples generated from different organs including lung, heart, jejunum, liver, kidney, bowel, fat, skin, bone marrow and placenta, where 5 samples were COVID-19 negative (control). The count of each sample was generated on 59,091 transcripts. This dataset was first analyzed by Desai et al.130 to investigate the temporal and spatial heterogeneity of host genome response to SARS-CoV-2 pulmonary infection. In our case, we considered only lung tissue samples to avoid the spatial heterogeneity problem from the dataset. Our analyzed original dataset consisted of 35 lung tissue samples infected with SARS-CoV-2 (case) and 5 control samples.

Collection of drug agents and receptor proteins as metadata

We collected SARS-CoV-2 3CL protease-guided top listed 90 drugs out of 3410 FDA approved antiviral drugs published by Beck et al.50 as the meta drug agents to explore few top ranked host transcriptome-guided drugs against SARS-CoV-2 infections by molecular docking with our proposed receptor proteins. The 3D structures of 90 FDA-approved drugs (Supplementary Table S1) were downloaded from PubChem database54. To validate our proposed host transcriptome-guided repurposed drugs by molecular docking with the top listed receptor proteins that were published in different reputed journals, we reviewed 22 different articles infections28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49 associated with SARS-CoV-2 infections and selected 11 top listed receptor proteins out of 193. Then their docking performance with our proposed drugs were investigated whether the drugs are keeping consistency with high binding affinities as in our proposed drug target hub-DEGs.

Integrated bioinformatics and system biology analyses

The integrated bioinformatics and system biology approaches were utilized in this study as described below:

Identification of differentially expressed genes (DEGs)

Identification of differentially expressed genes (DEG) is one the most important tasks in this study. There are several methods for identification of DEGs from RNA-Seq profiles. Most of them are sensitive to outlying observations. There are a few robust approaches for identification of DEGs from RNA-Seq profiles. However, non-robust approaches are slightly better than robust approaches in absence of outliers, while the robust approaches are much better than the classical approaches in presence of outliers. Therefore, we considered edgeR131 as a popular non-robust and DESeq2132 as a popular robust approaches to take their advantages in our analyses, since few RNA-Seq counts are often contaminated by outliers due to several steps involves in the data generating process. However, normalization is a compulsory step for RNA-Seq data analysis. It removes systematic technical bias from the data and makes the samples comparable. The edgeR approach utilizes TMM (trimmed mean of M values) normalization, whereas the DESeq2 utilizes the median-of-ratios method. The edgeR method was formulated based on generalized linear model (GLM) of the negative binomial family. It assumes negative binomial (NB) distribution for the read counts and uses the empirical Bayes for squeezing the tag-wise dispersions toward common dispersion, whereas DESeq2 also considered GLM of the NB family and assumes NB distribution for the read counts and uses the empirical Bayes to shrink gene-wise dispersion estimates towards fitted values. The edgeR approach utilizes the likelihood ratio test (LRT) statistic to calculate the p-values and test the null hypothesis of no differential read counts between case and control groups, whereas the DESeq2 utilizes the Wald test statistic to calculate the p-values and test the null hypothesis of no differential read counts between case and control groups. The p-values of both the edgeR and the DESeq2 approaches are then adjusted for multiple testing using the procedure of Benjamini and Hochberg133.

In this paper, we considered the gth gene (g = 1,2, …, 59,091) as a differentially expressed gene (DEG) between case and control groups if its adjusted pg-value < 0.05 along with |log2(aFCg)|> 1 by controlling the false discovery rate (FDR) at 5%, otherwise, it was considered as equally expressed gene (EEG). The gth gene was considered as upregulated or downregulated DEG if its adjusted pg-value < 0.05 along with log2(aFCg) > 1 or log2(aFCg) < − 1, respectively. Here aFC is the abbreviation of average fold change which is defined as \({\text{aFC}}_{g}={\overline{x} }_{g}/{\overline{y} }_{g}\) (the fold change of \({\overline{x} }_{g}\) with respect to \({\overline{y} }_{g}\)), where \({\overline{x} }_{g}\) and \({\overline{y} }_{g}\) are the averages of normalized counts of case and control groups with respect to gth gene, respectively. For example, a change from \({\overline{y} }_{g}=3\) to \({\overline{x} }_{g}=9\) produces aFAg = 3 which is referred to as a "threefold upregulated in average". Similarly, a change from \({\overline{y} }_{g}=9\) to \({\overline{x} }_{g}=3\) produces aFAg = 1/3 which is referred to as a "threefold downregulated in average". Let AUR and ADR were the upregulated and downregulated DEGs-sets respectively, detected by edgeR. Again, let BUR and BDR were the upregulated and down-regulated DEGs-sets respectively, detected by DESeq2. Then we defined upregulated gene-set as \(({A}_{UR}\cup {B}_{UR})\) and down-regulated as \(({A}_{DR}\cup {B}_{DR})\) by combining the results DEGs results of edgR and DESeq2. Let C was the set of contradictory upregulated and downregulated DEGs estimated by both edgeR and DESeq2. For example, if a DEG \(g\in {A}_{UR}\) but \(g\notin {B}_{UR}\) or \(g\in {A}_{DR}\) but \(g\notin {B}_{DR}\), then the DEG ‘g’ was considered as a contradictory DEG. We removed this type of contradictory DEGs from further analysis. Then the final DEGs-set for further analysis was defined as

$$DEG=\left[\left({A}_{UR}\cup {B}_{UR}\right)\cup \left({A}_{DR}\cup {B}_{DR}\right)\right]-C=[\left({A}_{UR}\cup {B}_{UR}\right)\cup \left({A}_{DR}\cup {B}_{DR}\right)]\cap {C}^{/},$$

where, upregulated DEGs-set was defined as

$${DEG}_{UR}=\left({A}_{UR}\cup {B}_{UR}\right)-C=\left({A}_{UR}\cup {B}_{UR}\right)\cap {C}^{/}$$

and the downregulated DEGs-set was defined as

$${DEG}_{DR}=\left({A}_{DR}\cup {B}_{DR}\right)-C=\left({A}_{DR}\cup {B}_{DR}\right)\cap {C}^{/}.$$

Functional and pathway enrichment analysis of Hub-DEGs

Gene ontology (GO) functional and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment/annotation/over-representation analysis134 is a widely used approach to determine the significantly annotated/enriched/over-represented functions/classes/terms (biological processes (BP), molecular functions (MF) and cellular components (CC)) and pathways by the identified DEGs/Hub-DEGs. The BP is a change or complex of changes during the granularity period of the cell that is mediated by one or more gene products for different biological objectives. The MFs are the biochemical activities of gene products. The CC is a place in a cell in which a gene product is active. KEGG pathway is a collection of experimentally validated pathway maps135,136,137 representing our knowledge of the molecular interaction, reaction and relation networks for metabolism, cellular processes, genetic information processing, organismal systems, environmental information processing, human diseases and drug development. Let Si is the annotated gene-set corresponding to ith type of biological functions or pathways given in the database and Mi is the number of genes in Si (i = 1, 2,…,r); N is the total number of annotated genes those construct the entire combine set \(S=\bigcup_{i=1}^{r}{S}_{\text{i}}={S}_{i}\bigcup {S}_{i}^{c}\) such that\(N\le \sum_{i=1}^{r}{M}_{i} ;\) where \({S}_{i}^{c}\) is the complement set of Si. Again, let n is the total number of DEGs of interest and ki is the number of DEGs belonging to the annotated gene-set Si. This problem is summarized by the following contingency table (Table 1):

Table 1 Contingency table.

The probability of observing exactly ki DEGs in Si (ith functional/pathway annotated gene-set) out of n DEGs can be modeled by the hypergeometric distribution. Hence, the probability of observing ki or more genes in Si out of n DEGs can be calculated by the cumulative probability as follows

$${p}_{i}=1-{\sum }_{j=0}^{{k}_{i}}\frac{\left(\genfrac{}{}{0pt}{}{Mi}{j}\right)\left(\genfrac{}{}{0pt}{}{N-Mi}{n-j}\right)}{\left(\genfrac{}{}{0pt}{}{N}{n}\right)}, i=\text{1,2},\dots ,r.$$

The subset of DEGs belonging to Si is said to be significantly enriched if its adjusted p-value (pi) is less than 0.05 by controlling the FDR at 5%. The g:GOSt core, a free web tool for functional analyses (https://biit.cs.ut.ee/gprofiler/gost), embedded with the g:Profiler web server138 utilizes the cumulative hypergeometric approach to calculate these p-values (pi’s) and their adjusted p-values for multiple testing using the procedure of Benjamini and Hochberg133. The g:Profiler is a regularly updated database for performing GO functional and KEGG pathway enrichment analysis. Therefore, in this paper, we performed functional and pathway enrichment analysis by using the g:GOSt tool entrenched in the g:Profiler web server to disclose the statistically significant GO terms of biological processes, molecular functions and cellular components and KEGG pathways for DEGs associated with the SARS-CoV-2 infections.

Protein–protein interaction (PPI) network analysis of DEGs

Protein–protein interactions (PPIs) are the physical magnetism of two or more protein molecules that occur due to biochemical reactions steered by hydrogen bonding, electrostatic forces and the hydrophobic effect. Generally, a protein cannot work without interaction with one or more other proteins. The PPI network contributes to the formation of larger protein complexes for performing a specific task139. It carries out many molecular functions and biological processes including protein function, cell-to-cell interactions, metabolic and developmental control, disease incidence, and therapy design. A PPI network is represented as an undirected graph, where nodes and edges indicate proteins and their interactions, respectively. A node having the largest number of significant interactions/connections/edges with other nodes is considered as the top ranked hub-protein. Therefore, the PPI network analyses of DEGs are now widely using to explore hub-DEGs/proteins. In this study, the PPI network of DEGs was constructed by using the NetworkAnalyst140 with the STRING database141 plagued in Cytoscape. A topological exploration based on dual-metric measurements degree (> 10) and betweenness were utilized to determine the highly representative DEGs/proteins those are also known as hub-DEGs/proteins.

Regulatory network analysis of hub-DEGs

A gene regulatory network (GRN) shows molecular regulators that interact with each other to control the gene expression levels of mRNA and proteins. Transcription factors (TFs) and microRNAs (miRNAs) are considered as the most important molecular regulators of genes. A transcription factor (TF) is a protein that binds to a specific DNA region (promoter/enhancer) and regulates gene expression by promoting or suppressing transcription. TFs are considered as the main players in GRN. A miRNA is a small single-stranded non-coding RNA molecule (containing about 22 nucleotides) that works in RNA silencing and post-transcriptional regulation of gene expression. There are up to 1600 TFs and 1900 miRNAs in the human genome. A TFs and hub-DEGs/proteins interaction network is considered as an undirected graph, where nodes indicate TFs or hub-DEGs and edges represents interactions between TFs and hub-DEGs, respectively. A TF-node having the largest number of significant interactions/connections/edges with hub-DEGs nodes is considered as the top ranked hub-TF regulator of hub-DEGs. To explore hub-TFs of hub-DEGs, we contracted the interaction network between TFs and hub-DEGs by using the NetworkAnalyst tool140 based on JASPAR database142. Similarly, miRNA and hub-DEGs interaction network was constructed through the NetworkAnalyst based on TarBase V8.0143 database to identify the key regulatory miRNAs for hub-DEGs. These key regulatory biomolecules were selected based on the highest topological matrices (degree of connectivity and betweenness) applied on the interaction network.

Association of hub-DEGs with comorbidities

To investigate the association of hub-DEGs with other diseases, we performed diseases versus hub-DEGs interaction network analysis by using the NetworkAnalyst tool140 based on DisGeNET144 database. We also performed survival analysis based on the expression of hub-DEGs with lung cancer patients by using Kaplan–Meier (KM) plotter145 to investigate the association of hub-DEGs with lung cancer, since SARS-CoV-2 samples were collected from lung tissue and also it affects the lung mostly. The KM plotter utilizes the log rank statistic to test the significance of association.

Drug repurposing by molecular docking study

To propose in-silico validated efficient FDA approved repurposed drugs for the treatment of SARS-CoV-2 infections, we employed molecular docking between the target receptor proteins and drug agents. We considered our proposed hub-DEGs based hub-proteins and associated TFs proteins as the drug target receptor proteins and SARS-CoV-2 transcriptome-guided top listed 90 drugs out of 3410 FDA approved antiviral drugs published by Beck et.al. 202050 as drug agents or ligands for docking analysis. The molecular docking study requires 3-Dimensional (3D) structures of both receptor proteins and candidate drugs. We downloaded the 3D structure of all targeted receptor proteins from Protein Data Bank (PDB)51 and SWISS-MODEL52, a homology modeling based database. The 3D structures of our selected 90 drug agents (Supplementary Table S1) were downloaded from PubChem database54. The 3D structure of the target proteins was visualized using Discovery Studio Visualizer 2019146 and the water molecules, co-crystal ligands which were bound to the protein were removed. Further, the protein was prepared using USCF Chimera and Autodock vina147 in PyRx open source software by adding charges and minimizing the energy of the protein and subsequently converting it to pdbqt format147,148,149. The exhaustiveness parameter was set to 8. The Protein–Ligand Interaction Profiler (PLIP) web service150 and PyMol151 were used to analyze the docked complexes for surface complexes, types and distances of non-covalent bonds. Let Aij denotes the binding affinity between ith target protein (i = 1, 2, … , m) and jth drug agent (j = 1, 2, … , n). Then target proteins are ordered according to the descending order of row sums \({\sum }_{j=1}^{n}{A}_{ij}\), j = 1,2, … , m, and drug agents are ordered according to the descending order of column sums \({\sum }_{i=1}^{m}{A}_{ij}\), j = 1,2, … , n, to select the top ranking few drug agents as the candidate drugs. The average binding affinity score less than or equal to − 8.0 was considered as the better drug selection criterion against the receptor proteins. Then we validated the proposed repurposed drugs by molecular docking with the top listed receptor proteins associated with SARS-CoV-2 infections that were obtained by the literature review. To select the top listed receptor proteins associated with SARS-CoV-2 infections, we reviewed 22 recently published articles28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49 and selected the top listed 11 receptor proteins.

Molecular dynamic (MD) simulations

MD simulations were carried out by using YASARA Dynamics software152, and the AMBER14 force field153 to study the dynamic behavior of the top-ranked protein–ligand complexes. A total of six different systems were used to run MD simulation. The systems included top three hits, TLR2_Torin-2, TLR2_Rapamycin, USP53_Radotinib complexes corresponding to our proposed receptors and another three hits, IRF7_Torin-2, STAT1_Rapamycin, STAT1_Radotinib complexes from top listed independent receptors. Before simulation, the target-drug complex's hydrogen bonding network was optimized and solvated by a TIP3P154 water model in a simulation cell. Periodic boundary conditions were maintained with a solvent density of 0.997 g L −1. Titratable amino acids in the protein complex were subjected to pKa calculation during solvation. The initial energy minimization process of each simulation system, consisting of 41,645 ± 10, 41,645 ± 10, 95,924, 105,924 ± 10, 105,924 ± 10 and 85,924 atoms for TLR2_Torin-2, TLR2_Rapamycin, USP53_Radotinib, STAT1_Rapamycin, STAT1_Radotinib and IRF7_Torin-2 complexes was performed by a simulated annealing method respectively, using the steepest gradient approach (5000 cycles). Each simulation was run with a multiple time step algorithm155, using a time-step interval of 2.50 fs under physiological conditions (298 K, pH 7.4, 0.9% NaCl)156. All bond lengths were constrained using the linear constraint solver (LINCS)157 algorithm, and SETTLE158 was used for water molecules. Long-range electrostatic interactions were described by the PME159 methods, and, finally, 100 ns MD simulation was accomplished at Berendsen thermostat160 and constant pressure. The trajectories were recorded every 250 ps for further analysis, and subsequent analysis was implemented by default script of YASARA161 macro and SciDAVis free software available at http://scidavis.sourceforge.net/. All snapshots were then subjected to YASARA software's MM-PBSA (MM-Poisson–Boltzmann surface area) binding free energy calculation using the formula below162,

$$\text{Binding free Energy}= {\text{E}}_{\text{potReceptor}}+{\text{E}}_{\text{solvReceptor}}+ {\text{E}}_{\text{potLigand}}+ {\text{E}}_{\text{solvLigand}}- {\text{E}}_{\text{potComplex}}- {\text{E}}_{\text{solvComplex}}.$$

Here, MM-PBSA binding energy was calculated using YASARA built-in macros using AMBER 14 as a force field, with larger positive energies indicating better binding163. The PBSA is one of the most appealing solvation systems used in computer-aided drug design techniques to determine binding energy of protein–drug complexes164,165.