Introduction

Sepsis is a major cause of mortality and morbidity in the intensive care unit1. According to the sepsis-3 definition, sepsis is defined as life-threatening organ dysfunction caused by a dysregulated host response to infection2. Sepsis can be caused by any type of infection, including bacteria, influenza, pneumonia, and food poisoning3. Characterized by systemic inflammatory response syndrome (SIRS) and a suspected or confirmed infection4, sepsis can induce fatal medical conditions, such as septic shock accompanied by low blood pressure and a high rate of mortality4 and acute respiratory distress syndrome (ARDS)5. However, the differences between these sepsis-related diseases are still unclear.

Transcriptome analysis is a cost-effective tool to explore gene expression in sepsis patients, and to develop diagnostic biomarkers and targeted therapies4,6. While recombinant human activated protein C (rhAPC) is the sole US Food and Drug Administration (FDA)-approved medicine for severe sepsis treatment7, contemporary systems biology methods hold the potential to unravel the underlying disease mechanisms and bridge the gap between research into clinical practice8.

Several sepsis studies involve gene-centric transcriptome analysis. For example, gene signatures for severity and endotype were identified from RNA-Seq data4. Transcriptome analysis revealed the association of sepsis survival with a robust immune response and the presence of missense variants in VPS9D19. Microarray analysis of sepsis patients with ARDS revealed inflammasome-regulated cytokines such as IL-18 are important in acute lung injury10. Upregulated genes related to low-density neutrophils in sepsis leukocytes were identified11. Most of these studies used traditional differential gene expression analysis (DGE) that predominantly focused on individual genes, leading to challenges in result reproducibility due to gene functional redundancy across studies. In contrast, module-based analysis, which centers on clusters of co-expressed genes, has proven to be more reliable than single gene analysis12. Therefore, we collected six independent datasets and used the module-based analysis to find recurrent modules.

Here, we applied weighted gene co-expression network analysis (WGCNA) on the largest RNA-Seq sepsis dataset GSE185263 to identify reference modules, to which other datasets can be projected. We found four critical modules associated with cell cycle, neutrophil activation, and natural killer cell mediated cytotoxicity. These modules effectively distinguish between patients with varying survival times. Furthermore, our analysis unveiled sepsis-related drugs, some of which are well-supported by existing literature. Our analysis offers significant insights into sepsis diagnosis, prognosis, and treatment.

Results

Co-expressed gene modules identified in sepsis

Although sepsis transcriptome has been profiled in several studies, the small sample size and microarray platform limit the analysis power. We used the currently largest sepsis RNA-Seq dataset GSE185263 to construct a gene co-expression network (Fig. 1A). Using the WGCNA method, a total of 37 co-expressed gene modules were identified (Table 1). To test if these modules are stable, the module stability analysis was performed. Results showed that the average module stability was larger than 0.96 with a standard deviation of 0.015 (Fig. 1B). Functional enrichment analysis revealed that these modules were associated with transcription factors, translation, immune response, cell cycle, mitochondrion, leukocyte activation, platelet activation, and interferon signaling (Table 1). These modules are associated with SARS-CoV-2 infection, except that M42 is associated with mouse hepatitis virus (MHV) infection, M50 and M57 are associated with SARS-COV-2 infected mouse heart, and M70 is associated with SARS-CoV-2 and seasonal coronavirus host factors. While M60 is novel, it is not enriched in any diseases. Figure 1C shows the differential expression patterns of module hub genes across disease status, cell types, and cell status. All the module genes and related information is provided in Supplementary Table 1.

Figure 1
figure 1

Gene modules identified in the sepsis RNA-Seq dataset GSE185263 are stable. (A) Cluster dendrogram showing the cluster tree and module assignment. Each module is denoted with different colors. (B) Radar plot showing the correlation of the original module connectivity and the sampled one for each module. (C) Expression of hub genes in each module in different diseases, cell types, and cell status. TS T cell subclusters, NS NK cell subclusters, MS monocyte subclusters, MK megakaryocyte subclusters, DS dendritic cell (DC) subclusters, BS B cell subclusters.

Table 1 Functional and immune cell annotation of the identified modules.

Modules were associated with immune cell populations

To check if these identified modules were associated with immune cell proportion, CIBERSORT was used to deconvolute GSE185263 into immune cell proportion. Correlation analysis between immune cell proportion and module expression indicates the links between them (top assignment with an absolute R > 0.6 and P < 0.01, Table 1). These results were confirmed in a single-cell transcriptome dataset. For example, IGHV3-21 is the hub gene of M27 and is highly expressed on the BS3 sub-cluster which is composed of B cells (Fig. 1C). Indeed M27 expression was significantly and positively correlated with plasma B cell abundance. In several modules (M5, M17, M30, M33, M49, and M56), the reverse correlation between neutrophils and, B cell, T cells was observed. This may indicate that neutrophils suppress T cell proliferation.

Modules were differentially expressed in sepsis and control

As the high dimensional sepsis transcriptome data has been reduced to tens of modules, module expression was compared between different sepsis groups. We found that 27 modules were differentially expressed between sepsis and normal groups. Among them, 17 modules were up-regulated and 10 modules were down-regulated in sepsis compared to control (Fig. 2A). Modules M14, M63, and M64 were also significantly changed in non-survived patients compared with survived patients (Fig. 2B). The expression patterns of hub genes of these modules were confirmed in another dataset (Fig. 2C). The three modules could perfectly discriminate samples of sepsis and control in both datasets GSE185263 and GSE65682 (Fig. 3). Other modules including M2, M33, M35, M57, and M63 had an AUC larger than 0.9.

Figure 2
figure 2

Significantly changed modules in sepsis RNA-Seq dataset GSE185263. (A) The sepsis transcriptome was compared with normal control. Green bars indicate upregulated modules. Red bars indicate downregulated modules. The statistical significance was set at 0.01/38. (B) non-survived patients were compared with survived patients. (C) The expression of hub genes of M14, M63, and M64 was confirmed in another sepsis dataset.

Figure 3
figure 3

Modules M14, M63, and M64 can recurrently discriminate samples of sepsis and control in both datasets GSE185263 and GSE65682.

Modules were differentially expressed in SIRS, sepsis, septic shock, and ARDS

Severe sepsis or septic shock is characterized by an excessive inflammatory response to infectious pathogens5. We analyzed the dataset GSE63042 which contains transcriptome data for SIRS, sepsis, septic shock, and sepsis death. Multivariate analysis of variance (MANOVA) was applied to dataset GSE63042. M33 was upregulated in severe sepsis, sepsis shock, and sepsis death compared to SIRS. M52 was downregulated in severe sepsis and sepsis death compared to SIRS. M57 was upregulated in sepsis death compared to severe sepsis. M59 was upregulated in sepsis death compared to SIRS, uncomplicated sepsis, and septic shock. M63 was upregulated in severe sepsis, septic shock, and sepsis death compared to SIRS (Fig. 4).

Figure 4
figure 4

Modules had different expression patterns in SIRS, sepsis, and septic shock.

ARDS is a devastating complication of severe sepsis, which results in a high mortality rate. We analyzed the dataset GSE32707 and found that M27, M49, M59, and M63 were upregulated and M30, M35, M39, M40, M42, M60, M66, and M70 were downregulated in day 0 ARDS compared to normal control.

Modules were associated with the survival of critically ill patients with sepsis

To check if the identified modules are associated with survival, we performed survival analysis in the dataset GSE65682 of critically ill patients with sepsis. Modules M17, M49, M52, and M59 were found to be associated with patient survival after adjusting for endotype class and age (Fig. 5). Hub genes of these modules were also associated with patient survival (Fig. 5). In sepsis dataset GSE185263, we found that M52 (R = − 0.34, P = 8E−11) and M60 (R = 0.35, P = 7E−12) had the highest absolute correlation value with Sequential Organ Failure Analysis (SOFA) score. M60 is enriched with non-coding genes. Survival analysis for all the modules can be explored at the sepsis gene co-expression database (https://liqs.shinyapps.io/sepsis/).

Figure 5
figure 5

Modules M17, M49, M52, and M59, and hub genes TBX21 and CEACAM6 were associated with patient survival after adjusting for endotype class and age. The analysis was performed in dataset GSE65682.

Module expression was changed in severe A (H1N1) pandemic influenza

Critically ill patients with sepsis caused directly by influenza viruses, or by influenza-induced secondary bacterial infections are increasing worldwide. Therefore, we performed differential analysis for GSE27131 to examine if these modules are associated with disease development in severe A (H1N1) pandemic influenza. ME59 was found to be strongly up-regulated on day 6 compared to day 0 (P = 0.003), but not changed on day 0 compared to the control. Thus, M59 was associated with severe influenza development. We found that the M59 hub gene CEACAM6 and its co-expressing CEACAM8 were both up-regulated in the influenza dataset (Fig. 6A,B). CEACAM6 has been reported as a protein receptor for the influenza A virus13. Interestingly, CEACAM6-high airway neutrophils and epithelial cells are a feature of severe asthma14. In our analysis, the module was associated with neutrophil activation (P = 5E−22), indicating CEACAM6 as a potential biomarker for severe influenza that requires mechanical ventilator support. The finding was confirmed in another dataset GSE32707, in which both day 0 and day 7 sepsis patients with acute respiratory distress syndrome (ARDS) had stronger M59 expression compared to control (P < 0.0001)10. Genes CEACAM6 and CEACAM8 were also confirmed to be up-regulated (Fig. 6C,D). In M59, many genes were co-expressed with CEACAM6 (Fig. 6E), such as CEACAM8 which can also separate patients with long and short survival (Fig. 6F).

Figure 6
figure 6

M59 hub gene CEACAM6 was highly co-expressed with CEACAM8, which is also prognostic for survival. (A) CEACAM6 was up-regulated in day 6 septic influenza patients. (B) CEACAM8 was up-regulated in day 6 septic influenza patients. (C) CEACAM6 was up-regulated in day 7 septic ARDS patients. (D) CEACAM8 was up-regulated in day 7 septic ARDS patients. (E) The top 100 connections in M59. (F) CEACAM8 can separate patients with long and short survival.

Modules can separate patients with severe community-acquired pneumonia

Diagnosis of severe pneumonia remains challenging because of a lack of correlation between the clinical status and etiology15. Causes of severe respiratory failure include bacterial pneumonia, influenza pneumonia, mixed bacterial and influenza A pneumonia, and non-infective SIRS. We used dataset GSE40012 to test if the modules can separate pneumonia patients with different etiologies. For each disease, we selected the top one or two modules to describe here. For example, M56 can separate bacterial pneumonia from other pneumonia (Fig. 7A). M17, M27, and M59 can separate influenza pneumonia from other pneumonia (Fig. 7B,C). M69 can separate mixed-type pneumonia from other pneumonia (Fig. 7D). M67 can separate SIRS from other pneumonia (Fig. 7E). M14 can separate health control from pneumonia (Fig. 7F).

Figure 7
figure 7

Modules can separate pneumonia patients with different etiologies. The analysis was performed in GSE40012. (A) Bacterial pneumonia. (B,C) Influenza pneumonia. (D) Mixed bacterial and influenza A pneumonia. (E) SIRS. (F) Health control.

Candidate drugs associated with sepsis

Four modules M17, M49, M52, and M59 were associated with patient survival. We searched the genes of these modules in DSigDB to identify related chemicals. We found that M17 was enriched with testosterone (P = 3E−145) and phytoestrogens (P = 2E−55) signature genes. M49 was enriched with Ibuprofen (P = 6E−4) and urea (P = 8E−4) signatures genes. M52 was enriched with Dichlorvos (P = 1E−6) and ZIRAM (P = 1E−6) signatures genes. M59 was enriched with Potassium persulfate (P = 2E−5) and adenylyl sulfate (P = 4E−4) signatures genes. GWAS Catalog analysis revealed that M49 and M59 genes were associated with monocyte percentage of white cells (P = 0.004) and vitamin B12 levels (P = 0.007) respectively. We also performed docking analysis to the hub gene of M49 AQP9 in complex with urea (Fig. 8A). The interaction may occur at sites Ala214 and Asn216 (Fig. 8B).

Figure 8
figure 8

Docking analysis for M49 hub gene AQP9 in complex with urea. (A) AQP9 in complex with urea. (B) The pose view shows the interaction sites.

A user-friendly database for sepsis gene co-expression was constructed

Although sepsis transcriptomes have been analyzed in several publications, there are still few web tools available for researchers. Thus, we for the first time constructed an easy-to-use web tool that provides informative data about the above sepsis gene co-expression network. The modules of the database include module gene list, module hub gene list, module network visualization, gene function investigation, and differential module analysis (Fig. 9). The database is available at https://liqs.shinyapps.io/sepsis/. The database will provide a useful tool for researchers to generate testable hypotheses.

Figure 9
figure 9

An easy-to-use gene co-expression database for sepsis was constructed.

Discussion

We first applied WGCNA to the currently largest RNA-Seq sepsis dataset and identified 37 modules. The clinical relevance of these modules was validated in sepsis, sepsis shock, SIRS, ARDS, and severe influenza A patients. Although gene co-expression networks have been constructed in several studies, they are limited to sepsis samples from the microarray platform1,16. For example, two studies used the same method and dataset but got results with few overlapped genes1,17. It has been reported that module-based analysis will give more consistent results18. Therefore, we comprehensively analyzed the sepsis network and provided detailed information about the modules.

We used the RNA-Seq dataset GSE185263 for reference module identification and projected transcriptome data from the other 5 datasets to them. Based on these modules, we can reduce the high dimensional transcriptome data to tens of modules, and then correlate modules with clinical data to infer important modules. Among the identified modules, two modules M34 and M69 were related to innate immunity, and two modules M30 and M35 were related to mitochondrion. These modules were annotated with immune cells. With this information, the modules, immune cell types, and clinical parameters can be linked. For example, M30 was annotated as Th1 CD4+ T cell and was upregulated in ARDS. Increased mitochondrial biogenesis can prolong CD4+ T cell activation, indicating the critical role of M30 in ARDS19. M33 was enriched with activated myeloid leukocytes and was annotated as neutrophils and CD8+ T cells (–). Neutrophils can suppress CD8+ T cells20. M33 was upregulated in sepsis, whose expression was negatively correlated with CD8+ T cell proportion, indicating the role of M33 in the dysfunctional/decreased CD8+ T cells. Interestingly, PD-1 expression was also upregulated in sepsis (data not shown), indicating the unfavorable immune status with decreased and exhausted CD8+ T cells in sepsis21. M43 was enriched with the B cell receptor signaling pathway and was annotated as B cell and monocyte (–). M43 was downregulated in sepsis, indicating the increased monocyte in sepsis. Inflammatory monocytes can hinder antiviral B cell responses22. M63, the most significantly upregulated module in sepsis, was annotated as endothelial and CD4+ T cell (–). Endothelial are activated in sepsis23. It has been reported that activated murine lung endothelial can induce Tregs to suppress CD4+ T cell proliferation24. M69, upregulated in sepsis, was annotated as pDCs and involved in Interferon alpha/beta signaling. pDCs are a specialized cell type producing natural interferon (IFN)25. Increased circulating pDCs during sepsis but decreased pDCs in nonsurvivors compared to survivors was observed, indicating it may function as an early predictive biomarker for the outcome of sepsis26.

Hub genes of the modules are important in sepsis with literature support. For example, the Hub gene of M43 PAX5 is the guardian of B cell identity and function27. TBX21 expression in M52 is inversely associated with SOFA and mortality28. SLC51A in M63 is identified as a reduced mortality signature4. AQP9 in M49 can regulate neutrophil cell migration and impact sepsis survival29. Mutation of RBCK1 in M69 is associated with recurrent episodes of sepsis in children and causes death30. E3 ubiquitin ligase gene TRIM31 is involved in the development of sepsis31. TRIM10 in M23 is of the same gene family member, but with unexplored roles in sepsis.

Many biomarkers based on a single gene cannot be reproduced across studies, even studies with the same datasets analyzed. We used a more stable module to discriminate samples. The modules could separate sepsis and control samples reproducibly in two independent datasets GSE185263 and GSE65682. In the pneumonia dataset GSE40012, M17 had the highest ability to separate influenza pneumonia from other pneumonia. M17 is involved in the cell cycle. A previous study showed that cell-cycle regulation genes were signatures for influenza, which is different from that caused by bacterial pathogens or SIRS15. This information suggests that our results are reliable. However, due to the small sample size, the result of modules separating SIRS, sepsis, sepsis shock, and ARDS is not satisfactory. Current literature also does not provide a satisfactory molecule for discrimination of these diseases, which may be attributed to the inherent heterogeneity of the complex disease6. We used the module-centric method to overcome the problem as modules are more stable than genes. Besides, a cohort with a larger sample size should be used when aiming to address such issues.

In the candidate drugs analysis, we found that many of the results have related literature support. For example, M17 was associated with testosterone and phytoestrogens. Testosterone is the major sex hormone in males, which plays a key role in immune depression32. It has been reported that the hospital mortality rate was higher in male than in female patients32. In mice, the depletion of testosterone can improve survival after polymicrobial sepsis32. Interestingly, in dataset GSE65682, high M17 was associated with short survival of sepsis patients. Therefore, testosterone can be a useful marker and target for disease prognosis and treatment. Isoflavones are a common type of phytoestrogens. Soy isoflavone daidzein pretreatment can improve survival in a mouse model of sepsis33. Hub gene of M17 NUSAP1 can be downregulated by 17b-estradiol and soymilk which contains a large quantity of daidzein34, indicating the potential mechanism of action. High M49 expression patients have longer survival. M49 was associated with ibuprofen and urea. Ibuprofen has been applied in treating patients with severe sepsis35, however, its mechanism is still unknown36. Daily ibuprofen increased the kidney urine output by increasing some phosphorylated forms of AQP237. The hub gene of M49 is AQP9 which is involved in urea elimination38. Thus, urea may play a negative role in sepsis and can be a potential biomarker. Interestingly, a recent report concluded that blood urea nitrogen (BUN) level is independently and positively linked with the presence and severity of sepsis in neonatal39. Thus, urea may play a key role in sepsis development, and may serve as a treatment target. M52 was associated with dichlorvos and ZIRAM (fungicide). Dichlorvos is a highly hazardous pesticide that can cause sepsis40. M59 was associated with potassium persulfate and adenylyl sulfate. Critically ill patients with abnormal K+ levels had a higher incidence of ICU mortality than patients with normal K+ levels41. Besides, M49 and M59 were also associated with monocyte percentage of white cells and vitamin B12 levels. Interestingly, monocyte counts were independently associated with mortality in patients with sepsis42. Serum vitamin B12 levels have been positively correlated with increased mortality in critically ill patients43. Thus, our analysis generated novel hypotheses that merit future validation. These drugs may provide new treatment options for sepsis.

Conclusion

In conclusion, our study identified prognostic modules, hub genes, and drugs for sepsis. These modules had good discrimination ability in multiple datasets. A sepsis gene co-expression database was first developed. Our analysis provides important information for the sepsis study.

Material and methods

Sepsis datasets

Six sepsis datasets were retrieved from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database44. Two of the datasets were from RNA-Seq and four were from gene expression microarray. Dataset GSE185263 currently is the largest RNA-Seq sepsis dataset, with 348 sepsis samples and 44 healthy controls4. Detailed information about these datasets is shown in Table 2. Expression matrices for these datasets were downloaded directly from the Series Matrix File(s) link provided in the GEO database. The GSE185263 expression matrix was filtered with a mean count > 10 and a standard deviation > 0.1 before downstream analysis. Finally, 15,688 genes were retained for gene co-expression network analysis. For other datasets, all the gene IDs were converted to Ensembl as the ID types across datasets were different.

Table 2 Information for the datasets analyzed in the study.

Gene co-expression module identification

Weighted gene co-expression network analysis (WGCNA) was performed to identify gene co-expression modules. According to the WGCNA R package manual45, parameters were set as following: softPower = 16, corOptions =  “use = '‘p’”, networkType =  “signed”, minModuleSize = 30, deepSplit = 4, MEDissThres = 0.2. It has been reported that rank-based networks require a lower power parameter to achieve a scale-free network46. Another advantage of this conversion is that novel transcriptomes can be projected to the reference modules to calculate module-level expression values12. Therefore, we converted the expression values into ranks before network analysis. Pearson correlation coefficient was calculated for each gene in the gene rank matrix. An adjacency matrix was constructed by raising the correlation matrix to a power of 16. This step generated a scale-free network, which is a common property of the biology network. The weighted network was transformed into a network of topological overlap (TO)—a metric that measures not only the correlation of two genes but also the extent of their shared correlations across the weighted network. Genes were hierarchically clustered based on their TO. Finally, co-expression gene modules were identified by the Dynamic Tree Cut algorithm45. As genes in a module are highly correlated, module genes can be summarized as a module eigengene (ME) by singular value decomposition47. WGCNA provides information about gene module assignment, gene connectivity, and module expression information. Connectivity is the sum of correlations of a gene with all other genes in the module or network. Hub gene in a co-expression module tends to have the highest connectivity. Module stability was tested for each module by half sampling 1000 times and was represented by a correlation of intra-module connectivity between the original one and the sampled one in the form of mean ± standard deviation48. The module-level expression for other datasets was calculated by the moduleEigengenes function.

Module annotation

Functional enrichment analysis of modules was performed in the gProfileR package49. For simplicity, only representative terms with significance P < 0.01 were recorded. CIBERSORT tool was used to estimate the immune cell proportions in the blood from transcriptome data50,51. ME matrix was correlated with the immune cell proportion matrix by Spearman correlation. For each module, only the top correlated immune cell type with an absolute correlation R > 0.6 was kept. Drug Signatures Database (DSigDB) was used to convert module gene sets into drugs52. Enrichr was used to check the overlap of the module genes with known disease gene sets53. GWAS Catalog was used to test if the known SNPs in a module gene set were associated with phenotypes54. The significance was set at 0.05.

Statistical analysis

The Student’s t-test was performed when comparing two groups of expression values. The significance value was adjusted as 0.05/n, where n indicates the times comparison performed according to the Bonferroni correction. Analysis of variance (ANOVA) was performed when comparing expression data from more than two groups for a single dependent variable. Multivariate Analysis of Variance (MANOVA) was performed when comparing expression data from more than two groups for multiple dependent variables. An adjusted P value smaller than 0.05 was considered statistically significant. The Student’s t-test ANOVA, and MANOVA were performed in R package base, car, and stats55.