Introduction

Sepsis manifests as signs of infection in conjunction with acute organ dysfunction [1]. The high mortality rate due to severe sepsis remains a serious problem despite the increasing understanding of the pathogenesis and the continuous advances in modern treatment techniques, such as appropriate antibiotics, aggressive resuscitation and organ support [2, 3]. Therefore, there is an urgent need to search for an effective treatment for sepsis patients to improve therapeutic efficacy and prognosis.

Pyroptosis is a specific form of cell death leading to loss of plasma membrane integrity, which is induced by the activation of sensors using the inflammasome [4]. Pyroptosis can be triggered by microbial infections, and proper pyroptosis can protect multicellular host organisms from bacterial and microbial infections [5]. However, excessive pyroptosis can lead to massive inflammatory reactions, such as septic shock and multiorgan failure [6]. Although it has been shown that there is a correlation between pyrexia and sepsis, a specific regulatory mechanism is lacking to elucidate the relationship of both.

In this study, we utilized multiple bioinformatics methods to explore genes associated with sepsis. The PRGs mostly related to sepsis were finally clarified by combining machine learning algorithms. We investigated the genetic connection between pyroptosis and sepsis. The PRGs associated with sepsis could be employed as biomarkers for disease diagnosis and therapy monitoring, as well as a reference for early therapeutic targets for sepsis. Long noncoding RNAs (lncRNAs), a type of nonprotein transcript, are involved in messenger RNA (mRNA) splicing and maturation and mRNA stabilization [6, 7]. It has been demonstrated that lncRNAs have a nonnegligible regulatory role in the pathophysiological mechanisms and organismal dysfunction of sepsis [8]. Thus, we constructed a ceRNA network of lncRNAs around the PRGs [9].

Methods

Data retrieval and processing

We retrieved the dataset using the search terms "(((Expression profiling by array [Filter]) AND Homo sapiens [Organism]) AND blood [Sample Source]) AND sepsis" based on the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) database, which is an open-source database consisting of a large amount of tumour or nontumor data. Two suitable datasets (GSE134347 and GSE32707) were obtained and normalized for correction by applying the “sva” R package. Fifty-nine and sixty-nine samples were excluded from the two datasets that were not relevant to this study. Thirty-three genes associated with PRGs were available based on prior reviews (Additional file 4: Table S4). Detailed information on the GEO datasets is listed in Table 1. The flow diagram of the study is shown in Fig. 1.

Table 1 Details of the datasets
Fig. 1
figure 1

Workflow chart of data preparation, processing, analysis, and validation

Identification of DEGs

The identification of differentially expressed genes (DEGs) facilitates the distinguishing of different body state conditions as well as the understanding of gaps between them at the genetic level. The expression matrix, extracted from healthy and sepsis groups, was collated and subjected to differential expression analysis by the “limma” R package. Genes with |log2 Fold Change|> 1 and P value < 0.05 were considered DEGs. Meanwhile, these genes were visualized in the form of a volcano map and heatmap by the "ggplot2" and "pheatmap" R packages.

Weighted gene coexpression network analysis (WGCNA)

The “WGCNA” package was employed for weighted analysis to identify coexpression modules associated with sepsis [10]. First, we chose the optimal soft threshold to construct the adjacency matrix by a calculation and converted it into a topological overlap matrix (TOM). Subsequently, we constructed different modules based on a hierarchical clustering approach and randomly assigned colours to each module, with the difference in colour representing the difference in relevance. The genes in these modules were considered sepsis-related module genes. The genes located in the most relevant block of sepsis were used for subsequent correlation analysis.

GO and KEGG pathway enrichment analysis of the sepsis-related genes

Genes jointly belonging to the sepsis-related module genes with DEGs were regarded as sepsis-related genes. Genetic enrichment analysis, Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were used to measure the distribution trend of genes in a phenotype-related gene table to evaluate their contribution to phenotype. Hence, GO and KEGG enrichment analyses of these genes were executed using the database for annotation, visualization, and integrated discovery (DAVID: https://@@@@@@@david.ncifcrf.gov) (11). The results were visualized using the “ggplot2” package.

Analysis of the protein‒protein interaction network and hub genes

We applied the Search Tool for the Retrieval of Interacting Genes (STRING) (https://string-db.org/) database to perform protein‒protein interaction (PPI) analysis with the aim of exploring the interconnections between proteins. Subsequently, the raw PPI network was downloaded and built through Cytoscape, a widely used visualization tool. Screening of hub genes from sepsis-associated genes based on overlapping genes was done using multiple algorithms of CytoHubba, a plugin of Cytoscape. Then, the degree of correlation and association between hub genes was visualized by the "corrplot" package. To explore the main mechanisms of hub genes in the pathogenesis of sepsis and their associated pathways of action, we performed functional enrichment analysis using the R package “clusterProfiler”, and FDR < 0.05 was considered significant.

Unsupervised consensus clustering

We applied an unsupervised clustering approach for subtyping sepsis samples based on 33 PRGs provided by prior reviews, and the algorithm was executed with the ConsensusClusterPlus R package. The number of Clusters k was set from 2 to 9. The cumulative distribution function (CDF) and the area under the CDF curve were used to determine the optimal number of clusters. Subsequently, we identified the clustering results with principal component analysis (PCA).

Identification and validation of pyroptosis-related genes

The R package “caret” was used to build the least absolute shrinkage and selection operator (LASSO) model, an approved machine learning algorithm, for screening the key genes closely related to sepsis from hub genes. The external dataset GSE32707 was utilized to validate the accuracy of the model in the form of ROC curves, and an AUC value greater than 0.65 was recognized as having a better accuracy. By integrating the key genes obtained above as well as 33 pyroptosis genes, we obtained the pyroptosis genes most strongly associated with sepsis. Finally, the NetworkAnalyst online tool (https://www.networkanalyst.ca/) was used to construct a lncRNA network to gain more insights into the role played by pyroptosis genes in sepsis.

Results

Data preprocessing and DEG screening between healthy and sepsis samples

The results of the normalization of the GSE134347 expression data in 83 healthy and 156 sepsis groups are shown in Fig. 2A and 2B. As shown in the PCA results (Fig. 2C), the standardized genes demonstrated by the heatmap (Fig. 2D) could clearly distinguish the healthy samples from the sepsis samples and facilitate further analysis. We screened 575 DEGs based on |log2 Fold Change|> 1 and P value < 0.05, and duplicate gene symbols were deleted (Fig. 2E). The heatmap showed 30 upregulated and 30 downregulated genes (Fig. 2F). The details of the DEGs are provided in Additional file 1: Table S1.

Fig. 2
figure 2figure 2

Data preprocessing and DEG screening. A, B Before & after data normalization. C PCA: The farther the two samples are from each other, the greater the difference is between the two samples in gene expression patterns. D Heatmap: Gene expression differed between the samples of the two groups. E The volcano plot of DEGs: The red points represent upregulated genes, and blue points represent downregulated genes. F The heatmap of DEGs: The upregulated genes are shown in red, and downregulated genes are shown in blue

Identification of core modules by WGCNA

We analysed the critical gene modules closely related to sepsis using the WGCNA algorithm. We screened the soft power β = 6 and the scale-free R2 = 0.85 as the most suitable parameters to construct a scale-free network (Fig. 3A). In total, we identified 27 colour modules with different correlations with sepsis. Finally, the brown module exhibited the strongest relationship with sepsis, which included 2,955 genes, r = 0.83, P = 4e − 62 (Fig. 3B). The relationship between modules and disease status was exhibited by the modular significance (MS). Gene significance (GS) was described as the correlation between a gene and clinical phenotype. A total of 382 genes (Additional file 1: Table S1) that they were mostly associated with were screened from this module based on GS = 0.7 and MM = 0.7 (Fig. 3C).

Fig. 3
figure 3

Identification of core modules by weighted gene coexpression network analysis (WGCNA). A Left: Analysis of the scale-free index for various soft-threshold powers (β). Right: Analysis of the mean connectivity for various soft-threshold powers. B The correlation of genes with sample modules is demonstrated by a heatmap. C The relevance of members in the brown module and sepsis

Genes and pathway enrichment analysis

We identified 170 sepsis-related genes (Additional file 1: Table S1) based on the sepsis-related module and DEGs (Fig. 4A). GO terms for 170 genes fall into three categories: biological processes (BP), cellular components (CC), and molecular functions (MF) (Fig. 4B). The results of GO analysis were mainly associated with inflammation, cornification, and granule membrane, such as cellular response to lipopolysaccharide,, tertiary granule membrane, extracellular space and NAD + nucleotidase, cyclic ADP-ribose generating (Additional file 2 Table S2). In addition, we carried out a KEGG pathway enrichment analysis on 170 genes (Fig. 4C). The results of the enrichment analysis consistent with FDR < 0.05 was leishmaniasis. Detailed results of the KEGG analysis are shown in Additional file 2: Table S2. These results were positive for the present study and contribute to further research (Additional files 3, 4).

Fig. 4
figure 4

GO and KEGG analysis of 170 sepsis-related genes. A Venn diagram showing 170 sepsis-related genes obtained by DEGs and WGCNA. B All terms of GO categories of biological process (red), cellular component (blue) and molecular function (green). C KEGG pathway analyses of 170 genes

Identification and analysis of hub genes in sepsis by the PPI network

We acquired the PPI network with an interaction score of 0.400 based on the STRING database, including 163 nodes and 226 edges (Additional files 5, 6, 7, 8, 9, 10). We applied six algorithms (Degree, EPC, MCC, DMNC, Closeness, Betweenness) to mine 13 hub genes from the PPI network (Additional file 3: Table S3), which was the intersection of the top 30 genes of each algorithm (Fig. 5C). Figure 5A displays the network map of the top 30 genes of the degree algorithm. According to the MCODE plugin, the most insignificant module of the PPI network is shown in Fig. 5B. The expression analysis revealed that all 13 genes were expressed at higher levels in sepsis samples than in healthy samples (Fig. 5D). We calculated the correlations among hub genes, and the results demonstrated that they all had significant positive correlations (Fig. 5E). Meanwhile, the correlation network diagram also proved the tightness of the contact among them (Fig. 5F).

Fig. 5
figure 5

Identification and analysis of hub genes by the PPI network. A The top 30 genes of the degree algorithm of the PPI network. B The most insignificant module of the PPI network. C The hub genes were identified by six algorithms (Degree, EPC, MCC, DMNC, Closeness, and Betweenness). D Violin plot of hub gene expression. E Correlation heatmap of hub genes. F Correlation network map of hub genes

Enrichment analysis of hub genes

To further investigate the connection of sepsis development with hub genes, we performed GO and KEGG enrichment analyses. GO enrichment analysis indicated that hub genes were focused on defence response, inflammation regulation and multiple receptor activation (Fig. 6A–C). According to KEGG analysis, the hub genes were involved in various signalling pathways, including the prolactin signalling pathway, leishmaniasis, the IL-17 signalling pathway, growth hormone synthesis, secretion and action (Fig. 6D–G) [12,13,14]. Detailed results of the enrichment analysis are shown in Additional file 2: Table S2. These results confirm the high association between hub genes and sepsis, as well as the apparent variation of hub genes in various immune and inflammatory conditions.

Fig. 6
figure 6

GO and KEGG analysis of hub genes. A–C GO enrichment analysis of hub genes (A: BP, B: CC, C: MF). The size of the node respondents for the number of gene counts. (D) KEGG enrichment analysis of hub genes; the colour of the bar represents the P value. (EG) Prolactin signalling pathway, leishmaniasis, and IL-17 signalling pathway. Red indicates high expression in the pathway, and green indicates low expression in the pathway

Correlation of sepsis and pyroptosis based on subtype clustering

Based on 33 PRGs provided by prior reviews, subtype analysis of sepsis was performed. According to Fig. 7B and 7C, k = 2 or k = 3 values would be acceptable; however, after dividing the samples into 3 groups, some data could not be well clustered; therefore, we decided to separate our data into 2 groups. The data could be well clustered when k = 2 (k: clustering variable) based on Figs. 7B and 7C. The matrix shown in Fig. 7A represents the consensus for k = 2 and indicates a well-defined two-block structure. As shown Fig. 7D and E, 33 PRGs could distinguish Cluster 1 from Cluster 2 from two different perspective, and we concluded that grouping by pyroptosis-related genes of sepsis expression was appropriate (k = 2). Thus, a possible correlation between pyroptosis-related genes and sepsis may also be demonstrated.

Fig. 7
figure 7

Identification of consensus clusters by pyroptosis-related genes. A When k = 2, there is a correlation between groups. B Relative change in the area under the cumulative distribution function (CDF) curve for k values from 2 to 9. C Consensus clustering CDF when the k value ranges from 2 to 9. D PCA of pyroptosis-related genes in the sepsis samples (Cluster 1 is marked in blue, and Cluster 2 is marked in red). E PCA of pyroptosis-related genes in the sepsis samples (Cluster 1 is marked in orange, and Cluster 2 is marked in purple)

Analysis and screening PRGs

We yielded 8 key genes by applying validated machine learning algorithms (LASSO) from 13 hub genes (Fig. 8A1, A2). We used GSE32707, as an external dataset, to evaluate the efficiency of the supervised machine learning algorithms using ROC curves (Fig. 8B). The AUC value of LASSO was 0.74, and we considered it the optimal sepsis prediction model. According to the 33 PRGs provided by prior reviews, only NLRC4 was associated with pyroptosis in the 8 key genes related to sepsis (Fig. 8C). Finally, ROC curves were plotted based on the external validation dataset (GSE32707) to verify the potential value of NLRC4 as an early diagnostic marker or therapeutic target for sepsis patients. The AUC value of NLRC4 was 0.67, which was greater than or equal to 0.65, and it was identified as a sepsis-related key gene (Fig. 8D). To explore the upstream targets of PRGs associated with sepsis, we used the NetworkAnalyst online tool to predict the miRNAs of NLRC4. The StarBase database (https://starbase.sysu.edu.cn/) was employed to predict lncRNAs based on hsa-miR-335-5p and hsa-miR-146a-5p, as well as to construct a ceRNA network with 1 mRNA (NLRC4), 2 miRNAs (hsa-miR-335-5p, hsa-miR-146a-5p) and 6 lncRNAs (MIR29B2CHG, TMEM161B-AS1, KCNQ1OT1, NEAT1, AC016876.2, XIST) (Fig. 8E).

Fig. 8
figure 8

Analysis and screening of PRGs associated with sepsis. A Eight sepsis-related key genes obtained using the LASSO algorithm. B Application of an external dataset to validate the predictive model. C The PRG mostly associated with sepsis was identified by a predictive model, the GeneCards database and prior reviews. D Applying an external dataset to validate the PRG mostly associated with sepsis. E Construction of the ceRNA network around the PRG mostly associated with sepsis

Discussion

Sepsis is currently one of the major global health burdens and the leading cause of death for patients in intensive care units (ICUs) [15]. Therefore, there is an urgent need to find a therapeutic target that can be used as an early diagnostic or effective treatment target to improve diagnostic efficiency and patient prognosis and quality of life. In this study, we first screened 170 genes associated with sepsis by WGCNA and differential expression analysis, which resulted in the identification of 13 genes closely connected to sepsis. The results of functional enrichment analysis suggested that these genes were mainly involved in the regulation of the inflammatory response and the positive regulation of bacterial and fungal defence responses, all of which indicated an association with the pathogenesis and course of sepsis. Therefore, these findings could provide a strong theoretical basis for further related studies of the 13 genes and enhance the validity of the results.

Pyroptosis is a mode of programmed cell death that is distinguished from apoptosis and could be involved in the innate immune response of the body, activation of immune cell phagocytosis and clearance of pathogens [16, 17]. During sepsis pathogenesis, an inappropriate or excessive inflammatory response of the body may cause secondary infection or even organ failure [18]. Correspondingly, excessive pyroptosis could also lead to an uncontrollable inflammatory response, resulting in a poor prognosis [19, 20]. With the improvement of scientific research, a growing number of studies have attempted to elucidate the mutual relationship existing for the pathogenesis of sepsis and pyroptosis. It has been shown that caspase-1 activated by LPS can act on the pannexin-1 and P2X7 signalling pathways to induce scorch production and severe inflammatory responses, and this could be a potential target for the treatment of gram-negative bacterial sepsis [21]. Additional studies have demonstrated that downregulated miR-21 could suppress cystein-1 activation and GSDMD cleavage, acting through protein A20 to regulate the nuclear factor kappa B (NF-kB) pathway, thus serving as an essential positive regulator of pyroptosis and septic shock [22, 23]. Therefore, further exploration of the role played by pyroptosis in the pathogenesis of sepsis may provide novel potential therapeutic targets for sepsis. Machine learning, a well-established technology in the biomedical field, plays an irreplaceable role in improving the efficiency of clinical diagnosis and providing the best treatment options. We applied machine learning algorithms combined with relevant reviews and databases to screen for PRGs associated with sepsis and ultimately identified NLRC4 as a potentially effective therapeutic target for sepsis.

The NOD-like receptor (NLR) family, CARD domain-containing protein 4 (NLRC4), was initially described as a pro-apoptotic protein and demonstrated to detect cytosolic flagellin [24,25,26]. NLRC4, a pivotal component of the inflammasome, is involved in endogenous danger signalling responses to multiple microbial spines and macrophage scorching [27]. Recruitment of the NLRC4 inflammasome may have a substantial effect on gram-negative bacterial infections, especially those associated with Salmonella typhimurium [28]. It has been reported that overexpression of NLRC4 increases macrophage inflammasome activity, leading to infantile small bowel colitis syndrome and recurrent macrophage activation syndrome [29, 30]. In addition, another study found that decreased NLRC4 reduced the inflammatory response; during gram-positive pneumonia, NLRC4 knockdown mice exhibited reduced inflammation and controlled bacteria more effectively than wild-type infected mice [30, 31]. Pyroptosis is a proinflammatory form of regulated cell death dependent on caspase-1 activation [5]. When in recognition of danger or pathogen-associated molecular patterns, the inflammasome initiation sensor (NLRC4) activates caspase-1, which is considered the typical inflammatory vesicle activation mode [32]. In this study, NLRC4 was highly expressed in all patients with sepsis; therefore, it is reasonable to believe that NLRC4 may cause pyroptosis by activating caspase-1 and promoting the inflammatory response, which consequently leads to the development of sepsis. As a result, NLRC4 may be considered a potential therapeutic target of sepsis for further research. At present, some lncRNAs have been demonstrated to act as important regulators in the pathogenesis of sepsis [33]. For example, it was reported that there were significant differences in the expression of lncRNA ENST00000504301.1 and ENST00000452391.1 between sepsis survivors and nonsurvivors [34, 35]. To further explore the impact of a pyroptosis gene (NLRC4) on sepsis at a deeper level, we predicted the upstream targeting factor miRNAs and lncRNAs and constructed a ceRNA network of 6 lncRNAs (MIR29B2CHG, TMEM161B-AS1, KCNQ1OT1, NEAT1, AC016876.2, XIST) and 2 miRNAs (hsa-miR-335-5p, hsa-miR-146a-5p) around NLRC4.

Meanwhile, there are limitations to this study. The prediction results of lncRNAs and miRNAs are in a wide range and require more experimental data and literature to corroborate. Additionally, the PRGs identified in this study that have the potential to be therapeutic targets for sepsis require further literature support and basic experimental validation.

Conclusions

In this study, we identified NLRC4 as a PRG associated with sepsis based on machine learning and constructed a ceRNA network of lncRNAs and miRNAs around NLRC4, which may serve as early molecular biomarkers for therapeutic targets of sepsis. In the future, these molecular markers deserve further study in follow-up and require additional datasets and further experimental validation at the cellular or specimen level.