Introduction

Acute respiratory distress syndrome (ARDS) is a type of respiratory failure characterized by rapid and widespread inflammation of lungs, accompanied by hypoxemia, reduced lung compliance, and chest imaging examination showing bilateral alveolar opacity [1]. Globally, there are more than 3 million ARDS patients each year, and it accounts for 10% of the patients admitted to intensive care units (ICU) [2]. While, the overall prognosis of ARDS is poor, with a mortality of approximately 40% [3]. Furthermore, the survivors are usually accompanied with adverse sequelae, such as exercise limitation, physical and cognitive impairment [4].

Sepsis is the most common trigger of ARDS and the highest cause of ARDS mortality [5]. Clinical research has shown that ARDS related to sepsis has a worse recovery a higher overall disease severity and a higher mortality rate than non-sepsis-related ARDS [6]. As we know, sepsis is the body’s extreme response to an infection. Once the immune response of the body to infection is dysregulated, resulting in the inability to clear the infection, sepsis will develop through pro-inflammatory immune mechanisms. The latest definition from the NIH NHLBI panel states that sepsis is a severe endothelial dysfunction caused by both intravascular and extravascular infections, resulting in damage to the microcirculation [7]. The severe inflammatory response caused by sepsis can lead to changes in the permeability of lung epithelial cells and capillary endothelial cells. The influx and apoptosis of alveolar macrophages and neutrophils eventually lead to diffuse alveolar injury and severe hypoxia, which are the clinical features of ARDS [8]. In addition, the clinical study of Michelle Ng Gong, et al. showed that pneumonia-induced severe sepsis is more likely to develop ARDS than those with extrapulmonary sources of infections [9]. However, ARDS is a highly heterogeneous syndrome. The plasma molecular alterations of ARDS resulted from various causes are different [10]. And not all sepsis patients develop ARDS. The current treatment of ARDS is not significantly different from that of patients with sepsis, of which mechanical ventilation remains the preferred life-saving strategy. It cannot identify or predict the progression of ARDS in patients with sepsis, and cannot reduce the mortality of patients [11]. Therefore, the development of early diagnostic biomarkers and a specific treatment for sepsis-induced ARDS are essential.

High throughput gene analysis was a powerful tool to reveal the key pathways and genes of diseases. In recent years, it has also been applied to the research of ARDS. A Genome-wide association studies pointed out several candidate genes were related to the development of ARDS, including the interleukin 6 (IL6), interleukin 10 (IL10), interleukin 1 receptor antagonist (IL1RN), vascular endothelial growth factor A (VEGFA; also known as VEGF), angiotensin-converting enzyme (ACE), soluble mannose-binding lectin 2 (MBL2) and visfatin (NAMPT) [12]. Acosta-Herrera et al. found the correlation between VEGF signaling, neuron projection morphogenesis and ARDS by using the lung tissue of animal model of sepsis [13]. Wang et al. compared polymorphonuclear neutrophil (PMN) transcriptome alterations in sepsis patients and ARDS patients, and proposed that GAPDH, MAPK8, PIK3CB and MMP9 may play an important roles in the progression of ARDS [14]. These results not only helped us to further understand the mechanism of sepsis induced ARDS, but also proved that the analysis of potential ARDS related genes and pathways based on gene expression characteristics may be a breakthrough to further understand the genetic mechanism of ARDS.

Recently, with the development of analytical techniques in systems biology, gene network analysis has been widely used in disease-related high-throughput omics studies [15]. Gene network analysis can catalog, integrate and quantify the molecular interactions at the genomic scale, and identify key network features associated with disease processes, which provided an excellent complement to the traditional single-gene approach to research [16, 17].

In this study, we conducted an integrated analysis on expression level and network level of the whole blood microarray profiles of pneumonia-induced sepsis patients, sepsis-induced ARDS patients and healthy controls (Fig. 1). The network approach was adopted to identify the key genes and biological processes closely related to the development of ARDS and predict the possible upstream regulatory factors. Our results showed the panel composed of these genes was a potential biomarker of sepsis induced ARDS, which may be helpful to better understand the occurrence and development of ARDS.

Fig. 1
figure 1

Workflow of this study. A discovery dataset and a validation dataset were downloaded from the GEO database. The integrated analysis of gene expression and gene network was performed on the discovery dataset to identify hub genes related to sepsis-induced ARDS. The identified hub genes were performed ROC analysis in the validation set to demonstrate their efficacy

Materials and methods

Microarray data acquisition

The Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) is a free global public database storing genomics and transcriptomics data, including high-throughput sequencing and microarray expression files. We searched the GEO database for ARDS-related studies and found two datasets that met our requirements, including samples of healthy controls, pneumonia-induced sepsis alone, and ARDS developed from pneumonia-induced sepsis. The larger sample size dataset (GSE32707) was used as the discovery set, and another dataset (GSE66890) was used as the validation set. Institutional Review Board approval was not required, because our study was based on a public database and did not involve in animal or human samples.

Dataset GSE32707 was submitted by Dolinay et al. and approved by the Partners Human Research Committee [18]. The dataset contains 123 whole blood samples, including 58 patients with sepsis alone, 31 patients with sepsis-induced ARDS, and 34 healthy controls (Table S1). The detailed diagnostic criteria and demographic information can be found in the original manuscript [18].

The validation dataset GSE66890 was submitted by Kangelaris et al. and approved by the University of California, San Francisco Institutional Review Board [19]. The dataset included 28 patients with sepsis alone and 29 sepsis patients with ARDS (Table S1). The normalized data was available and downloaded directly. The information of patients, the collection of blood samples and the process of the generation of microarray profile were described in detail in the published manuscript [15].

Data pre-processing

The raw data was corrected background and quantile normalization using limma package of R (ver 4.0.3) [20]. Outlier samples were detected by calculating standardized sample network connectivity Z-scores, and samples with Z-score < -2.5 were removed [21]. Then we clustered samples using hclust tool and removed samples with the farthest distance from other samples. BiomaRt package was used to transformed Illumina probes to gene symbols. Only protein-coding genes were kept in our study. CollapseRows function was used to combine multiple probes annotated to the same gene symbol.

Differential expression analysis

In the present study, we used limma package of R to identify differential expressed genes (DEGs) between sepsis-alone group, sepsis-induced ARDS group and healthy control group [22]. Benjamini-Hochberg (BH) method was used to estimate false discovery rate (FDR) [23]. Adjusted p-value less than 0.05 was used as the threshold of significance.

Functional and pathway enrichment analysis

Gene Ontology (GO) annotation and enrichment analyses were performed using Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis software (IPA, http://www.ingenuity.com). These two approaches represent two different philosophies on the alteration of gene function. GSEA allows the use of all genes to investigate the alterations of biological functions caused by disease, enabling us to observe which biological functions tend to be up-regulated and which to be down-regulated [24]. In contrast, the canonical pathway analysis used by IPA prefers to know which pathways are primarily involved in DEGs, and thus the same pathway may contain both up- and down-regulated genes, given that these genes may have activated or repressive interactions with each other. We used both methods and potentially got complementary results that provide more accurate information. We used WebGestaltR package of R to perform GSEA based on no redundant GO biological process databases (1,000 permutations). The result was considered as significant which absolute value of normalized enrichment score (|NES|) more than 1.5 and FDR less than 0.25 [25]. Subsequently, we used IPA to implement over-represent analysis of canonical pathways for DEGs of multiple comparisons. One-sided fisher’s exact p-values were calculated to filter significance (p < 0.05).

Weighted gene co-expression network analysis (WGCNA)

Gene co-expression network is a widely used approach to explore the correlation relationship structure of gene cooperative alterations in disease status. In this study, we used WGCNA package of R to identify co-expression clusters based on all genes [26, 27]. Briefly, we calculated the gene correlation matrix and converted it to an adjacency matrix. Next, a signed weighted correlation network was constructed based on a fit to scale-free topology. Dynamic tree cut method was used to detect co-expressed gene clusters, called modules. The detailed parameters were used as follows: networkType = “signed”, corFnc = “cor”, TOMType = “signed”, TOMDenom = “mean”, mergeCutHeight = 0.25, deepSplit = 4, minModuleSize = 30. Each module was labeled by an independent color, and the genes labeled by gray did not belong to any co-expressed gene modules. We identified the key modules related to ARDS based on module-group correlation, the gene-group correlation within the module, and the degree of enrichment of DEGs to the modules. The correlation of module-group was calculated based on module eigengene (ME), which was the first principal component of the module representing the expression of the modules. The module enrichment of DEGs was performed by one-sided fisher’s exact test, and the FDR less than 0.05 adjusted by BH method was used as the threshold of significance. Functional analysis for the key module associated with ARDS was conducted by IPA software based on canonical pathway database.

Protein-protein interaction (PPI) network analysis and ROC analysis

STRING (https://www.string.org), a web-based database was used to construct the protein-protein interaction network for genes of key modules. The Cytoscape software (ver 3.8.0, https://www.cytoscape.org) was employed for visualizing the PPI network. The CytoHubba plugin (ver 0.1) provided degrees of each node in the PPI network and the top 10 genes were considered as hub genes [28]. Another plugin iRegulon (ver 1.3) was applied to predict the potential upstream regulating factors (URFs) of hub genes, such as transcription factors (TFs) [29]. We analyzed the correlation between the URFs with the highest NES-score and hub genes to establish a regulatory gene panel with significant correlation structure. Logistic regression and receiver operating curve (ROC) analysis were performed to obtain the diagnostic value of this gene panel for sepsis or sepsis-induced ARDS.

Results

Data processing of microarray

According to the data pre-processing process described above, we removed 19 outlier samples (Fig. S1). The 47,220 Illumina probes detected in the raw data were annotated to 18,066 protein-coding genes for our subsequent analysis.

Identification of DEGs

In total, we identified 439 DEGs (BH-adjusted p < 0.05, Fig. 2A, Table S2). Among them, 180 DEGs were identified between sepsis-alone group and control group, 150 DEGs were identified between sepsis-induced ARDS group and control group, and 162 genes were differential expressed between sepsis-alone group and sepsis-induced ARDS group (Fig. 2B). Although ARDS was developed from sepsis, there were few DEGs shared with the sepsis-alone group (nearly 70% of their DEGs were unique). The unsupervised hierarchical cluster heatmap displayed the expression changes of these genes in three groups (Fig. 2C). These results suggested that the expression levels of many genes were disrupted during the development of sepsis into ARDS. It also prompted us that there were some key molecules can serve as the biomarkers for prediction, prevention and treatment.

Fig. 2
figure 2

Differential expressed genes analysis (DEGs). (A) Volcano plots displays the DEGs in three comparisons. Red points represent DEGs, and gray points represent no significance. (B) Venn plots showed the shared DEGs and unique DEGs of multiple comparisons. (C) Heatmap showed the change of expression in sepsis-alone group, sepsis-induced ARDS group and control group. Red represents up-regulation and blue represents down-regulation

Functional and pathway enrichment analysis for DEGs

The enrichment analyses of biological functions and pathways were performed using GSEA and IPA software. Consistent with the result of DEGs, both sepsis-alone group and sepsis-induced ARDS group has independent changes of biological functions or pathway compared with controls (Fig. 3A). A total of 54 GO terms and 90 canonical pathways were significantly enriched based on GSEA and IPA software, respectively (Table S2). We observed that the sepsis-alone group and the ARDS group shared 7 enriched GO functions and pathways, and showed concordant regulatory direction, such as neutrophil mediated immunity, NADH dehydrogenase complex assembly and dopaminergic related pathways (Fig. 3A, Fig S2). The unique dysfunctions of sepsis-alone group mainly included mitochondrial energy metabolism processes, such as oxidative phosphorylation, fatty acid metabolism and some neural pathways. The unique altered functions of ARDS group were mainly involved in cell cycle and apoptosis related functions (Fig. 3B-C).

Fig. 3
figure 3

Function and pathway enrichment analysis. (A) Venn plots shows the overlap of GO functions and pathways that are significantly enriched in ARDS group and sepsis-alone group. (B) The bubble chart shows the result of gene set enrichment analysis (GSEA). The size of points represents the absolute value of normalized enrichment score (|NES|) and the color intensity of each point represents the significance. (C) Heat map exhibits the canonical pathways that significantly enriched by DEGs. The color intensity of each grid was scaled by -log10(p-value). The asterisk is used to indicate significance. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001

Gene co-expression network analysis

Subsequently, we constructed a signed gene co-expression network for all genes to explore the changes of gene correlation relationships related to ARDS. The detailed parameters were described in Method section. The power of 20 was used to make the network up to scale free fit (Fig. S3A). A total of 49 co-expressed gene modules were detected based on the power estimation of 20 and the size of modules was range from 37 to 1,375 (Fig. 4A, Table S3). In order to identify the modules related to ARDS, we calculated the module-group relationships based on pearson correlation analysis and obtained 13 significant modules (Fig. 4B). Then, we calculated the gene-group relationships in each module (Fig. 4C). The darkgrey module owed the largest correlation coefficient absolute value and module gene significance, thus it was considered as the key module. (Table S3, Fig. 4D). In the functional analysis of the darkgrey module, we found 57 significantly enriched pathways, including numbers of immune/inflammation-related signaling pathways, such as Antigen Presentation Pathway, B Cell Development, Th1/Th2 Pathway, IL-4 Signaling, as well as neuroinflammatory signaling pathways, and fatty acid metabolism pathways (Fig. 5A, Table S3).

Fig. 4
figure 4

Gene co-expression network analysis. (A) Heatmap plot of gene network. The heatmap depicts the topological overlap matrix (TOM) among all genes. Light color intensity represents low overlap and progressively darker red color represents higher overlap. Block of darker colors along the diagonal are the modules. (B) Heatmap quantifies module-group associations. Rows are labeled by names and colors of modules. The text of each row indicates the correlation coefficients and p-values of the correlation analysis between each module eigengene (ME) and groups. Red means positive correlation and blue means negative correlation. (C) Average significance of genes in each module. (D) The bubble plot shows the modules which are significantly enriched by DEGs. One-sided fisher’s exact test was performed, and the Benjamini-Hochberg method was used to adjust FDR. The size of each point represents the number of DEGs in each module and the color intensity represents the significance

Fig. 5
figure 5

Function analysis and protein-protein interaction (PPI) network of darkgrey module. (A) Significantly enriched canonical pathways for darkgrey module. (B) PPI network of genes in darkgrey module. (C) Boxplots showed the expression of hub genes in three groups

PPI network and ROC analysis

We constructed PPI network based on the 171 genes in darkgrey module by STRING database (Fig. 5B). The degree of each node was calculated, and the top 10 genes were considered as hub genes (Fig. 5B, Table S4). Four genes of them were differential expressed among sepsis-alone group, sepsis-induced ARDS group and controls, which were CSF1R, HLA-DRA, IRF8 and MPEG1. The plugin iRegulon identified 14 potential URFs upstream of these four key genes based on the largest NES (Table S4), of which 13 URFs were detected by the microarray profiling. MZF1, EOMES and MGA showed significant positive correlation with the hub genes, as well as LTF, TBX18, TBX5 and TBX6 showed significant negative correlation with hub genes (Fig. 6A). Among these potential URFs, EOMES (pARDS−Sepsis=0.002) and LTF (pSepsis−Control=0.018, pARDS−Control=0.013) showed a trend of expression differences among groups (Fig. 6B, Fig. S4). Therefore, we took EOMES, LTF, CSF1R, HLA-DRA, IRF8 and MPEG1 as a united diagnostic panel and assessed their diagnostic efficiency by ROC analysis (Fig. 6C). United gene panel had excellent diagnostic ability in both ARDS and sepsis (AUCARDS=0.914, AUCSepsis=0.900), and can well diagnose ARDS developed by sepsis (AUCARDS−Sepsis=0.746). An independent dataset confirmed (GSE66890) the united gene panel had a potential diagnosis efficiency, with an AUC of 0.769 (Fig. 6C). These results suggested the gene biomarker panel was reliable and robust, which can be used for the diagnosis of sepsis-induced ARDS.

Fig. 6
figure 6

URFs and ROC analysis. (A) Correlation analysis between URFs and hub genes. Pearson correlation, p < 0.05. (B) Expression of URFs. *p < 0.05, **p < 0.01. (C) ROC curve analysis of hub genes and URFs. Left panel showed the ROC analysis between disease group and control group. Right panel showed the ROC analysis between sepsis patients and ARDS patients

Discussion

In this study, we performed an integrated analysis of gene expression and gene network levels on microarray expression profiles of sepsis-induced ARDS patients, sepsis patients and healthy controls. In detail, we compared the DEGs between these three groups and explored their functional pathways. Furthermore, we screened the hub genes for sepsis-alone patients and sepsis-induced ARDS patients, and we found that the panel composed of these hub genes featured a good diagnostic efficacy.

Several previous studies have explored risk factors for sepsis-induced ARDS, including pneumonia infection [30], and blood endocan levels [31]. However, few studies have systematically compared sepsis-alone and sepsis-induced ARDS at genetic level. In addition, objective biomarkers for sepsis-induced ARDS are lacking.

In this study, we found that both the biological pathways of GSEA analysis (based on the continuous gene expression features) and IPA analysis (based on the differential gene expression features) suggested that neutrophil-mediated inflammatory response and mitochondrial dysfunction are the major characteristics of ARDS caused by sepsis. As we all know, ARDS was an acute inflammatory disease [8, 32], and neutrophils were considered to be an important component of the inflammatory microenvironment in ARDS [33]. Neutrophils were activated by dual feedback from exogenous and endogenous inflammatory stimuli after lung injury [34]. These activated neutrophils will release cytotoxic substances, such as reactive oxygen species (ROS), telomerase and various pro-inflammatory factors, which will further aggravate the inflammation [35]. In addition, Nguyen et al. and Teixeira et al. showed that neutrophils will promote the development of ARDS by assembling and activating NADH oxidase complexes to produce ROS [36, 37]. This was consistent with our findings that a mitochondrial function-dependent NADH dehydrogenase complex process in ARDS developed from sepsis (Fig. 3B-C). Mitochondria and several ATP-producing genes were the main sources of ROS products, which performed well in predicting the survival rate of ARDS patients [38]. Compared to the sepsis-alone group, we also found that the sepsis-induced ARDS group had some unique dysregulation functions related to cell fate, such as apoptosis signal. Apoptosis of lung endothelial cells (ECs) was one of the main pathological characteristics of ARDS [39]. Several studies have shown elevated levels of ATP or adenosine can promote endothelial cell apoptosis through multiple signaling pathways [40, 41]. Extracellular supplementation of ATP or adenosine can reduce Ras methylation and Ras GTPase activity by inhibiting isoprenylcysteine-O-carboxyl methyltransferase (ICMT), which in turn inhibits the activation of downstream signaling of molecules including Akt, ERK-1 and ERK-2 to induce apoptosis of ECs [40].

Furthermore, co-expression network analysis and PPI network helped us identified four hub DEGs were identified. Among these hub genes, colony stimulating factor 1 receptor (CSF1R), a cytokine which controls the production, differentiation, and function of macrophages, was significantly up-regulated in sepsis-alone group and sepsis-induced ARDS group. Previous evidence showed excessive recruitment and activation of macrophages from the blood, as well as resident alveolar macrophages (AM), may be key factors in the development of ARDS [42,43,44,45,46]. Macrophages can be activated through the classical JAK/STAT1 pathway by binding interferon-γ (IFN-γ) to cell surface receptors [47,48,49]. The down-regulation of interferon regulatory factor 8 (IRF8) in sepsis patients and ARDS patients promoted inflammatory and infection and activated macrophages through IFN-γ (Fig. 5C) [50]. The decreasing of macrophage-specific marker (MPEG1) and major histocompatibility complex, class II, DR alpha (HLA-DRA) in sepsis patients and ARDS patients were consistent with these findings (Fig. 5C) [51, 52].

Moreover, we also identified two possible upstream URFs of the hub genes, which were significant increased in sepsis patients and ARDS patients (Fig. 5). Lactotransferrin (LTF) has been shown to be a major innate immune responder and played an important role in controlling of the development of acute septic inflammation [53,54,55]. Although LTF was not a transcription factor, it had a serine protease activity, which can cut arginine-rich regions in a variety of microbial virulence proteins. This function contributed to the regulation of antimicrobial activity [56]. Neutrophils can directly produce LTF, and the release of LTF played a pivotal role in the development and resolution of inflammation [57]. Previous studies showed eomesodermin (EOMES) can promote CD8 T cells producing IFN-γ and their cytotoxicity [58]. In CD4 T cells, EOMES can either induce the production of IFN-γ by Th1 cells or promote Tr1 cells by driving IL-10 production [59, 60]. The regulation of EMOES on T cells and products was consistent with the results of our modular pathway enrichment analysis. Combined with these six hub genes, we found the diagnostic panel was highly efficient in distinguishing the healthy controls, sepsis patients and sepsis-induced ARDS patients.

Conclusions

In current study, we performed an integrated analysis based on gene expression and gene network and identified key regulators in the development of sepsis to ARDS. A six-gene panel including EOMES, LTF, CSF1R, HLA-DRA, IRF8 and MPEG1 was discovered and validated with a high accuracy both in sepsis subjects and sepsis-induced ARDS subjects. Our findings provide meaningful biomarkers for the diagnosis, and clues for the pathogenic mechanism of sepsis and ARDS.

Limitations

There were also some limitations in this study. Firstly, this study was limited by the search results of public databases and sample sizes. Secondly, our study was based on bioinformatics methods and screened out the marker genes of sepsis-induced ARDS with high diagnostic efficiency. The diagnostic results of these genes need to be verified in a larger datasets. Thirdly, the lack of in vivo and in vitro proof. For our next work, we will collect a large-scaled clinical samples from multi centers to confirm the stability of the predictive power of these markers, and confirm their therapeutic potential through in vivo and in vitro experiments.