Introduction

Sepsis, as a syndrome of organ dysfunction induced by a dysregulated response to infection, was one of major causes leading to high mortality and poor clinical outcomes in intensive care unit(ICU) [1, 2]. Studies reported that the short-term and long-term mortality of sepsis varied from 20 to 50% [3,4,5]. Hence, early identifying sepsis patients who had higher risk of poor prognosis was extremely important for physicians so they can do some intervention and timely managements to improve the clinical outcomes [6].

Artificial neural networks (ANN), as a type of machine learning algorithm, have been applied widely for medical researches [7,8,9]. One study with a total of 21,892 cases showed that ANN model had a good performance for predicting 14-day hospital readmission with pneumonia [10]. Another recent research on cancer demonstrated that ANN model was capable of simultaneously predicting the multiple co-occurring symptoms including the risk of pain, psychological disorders and lack of well-being [11]. In the COVID-19 pandemic, scientific researchers in Brazil applied the ANN model to easily make daily and cumulative forecasts for cases and deaths so that government officials and medical agencies could do actions more agilely and reliably [12].

In the present study, we aimed to explore the capability of ANN model in predicting clinical outcomes in sepsis based on the publicly accessible database of Medical Information Mart for Intensive Cart III (MIMIC-III).

Methods

Database and patients

MIMIC-III database is a US-based critical care public database. Clinical and laboratory data associated with 53,423 age ≥ 16 patients from 2001 to 2012 and 7870 neonates from 2001 to 2008 admitted in ICU were documented [13]. The database mainly included charted events such as demographics, vital signs, laboratory tests, vital status, medications, image reports, and clinical outcomes.

All patients with sepsis (ICD9 code: 99,591) in MIMIC-III (version 1.4) were enrolled in this study. Exclusion criteria included as follows: patients with missing > 5% individual data and age less than 18.

Data extraction

From the MIMIC-III database, the following general variables were extracted for the first 24 h after ICU admission: age at the time of hospital admission, gender, admission type, marital status, ethnicity, ICU department, comorbidities (renal disease, coronary artery disease (CAD), diabetes, and hypertension), sequential organ failure assessment (SOFA) score and acute physiology and chronic health evaluation (APACHEII) score. The length of stay (LOS) in ICU and in-hospital mortality were also collected.

Clinical and laboratory variables which were recorded within 24 h after admission were also extracted including systolic blood pressure (SBP), diastolic blood pressure (DBP), heart rate (HR), respiratory rate (RR), white blood cells (WBC), neutrophils, lymphocytes, sodium, chloride, platelet (PLT), red cell volume distribution width (RDW), mean corpusular volume (MCV), hematocrit, glucose, prothrombin time (PT), partial thrombin time (PTT), albumin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin, urea nitrogen, creatinine, lactate, total calcium, and anion gap. NLR is defined as the ratio of neutrophils to lymphocytes. Multiple multivariable imputations were utilized for addressing missing data to maximize statistical power and minimize bias.

Statistical analysis

Descriptive statistics included as follow: proportions and frequencies were used for categorical variables, while medians, mean (SD), and interquartile ranges (IQRs) were used for continuous variables. Chi-squared test or Mann–Whitney U test were utilized for the comparison between the survivor group and the nonsurvivor group.

First, univariable analysis was applied for identifying variables which were significantly different between the two groups. Then, those variables were enrolled to construct the predictive model by multivariable logistic regression. At last, the receiver-operator characteristic (ROC) analysis for predicting 30-day mortality was performed and the area under the curve (AUC) estimates were calculated. The analyses of accuracy, sensitivity, and specificity were also done for evaluating the predictive performance of different models. The best threshold values of variables were confirmed by the Youden Index (sensitivity+specificity-1). The value of each variable with the maximum Youden Index was the best threshold value.

Statistical analysis was performed by using SPSS software (version 26). A p value of < 0.05 was considered as statistically significant.

ANN model

For our ANN model, a multilayer perception with back propagation algorithm was the applied architecture [14, 15]. The basic structure of ANN had three layers including the input layer, the hidden layer and the output layer (Fig. 2). The variables which showed significant differences between the survivor group and nonsurvivor group by using univariate analysis were enrolled in the input layer. In Fig. 2, our ANN was composed with 1 input layer consisting of 12 nodes, 1 hidden layer consisting of 6 nodes, and 1 output layer consisting of 2 nodes.

The study population was categorized into the training set (n = 1689) and the validation set (n = 1176) was based on the ratio of 6:4 by simple randomization using R software function of set.seed (), respectively. We applied an oversampling algorithm method to deal with the imbalance between training set and validation set [16]. The training set was utilized to construct models and the validation set was used to test the predictive performance of the models (Table 2). The predictive performance of ANN was analyzed by averaging the 30-day mortality from the fivefold cross-validation [11]. In addition, the average accuracy, sensitivity, and specificity were calculated. The predictive performances of ANN, logistic regression, APACHEII, and SOFA scores were compared for training set and validation set were compared. ANN model was performed with PyTorch (version1.2.0).

Results

General characteristics of sepsis in MIMIC-III

At first, a total of 5403 patients with sepsis were enrolled. Based on the exclusion criteria, 2874 patients were included in our study (Fig. 1). The 30-day mortality was 29.8%. The median age of the cohort was 67, and males accounted for 55.7% in total. Among marital status, the proportions of divorced, married, single and widow individuals were 6.8%, 44.5%, 28.4%, and 15.4%, respectively. Most of the patients were white (72.7%). 96% of patients were admitted in emergency and more than a half were transferred in MICU (65.9%). Among comorbidities, the proportions of renal disease, CAD, diabetes and hypertension were 8.4%, 15.9%, 5.4%, and 37.9%, respectively. The median scores of SOFA and APACHE in the cohort were 2 and 14, respectively. The median days of LOS in ICU and hospital were 3 and 8, respectively (Table 1).

Fig. 1
figure 1

Flow chart for patients enrollment and study design

Table 1 General characteristics of sepsis in MIMIC-III

Baseline characteristics of training and validation tests

Table 2 demonstrated the general characteristics of training and validation. Except for diabetes (P = 0.021), there was no significant difference in other variables including age (P = 0.213), gender (P = 0.994), DBP (P = 0.310), SBP (P = 0.763), HR (P = 0.122), RR (P = 0.148), renal disease (P = 0.930), CAD(P = 0.542), hypertension (P = 0.774), PLT (P = 0.849), AST (P = 0.303), sodium(P = 0.931), glucose (P = 0.194), chloride (P = 0.510), MCV (P = 0.096), ALT(P = 0.420), neutrophils (P = 0.144), urea nitrogen(P = 0.617), PTT(P = 0.886), hematocrit (P = 0.355), PT (P = 0.949), anion gap (P = 0.070), RDW (P = 0.612), lymphocytes (P = 0.063), WBC (P = 0.089), NLR (P = 0.088), total calcium (P = 0.381), lactate (P = 0.790), albumin (P = 0.169), creatinine (P = 0.893), total bilirubin (P = 0.743), APACHEII (P = 0.581), SOFA (P = 0.671), LOS in hospital (P = 0.386) and 30-day mortality (P = 0.153).

Table 2 Baseline characteristics of training and validation sets

Multivariable logistic regression analysis

In Table 3, significant differences were showed in variables including age (P < 0.001), AST (P < 0.001), MCV (P = 0.001), ALT (P < 0.001), urea nitrogen (P < 0.001), PTT (P < 0.001), PT (P < 0.001), RDW (P < 0.001), lactate (P < 0.001), albumin (P < 0.001) and total bilirubin (P < 0.001) between two groups in the training set.

Table 3 Comparison of variables between survivor and nonsurvivor groups in training set

11 variables were enrolled in multivariable logistic regression analysis and 9 variables were identified as independent factors associated with 30-day mortality (Table 4): age(odds ratio (OR) 1.030,95% CI 1.020–1.039), AST(OR 1.000, 95% CI 1.000–1.001), urea nitrogen(OR 1.008,95% CI 1.004–1.013), RDW(OR 1.161, 95% CI 1.098–1.227), lactate(OR = 1.189, 95% CI 1.115–1.268), albumin(OR 0.581, 95% CI 0.447–0.708), total bilirubin(OR 1.059, 95% CI 1.029–1.091), PT(OR 1.031, 95% CI 1. 010–1.052) and PLT(OR 0.999, 95% CI 0.998–1.000).

Table 4 Multivariate logistic regression analysis of variables associated with 30-day mortality

ANN model development

The main structures of artificial neural networks were illuminated in Fig. 2. 11 variables including age, AST, MCV, ALT, urea nitrogen, PTT, PT, RDW, lactate, albumin and total bilirubin which showed significant differences between two groups were selected for the input layer. The output layer was 30-day hospital mortality. In Fig. 3, normalized importance of all 11 variables were demonstrated. The top four significant variables were albumin (100.00%), PT (85.73%), RDW (82.81%), and lactate (76.75%).

Fig. 2
figure 2

The main structures of artificial neural networks. RDW red cell volume distribution width, PT prothrombin time, PTT partial thrombin time, ALT alanine aminotransferase, AST aspartate aminotransferase, MCV mean corpusular volume

Fig. 3
figure 3

The normalized importance of 11 variables for predicting 30-day mortality by artificial neural networks. RDW red cell volume distribution width, PT prothrombin time, PTT partial thrombin time, ALT alanine aminotransferase, AST aspartate aminotransferase, MCV mean corpusular volume

Predictive performance of different models in Training set and Validation set In Table 5, predictive performance of ANN, logistic regression, APAHCEII and SOFA scores for training set and validation set were demonstrated. In training set, the accuracies of the four models were 0.866, 0.711, 0.615, and 0.574, respectively (P < 0.001). The sensitivities were 0.850, 0.662, 0.569, and 0.619, respectively (P < 0.001). The specificities were 0.410, 0.337, 0.367 and 0.413, respectively (P = 0.029). The area under the ROC curve (AUC) of ANN, LR, APACHEII and SOFA scores were 0.873, 0.720, 0.629 and 0.619, respectively (P < 0.001). In validation set, the accuracies of the four models were 0.735, 0.722, 0.401, and 0.609, respectively (P = 0.272). The sensitivities were 0.624, 0.604, 0.333, and 0.416, respectively (P = 0.197). The specificities were 0.772, 0.744, 0.841, and 0.788, respectively (P = 0.095). The AUCs of ANN, LR, APACHEII, and SOFA scores were 0.811, 0.752, 0.607, and 0.628, respectively (P = 0.002).

Table 5 Predictive performances of different models in training set and validation set

Comparison of the predictive performances in different models Figure 4 showed the ROCs of ANN, LR, APACHEII, and SOFA scores for training set (A) and validation set (B), which showed that the ANN model had the highest ROCs in both training set and validation set. In Table 6, AUCs of ANN, LR, APACHEII and SOFA scores between training set and validation set were compared. ANN model showed the significant difference (P < 0.001), while no significant difference was found in logistic regression (P = 0.067), APACHEII score (P = 0.174) and SOFA score (P = 0.350).

Fig. 4
figure 4

The receiver operating characteristic curves of ANN, LR, SOFA, APACHEII in predicting 30-day mortality in sepsis. 4A: Training set; 4B: Validation set. ANN artificial neural networks, SOFA sequential organ failure assessment, APACHE acute physiology and chronic health evaluation, LR logistic regression

Table 6 Comparison of predictive performance between training set and validation set

Discussion

In our study, an ANN model for predicting 30-day mortality in sepsis was performed. To our best knowledge, it was the first study for investigating the performance of ANN model in predicting short-term outcomes in sepsis based on MIMIC-III database.

Compared to LR model, ANN was good at dealing with nonlinear correlation in different analyses and also had a superiority in analysis of variables with sophisticated correlations [17]. One Korean study clarified that a total of 1260 bacteremia episodes were identified in 13,402 patients and ANN model had a better performance in early detection of bacteremia, with an AUC of 0.729 and a sensitivity of 0.810 [18]. Another study concluded that when ANN model was applied to the prediction of individual episodes of apnea and hypopnea in people with obstructive sleep apnea syndrome, it had both good specificity and sensitivity [19]. Our study showed that ANN model with an AUC of 0.811 was significantly superior to compared to LR, SOFA score and APACHEII score.

Four most important variables including albumin, PT, RDW, and lactate were identified in our ANN model. Accumulating evidence demonstrated those four variables were associated with clinical outcomes in sepsis [20,21,22].

Albumin, as the main protein which can balance capillary membrane permeability and plasma osmotic pressure, was identified to be associated with occurrence and clinical outcomes in sepsis [23]. One study clarified that low serum albumin levels (< 29.2 g/L) was an independent risk factor for 28-day mortality in sepsis [24]. Furthermore, the daily changes of albumin were significantly linked with mortality during the ICU stay in sepsis patients [25]. Another retrospective study concluded that in sepsis, the probability of survival decreased by 63.4% when serum albumin was ≤ 2.45 g/dl on admission, and by 76.4% when the lowest serum albumin during hospitalization was ≤ 1.45 g/dl [26].

Previous research illuminated that coagulation function on ICU admission was associated with mortality in sepsis [21]. In septic shock, survival curve analysis demonstrated a higher of PT/INR (> 0.16) had significantly higher risk in 28-day mortality compared with a lower level (< 0.16) [27]. One recent COVID-19 study found that non-survivors with sepsis had higher level of PT and APTT [28]. In sepsis, due to infection and activated innate immune system, coagulation will be activated, leading to sepsis associated coagulopathy with over-consumption of coagulation factors [29].

RDW, as a parameter for evaluating in the size of circulating red blood cells, was to be identified as a predictive indicator in different disorders [30,31,32,33]. A sepsis study with a total of 566 patients with overall mortality of 29% demonstrated that higher RDW was independently associated with 28-day mortality [34]. Another study investigated the association between RDW and in-hospital mortality in sepsis and found that RDW had good predictive performance with the AUC of 0.867 [35]. In a study on sepsis-induced acute respiratory distress syndrome, cox regression model showed that RDW was also an independent prognostic marker [36].

Lactate was reported as a predictor for the risk of death in all patients with or without sepsis [37]. Hyperlactatemia was more frequent in septic shock and was associated with a lower survival rate [38]. A prospectively research with a cohort of 1233 adults in UK showed that a lactate ≥ 2 mmol/L was associated with an increase in mortality and identified patients with suspicion of sepsis who had the highest risk of in-hospital mortality [39]. Lactate showed the similar prognostic accuracy for mortality in adults with sepsis compared to that of SOFA [4]. The current research proved that in polymicrobial sepsis, lactate could promote macrophage high mobility group box-1(HMGB1) lactylation/acetylation and release exosome, leading to disrupted endothelium integrity and increased vascular permeability [40].

In our study, we performed a predictive model for 30-day mortality in sepsis using ANN. Our predictive model can be beneficial for the early detection of patients with higher risk of poor prognosis. When those patients with higher risk of mortality are identified, physicians can do some intervention and timely managements in order to improve the clinical outcomes. Although the predictive model couldn’t help guide ICU management, it may be more relevant to target short-term outcomes including respiratory failure or vasopressor initiation within 48 h which could impact disposition decisions.

Some limitations should be stated in our study. First, the MIMIC-III public database included data before 2012, while the new definition of Sepsis-3.0 was published in 2016. Differences in the definition of sepsis in different phrases should be considered when applying our ANN model. Second, due to a high percentage of missing values in MIMIC-III, not all the variables which may affect the clinical outcomes in sepsis were included and analyzed. Some variables including the percentage of patients that received antibiotics, and the timing of such were not analyzed, which may confound the outcome of 30-day mortality. Third, the ANN model was applied to perform this study. Whether other prediction models of machine learning have better predictive performance than the ANN model should be further investigated. Fourth, our study constructed a predictive ANN model for 30-day mortality in sepsis. The primary outcome was 30-day mortality and patients with out-of-hospital mortality within 30 days might be missed. Fifth, we only investigated the 30-day mortality as the main outcome in the study. Other outcomes including complications and long-term prognosis were not investigated. In the future, further studies including more samples and longer follow-up should be conducted to help explore how to improve the clinical outcomes in sepsis.

Conclusion

In our study, an ANN model for predicting 30-day mortality in sepsis was performed. The predictive model can be beneficial for the early detection of patients with higher risk of poor prognosis.