Introduction

Hepatic encephalopathy (HE), one of the most common complications of liver cirrhosis, is defined as a brain dysfunction [1]. Significant HE occurs in approximately 30–40% of patients with cirrhosis due to hepatocellular dysfunction and portosystemic shunt [2]. Typical clinical manifestations of HE include various types of neuropsychiatric disorders, such as progressive disorientation, sleep disorders, inappropriate behavior, somnolence, coma, asterixis, hypertonia, hyperreflexia, and extrapyramidal dysfunction [3, 4]. It is worth noting that HE is a serious complication of cirrhosis associated with significant mortality and heavy financial burdens [5]. To be specific, the costs associated with HE attain 11.6 billion$, and outweigh other decompensating events in liver cirrhosis [6]. Though HE patients have improved outcomes over the past decade, several studies have indicated that the prognosis and quality of life still remain poor [7]. Specifically, HE typically heralds hepatic decompensation, and its development is usually associated with high morbidity, implying the need for liver transplantation [8,9,10]. Accordingly, it is critical to determine an easily accessible and simple model to estimate the risk of 28-day mortality in patients with HE.

Many existing scoring systems have been used to evaluate the prognosis of liver cirrhosis patients, but none of them are targeted at HE patients. Models for End-Stage Liver Disease (MELD) scores have been widely used as predictive tools for liver disease severity [11]. Originating from the MELD algorithm, Biggins et al. proposed a new score, the Model for End-Stage Liver Disease-Sodium (MELD-Na) model, which has a more accurate predictive ability than MELD [12].

Machine learning (ML), as part of artificial intelligence, was not limited by the state of data distribution and can handle complex relationships as well as high-dimensional data [13,14,15]. Consequently, the study aimed to 1) identify the significant prognostic factors for HE patients from a large database, and then to construct and validate a model that predicts 28-day mortality, 2) to compare prognostic performance of this model with those of the MELD and MELD-Na scores.

Methods

Data source

Data of this retrospective cohort study were obtained from one sizeable critical care database: the Medical Information Mart for Intensive Care (MIMIC)-IV version 2.0 [16]. As a large, single-center, freely available database, the MIMIC-IV database, has comprehensive, high-quality data of patients admitted to the intensive care units (ICUs) at the Beth Israel Deaconess Medical Center in Boston, Massachusetts, between 2008 and 2019. This study was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA, USA) and Massachusetts Institute of Technology (Cambridge, MA). Individuals who have passed the collaborative institutional training initiative examination can have an access to these databases. We completed the online course and obtained access to the database (certification number: 48120484). The study was reported according to the REporting of studies Conducted using Observational Routinely Collected Health Data (RECORD) statement [17].

Participant selection

In this paper, we used the International Classification of Diseases (ICD)-9 code “5722” to identify the disease “Hepatic encephalopathy”. Indeed, all the patients had underlying liver cirrhosis that led to hepatic encephalopathy. We also tried to use ICD-9 code to identify the causes of HE. Initially, a total of 1940 HE patients were extracted from databases in this retrospective study. The exclusion criteria were (1) multiple ICU admissions, (2) age < 18 years (3) ICU stay < 24 h. Since all protected health information was de-identified, the requirement for individual patient consent was waived.

Predictors of HE

Candidate predictors extracted from MIMIC-IV included baseline information and laboratory parameters. The baseline characteristics included: age, sex, race, body mass index (BMI), myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, peptic ulcer disease, diabetes, paraplegia, renal disease, malignant cancer, severe liver disease, metastatic solid tumor, acquired immunodeficiency syndrome (AIDS), temperature, mean artery pressure (MAP), heart rate, respiratory rate, red blood cell (RBC), white blood cell (WBC), hemoglobin (HGB), platelet (PLT), red cell distribution width (RDW), hematocrit (HCT), activated partial thromboplastin time (APTT), prothrombin time (PT), international normalized ratio (INR), bicarbonate, lactate, base excess (BE), anion gap, chloride, calcium, sodium, potassium, glucose, creatinine, blood urea nitrogen (BUN), total bilirubin (TBIL), albumin, alanine transaminase (ALT), aspartate transaminase (AST), alkaline phosphatase (ALP), urine output, sepsis related organ failure assessment (SOFA) and acute physiology score III (APSIII). All data used for prediction were from < 24 h after ICU admission.

Statistical analysis

Missing data are unavoidable in the MIMIC database, and this study used multiple imputation to account for missing data. The specific missing number (%) for included variables in the dataset before imputation is shown in Supplementary Table 1.

Values are presented in Table 1 as means with standard deviations (if normal) or medians with interquartile ranges (IQR) (if non-normal) for continuous variables and total numbers with percentages for categorical variables. Proportions were compared using the χ2 test or Fisher’s exact test, whereas continuous variables were compared by the t-test or Wilcoxon rank sum test, if appropriate.

Table 1 Baseline characteristics of the MIMIC-IV cohorts

Recursive feature elimination (RFE) was used as a feature selection method in this study. Specifically, RFE in this paper is based on random forest (RF). In brief, the RFE always fits the model in according with smaller sets of features until it reaches a specified termination criterion. Then, in every cycle of the trained model, the features are ranked by importance. Finally, dependency and collinearity are eliminated. Features were considered in groups of 8/16/24/32/40/48/ALL (ALL = 51 variables, Fig. 1), according to the ranks obtained after the feature selection method.

Fig. 1
figure 1

Study flow diagram and methods used for data extraction, training, and testing. ICU, intensive care unit; MIMIC-IV, Medical Information Mart for Intensive Care-IV; HE, hepatic encephalopathy; ML, machine learning; NNET, artificial neural network; GBM, gradient boosting machine; RF, random forest; BT, bagged trees

Then, four different ML algorithms were used to develop models, including artificial neural networks (NNET), gradient boosting machine (GBM), RF, and bagged trees (BT). Initially, we randomly assigned 70% of patients in MIMIC-IV database to the training cohort and 30% to the validation cohort. The training cohort was used to establish the model, while the validation cohort was used to perform validation. When constructing the model, we employed internal validation to evaluate the stability of the prediction model in the development sets. We used ten-fold cross-validation as the resampling method to find the optimal hyperparameters; nine folds were used for training in each iteration, and the last fold was processed to tune the hyperparameters. This process was repeated 30 times. In this way, each sample was involved in both the training and testing models so that all data were used to the maximum. Next, validation was employed to evaluate the validity of each model in the validation set.

After this, all models were assessed using multiple metrics based on the model performance. We calculated the median and 95% confidence intervals (CIs) of the area under the curve (AUC), accuracy, sensitivity, specificity, negative predictive value, and positive predictive value as measures of model performance.

We used the R packages "iml" and “Shapley values” to evaluate the importance of the variables included in the model. The Shapley values can be used to enhance ML’s interpretability to describe the relative contribution of each variable within each predictive model. Specifically, The Shapley values evaluate the importance of included feature A for variables produced by all feature combinations (rather than A).

All analyses were performed using the statistical software package R version 4.0.2 (http://www.R-project.org, The R Foundation). P-values < 0.05 (two-sided test) were considered statistically significant.

Results

Baseline characteristics

As shown in Fig. 1, there were 870 patients with HE in the MIMIC-IV database; of these, 601 were eligible for this study after exclusion. In this study, 489 (81.34%) patients still survived whereas 112 (18.64%) patients died within 28 days. The process of data extraction, training preparation, and data testing using different ML algorithms was also indicated in the Fig. 1. Individuals who died were more likely to have worse baseline conditions than survivors. Causes of HE mainly included virus hepatitis, alcoholic liver disease, autoimmune hepatitis. Details are listed in Table 1.

Variable importance

The RFE algorithm selected the following 8 important predictors: APSIII, SOFA, INR, TBIL, albumin, BUN, AKI and mechanical ventilation. All 8 variables were used in subsequent analyses for all models in both the training and testing sets.

Prediction performance in testing set

The discriminatory capabilities of all the models for predicting 28-day mortality are shown in Fig. 2 and Table 2. In the training set, NNET, GBM, RF and BT models were developed, and the testing models attained AUCs of 0.837, 0.769, 0.789, and 0.741, respectively (Fig. 2). NNET had the highest predictive performance among the four models (AUC: 0.837, 95% CI: 0.774–0.901), while the poorest discriminative ability was found in BT (AUC: 0.741, 95% CI: 0.654–0.829) (Table 2).

Fig. 2
figure 2

The receiver operating characteristic curves of four different machine learning models in the validation cohort. ROC, receiver operating characteristic, AUC, area under the curve; NNET, artificial neural network; GBM, gradient boosting machine; RF, random forest; BT, bagged trees

Table 2 Prediction performance of the machine learning models in the test set

Figure 3 showed the calibration curve for the calibration performance. The Hosmer–Lemeshow goodness-of-fit test was also calculated. To be specific, the chi-squared value was calculated based on the observed and model-predicted values for each group, and the corresponding P-value was subsequently obtained. A good fit of the prediction model was indicated if the 45° diagonal bisector did not cross the 95% CI region whereas a P-value < 0.05 for the belt plot of the calibration curve indicated a poor fit of the prediction model. The NNET model had good calibration, with P-values of 0.323.

Fig. 3
figure 3

The calibration curve for different machine learning models in the validation cohort. NNET, artificial neural network; GBM, gradient boosting machine; RF, random forest; BT, bagged trees

The 8 predictor variables in the NNET model are demonstrated in Fig. 4. Each variable in the study had a different Shapley value for 28-day mortality based on the ML approach. In general, TBIL, APSIII, and albumin were variables with relatively higher Shapely values across the NNET model. More precisely, these variables have a higher impact on the outcome of the model. Additionally, the AUCs of MELD and MELD-Na in the prediction of 28-day death were 0.728 (95% CI: 0.677–0.779) and 0.711 (95% CI: 0.658–0.765), respectively (Fig. 5).

Fig. 4
figure 4

The Shapley values for different variables in the NNET model. APSIII, acute physiology score III; ALB, albumin; SOFA, sepsis related organ failure assessment; INR, international normalized ratio; BUN, blood urea nitrogen; AKI, acute kidney injury; TBIL, total bilirubin; NNET, artificial neural network

Fig. 5
figure 5

The ROC curves of NNET, MELD and MELD-Na. ROC, receiver operating characteristic; NNET, artificial neural network; MELD, Model for End-Stage Liver Disease; MELD-Na, the sodium for end-stage liver disease

Discussion

Liver cirrhosis results from the development of various acute and chronic liver diseases, and HE is a common critical complication of decompensated cirrhosis. This retrospective study analyzed a relatively large population of MIMIC-IV, and found 8 variables were independent predictors of 28-day death. Notably, in this study, all indicators were obtained within the first 24 h of ICU admission, providing a short window for identifying severe patients with HE. Additionally, among the four ML models that were validated, NNET was the best model, with good discriminative (AUC = 0.837) and calibration ability simultaneously. Our model could potentially be useful to clinicians in their decision-making when it comes to the selection of therapeutic strategies. Features obtained in the final model included APSIII, SOFA, INR, TBIL, albumin, BUN, AKI and mechanical ventilation, which is consistent with the findings of other published studies.

In a recent research, researchers also identified SOFA and systemic inflammatory response syndrome (SIRS) as factors associated with 30-day mortality in patients with HE [18]. Patients with HE are functionally immuno-suppressed and susceptible to infection [19], which is a frequent precipitant for organ dysfunction. Likewise, the resultant organ failures as indicated by high SOFA and APSIII means significant mortality [20, 21]. Changes in TBIL and albumin often reflect liver function in patients with liver cirrhosis, which is closely associated with a poor prognosis [22]. Peng Y et al. in their study concluded that TBIL was independently correlated with in-hospital death in cirrhotic patients with HE [23]. Bai Z et al. reported that albumin level was an independent risk factor for HE-associated mortality during hospitalizations in cirrhosis (OR = 0.864, 95% CI = 0.771–0.967) [24]. Other studies have also validated this finding [25,26,27,28]. By conducting a preliminary observation, Udayakumar N et al. found that high serum bilirubin values in chronic liver disease were simple parameters that would predict a poor outcome in patients with HE [9]. Moreover, serum bilirubin levels as prognostic biochemical markers have been reported elsewhere and are similar to our observations [29, 30]. One previous investigation identified elevated levels of TBIL, BUN, and decreased albumin as factors associated with poor prognosis in patients with acute HE [23]. In addition, Hung TH et al. in their study also identified that AKI increased the 3-year mortality of cirrhotic patients with HE [31]. Other study also found that AKI was the main independent predictor of ICU death and 1-year mortality [32]. Possible mechanisms may be that the increased ammonia from blood stream exacerbates cerebral edema in cirrhotic patients with HE [33,34,35]. Additionally, according to Cui Y et al., hepatorenal syndrome (HRS) was independent factor associated with 3-month death [36]. Notably, decreased kidney function usually accompanies end-stage liver disease, including HRS. Renal dysfunction can increase BUN, and in turn, resultant diffusion of BUN into the intestine can cause enhanced ammonia uptake accompanied by worsening of HE [36]. Previous studies explored the prognostic factors correlated with 180 cirrhotic patients presenting with HE who were admitted to ICU. And researchers also found that the use of mechanical ventilation was a significant risk factor for mortality [37]. The study by Saffo S et al. has demonstrated that mechanical ventilation was the strongest predictor of in-hospital mortality in their primary analysis (OR, 3.00; 95% CI, 2.14–4.20; P < 0.001) and in all sensitivity analyses [38]. Studies by Benhaddouch Z et al. came to similar conclusion [8]. This may be due to the fact that patients with hepatic encephalopathy are at a greater risk for the complications of mechanical ventilation because of possible underlying circulatory, neurologic, and immunologic disturbance [39]. Mechanical ventilation itself may aggravate the existing condition, manifested as impairment in cardiovascular and cognitive function and immune defense, consequently, patients with hepatic encephalopathy may particularly be susceptible to developing such complications as shock, progressive delirium, and infection [40, 41]. Although MELD and MELD-Na are accurate, highly specific scores that are commonly used to reveal liver disease severity and quantify mortality risk [42, 43], the predicative performance for 28-day mortality of these scores in patients with HE remains unclear. Hence, we performed the ROC curves in this study, and found that the AUCs of the MELD and MELD-Na scores for 28-day mortality was lower than that of NNET. which suggested our model was superior to other models in terms of diagnostic discrimination. The reason for this may be, that only simple indicators (creatinine, bilirubin, and INR) cannot precisely assess the degree of cirrhosis [44, 45]. Moreover, a frequently reported drawback of the MELD score was that it has the disadvantage of lacking objective parameters reflecting the patient 's physical and nutritional status, including albumin [46].

There were several strengths in this study. First, multiple ML algorithms were used to build a predictive model. Third, 8 variables selected for the final model are readily available in clinical practice, enabling the model to be implemented easily in the real world.

This study had several limitations. First, because the model was based on MIMIC-IV, which is a single-center database, it still needs to be externally validated in other datasets. Secondly, all the variables were first obtained after an ICU admission, without considering that indicators were dynamically changing. Thirdly, originated from a retrospective cohort, the model needs further prospective validation before being considered for clinical application. Lastly, the majority of the patients included in this study were White, therefore, these findings may not be extrapolated to other populations, such as Asians.

Conclusions

In this study, we proposed an individualized predictive model based on ML for 28-day mortality in HE patients upon ICU admission. We demonstrated that APSIII, SOFA, mechanical ventilation, INR, TBIL, albumin and AKI are crucial for predicting 28-day mortality. The model in our study had superior performance to MELD score or MELD-Na score predicting 28-day mortality. In the future, real-time prediction of mortality risk among HE patients might be realized, which, in turn, will optimize treatment to improve clinical prognosis.