Acute kidney injury (AKI) is common in the intensive care unit (ICU), and there is evidence that even a small increase in serum creatinine may be associated with increased risk of mortality [1,2,3]. AKI can be defined by either an elevation in serum creatinine or a reduced urine output according to the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines [4]. Oliguric AKI constitutes a substantial proportion of the overall AKI population, and it imposes a great challenge for fluid management. Pathophysiologically, oliguria may represent an adaptive response in AKI, and once an effective circulatory volume is restored by positive fluid balance, urine output would improve. Under this circumstance, fluid administration or positive fluid balance can be considered beneficial and those who improve with more fluid can be considered as having volume-responsive (VR) AKI.

Intravenous fluid challenges are often used in critical care to restore blood pressure to improve urine output in patients with hypotension and oliguria, respectively [5, 6]. Recent epidemiological evidence, however, suggested that large positive fluid balance may not be useful to improve urine output in many patients with AKI and can even be harmful in worsening renal function through a number of possible mechanisms including excessive kidney edema [7]. This latter type of AKI can be considered as volume-unresponsive (VU) AKI [8]. Because early improvement in organ dysfunction is associated with an improved survival [9], it would be best if we can adjust the fluid treatment strategy by identifying which patients are having VR or VU AKI.

In animal models, fractional excretion of electrolytes was found to perform well in early differentiation between VR- and VU-AKI [10]. However, these promising results could not be replicated in human studies. Legrand et al. investigated the discrimination of urinary sodium to distinguish volume responsiveness, and it was found to have limited predictive value [11]. Currently, there is little clinical information on how we can identify patients with AKI who are VR and VU in terms of urinary output response. We hypothesized that advanced machine learning techniques may be useful to identify the most important clinical factors that can differentiate between patients with VR and VU AKI. In this study, we aimed to use machine learning techniques to develop and validate an AKI fluid-responsiveness model, called extreme gradient boosting (XGBoost), and compared the performance of this model to a conventional logistic regression model.


Source of data

A large US-based critical care database named Medical Information Mart for Intensive Care (MIMIC-III) was analyzed [12]. The MIMIC-III database is an integrated, de-identified, comprehensive clinical dataset containing all the patients admitted to the ICUs of Beth Israel Deaconess Medical Center in Boston, MA, from June 1, 2001, to October 31, 2012. There were 53,423 distinct hospital admissions for adult patients (aged 16 years or above) admitted to the ICUs during the study period. Since the study was an analysis of a third-party anonymized publicly available database with pre-existing institutional review board (IRB) approval, IRB approval from our institution was exempted. The study was reported according to the recommendations of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement [13].


Patient eligibility was considered when urine output was less than 0.5 ml/kg/h for the first 6 h after ICU admission. This definition was consistent with the urine output component of the KDIGO criteria [4]. To examine the impact of fluid resuscitation on subsequent urine output response, only patients with substantial fluid intake (> 5 l) within 6 h following the initial 6 h for assessing oliguria were eligible (Fig. 1). Fluid intake and urine output were extracted from the nursing chart system, which more accurately reflected the actual volume intake or output than the medical order system. Patients who left ICU during the observation period were excluded. Furthermore, patients receiving any diuretics and/or renal replacement therapy (RRT) on day 1 were excluded.

Fig. 1
figure 1

Schematic illustration of the time windows for the definition of oliguria and fluid responsiveness. Oliguria was defined as urine output < 0.5 ml/kg/h for the first 6 h after ICU admission. Fluid responsiveness was defined as urine output > 0.65 ml/kg/h for 18 h after initiation of fluid loading. It is noted that the time window for the definition of oliguria preceded the exposure of fluid input

Outcome (volume responsiveness)

The urine output within an 18-h period following the initial 6 h for defining oliguria was used as the outcome. Patients were considered as VR-AKI if he/she had urine output greater than 0.65 ml/kg/h, corresponding to a 30% increase as compared with the baseline value. Otherwise, they were defined as VU-AKI.

Predictors of VR-AKI

Routinely collected clinical and laboratory variables obtained within the first 6 h of ICU admission were assessed for their ability to predict volume responsiveness. For some variables with multiple measurements, both the maximum and minimum values were assessed. Age, gender, ethnicity, admission type, elective surgery, type of ICU and presence of infection, and vital signs including respiratory rate, blood pressure, heart rate, and temperature were analyzed. In addition, laboratory data including glucose, white blood cell count (WBC), hematocrit, chloride, potassium, sodium, lactate, creatinine, blood urea nitrogen (BUN), coagulation profile, PaO2, PaCO2, and pH were included.

Because this was a hypothesis-generating epidemiological study, no attempt was made to estimate the sample size of the study; instead, all eligible patients in the database were included to maximize the statistical power of the predictive model. Because missing data may create bias, variables with > 70% missing values were excluded from further analysis. Other variables with a lesser degree of missing values were analyzed using multiple imputation method [14].

Statistical analysis

Clinical characteristics between VR-AKI and VU-AKI groups were compared using either Student t test or rank-sum test as appropriate. Chi-square test or Fisher’s exact test was employed to compare the differences of the categorical variables [15]. A stepwise logistic regression model was used to select variables which were predictive of volume responsiveness. Both forward selection and backward elimination were used, testing at each step for variables to be included or excluded. Akaike Information Criterion (AIC) was used as the selection criteria to eliminate the predictors [16].

Extreme gradient boosting (XGBoost) combined with decision trees was employed to predict VR versus VU. A classification tree was used as the weak learner, and the learning objective function was binary logistic. The boosting method works by iteratively refitting a weak classifier (decision tree) to residuals of previous models. Each successive classifier focused more on misclassified observations during the previous round of fitting [17]. In this study, we employed 300 rounds of iterations for cross-validation process, which were expected to result in a powerful ensemble classifier with superior predictive accuracy. Overfitting can be a major problem in using machine learning techniques. The ability to understand the complex relationship in data while avoiding overfitting requires fine-tuned hyperparameters. XGBoost hyperparameters included learning rate, minimum loss reduction required to make a further partition on a leaf node of the tree, maximum depth of a tree, subsample ratio of the training instance. The original dataset was randomly partitioned into 5 equal-sized subsamples for bootstrap validation (BV). Specifically, 4 subsamples were used to train the model, which was then validated in the remaining 1 subsample. Hyperparameters were considered to be sufficiently tuned if (1) the BV training log-loss decreased as the number of trees increased and (2) BV testing log-loss was less than 0.693 and only slightly greater than the training log-loss (e.g., a log-loss of 0.693 is the performance of a binary classifier that performs no better than chance: − log 0.5 ≈ 0.693). We used a loop function (grid search) to select the hyperparameters that the minimum training log-loss should be greater than the 85th percentile, and the minimum testing log-loss should be less than the 8th percentile. After choosing the hyperparameters, the BV process was run for 100 times to determine the number of trees required for the final model. The number of trees in the final XGBoost model was determined by the minimum BV testing log-loss in each of the last 100 BV iterations and computed the 5th percentile of that distribution. This was a conservative approach to the selection of the number of trees, which was also described in the literature [18].



Of the 10,795 patients with urine output < 0.5 ml/kg/h for the first 6 h after ICU admission, 7491 patients (69.4%) received fluid intake > 5 l within the following 6 h. A number of 809 patients were excluded because they received diuretics and/or RRT on the first day. A total of 6682 patients were included in our analysis; 2456 patients had VR-AKI, and 4226 patients had VU-AKI on day 1 in ICU (Fig. 2).

Fig. 2
figure 2

Flow chart of patient selection

The differences in characteristics between VR and VU groups are described in Table 1. VR group had more patients of elective surgery prior to ICU admission than the VU group (18.2% vs. 13.6%; p < 0.001). The maximum serum creatinine concentration (1.74 ± 1.50 vs. 1.26 ± 0.98 μmol/l; p < 0.001) was higher, and the minimum bicarbonate concentration (20.82 ± 5.41 vs. 22.24 ± 4.32 mmol/l; p < 0.001) and maximum hematocrit (35.14 ± 5.69 vs. 35.93 ± 5.79%; p < 0.001) were lower in the VU group. The VR group had higher urinary pH (5.78 ± 0.86 vs. 5.66 ± 0.79; p < 0.001), lower urinary creatinine (111.61 ± 73.01 vs. 132.51 ± 79.89 mg/dl; p < 0.001), higher systolic blood pressure (89.16 ± 16.46 vs. 85.09 ± 18.03 mmHg; p < 0.001), higher albumin (3.03 ± 0.63 vs. 2.83 ± 0.66; p < 0.001), lower rate of mechanical ventilation (31.3% vs. 42.7%; p < 0.001), vasopressor use (22.8% vs. 30.0%; p < 0.001), and infection (47.6% vs. 62.4%; p < 0.001) than the VU group (Table 1).

Table 1 Characteristics between fluid responsive and non-responsive groups

The stepwise logistic regression model

The results of stepwise logistic regression model are shown in Table 2. As expected, advanced age (odds ratio [OR] for a 20-year increase, 0.69; 95% confidence interval [CI], 0.63 to 0.75), a higher serum creatinine (OR for each 0.1 mg/dl increase, 0.98; 95% CI, 0.97 to 0.99), and lactate (OR for each 1-mmol/l increase, 0.91; 95% CI, 0.88 to 0.94) were associated with decreased probability of volume responsiveness. On the contrary, a greater value of albumin (OR for each 1-g/dl increase, 1.53; 95% CI, 1.38 to 1.71), systolic BP (OR for each 20-mmHg increase, 1.13; 95% CI, 1.04 to 1.23), and minimum bicarbonate (OR for each 5-mmol/l increase, 1.12; 95% CI, 1.02 to 1.22) were associated with increased probability of volume responsiveness (Table 2).

Table 2 Multivariable logistic regression model with stepwise variable selection

The XGBoost model

The hyperparameters used in our analysis were as follows (determine by grid search): learning rate = 0.04, minimum loss reduction = 10, maximum tree depth = 9, subsample = 0.6, and number of trees = 300. With these hyperparameters, bootstrap validation (BV) training log-loss decreases as the number of trees in an ensemble increases, and the BV testing log-loss was less than 0.693 and only slightly more than BV training log-loss as the tree grows (Fig. 3). Feature importance was calculated by the sum of the decrease in error when split by a variable, which reflects the contribution each variable makes in classifying VR versus VU. The urinary creatinine was the most important variable to distinguish VR and VU group, followed by maximum BUN, age, albumin, and maximum temperature (Fig. 4).

Fig. 3
figure 3

Training process of the extreme gradient boosting machine. Sample output of bootstrap validation (BV) during XGBoost hyperparameter tuning, using the values specified in the final XGBoost model (learning rate = 0.04, minimum loss reduction = 10, maximum tree depth = 9, subsample = 0.6, and number of trees = 300). Log-loss value for the training and testing datasets is shown in the vertical axis. The dashed vertical line indicates the number of rounds with the minimum log-loss in the test sample. The conditions of well-tuned model were satisfied: BV training log-loss decreases as the number of trees in an ensemble increases, and BV testing log-loss is less than 0.693 (e.g., a log-loss of 0.693 is the performance of a binary classifier that performs no better than chance: − log 0.5 ≈ 0.693) and only slightly more than BV training log-loss as the tree grows.

Fig. 4
figure 4

Feature importance derived from XGBoost model. Abbreviations and annotations: creat.u, urinary creatinine; bun_max, maximum blood urea nitrogen; creatmax0d, maximum creatinine on the day of ICU admission; diasbp_min, minimum diastolic blood pressure; inr_max, maximum international normalized ratio; heartrate_max, maximum heart rate; sysbp_min, minimum systolic blood pressure; first_careunitCSRU, first care unit is cardiac surgery recovery unit; mech_vent, mechanical ventilation; ph.u, urinary pH; TSICU, trauma-neuro surgical ICU; vaso, vasopressor

Model performance

Model discrimination was assessed using the area under receiver operating characteristic curve (AU-ROC). The XGBoost had a significantly greater AU-ROC than the logistic regression model (AU-ROC, 0.860; 95% CI, 0.842 to 0.878 vs. 0.728; 95% CI, 0.703 to 0.753, respectively; Fig. 5). Table 3 describes the classification or confusion matrix for the two models in identifying the VR and VU status.

Fig. 5
figure 5

Receiver operating characteristic curve for estimating the discrimination of the logistic regression model and XGBoost model

Table 3 Classification matrix for the XGBoost and logistic regression models in the out-of-sample validation cohort


In this hypothesis-generating study, we showed that some clinical factors were more likely to be associated with VR-AKI than VU-AKI. Using advanced machine learning techniques, we could identify some important clinical factors associated with VR-AKI such as age, urinary creatinine concentration, maximum BUN concentration, and albumin. These results have some implications and require further consideration.

First, an ability to accurately identify volume responsiveness in critically ill patients with AKI is clinically important to avoid both hypervolemia and hypovolemia. Currently, there is a lack of a reliable tool to distinguish between VR and VU AKI at an early stage. In this study, we showed that sophisticated machine learning techniques such as the XGBoost modeling can enrich the amount of information we can obtain from analyzing a database and allow us to develop and validate a better-performing predictive model compared to the conventional logistic regression technique. The potential usefulness of the model is that it can help to stratify oliguria patients immediately after ICU admission. As a result, large volume fluid can be more accurately given to patients who are very likely to respond to fluid challenge. There is evidence that fluid overload can result in organ dysfunctions, prolonged mechanical ventilation, and even death [19,20,21]. Thus, it is of vital importance to identify patients who will benefit from fluid resuscitation. However, the present study cannot provide a higher level of evidence on the effectiveness of the XGBoosting model. Future randomized controlled trials comparing the treatments with and without the prediction model are warranted to explore the effectiveness.

Second, our results showed that urinary creatinine was potentially useful to differentiate between patients in AKI who were VR and VU. Probably, patients with higher serum creatinine may also have higher excretion of creatinine to the urine. Since the former is a biomarker of kidney injury (e.g., higher serum creatinine was associated with higher risk of intrinsic injury), the latter is also associated with increased risk of VU-AKI. The utility of urinary biochemistry to predict AKI outcome has been controversial in the literature. Although urinary biomarkers such as creatinine and fractional excretion of electrolytes were significantly different between VR and VU groups in some animal and human studies [10], there are also studies showing that urinary biochemistry may not be useful in differentiating between VR and VU AKI [22,23,24,25]. In our study, we could not analyze the ability of urinary sodium and potassium to differentiate between VR and VU AKI because a large proportion (> 70%) of patients did not have this data.

Third, we found that patients with AKI and oliguria after elective surgery were more likely to respond to fluid challenges in univariate analysis. Patients who underwent elective surgery are generally in better clinical condition than patients requiring urgent surgery or with a medical emergency. Postoperative oliguria can be explained by hypovolemia due to intraoperative and postoperative insensible fluid loss. As such, they will be more likely to benefit from a larger amount of fluid after major surgery. However, the association of elective surgery with volume responsiveness disappeared after adjusting for other physiological variables, indicating that hypovolemia can be represented by these variables such as systolic blood pressure, heart rate, and hematocrit. We found that an increased hematocrit within the first 24 h of ICU admission was also an independent predictor of VR (in both the logistic and XGBoost models). This result could be explained by the fact that hematocrit has a direct relationship with the intravascular plasma volume and a higher hematocrit may suggest a relative hypovolemic state [26]. Conversely, a higher serum creatinine on ICU admission might indicate established renal intrinsic damage, which is more likely to be unresponsive to fluid challenges.

This study has some strengths and weaknesses. The XGBoost modeling is a novel technique that has not been widely adopted in critical care research. The XGBoost algorithm has been successfully used in some complex scenarios such as the prediction of the failure of the treatment for parapneumonic empyema [18], in which the predictive accuracy of the XGBoost model was significantly better than a generalized linear model. This is not surprising because the XGBoost model is an ensemble of weak prediction trees, which is able to capture complex relationships in data without the need for high-order interactions and non-linear functions to be explicitly specified [27]. Furthermore, this technique is well designed to prevent overfitting by cross-validation and regularization [17]. Our results suggest that this technique has the potential to improve the power of critical care epidemiological studies in the future. Nonetheless, this was a hypothesis-generating study, and external validation of our model is essential to confirm its utility. A limitation of this study is that we did not have data on the indications for large volume resuscitation. The study was not a designed clinical trial that the indications for large volume loading could be prespecified. However, we have randomly selected 30 cases and found that patients receiving > 5 l fluid during a 6-h period were those with indications for fluid loading in order to increase the urine output. Furthermore, the overall population had low blood pressure (mean SBP, 85 mmHg), elevated heart rate (mean, > 100/min), and lactate (mean, 3 mmol/l) in our study, which were consistent with the indications for fluid loading (Table 1). Clinicians might have given their patients fluid for a variety of different reasons such as hypotension or elevated heart rate in addition to any attempt to improve oliguria. The study only explored the short-term effect of large volume intake, other long-term outcomes such as persistent AKI, organ-failure free days, and mortality were not investigated.


In conclusion, this hypothesis-generating study showed that some clinical factors were more likely to be associated with VR-AKI than VU-AKI. The XGBoost modeling technique could identify the predictors of VR-AKI that were not apparent using logistic regression, resulting in a better-performing predictive model to identify patients with VR-AKI. Further epidemiological studies using advanced machine learning techniques to validate our results will help us to identify the most suitable patients to be included in clinical trials assessing the benefits of fluid therapy in AKI.