Background

Acute kidney injury (AKI) is a serious complication after cardiac surgery with an incidence of 5–30% depending upon procedure type and definitions used [1,2,3,4,5]. It is associated with an increased rate of mortality, hospital length of stay, and healthcare cost [6, 7]. As the incidence of AKI is higher after cardiac surgery as compared to medical and noncardiac surgical populations [8], much research has been dedicated to the identification of modifiable risk factors and/or derivation of AKI risk prediction models in this group [9,10,11,12].

Recent research demonstrates that there is no standard approach to AKI prediction for patients undergoing cardiac surgery. Existing predictive models are based on different combinations of risk factors and rely heavily on intra- and post-operative events to achieve predictive accuracy [12, 13], while preoperative risk stratification is most important and remains challenging. In addition, most existing predictive models were developed to identify patient at risk of severe AKI requiring renal replacement therapy [5, 12], despite mild AKI being associated with up to a threefold increase in the risk of short- and long-term mortality after cardiac surgery [3, 14].

Renal function has long been held as a surrogate for systemic perfusion, and accurate preoperative prediction can help to identify patients who may benefit most from intensive monitoring and personalized management strategies throughout the perioperative period. In the advent of artificial intelligence (AI) in medicine, Machine learning (ML) methods such as Random Forests have successfully been applied to create accurate and reliable predictive models in several fields of study [15, 16]. Moreover, hybrid ML algorithms offer improved performance, [17] interpretability and ease of use, making the AI “explainable” to clinicians.

We performed a case study to: (1) derive and internally validate a preoperative model to predict AKI of any severity after cardiac surgery, using a hybrid ML approach, consisting of Random Forests, followed by high-performance logistic regression, and (2) compare the performance of this ML model with traditional and enhanced regression models. We hypothesized that the ML model will outperform traditional models, both in terms of performance and parsimony.

Methods

Design and selection criteria

The study protocol was approved by the University of Ottawa Heart Institute Research Ethics Board, which waived the requirement for individual patient consent. We conducted a retrospective study of adult patients (age ≥ 18 years) who underwent major cardiac surgery requiring cardiopulmonary bypass between November 1st, 2009 and March 31st, 2015 at the University of Ottawa Heart Institute. Patients who underwent off-pump or thoracic aortic procedures, cardiac transplantation and insertion of ventricular assist devices, as well as those who were dialysis-dependent at baseline, were excluded from the study.

Data sources

We performed a retrospective analysis of prospectively collected data from Cardiocore. Cardiocore is a multimodular data reservoir that captures detailed demographics, comorbidities, physiologic and procedural details, and perioperative outcomes for all patients who undergo cardiac procedures at the University of Ottawa Heart Institute, a university-affiliated tertiary cardiac care referral center that performs the full scope of cardiac procedures. It is formally managed by a multidisciplinary committee and undergoes regularly scheduled quality assurance audits [18].

Study outcome

Postoperative AKI was defined according to the Kidney Disease: Improving Global Outcomes (KDIGO) criteria as a serum creatinine increase ≥ 26 μmol/l within 48 h following surgery or an increase of ≥ 50% from baseline within 7 postoperative days [19].

Candidate variables

We included, a priori, preoperative factors known to be or that could be associated with cardiac surgery-associated AKI based on previous research (Additional file 1: Table S1). Demographic factors included: age [5, 20], sex [11], body mass index (BMI) [20, 21], smoking status [20], and alcoholism status. Preoperative patient characteristics included: glomerular filtration rate (eGFR) [20, 22], preoperative anemia [20, 23], left ventricle ejection fraction [20], Cardiac Anesthesia Risk Evaluation (CARE) mortality risk score [24, 25], a history of atrial fibrillation [9], hypertension [20], coronary artery disease, Canadian Cardiovascular Society (CCS) grading of angina severity [26], recent myocardial infarction within 6 weeks prior to surgery, New York Heart Association Function (NYHA) Class [13], right-sided heart failure, infective endocarditis [27], peripheral arterial disease [20], carotid disease [9], cerebrovascular disease related and unrelated to carotid disease [11], presence of residual neurologic deficit after stroke, seizure disorder, smoking, diabetes [5, 9, 20], preoperative cardiogenic shock [22], preoperative intra-aortic balloon pump therapy and cardiac arrest [5, 22, 27]. Procedure-related characteristics included: operative priority [22, 28], procedure type [13, 20, 22], and redo sternotmy [29].

Statistical analysis

We divided the cohort randomly into derivation (70%) and validation (30%) samples.

We created three AKI risk prediction models in the derivation samples: (1) a hybrid ML algorithm, consisting of Random Forests, followed by high-performance logistic regression, (2) a traditional statistical model that employed backward variable selection, and (3) an enhanced statistical model that used 500 bootstrap samples for backward variable selection [30]. A data analysis and statistical plan was written and filed with a private entity (institutional review board) before data were accessed.

Derivation using a hybrid ML algorithm

Details of the Random Forests method have been described elsewhere [31,32,33]. In short, we used a bootstrap sample of the data to build each of the classification trees. A random subset of variables was selected at each split, thereby constructing a large collection of decision trees with controlled variation. The Random Forests trees are not pruned, so as to obtain low-bias trees (Additional file 2: Figure S1). Every tree in the forest casts a “vote” for the best classification for a given observation, and the class receiving most votes results in the prediction for that specific observation.

The derivation dataset was first sampled to create an in-bag partition—(2/3 of derivation sample) to construct the decision tree, and a smaller our-of bag partition (1/3 of derivation sample) to test the constructed tree to evaluate its performance by computing (Additional file 3: Figure S2): (1) misclassification error, (2) C-statistics, Hosmer–Lemeshow (H–L) p-value and (3) model performance (i.e., sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV]). Then, we performed tenfold cross validation to evaluate the model. The optimal number of trees and a subset of variables at each node was selected using the “tuneRF” package in R (version 3.2.3) to minimize the misclassification error. Random Forests calculates estimates of variable importance for classification using permutation variable importance measure (VIM) [31], which is based on the decrease of a classification accuracy when values of a variable in a node of a tree are permuted randomly. In our cohort, optimal misclassification rate was achieved by using 700 classification trees and 10 variables available for splitting at each tree node.

In this analysis, we converted all categorical variables into a set of binary variables to indicate the absence or presence of a given categorical effect, to increase the computational complexity for tree creation and to mitigate the inherent bias of Random Forests that favors categorical variables with multiple degrees of freedom [34]. We identified a subset of top 30 predictor variables out of the 43 candidate variables and incorporated them into a high-performance logistic model (SAS 9.4, SAS Institute, USA) to identify the best parsimonious model [35]. We used the Schwarz Bayesian Criterion (SBC) as a penalized measure of fit for the logistic regression model to avoid over-fitting [36]. A model with smaller SBC value is preferred over a model with a larger SBC value.

Derivation using traditional and enhanced statistical approaches

The traditional model employed logistic regression with an automated backward variable selection algorithm and generalized linear model. To prevent overfitting, the association of covariates with postoperative AKI had to have a significance level ≤ 0.001 to remain in the model [37].

The enhanced statistical approach employed backward variable selection for logistic regression models within 500 random bootstrap samples drawn with replacement from the original cohort [30], using a significance level ≤ 0.001 for backward stepwise selection to prevent overfitting [37]. We selected variables that were significant in predicting AKI in 50% or more of the bootstrap samples. We then averaged the regression coefficients for each variable across the 500 bootstrap samples.

Point score assignment and internal validation

For each of the three models, we assigned integer scores to retained covariates using the method described by Sullivan et al. [38] (Additional file 4). We then assessed the discrimination (C statistics or AUC) and calibration (Hosmer–Lemeshow (H–L) goodness-of-fit test and a decile-decile calibration plot of the observed and predicted outcome) of each model using the validation datasets.

The Random Forests analyses were performed in R statistical software (version 3.2.3) using the “randomForest” package [32]. All methods were performed in accordance with the international guidelines for developing and reporting predictive models in biomedical research. The traditional and enhanced statistical models, as well as point score assignment and internal validation, were performed using SAS 9.4 (SAS Institute, USA).

Results

Of 6522 patients who met the selection criteria, 1760 (27.0%) developed AKI within 7 days of surgery. The baseline characteristics of patients with and without postoperative AKI are reported in Additional file 5: Table S2. These baseline characteristics were similarly distributed across the derivation and validation datasets (Additional file 6: Table S3). Compared to those without AKI, patients who developed AKI were more likely to have undergone complex, emergent surgery, to have higher overall preoperative risk (CARE score ≥ 3), and to have a history of atrial fibrillation, cerebrovascular disease, anemia, and endocarditis.

The crude and adjusted odds ratios representing the relationship between candidate risk factors and AKI are presented in Additional file 7: Table S4.

Hybrid ML algorithm

The accuracy of the Random Forests model was 92.8% in derivation sample, and 75.5% after tenfold cross-validation. The resulting top 30 predictor variables are summarized in Fig. 1.

Fig. 1
figure 1

Description of the top 30 variables for prediction of AKI after cardiac surgery. Abbreviations: CCS class, Canadian Cardiovascular Society (CCS) grading of angina severity; Recent MI, Recent MI within 30 days of surgery; NYHA class, New York Heart Association Function class; BMI, body mass index; CARE score, Cardiac Anesthesia Risk Evaluation (CARE) mortality risk score; CABG, Coronary artery bypass grafting; GFR, glomerular filtration rate

After applying high-performance logistic regression to achieve parsimony, the final ML model consisted of 12 variables, including: CARE score (2–4), BMI, hypertension, atrial fibrillation, NYHA Class 3, left ventricle ejection fraction < 35%, anemia, emergent operative status, redo sternotomy, combined CABG/valve surgery, former smoker, and preoperative intra-aortic balloon pump use (Table 1).

Table 1 The risk model of AKI derived through a hybrid Machine Learning approach

The model performance in the derivation sample is presented in Table 2.

Table 2 Performance of the risk models in the derivation dataset

The mean of the total risk score was 10.16 (SD = 5.54) across retained covariates. The total risk score was strongly associated with postoperative AKI (OR = 1.20, 95% 1.18–1.22) in univariate logistic regression. The predicted probability threshold with the optimal operating characteristics (e.g., the square of distance between the point (0, 1) on the upper left hand corner of ROC space and any point on ROC curve) [39], was a predicted risk of 3% (sensitivity, 67.1%; specificity, 94.1%; PPV, 50.2%; NPV, 87.6%). Using a predictive probability of 50% yielded the following results: sensitivity, 31.2%; specificity, 94.4%; PPV, 71.1%; and NPV,78.6%. The risk prediction model remained robust after internal validation (AUC = 0.75; H–L χ2 = 5.34, p = 0.804) (Additional file 8: Figure S3).

Traditional statistical model

The final traditional model consisted of six predictor variables: CARE score, HF, anemia, smoking, BMI, and redo sternotomy (Table 3).

Table 3 The “traditional” risk model of AKI derived through logistic regression with automated backward variable selection

The mean of the total risk score was 8.67 (SD = 16.86). The total risk score was significantly associated with postoperative AKI (OR = 1.04, 95% 1.03–1.05). The model performance in the derivation sample is presented in Table 2. The predicted probability threshold with the optimal operating characteristics [39], was a predicted risk of 2% (sensitivity, 62.2%; specificity, 65.8%; PPV, 40.9%; and NPV, 82.1%). Using a predictive probability of 50% yielded the following results: sensitivity, 12.9%; specificity, 95.7%; PPV, 56.2%; and NPV,73.6%. In the validation sample, the point score model was modestly discriminative (AUC = 0.70), but poor calibrated (H–L χ2 = 20.32, p < 0.001) (Additional file 9: Figure S4).

Enhanced statistical model using bootstrapping methods

The final enhanced model consisted of 10 predictor variables, including: CARE score, hypertension, atrial fibrillation, HF, smoking status, BMI, surgery type, redo sternotomy, and preoperative intra-aortic balloon pump use (Table 4).

Table 4 The “enhanced” risk model of AKI derived through logistic regression with backward stepwise variable selection using 500 bootstrap samples

The mean of the total risk score was 11.16 (SD = 15.24). The total risk score was significantly associated with AKI (OR = 1.16, 95% 1.14–1.17 The model performance in the derivation sample is presented in Table 2. The predicted probability threshold with the optimal operating characteristics [39], was a predicted risk of 2% (sensitivity, 66.3%; specificity, 79.1%; PPV, 47.5%; and NPV, 84.4%). Using a predictive probability of 50% yielded the following results: sensitivity, 24.3%; specificity, 96.4%; PPV, 66.3%; and NPV, 76.6%. The risk prediction model remained robust after internal validation (AUC = 0.74; H–L χ2 = 8.9442, p = 0.347) (Additional file 10: Figure S5).

Discussion

To our knowledge, this study is the first to date that uses a hybrid ML approach to derive and validate a model to predict cardiac surgery-associated AKI of any severity, using only preoperative variables. Our findings suggest that a hybrid ML algorithm predicts better, and is computationally more efficient, than traditional and enhanced techniques for risk modeling.

Previous research has shown that the use of automated variable selection methods could result in the selection of non-reproducible sets of independent variables, thus biasing the estimated regression coefficients [40]. Because of this, the use of backward variable selection in repeated bootstrap samples would likely result in improved estimation of regression coefficients with narrower confidence intervals [30]. Our hybrid ML approach benefits form its ability to accommodate inter-correlation between multiple explanatory variables and providing protection from over-fitting the data [15], and thus, outperforms both traditional and enhanced regression models.

Several cardiac surgery-associated AKI risk models have been proposed to date, with the models predicting renal replacement therapy being most robust [9,10,11]. Despite the clinical importance of renal replacement therapy, its low incidence rate (2–3%), late occurrence [41], and end stage physiology limit the practical benefit of these risk models. In contrast, mild AKI is very common (pooled incidence rate of 22.3%) [42] and contributes to considerable perioperative and long-term morbidity and mortality [14]. The kidneys are sensitive to unfavorable physiologic processes in the setting of cardiac surgery, which include hypotension, low cardiac output syndrome, systemic inflammation resulting from the mechanical trauma of extracorporeal red blood cell in contact with artificial surfaces [43, 44], as well as the catecholamine surge, decreased vasomotor reactivity and the mismatch of medullary blood flow and renal oxygen consumption that occur during the post-bypass period. Taken together, accurate preoperative prediction of AKI of any severity, prior to exposure to intra- and post-operative stresses, affords clinicians the greatest window of opportunity to proactively intensify physiologic monitoring, personalizing fluid management and hemodynamic goals to optimize systemic and renal perfusion in at-risk patients [18].

We used KDIGO to define AKI [19], which enables standardization of reporting and compatibility with similar studies. Our high quality, comprehensive clinical databases provided a large number of standardized candidate variables for ML and statistical modeling. Our ML risk model contains 11 variables that are etiologically associated with AKI after cardiac surgery [12]. We found that our ML model was more accurate than the traditional and enhanced statistical models (AUC = 0.75 vs. 0.70 and 0.73, respectively).

In addition, the ML and enhanced statistical models were well calibrated, while the traditional statistical model was not. From a practical perspective, the ML model was more computationally efficient than the enhanced backward selection algorithm using 500 bootstrap samples. Our findings are consistent with the literature, where recent medical applications of ML have shown a high degree of accuracy in predicting various outcomes across a spectrum of clinical settings and diseases [45, 46].

Few published studies to date predicted cardiac surgery-associated AKI of any severity. Our ML risk model had a higher predictive ability and was more parsimonious (AUC = 0.75, H–L p = 0.804) than a recent preoperative model for cardiac surgery-associated AKI of any severity (AUC = 0.73, H–L p = 0.490) [20], which was derived using a traditional statistical approach and consisted of 15 risk factors. This model was developed using prospectively collected data from over 30,000 subjects undergoing cardiac surgery at three hospitals in the UK and was externally validated. Our ML model also had similar predictive accuracy and better calibration compared to another contemporary preoperative risk score [22] for any-stage AKI consisted of 10 risk factors (AUC = 0.77, H–L p = 0.06), that was derived using bootstrapping methods and was validated internally. It is to be noted that in the latter model, AKI was defined as that occurring within 30 days of cardiac surgery. This definition likely captures events occurring during surgical readmissions or during complicated and prolonged postoperative stays. These events may be unrelated to the index surgery and may thus be impractical for informing preventative therapy in the intraoperative setting.

Two other published risk models for predicting AKI of any severity after cardiac surgery combined various pre-, intra- and postoperative factors [13, 47]. These studies demonstrate that the addition of perioperative factors could improve model performance (AUC = 0.84, and AUC = 0.81, respectively). Further research could be aimed to investigate the additive predictive value of key perioperative variables such as hypotension and low cardiac output, to produce “staged models”. Such models would inform preoperative AKI risk stratification for the planning and personalization of pre- and intraoperative management, as well as to enhance prognostication based on intra- and post-operative events.

Clinical prediction models and associated risk-scoring systems are popular statistical methods as they permit a rapid assessment of patient risk without the use of computers or other electronic devices [48]. The additive point score assigned to each predictor in the developed models to predict AKI of any severity was derived from well-fit logistic regression models, and can readily be applied at the bedside. These validated scores to predict AKI of any severity following cardiac surgery will aid in clinical decision-making, patient counseling and informed decision-making, resource utilization, and preoperative medical optimization [12]. Future research is recommended to prospectively assess the efficacy of these models to enhance personalized fluid and hemodynamic management, as well as minimizing exposure to nephrotoxins, in preventing perioperative AKI.

Our findings should be interpreted in light of several limitations. First, our study was conducted in the setting of a single tertiary care hospital. Therefore, our ML model needs to be externally validated before it can confidently be used at other institutions and geographic regions. Second, a relatively small number of covariates was included in this study. The performance of the Random Forests approach may be improved in the presence of a larger distribution of covariates [49]. Third, our risk model is tailored to patients undergoing procedures involving cardiopulmonary bypass and may not be applicable in the setting of off-pump CABG [50]. Forth, we did not incorporate urine output criteria in identifying patients with AKI, because this information was not available in our databases. Finally, unmeasured confounding characteristics are an important consideration in any retrospective analysis.

Conclusions

In summary, we derived and internally validated an accurate and well-calibrated preoperative risk model for cardiac surgery-associated AKI of any severity. We found in this study that risk modeling using a hybrid ML approach led to better model performance than parametric statistical approaches, without sacrifice of computational efficiency. Further studies are needed to externally validate this model, as well as to derive and validate staged models to better inform management and prognostication.