Introduction

Hyperkalemia is a common complication of chronic kidney disease (CKD). The prevalence rate of hyperkalemia is approximately 9% in CKD patients and one-third of non-dialysis CKD patients under nephrology care [1, 2]. The risk factors for hyperkalemia in CKD patients are congestive heart failure, diabetes, old age, a high potassium diet, and medications like renin-angiotensin-aldosterone system inhibitors, beta-blockers, and others [3,4,5]. Hyperkalemia is associated with mortality, CKD progression, hospitalization, and high healthcare costs [1, 3].

There were some studies about predicting hyperkalemia. A claim study conducted in the U.S. successfully predicted hyperkalemia in CKD patients using logistic regression [6]. Several deep-learning models were performed well to predict hyperkalemia using electrocardiography in CKD patients and at the emergency department [7, 8].

Recently, machine learning have been developed to handle complex and high-dimensional data and increasingly applied in clinical medicine. The eXtreme Gradient Boost (XGBoost) algorithm developed by Chen et al. [9], one of the state-of-the-art gradient boosting machine learning algorithms, performed excellently in a number of medical problems [10,11,12]. In this study, we aimed to develop a machine learning model using the XGBoost algorithm and then assess the model performance in predicting hyperkalemia in patients with advanced CKD at the outpatient clinic in comparison to conventional logistic regression models and two nephrologists.

Materials and methods

Data source

This retrospective study used data retrieved from the pre-end-stage renal disease (pre-ESRD) program every 3 months that was initiated by Taiwan’s National Health Insurance Administration (NHIA) and performed in most of the hospitals in Taiwan to provide high-quality care for patients with CKD of stages 3b, 4, and 5 [13]. From January 1, 2010, to December 31, 2020, we used data collected in a single medical center in central Taiwan. This study was approved by the Institutional Review Board of Changhua Christian Hospital (IRB number-210423). All the data were measured in the laboratory that had been accredited by the College of American Pathologists’ Laboratory Accreditation Program.

Study population

Eligible patients were enrolled to have had at least two outpatient visits in three months between January 1, 2010, and December 31, 2020. We excluded patients who were aged ≤ 20 years and whose estimated glomerular filtration rates (eGFRs) were ≥ 30 mL/min/1.73 m2 because advanced CKD patients with hyperkalemia had higher medical expenses and mortality rates [14, 15]. We also excluded patients who did not have serum potassium values in the t-th clinic visit or the t + 1-th clinic visit. The t-th clinic visit refers to the time when the lab tests were conducted for the development of the prediction models. We randomly divided the study participants into the training (~ 75%) and testing (~ 25%) datasets by patient identification to make sure that the data were totally different between the training and testing datasets.

Model development

Predictors

Figure 1 shows how to generate parameters used for model development and prediction. The variables for our model consisted of demographics, laboratory tests, medical history based on ICD-9 and ICD-10 (Supplementary Table S1), and medications (Supplementary Table S2). For laboratory tests, missing values were imputed separately for the training and testing sets. We imputed the missing values using the K-Nearest Neighbors approach [16].

Fig. 1
figure 1

Model development and prediction of a single visit

The primary outcome of our study was to predict whether or not hyperkalemia (K > 5.5 mEq/L) would occur during the t + 1-th visit.

Prediction machine learning algorithms

We built a binary prediction model using XGBoost and used a grid search with tenfold cross-validation to find the best hyperparameters. XGBoost is one of the ensemble decision-tree-based learning algorithms based on a gradient descent-boosting process. The core concept of gradient boosting decision tree algorithm is that it iteratively generates many weak classifiers and combines them to obtain a strong classifier, which is implemented by each new decision-tree learning from the errors of the previous decision-tree sequentially [17]. Other advantages of XGBoost are tuning hyperparameters, controlling overfitting, and parallel computation to reduce processing time [9, 12].

Human-machine competition

Two nephrologists participated in our study. They predicted whether or not hyperkalemia would occur in the t + 1-th clinic visit using the data of the t-th clinic visit. We assessed their performance using the testing dataset and compared their results with those of XGBoost and the logistic regression model.

Statistical analyses

We compared baseline characteristics between training and testing datasets. Categorical variables were presented as proportions and continuous variables were presented as mean values with standard deviations. Numerical variables of clinical characteristics were compared using Student’s t-test. The chi-squared test was used to compare differences in categorical variables.

We conducted multivariable logistic regression analyses as a reference model. The overall performance of the models in the testing dataset was assessed by calculating the area under the receiver operating characteristic curve (AUC) and the associated 95% confidence interval (CI). The AUC values were compared using the DeLong test. The net benefit of the XGBoost model was assessed using the decision curve analysis (DCA) and then further using clinical impact curves (CIC) to assess the clinical practicability [18, 19]. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated to evaluate the model performance. Finally, we used the SHAP (SHapley Additive exPlanations) framework to evaluate the impact of features in our model [20].

Machine learning algorithms and statistical analyses were performed using Python version 3.9.12, scikit-learn version 1.0.2, and R version 4.2.0.

Results

General demographics

The 1,526 patients (6,949 visit numbers) in the training dataset and 439 patients (2,054 visit numbers) in the testing dataset met our inclusion criteria (Fig. 2). Baseline patient characteristics are presented in Table 1. The mean patient age was 69.39 years and 49.2% were female. Patients in the testing dataset were older and more likely to be female, have diabetes, cardiovascular disease, cancer, hypertension, hyperlipidemia, and dementia. Patients in the testing dataset had a higher proportion of prescriptions for angiotensin receptor blocker and lower proportion of calcium polystyrene sulfonate use. The prevalence of hyperkalemia (K > 5.5 mEq/L) during the t + 1-th visit was 6.6% in the training dataset and 6.8% in the testing dataset.

Fig. 2
figure 2

Participant flow diagram. Abbreviations: OPD, outpatient department

Table 1 Baseline characteristics in the training and testing datasets

Development of the XGBoost Model and comparison of human-machine competition

The detailed results of the human-machine competition are shown in Table 2 and Fig. 3. In detecting hyperkalemia, the XGBoost model had the highest AUC, PPV, and accuracy in the human-machine competition. In terms of the AUC, the performance of the XGBoost model was significantly better than that of the two clinicians (0.867, 95% CI 0.840–0.894, vs. 0.745, 95% CI 0.704–0.789, and 0.741, 95% CI 0.700–0.783, respectively); however, its performance did not differ significantly from that of logistic regression. The net benefit for the XGBoost and logistic regression models was better than that of the two clinicians based on DCA (Fig. 4a). Figure 4b shows the XGBoost model also had a better clinical net benefit within a wide range of threshold probabilities and impacted patient outcomes.

Fig. 3
figure 3

Area under operating characteristic (ROC) curves for XGBoost, logistic regression models, and nephrologists in the testing dataset

Important features of the XGBoost model and results of multivariable logistic regression

As shown in Fig. 5, the top 5 features of the XGBoost model were the potassium level during the t-th visit, blood urea nitrogen, calcium polystyrene sulfonate, angiotensin receptor blocker use, and hemoglobin, in that order. Supplementary Table 3 shows the results of univariate and multivariate logistic regression analysis. In the multivariate logistic regression analysis, the top 5 significant variables by P value for hyperkalemia were potassium level during the t-th visit (OR, 6.96; 95% CI, 6.05–8.02; P < 0.001), ARB (OR, 1.40; 95% CI, 1.19–1.64; P < 0.001), hemoglobin (odds ratio [OR], 0.92; 95% [CI], 0.87–0.97; P = 0.001), CHF (OR, 0.8; 95% CI, 0.68–0.95; P = 0.008), and calcium polystyrene sulfonate (OR, 1.29; 95% CI, 1.06–1.56; P = 0.009) (Table 3).

Fig. 4
figure 4

A. Decision curve analysis (DCA) of the XGBoost, logistic regression (LR) models and nephrologists. XGBoost and LR models demonstrated a larger net benefit compared to nephrologists for the threshold probabilities. B. Clinical impact curve (CIC) of the XGBoost model

Fig. 5
figure 5

Top 5 important features of the XGBoost model by SHAP value

Table 2 Performance comparison between XGBoost model, logistic regression model, and nephrologists in the training and testing datasets
Table 3 The top 5 significant variables by P value in the multivariate logistic regression model

Discussion

In the present study, we developed the XGBoost model to predict hyperkalemia in advanced CKD patients using data from an outpatient clinic. The XGBoost model demonstrated better performance in comparison with two nephrologists; however, the difference in AUC between XGBoost and the logistic models was not statistically significant.

The prevalence and incidence rates of ESRD in Taiwan are the highest in the world [21, 22]. Taiwan’s NHIA developed the pre-ESRD program to reduce the magnitude of the problem of CKD in 2006; as such, nephrologists may often need to attend to more than 20 CKD patients at a clinic. Clinical decision-making tools could help physicians make better decisions in properly caring for patients in Taiwan, especially when they face many CKD patients at a clinic. Hyperkalemia is a frequent complication of CKD due to its limited ability to increase potassium excretion [4, 23]. Hyperkalemia is associated with not only muscle weakness and fatal arrhythmia but also high insurance costs in CKD patients [1, 24]. Thus, we investigated whether the XGBoost model improved hyperkalemia prediction for CKD patients. The XGBoost model performed best in human-machine competition using evaluation metrics such as the AUC, accuracy, NPV, and PPV in this study. In addition, the XGBoost model had a higher bet benefit than the logistic regression model, which would lead to the better clinical outcomes [25].

XGBoost is an efficient and flexible gradient boosting machine learning algorithm and make prediction well in clinical problems. XGBoost achieved a high accuracy in predicting COVID-19 severity in US, excellently predicted kidney outcome in immunoglobulin A nephropathy, and outperformed 2-year dementia risk [11, 26, 27]. In this study, The XGBoost model performed best. However, the differences in evaluation metrics between the XGBoost and logistic regression model were not statistically significant. Evidence has revealed that logistic regression was not inferior to machine learning for clinical prediction models [28]. The possible reason why machine learning does not perform better in clinical problems is the fact that clinical predictions have a poor signal-to-noise ratio, low-dimensional data, and a small sample size [28, 29].

Machine learning and logistic regression usually use different variables with divergent ranks to develop prediction models [29]. In addition, machine learning models are regarded as black-box models so that physicians may doubt the results [30]. In this study, we attempted to explore if the XGBoost could use reasonable variables to develop a prediction model. We used SHAP to visualize the five most important features in the XGBoost model and compared the results to that of the logistic regression model. In both models, there were four variables that were chosen as high-ranking variables, including hemoglobin, the serum potassium value during the t-th visit, angiotensin receptor blocker use, and calcium polystyrene sulfonate use. A high potassium value during the t-th visit and calcium polystyrene sulfonate use implies that the baseline potassium level of patients is high. Angiotensin receptor blocker use induced hyperkalemia due to the decline in the serum aldosterone level and decrease in the renal blood flow [31, 32]. Lower hemoglobin levels were associated with hyperkalemia, and possible risk factors include iron-deficiency anemia, sickle cell anemia [33], and gastrointestinal bleeding [34]. From the above results, we believe that the XGBoost algorithm developed a reliable prediction model using the variables that have clinical significance in this study.

We may develop a clinical decision support system which has reasonable clinical performance to help physicians identify high-risk patients with hyperkalemia. The system would alarm the CKD team that patients are in danger of hyperkalemia so that they can prescribe medications to prevent hyperkalemia and inform patients of going back to the clinic for follow-up earlier under the care of multidisciplinary teams. Nevertheless, there are some limitations to the present study. First, this is a single-center study and it may not be able to apply to other hospitals directly (absent external validation). Second, this dataset did not include vital signs, blood gas data, oral sodium bicarbonate, body weights, other nutritional parameters, lifestyles, and physical statuses, all of which may affect the potassium level. Third, the data of the pre-ESRD program in Taiwan were collected every 3 months. We are not able to retrieve the data if patients have data between 2 clinic visits within 3 months. Finally, there were missing values in this dataset; thus, a prospective study in which complete data can be collected is recommended to verify our findings.

In conclusion, the XGBoost model had a better predictive performance for hyperkalemia than physicians in an outpatient clinic. The results indicate that this model may be a decision-making tool to help physicians take better care of patients. Further prospective studies are needed to validate our findings.