Background

The global impact of stroke is substantial, ranking second in mortality and third in disability, with an estimated annual cost exceeding US$891 billion worldwide [1, 2]. Notably, ischemic strokes constituted over 60% of all stroke events [3]. Renal impairment is a critical adverse complication in AIS patients, often induced by factors such as mechanical thrombectomy, which increases the risk of mortality [4,5,6]. Existing research has primarily focused on AKI and CKD, with a scarcity of reports addressing the renal function trajectory during the 7–90 days following kidney injury [7, 8].

AKI and CKD do not represent distinct clinical syndromes but rather frequently present as a disease continuum [9]. No consensus exists for defining criteria to evaluate kidney recovery after AKI [10]. The 2012 Kidney Disease Improving Global Outcomes (KDIGO) guideline first introduced the term ‘Acute Kidney Diseases and Disorders’, defining it as abnormalities in kidney function and/or structure lasting less than 3 months, which includes AKI [11]. The 2017 Acute Disease Quality Initiative (ADQI) workgroup defines acute kidney disease (AKD) as acute or subacute damage and/or loss of kidney function persisting for 7 to 90 days following an AKI-triggering event [12]. Although the diagnostic criteria for AKD differ between the two guidelines, both stress the importance of considering AKD as a condition of equal significance to AKI.

Artificial intelligence (AI) is at the forefront of digital medicine [13]. Machine learning (ML), a fundamental branch of AI, excels in deciphering complex nonlinear associations among multidimensional features [14]. It has been extensively applied in the realm of healthcare, spanning areas such as medical diagnostics and the prediction of disease risks [15, 16]. Numerous studies employ ML models to predict mortality risk in patients with conditions such as heart failure, surgical interventions, and sepsis [17,18,19]. These studies predominantly utilize decision tree-based algorithms, which handle nonlinear features more effectively and mitigate overfitting compared to traditional regression models. In addition, ML significantly enhances outcome interpretability by elucidating influential variables, complex internal operations, and learned decision-making paths. SHapley Additive exPlanations (SHAP), a prominent interpretive method, quantify the marginal contribution of each feature upon integration into a ‘black-box’ model, providing explanations at both global and local levels [20, 21]. Its strength lies in precisely measuring the impact’s degree and direction that each feature exerts on the model’s output. In assessing mortality risk for AIS patients, research primarily focuses on those in intensive care unit (ICU) [22, 23], which creates a gap in prognostic evaluations for non-ICU AIS patients. Studies involving non-ICU AIS patients face challenges related to imbalanced data distribution, with a mortality rate of less than 5%, and this imbalance remains unaddressed [24]. Importantly, there is a dearth of research dedicated to predicting the impact of AKD on the mortality of AIS patients.

Hence, this study aimed to achieve the following objectives: (1) evaluate the incidence of AKI, AKD, and mortality among AIS patients; (2) assess mortality risk using various ML algorithms and identify the most optimal model; (3) utilize SHAP analysis to elucidate the contributions of individual features to the outcome and unveil the underlying decision-making process; (4) compare the predictive capabilities of using AKD independently or in combination with AKI for predicting mortality; (5) develop a user-friendly online prediction tool for estimating the probability of mortality in AIS patients.

Materials and methods

Study design

This retrospective cohort study involved 1633 patients diagnosed with AIS between January 2020 and June 2021. All patients were randomly assigned to a test set comprising 15% of samples not seen during model development; this set was used to assess the final model’s performance. An 85% sample subset was designated as the training set for model building. During the training phase, we employed a grid search with tenfold cross-validation to fine-tune model hyperparameters and prevent overfitting [25].

Patients diagnosed with AIS were included according to the International Classification of Diseases version 10 (ICD-10). Individuals meeting any of the following criteria were excluded: (1) age < 18 years; (2) hospitalization duration < 24 h; (3) hospital-acquired or traumatic brain injury with concurrent stroke, or comorbid intracranial tumor, transient ischemic attack, or other intracranial disorders; (4) concurrent Stage 5 CKD, undergoing renal replacement therapy, or having undergone kidney transplant; and (5) patients with incomplete data recording.

Data collection

Clinical information was extracted using natural language processing and parsing methods applied to structured data within the electronic health record. Data pertaining to demographic characteristics, medical history, and comorbidities were collected upon admission. Medication records were compiled during hospitalization, with particular attention to instances where these medications were administered before the onset of kidney injury. Comprehensive blood counts, coagulation markers, blood chemistry analyses, and urine tests were conducted within 1 week of admission. Initially, we included 104 readily available features based on expert clinical opinions and literature reviews. Following the removal of features with a missing proportion greater than 15%, we retained 86 features for building the prediction models.

Outcome definitions

The study investigated AKI and AKD as short-term outcomes, and mortality as a long-term outcome. AKI was defined in accordance with the 2012 KDIGO criteria, signifying either a rise in serum creatinine (Scr) greater than 0.3 mg/dL from baseline within 48 h or an increase to 1.5 times the baseline value within 7 days [11]. As stipulated by the 2017 ADQI guidelines, AKD was characterized by the acute or subacute impairment and/or loss of kidney function occurring within 7 to 90 days following an AKI event [12]. Based on the diagnostic criteria for AKI and AKD, patients exhibited three distinct renal function trajectories following kidney injury: (1) AKI recovery, indicating that Scr returned to baseline value within 7 days; (2) subacute AKD, denoting a slow increase in Scr levels lasting more than 7 days (AKD without AKI); and (3) AKD with AKI, representing the persistence of stage 1 or greater AKI for ≥ 7 days after an AKI initiating event (AKI progressing to AKD). The final classification encompassed four categories: (1) no kidney disease (NKD), (2) AKI recovery, (3) subacute AKD, and (4) AKD with AKI. Mortality was defined by the vital status for survival or death at the last follow-up. Clinical features, incorporating renal function trajectories, were incorporated to develop a risk prediction model, with mortality as the binary endpoint, to evaluate mortality risk in AIS patients.

The baseline Scr level was defined as the initial Scr measurement obtained upon hospital admission. The timing of AKI and AKD diagnosis was determined when patients initially met the respective diagnostic criteria. Each patient underwent a minimum of three Scr tests, which included two tests during their hospitalization and one at their first follow-up appointment. If elevated Scr levels did not return to baseline, additional tests were performed weekly during hospitalization or at the subsequent follow-up. The estimated glomerular filtration rate (eGFR) was calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) creatinine formula [26].

Model development and interpretation

Data were trained on the following eight ML models: (1) light gradient boosting machine (LightGBM), (2) GBM, (3) random forest (RF), (4) K-nearest neighbors (KNN), (5) multi-layer perceptron (MLP), (6) naive Bayes (NB), (7) support vector machine (SVM), and (8) logistic regression (LR). LightGBM and GBM are gradient-based learning frameworks that employ decision trees and boosting. LightGBM, in comparison to GBM, shortens training times and reduces memory usage by partitioning data using histograms [27]. RF constructs individual decision trees using random subsets of the training data and combines their results through majority voting for classification [28]. KNN is a frequently used supervised learning algorithm that conducts classification or regression based on feature similarity among neighboring data points [29]. MLP relies on the stacking of multiple layers of neurons, employing layer-wise propagation and nonlinear activation functions to learn and represent intricate data relationships [30]. NB is rooted in Bayes’ theorem and performs classification by calculating the posterior probabilities of different categories under given feature conditions [31]. SVM is a supervised learning algorithm that makes predictions by identifying the optimal separating hyperplane [32]. LR is a linear model that predicts probabilities based on the logistic function [33]. All models using the same dataset and applying consistent imputation and scaling techniques.

SHAP was used to interpret the results of the top-performing model. Features with positive SHAP values enhance the output, with larger numerical values indicating more significant contributions [34]. SHAP summary plots offer visualizations of essential feature rankings and the overarching relationships and directions concerning features and outcomes. SHAP force and decision plots offer an intuitive visualization of how distinct features influence an individual prediction.

Data balancing

In our study, there exists an imbalance, as the mortality rate is approximately 5%. To address this imbalance, we utilized a weight rebalancing technique to adjust the weights of both the majority and minority classes [35]. Solely the training dataset underwent balancing. The test datasets remained unaltered to evaluate model performance using representative data. The scikit-learn Python library includes a built-in parameter called “class weight” or “weights” for LR, RF, LightGBM, SVM, and KNN. The model automatically assigns a weight to each class that is inversely proportional to its frequency. The balanced weight for each class is calculated using the equation: Class weight = total number of samples/(number of classes × class sample size). The class weight for mortality was 10.34, while the class weight for non-mortality was 0.53 when the “balanced” option was used. In the case of the NB classifier, we established a prior probability of 0.5 for each class to achieve group balance. In future work, we plan to adjust class weights in the MLP classifier by modifying the loss function’s weights.

AI-driven web application

A web-based calculator for predicting mortality among AIS patients was developed using the “Streamlit” application (https://share.streamlit.io/) to implement the optimal model [36]. To enhance the user-friendliness of the web calculator, this study introduced two panels: one for inputting model parameters and obtaining mortality probability, and another for providing a model introduction.

Statistical analysis

Features with missing values exceeding 15% were omitted from the dataset. Multiple imputation techniques were then applied to estimate the missing data. Utilizing LR to compute the required sample size with mortality as the outcome, we ascertained that a minimum of 801 patients is essential to achieve a statistical power of 90% for the detection of an effect size of 0.10 at a two-sided significance level (α) of 0.05. Normally distributed continuous features are reported as the median ± standard deviation (SD) and were compared using independent t test. For non-normally distributed features, we present them as the median (interquartile range) and utilized the Mann–Whitney U test for comparisons. Categorical features were characterized in terms of percentages and underwent comparison through the Pearson’s Chi-squared test. We evaluated the models’ predictive performance using a variety of metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, accuracy, F1 score, Brier score loss (BSL), Matthew’s correlation coefficient, and decision curve analysis (DCA). The AUROC and F1 score were utilized to identify the optimal model. A significance level of less than 0.05 (two-tailed) was utilized. Our analysis was conducted using the Python programming language (Python Software Foundation, version 3.9.13) within the integrated development environment Visual Studio Code 1.81.1.

Results

Study cohort

A retrospective review of medical records was conducted for 1876 AIS patients from January 2020 to June 2021, with 1633 were eligible for further analysis (Fig. 1). Table 1 presents the baseline characteristics of the study population, and Table S1 stratifies the same cohort based on mortality. The incidence rates of AKI, AKD, and mortality were 14.57% (238/1633), 19.72% (322/1633), and 4.84% (79/1633), respectively. From the perspective of renal function trajectories, a total of 495 patients (30.31%) developed acute/subacute kidney dysfunction (meeting AKI and/or AKD criteria), comprising 257 patients (15.74%) with subacute AKD, 173 patients (10.59%) who experienced recovery from AKI, and 65 patients (3.98%) meeting both AKI and AKD criteria. Increased mortality rates were noted in elderly individuals (mean age: 73 vs. 68 years), those experiencing fever (15.19% vs. 8.04%), and patients with AKD coupled with AKI (31.65% vs. 13.92% in subacute AKD, 25.32% in AKI recovery, and 29.11% in NKD patients).

Fig. 1
figure 1

Architectural diagram of study

Table 1 Baseline characteristics of inpatients [mean ± SD; n (%)]

Model performance

A comprehensive set of 86 features served as predictors for mortality and were integrated into the ML models. Among all ML models, the LightGBM model displayed the best performance, with an AUROC of 0.96 and an F1 score of 0.47 (Fig. S1, Table S1, and Table S2). After data balancing, the model showed no significant difference in AUROC and accuracy, but it achieved a better balance between precision and recall (Table 2 and Table S3). When the model incorporated only the top 10 features, the AUROC remained high at 0.93, while maintaining a balance between precision and recall. Consequently, the LightGBM model was utilized in later stages for result interpretation and the development of an AI-driven web application. DCA revealed that the LightGBM model possessed high clinical utility (Fig. S2). Additional information concerning various performance metrics, such as accuracy, BSL, and Matthews correlation coefficient, is available in Table 2 and Table S2.

Table 2 Performance of LightGBM model for predicting mortality*

SHAP interpreter for the model

Figure 2A, B illustrates the SHAP summary plot of the LightGBM model. The top five features associated with mortality were ACEI/ARE, renal function trajectories (including AKI recovery, subacute AKD, and AKD with AKI), neutrophil count, diuretics use, and Scr. Substituting “AKD grade” for “renal function trajectories” in predicting the risk of mortality resulted in a decrease in the model’s AUROC to 0.92, which was lower than the predictive model constructed by combining AKI and AKD. Furthermore, the importance ranking of “AKD grade” falls outside the top 15 and is not a primary feature for predicting mortality (Fig. S3).

Fig. 2
figure 2

The SHAP summary plots for LightGBM models and force plots for two representative patients. A The ranking of feature importance within the mortality prediction model. Features with higher mean absolute SHAP values signify increased predictive influence. B Each dot represents the SHAP value of a specific feature for an individual, with red and blue indicating high and low feature values, respectively. On the x-axis, a positive or negative SHAP value signifies that the feature positively or negatively influenced the AKD prediction for the individual. C provides a personalized explanation for a case with a mortality probability below 10% and an actual outcome of survival. Features are ranked from the center to both ends based on the extent of their impact. The impact of a feature on the model’s output is directly proportional to the size of the arrow. The positive impact of a feature is depicted in red, elevating the prediction from the base value, while the negative effect is shown in blue, lowering the prediction. Certain features, such as Scr (107 μmol/L) and TBIL (13.6 μmol/L), exhibit a positive influence, while the absence of ACEI/ARB, diuretics, and antibiotics, as well as the absence of kidney disease, contribute negatively to predicting mortality. D provides a personalized explanation for a case with a mortality probability exceeding 90% and an actual outcome of mortality. The base value represents the averaged predicted results

The SHAP interaction plot visually elucidates the interplays among the top 15 features in mortality model (Fig. S4). SHAP dependence plots illustrate the impact of a single feature or the interaction between two features on mortality prediction (Fig. S5). The force plots (Fig. 2C, D) depict the prediction process for two representative patients. The cases shown in Fig. S6 illustrate patients with similar predicted probabilities, yet the constituent feature compositions leading to these predictions differ.

AI-driven web application

Employing LightGBM for mortality prediction, we have created an AI-driven web application within the Streamlit framework. In the test set (Table 2), compared to the LightGBM model built with all features, the model constructed with the top ten features showed no significant decrease in accuracy (0.89 vs. 0.91) and AUROC (0.93 vs. 0.96), with a slight increase in the F1 score (0.50 vs. 0.47). Therefore, this study utilizes the top ten features for constructing an online predictive model. When users visit the website, they input features data, which is then encoded and sent to the server for real-time mortality prediction. No private data are required besides feature information, and all input is promptly deleted after generating the prediction result. The calculator is accessible at https://strokemortalityapppy-gupkbhhnwkoghqnhvtul8b.streamlit.app/.

Discussion

To the best of our knowledge, this study is the first to develop and compare multiple ML models for predicting mortality in AIS patients using AKD data. Among 1633 AIS patients, the mortality rate was 4.84%, and 30.31% of patients developed acute/subacute kidney dysfunction. Of these, 65 (3.98%) met both AKI and AKD criteria, 257 (15.74%) developed subacute AKD, and 173 (10.59%) experienced recovery from AKI. LightGBM demonstrated the strongest predictive performance, achieving an AUROC of 0.96 for mortality prediction. The five most important features for assessing mortality risk are ACEI/ARE, renal function trajectories, neutrophil count, diuretic use, and Scr. Compared to using AKD alone, the combined use of AKI and AKD enhances the model’s predictive performance. We further employ various SHAP plots to interpret the “black box model” at both the global and local levels. Ultimately, an AI-driven web application based on the LightGBM model was created for inputting patient data to facilitate the clinicians’ assessment of mortality in AIS patients.

Huang et al. developed various ML algorithms, including eXtreme Gradient Boosting (XGBoost), to develop a mortality prediction model for severe stroke patients [37]. XGBoost outperforms traditional regression models, especially in handling imbalanced and high-dimensional data. Our study compared different ML models using AUROC and F1 scores, and LightGBM demonstrated superior predictive performance. In contrast to XGBoost, LightGBM effectively mitigates overfitting through gradient-based one-side sampling and exclusive feature bundling. In addition, it enhances computational speed and reduces memory usage by employing histogram techniques and a leaf-wise growth strategy [27].

The prediction of mortality risk in AIS patients primarily focuses on ICU patients [22, 23, 37]. Wang et al. developed a mortality prediction model for non-ICU AIS patients using various ML algorithms [24]. However, this study encountered data imbalance issues that remained unaddressed. Several investigations employing regression models have identified AKI and CKD as significant risk factors for mortality in AIS patients [38,39,40]. The impact of renal function trajectory between 7 and 90 days on mortality remains unclear. This study marks the first attempt to analyze the relationship between AKD and mortality in AIS patients. It underscores that comprehensive renal function trajectories encompassing both AKI and AKD are more vital and precise in predicting mortality risk compared to isolated AKD. This highlights the importance of monitoring the renal function trajectory from 7 to 90 days, even when AIS patients have subacute kidney dysfunction or experience rapid kidney function recovery within 7 days after AKI.

Our study utilized a variety of SHAP plots to address the challenge of the ‘black box’ in mortality risk assessment. Among these, the SHAP summary plot prioritized features based on their importance, identifying ACEI/ARB and renal function trajectories as the two most critical indicators for predicting mortality. SHAP dependence plots demonstrated that patients with acute or subacute kidney injury, particularly those with AKD and AKI, showed an increased risk of mortality associated with ACEI/ARB use. SHAP force plots and decision plots revealed variations in feature contributions for patients with similar predicted probabilities, effectively enhancing the personalization and transparency of the decision-making process.

Our study has some limitations to acknowledge. First, this study lacks specific stroke-related information that could influence mortality, such as the NIHSS score. Second, the follow-up period was too brief to ascertain whether patients developed CKD. Consequently, this study did not assess the influence of AKD on the emergence of new-onset CKD. Third, we have no data specifying the time interval between AIS onset and Scr measurement. However, patients with acute strokes are usually promptly admitted to the hospital, and blood samples are drawn shortly after their arrival. Consequently, the time lapse is unlikely to exceed a few hours. Forth, the AI-driven web application is crafted to assist clinicians in discerning AIS patients with elevated risk of mortality, rather than serving as a replacement for clinical diagnosis. Due to the retrospective nature of data collection, it is crucial to undertake additional validation using an independent population to ensure robust predictive validity across diverse usage scenarios. Fifth, our study is limited to a single center. To enhance the robustness of our findings and ensure their applicability across various scenarios, we will validate our results using an independent population.

Conclusions

In summary, AKD plays a crucial role in evaluating the mortality risk of AIS patients. Comprehensive renal function trajectories, encompassing both AKI and AKD, are of paramount importance for predicting mortality. The LightGBM model exhibited robust performance as a tool for mortality prediction in AIS patients. The utilization of this AI-driven web application has the potential to significantly reduce mortality rates and assist physicians in making informed treatment decisions.