Introduction

ST-segment elevation myocardial infarction (STEMI) is a severe type of acute myocardial infarction (AMI) with a poor prognosis and an association with high morbidity and mortality [1,2,3]. There are multiple risk factors for STEMI, including tobacco use, dyslipidemia, hypertension, diabetes mellitus (DM), and familial history of coronary artery disease (CAD) [4,5,6]. Particularly, STEMI patients with type 2 diabetes mellitus (T2DM) face an increased risk of cardiovascular complications, with a myocardial infarction rate 2–4 times higher than that in non-diabetic patients [7]. Although recent studies have shown that proactive management of T2DM significantly reduces cardiovascular complications and mortality, the overall prognosis for STEMI patients with T2DM remains poor [8,9,10,11].

Traditional models, such as the GRACE score based on clinical data and the TIMI score based on coronary angiography information, have been used to predict patient outcomes [12, 13]. However, the performance of these scores in prognosis prediction has certain limitations, and there is a risk of delayed scoring. Machine Learning (ML), in contrast, demonstrates superior predictive power in identifying interaction patterns between variables, especially in predicting in-hospital mortality and short-term outcomes for acute myocardial infarction patients, compared to conventional statistical methods [14, 15].

Currently, there is no ML model capable of detecting in-hospital mortality for STEMI patients with T2DM. Therefore, this study aims to develop an accurate and effective ML model to predict outcomes for STEMI patients with T2DM, offering better treatment options and reducing perioperative complications in these patients.

Methods

Study population

Between January 2016 and June 2020, patients from the affiliated hospital of Zunyi Medical University were recruited [5]. All patients met the diagnostic criteria for STEMI and underwent primary PCI according to the current guidelines [16]. Throughout this study, all procedures involving human participants were in accordance with the Declaration of Helsinki. The present study was approved by the Ethical Evaluation Committee of Zunyi Medical Hospital (ZMU〔2022〕1-177).

Definitions and data collection

Data about demographics, clinical outcomes, and procedural characteristics were collected using a standardized form. Baseline data on patient characteristics were collected from both medical records and standardized in-person interviews during the index admission for AMI. In this study, T2DM was defined as a chart diagnosis of diabetes or the use of glucose-lowering medication at AMI presentation [17]. Delay was defined as the upper limit of time from onset to the hospital first medical contact > 12 h. The cohort studies were reported in accordance with Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). The electronic health records were used to collect and analyze the complete case data. In our study, the assessment of predictive factors was performed without the knowledge of the participants’ outcomes. An independent observer recorded in-hospital deaths.

ML algorithm methods

We developed six ML algorithms to model our data: random forest (RF), CatBoost classifier (CatBoost), Naive Bayes (NB), Logistic Regression (LR), gradient boosting classifier (GBC), and extreme gradient boosting (XGBoost). Using a randomization process, we split our dataset into two groups: a training set (70%) to develop the ML models and a validation set (30%) to examine model performance. Using 10-fold cross-validation to ensure the robustness of validation set results.

Missing data

Complete case data were collected from the electronic health records (EHRs) and analyzed. To simplify the review and ensure accuracy, variables with more than 20% of observations missing were also removed.

Statistical analysis

Continuous variables are presented as median (IQR), and categorical variables are presented as n (%). During training, the ML-based models were tuned to avoid overfitting, and the models were internally validated using all data via 10-fold cross-validation. The following indicators were used to define model performance: area under the curve (AUC), recall, precision, and F1 value. In the ROC analysis of the entire dataset, a 95% confidence interval was used to assess statistical significance and compare models. All statistical analyses were conducted using Python (version 3.7) and R (version 4.0.2).

Results

A total of 438 patients with STEMI registered in the database between January 2016 and January 2020 were included (Fig. 1). The median patient age was 62 (52–71) years, 71% were male, and 42(9.5%)patients died in the hospital. All patients underwent emergency PCI. A comparison of the demographic data and baseline characteristics between patients is shown in Table 1. Six ML models (LR, RF, CatBoost, XGBoost, GBC, and NB) were developed to predict in-hospital mortality rates based on all available features. A comparison of the predictive performances of the six ML algorithm models in the validation set is presented in Table 2. CatBoost (AUC = 0.92 [95%CI:0.909–0.922]),XGBoost(AUC = 0.88[95%CI:0.875–0.891]),RF(AUC = 0.89[95%CI:0.883–0.904]),GBC(AUC = 0.91[95%CI:0.903–0.916]),NB(AUC = 0.87[95%CI:0.859–0.878]),and LR(AUC = 0.84[95%CI:0.833–0.855]) provided similar accuracy values in our study. The GRACE risk assessment tool (AUC = 0.83[95%CI:0.789–0.862]) also demonstrated good discriminatory ability. We investigated the possibility of ensembling and combining different models to further optimize the models, including bagging, boosting, and stacker methods, and found that the performance of the ML models did not improve significantly.

Fig. 1
figure 1

Flow diagram outlining the study process

Fig. 2
figure 2

The relative importance of variants in machine learning algorithms

Table 1 Demographics and clinical characteristics of patients with and without mortality in the cohort
Table 2 Comparison of validation set results of the machine learning models

Figure 2 illustrates the relative importance of the variables in each ML algorithm. Excluding the logistics model, the top 10 most important intersection variables of the other five excellent ML models are shown in Fig. 3. We can see a general trend in the evidence: despite the slight difference in the importance of variables in these ML algorithms, factors including out-of-hospital cardiac arrest (OHCA), GRACE, and blood urea nitrogen (BUN) ranked in the top five (Fig. 3). The neutrophil-to-lymphocyte ratio (NLR), B-type natriuretic peptide (BNP), and cystatin C were ranked in the top ten. Conversely, the cardiac troponin T and creatinine variables contributed little to the prediction. The significance of the high-ranking variables in the CatBoost model decreased in the following order: GRACE, OHCA, myoglobin, BNP, and NLR. There are challenges involved in saving patients who were hypoperfused when they were admitted due to STEMI. The CatBoost model demonstrated the highest performance of the predictive models, with an AUC of 0.92, precision of 0.79, and accuracy of 0.93. Therefore, we selected the CatBoost model as the final predictive model for application to the validation set (Fig. 4).

Fig. 3
figure 3

Cross-verification in the top 10 variables of the machine learning models

Fig. 4
figure 4

Visualization of CatBoost model performance in the validation set. (A) AUCROC curve. (B) Precision-recall curve. (C) Calibration curve. (D) Classification report. (E) Decision boundary. (F) Feature SHAP value

Discussion

This study aimed to develop an accurate and user-friendly prediction model for STEMI patients with T2DM to enhance therapeutic decision-making and reduce periprocedural complications. Our approach leveraged machine learning (ML) algorithms, known for their exceptional performance over traditional regression methods in large-scale outcome prediction [14, 15, 18, 19]. We focused on integrating ML techniques with preoperative clinical data for this patient group.

We addressed the challenge of potential data lag in the risk model, which incorporates preoperative, intraoperative, and postoperative clinical data, as this can delay the prediction of adverse events. While logistic regression is often chosen in traditional binary outcome prediction models for its strong predictive power and model interpretability, in our study, CatBoost showed superior performance. Among the five ML models, the GRACE variable and OHCA were the strongest predictors of almost all the analysis methods. Coronary artery disease, particularly acute coronary syndrome, is the leading cause of OHCA in Asians. The survival rate for Asians with OHCA is 3.0% [20]. The treatment of post-resuscitation is dependent on emergency coronary angiography and PCI. Therefore, this study again validated the effectiveness of the GRACE score in a cohort of patients with T2DM and STEMI while finding that ML models had better model discrimination and accuracy. Moreover, it is worth mentioning that, in the case of medical sample imbalance, some studies propose that the model integration technology can significantly improve model performance and capability [14, 21].

ML in clinical medicine faces challenges like data quality and processing complexity, with ML model features often being subtle and difficult for clinicians to interpret. Despite these challenges, ML has achieved significant breakthroughs in biomedicine. For example, ML algorithms have been utilized to analyze complex, high-dimensional data in 12-lead ECGs, predicting artery occlusion in myocardial infarction cases [22]. Liang et al.‘s study effectively used ML to predict heart failure risk during hospitalization for patients with acute anterior wall ST-segment elevation myocardial infarction, employing parameters such as VF, CAP, age, LVEF, and NT-pro-BNP peak levels. This approach enabled the identification of high-risk patients, guiding personalized and proactive management strategies [23]. Further, research by Tofighi et al. demonstrated the efficacy of ML in identifying high-risk STEMI patients for adverse events during follow-up, aiding in crafting individualized treatment plans to improve outcomes and lessen disease burden [24]. Avvisato’s study illustrated ML’s powerful capacity to dissect individual variations in the whole brain functional connectome, aiding in the diagnosis of neurological diseases and predicting clinical outcomes [25]. Those predictive models can be seamlessly integrated into the emergency visit process to aid physicians in determining the prognosis of patients’ conditions. By prioritizing higher-risk patients for more immediate and comprehensive care and monitoring or scheduling follow-ups for lower-risk patients, this approach not only optimizes emergency physicians’ workflow but also ensures that patients receive care tailored to their specific risk profiles. Therefore, understanding the synthetic significance of clinical data samples and the models and methods to accurately diagnose diseases or predict the occurrence of adverse events will remain a persistent focus in medical integration.

Limitations

This study also has some limitations. Firstly, being a single-center study, the generalizability of our results might be limited. Additionally, its retrospective design raises the possibility of selection bias, which could affect the interpretation of results and the applicability of prediction models. Secondly, like other similar studies, ours relied on clinical data elements such as medical history, physical examination, and lab findings as input features. This reliance potentially restricts the model’s applicability in real-world settings and results in some models lacking interpretability. Thirdly, akin to most studies [26], ours used a single dataset for training and testing the model, but the sample size was insufficient for precise testing and training, mainly due to a lack of adequate positive samples. Even though synthetic sampling techniques were employed, the improvement was not significant, indicating a need for studies with larger sample sizes in this field. Fourthly, the absence of external validation in our study could lead to overfitting. Fifth, While the GRACE score’s inclusion might enhance the model’s predictive accuracy, it can also complicate the interpretation, as the score itself is a composite measure. This layer of abstraction might obscure the individual impact of the clinical variables constituting the GRACE score. Relying heavily on an existing score can affect the robustness of the model, potentially making it more sensitive to any biases or limitations inherent in the GRACE score. Lastly, the indicators included in the analysis were not comprehensive, missing crucial variables such as a timeline of medical visits, medication usage, etc., all of which could influence the outcomes. Therefore, future research should employ larger sample sizes and more scientific methodologies.

Conclusion

In this study, various ML methods were used to establish in-hospital death prediction models for AMI complicated by diabetes. Compared with the traditional model and GRACE score, the predictive discrimination of the ML model based on the CatBoost algorithm was more accurate. Although this ML algorithm presently has a good prediction ability, it still requires further scientific and reasonable external verification.