Introduction

Acute coronary syndromes (ACS) are the unstable and progressive stages of coronary heart disease, including ST-segment elevation myocardial infarction (STEMI), non-ST-elevation myocardial infarction (NSTEMI), and unstable angina (UA) [1]. Although advances in early reperfusion therapy and adjuvant drug therapy have improved the prognosis of ACS patients, ACS remains the leading cause of death worldwide [2]. More than 5% of patients with ACS die in-hospital [3], even up to 26.7% in some subgroups [4], and up to 26.5% in long-term follow-up [5]. Appropriate management can significantly improve the prognosis of patients with ACS; thus, timely and accurate identification of mortality risk and early and appropriate risk stratification are essential.

Traditional risk stratification of ACS patients is based on risk scoring systems, of which the Global Registry of Acute Coronary Events (GRACE) risk score and Thrombolysis in Myocardial Infarction (TIMI) risk score are the most widely used mortality risk prediction tools [6, 7]. Although these risk scores have been validated and are generally accepted, they have limitations in current clinical practice. First, these risk scores were developed based on data from earlier randomized controlled trials. During that period, contemporary therapies for acute myocardial infarction (AMI) were not widely available, and drug eluting stents and newer generation antiplatelets were not introduced. Therefore, the predictive effect of these risk scores in current practice is questionable [8]. Second, these risk scores use only selective variables based on traditional statistical methods, inevitably limiting the number of predictors and thus the possibility of missing important information [9]. In addition, traditional risk scores focus on predicting short-term mortality, such as in-hospital, 14-day, and 30-day mortality, and less on long-term mortality risk [10]. Therefore, widespread interest has been in exploring more accurate and comprehensive mortality risk prediction models.

Machine learning (ML) is a subdiscipline of artificial intelligence that uses algorithms to identify patterns in large data sets with multiple variables that can be continuously improved with additional data, resulting in pattern algorithms that can predict various outcomes [11]. It constructs models based on test inputs and correlates all or some predictor variables with the results to make data-driven predictions or decisions [12]. In recent years, ML has been increasingly used in the medical field, especially in the cardiovascular field, as the availability of medical data continues to increase and computer analysis capabilities continue to improve. Emerging research indicates that the introduction of ML models as a clinical tool to accurately predict the risk of death in ACS patients has the great potential [13]. However, the performance of different models tends to vary, and it is unclear whether ML models have robust performance in predicting the risk of death in ACS. Therefore, we performed this systematic evaluation and meta-analysis.

Methods

This meta-analysis was reported according to the criteria of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) [14]. Before the study initiation, the protocol was registered and approved in the PROSPERO International Prospective Systematic Evaluation Register (CRD42022322721).

Retrieval strategy

We extensively searched PubMed, Embase, Web of Science, and Cochrane Library databases. Searches were conducted from database creation to March 12, 2022, with no language restrictions, to explore machine learning for predicting the risk of death in patients with ACS. We searched using a combination of MeSH (Medical Subject Headings) terms and free-text terms. We used three sets of search terms, each of which had at least one word to match:

  1. 1.

    “acute coronary syndromes” “myocardial infarction” “ST-segment elevation myocardial infarction” “non-ST-segment elevation myocardial infarction” and “unstable angina pectoris”;

  2. 2.

    “machine learning” “deep learning” “migration learning” “random forest” “artificial neural network” “support vector machine” “nomogram” “XGboost” “decision trees” and “predictive models”;

  3. 3.

    “death” “mortality” and survival”. (Additional file 1: Table S1 for specific retrieval strategy).

Inclusion and exclusion criteria

The inclusion criteria were as follows:

  1. 1.

    Participants with a precise diagnosis of ACS, including STEMI, NSTEMI, and UA.

  2. 2.

    ML models and predictor variables were clearly described.

  3. 3.

    ML algorithms and predicted outcomes are provided. These outcome metrics focus on C-index, area under the curve (AUC), sensitivity, specificity, accuracy, confusion matrix, prediction model type, and critical predictors.

The exclusion criteria were as follows:

  1. 1.

    Conference abstracts, letters, editorials, systematic reviews or meta-analyses, consensus statements, and guidelines.

  2. 2.

    Studies where the time of death is unknown.

  3. 3.

    The main interest is in assessing risk factors without constructing a mortality risk prediction model.

  4. 4.

    Studies for which the full text is not available.

Data extraction

Two reviewers (X–XZ and XW) constructed standardized forms based on the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [15]. They independently extracted data from original research reports. Extracted data included study characteristics (first author, year of publication, study type, sample source, number of participants and number of events, time to death or follow-up), ML characteristics (external validation method, variable selection method, model type, predictors included in the final model and their number), reported outcomes (C-index, AUC, sensitivity, specificity, and accuracy) and methodological information. In addition, if an article describes more than one model, we extract the data for each model separately. In articles examining the performance of the same predictive model across outcomes or multiple timepoints, we retained the predictive model for the mentioned outcome or timepoint as the primary analysis for the study.

Quality assessment

We assessed the risk of bias in reported prediction models using PROBAST, risk of a bias assessment tool designed to systematically evaluate diagnostic or prognostic prediction models [16]. It contains 20 questions reflecting the quality bias of the original study from four domains: participants, predictors, outcomes, and statistical analysis. Each domain has several questions with responses of “yes” “probably yes” “no” “may or may not” or “no information” questions. A domain was considered low risk if it contained all questions with “yes” or “probably yes” answers, and the model was rated as low risk when all domains were considered low risk. A domain is regarded as high risk if it contains at least one question marked “no” or “probably no”. The model is considered high risk when at least one domain is regarded as high risk. A domain is considered an unclear risk if it contains at least one question indicated as uninformative. The overall risk of bias is considered unclear when at least one domain is considered unclear, and the other domains are at low risk of bias. Two investigators (X–XZ and LX–X) completed the assessment independently; discrepancies were discussed with a third party and resolved by consensus.

Outcomes indicators

Our outcome indicators for this systematic evaluation are the C-index and accuracy of the mortality risk model. We can obtain the accuracy directly from the confusion matrix and indirectly from the sensitivity and specificity combined with the number of deaths and the total sample size.

Data analysis

We performed a random-effects model-based meta-analysis to pool the C-index and accuracy of all included studies. We used the 95% confidence interval (CI) as the effect measure. For original studies that lacked confidence intervals or standard errors for the C-index, we estimated them according to the formula proposed by the relevant study [17]. The formula follows: c is the reported C-statistic, n is the number of observed events, m is the total number of non-events, and m ∗  = n ∗  = ½(m + n)-1. We used R4.2.0 (R Development Core Team, Vienna, http://www.R-project.org) in Metafor, Matrix, and Meta packages for all statistical analyses.

$$\mathbf{S}\mathbf{E}(\mathbf{c})\approx \sqrt{\frac{{\varvec{c}}\left(1-{\varvec{c}}\right)[1+\frac{{{\varvec{n}}}^{\mathbf{*}}\left(1-{\varvec{c}}\right)}{2-{\varvec{c}}}+\frac{{{\varvec{m}}}^{\mathbf{*}}{\varvec{c}}}{1+{\varvec{c}}}]}{{\varvec{m}}{\varvec{n}}}}$$

Results

Literature screening results

The search identified a total of 28,084 articles, of which 6,424 were duplicates. Subsequently, 20,588 irrelevant articles were excluded after screening titles and abstracts, leaving 1072 articles for full-text screening. The final 50 eligible articles [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67] evaluated the predictive effectiveness of the ML algorithm-based prediction model for predicting the risk of death in patients with ACS. Figure 1 shows the screening process.

Fig. 1
figure 1

Flow chart of literature search and selection

Eligible studies and the characteristics

All included studies were published as research articles in peer-reviewed medical journals from 2006 to 2022. We noted a broadly upward trend in publication, with 32 studies (64%) published since 2019, 18 studies (36%) published from 2006 to 2018, and no studies published before 2006. The 50 included studies involved 1,592,034 participants, and we calculated mortality using a single-arm meta-analysis: in-hospital mortality, 30-day mortality, 3- or 6-month mortality, and 1 year or more mortality were approximately 5.2% (95%CI 3.6–7.5), 6.5% (95%CI 4.9–8.5), 6.3% (95%CI 2.4–15.3), 8.3% (95%CI5.9–11.5), respectively. (Additional file 1: Fig. S1 for specific outcomes).These articles described a total of 216 ML models for predicting the risk of death in ACS, 119 of which were externally validated, involving logistic regression (LR), random forest (RF), artificial neural network (ANN), decision tree (DT), support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), naive Bayesian (NB), adaptive boosting algorithm (AdaBoost), Bayesian Net classifier (BN), k-nearest neighbors algorithm (KNN), linear discriminant analysis (LDA), and more than 12 common prediction models. LR (n = 74) is the most commonly used modeling method, followed by RF (n = 35), which may be due to LR’s excellent performance in visualizing its model scores using nomograms. 21 studies had samples from public databases, including the Medical Information Mart for Intensive Care III database (MIMIC-III database) and multiple national acute myocardial infarction registry databases, 8 studies had samples from clinical trials, and 21 studies had samples from electronic health records. The key characteristics of the included studies are presented in Additional file 1: Table S2.

Quality assessment of the selected studies

We assessed the risk of bias for all developed or externally validated models according to PROBAST. Overall, ML models had a high risk of bias, mainly due to a large number of studies with subjects derived from retrospective case–control studies or without estimation of sample size according to the Events per variable (EPV) principle, thus not ensuring adequate sample size. In addition, 59 models were not externally validated. Moreover, the risk models covered mostly chose univariate + stepwise regression, LASSO regression, and the importance ranking of the models’ features (e.g., RF, SVM) in terms of feature selection. We summarized the risk of bias of the models by the four domains of PROBAST (Fig. 2).

Fig. 2
figure 2

Risk of bias assessment (using PROBAST) based on four domains

A meta-analysis of prediction model C-index and actual prediction accuracy

C-index for in-hospital mortality

The training cohort of 19 studies [18, 19, 22, 23, 26, 27, 29, 32, 34, 38, 40, 44,45,46, 54, 59,60,61, 64] developed 69 ML models predicting in-hospital mortality with a combined C-index of 0.8491 (95% CI 0.8337–0.8649). The validation cohort of 15 studies [18, 19, 23, 26,27,28, 32, 38, 45, 46, 54, 59,60,61, 64] reported 53 ML models predicting in-hospital mortality with a combined C-index of 0.8633 (95% CI 0.8467–0.8802) (Table 1) (Additional file 1: Fig. S2A).

Table 1 C-index for in-hospital mortality

C-index for 30-day mortality

The training cohort of 10 studies [20, 30, 41, 49,50,51,52, 64, 65] developed 30 ML models predicting 30-day mortality with a combined C-index of 0.8208 (95% CI 0.7940–0.8485). The validation cohort of 5 studies [30, 35, 50, 52, 64] reported 17 ML models predicting 30-day mortality with a combined C-index of 0.8296 (95% CI 0.8134–0.8462) (Table 2) (Additional file 1: Fig. S2B).

Table 2 C-index for 30-day mortality

C-index for 3- or 6-month mortality

Twenty ML models predicting 3- or 6-month mortality were developed in the training cohort of 6 studies [23, 24, 26, 49, 56, 66] with a combined C-index of 0.8227 (95% CI 0.8001–0.8460). The validation cohort of 5 studies [23, 24, 39, 56, 66] reported 8 ML models predicting 3- or 6-month mortality with a combined C-index of 0.8205 (95% CI 0.7881–0.8541) (Table 3) (Additional file 1: Fig. S2C).

Table 3 C-index for 3- or 6-month mortality

C-index for 1 year or more mortality

The training cohort of 19 studies [23, 25,26,27, 31, 33, 36, 37, 42, 43, 47, 48, 52, 55, 57, 62,63,64, 67] developed 58 ML models predicting 1 year or more mortality with a combined C-index of 0.8352 (95% CI 0.8214–0.8493). The validation cohort of 11 studies [21, 23, 25, 26, 33, 42, 47, 48, 52, 57, 64] reported 41 ML models predicting 1 year or more mortality with a combined C-index of 0.8197 (95% CI 0.8042–0.8354) (Table 4) (Additional file 1: Fig. S2D).

Table 4 C-index for 1 year or more mortality

Accuracy for in-hospital mortality

The accuracy of 42 ML models to predict in-hospital mortality was reported in the training cohort of 15 studies [18, 19, 22, 23, 25, 27, 34, 38, 40, 44, 53, 54, 59,60,61] with a combined accuracy of 0.8434 (95% CI 0.8166–0.8669). The validation cohort of 13 studies [18, 19, 26,27,28, 38, 46, 53, 54, 59,60,61, 64] reported the accuracy of 45 ML models to predict in-hospital mortality with a combined accuracy of 0.8569 (95% CI 0.8411–0.8715) (Table 5) (Additional file 1: Fig. S2E).

Table 5 Accuracy for in-hospital mortality

Accuracy for 30-day mortality

The training cohort of 4 studies [20, 51, 52, 65] reported the accuracy of 17 ML models to predict 30-day mortality with a combined accuracy of 0.8257 (95% CI 0.7694–0.8707). The validation cohort of 2 studies [30, 64] reported the accuracy of 11 ML models to predict 30-day mortality with a combined accuracy of 0.8282 (95% CI 0.7922–0.8591) (Table 6) (Additional file 1: Fig. S2F).

Table 6 Accuracy for 30-day mortality

Accuracy for 3- or 6-month mortality

The training cohort of 3 studies [23, 24, 26] reported the accuracy of 15 ML models to predict 3- or 6-month mortality, with a combined accuracy of 0.7089 (95%CI 0.6737–0.7418). The validation cohort of 1 study [24] reported the accuracy of 2 ML models to predict 3- or 6-month mortality, with a combined accuracy of 0.7303 (95%CI 0.7184–0.7418) (Table 7) (Additional file 1: Fig. S2G).

Table 7 Accuracy for 3- or 6-month mortality

Accuracy for 1 year or more mortality

The training cohort of 12 studies [23, 26, 31, 33, 36, 37, 43, 48, 52, 55, 62, 63] reported the accuracy of 31 ML models predicting 1 year or more mortality with a combined accuracy of 0.7697 (95% CI 0.7360–0.8002). The validation cohort of 7 studies [21, 26, 33, 47, 48, 57, 64] reported the accuracy of 33 ML models predicting 1 year or more mortality with a combined accuracy of 0.7837 (95%CI 0.7455–0.8175) (Table 8) (Additional file 1: Fig. S2H).

Table 8 Accuracy for 1 year or more mortality

Predictive variables for risk of death in ACS

To clarify the variables with the greatest predictive power in ML models, the data were further examined and counted, and 27 ML prediction models (10.96%) were found not to specify the predictors used by the model. Age was the most widely used predictor for short-term and long-term mortality prediction. Sex, systolic blood pressure, heart rate, serum creatinine, Killip classification, diastolic blood pressure, glucose, and hemoglobin were important predictors, all ranking in the top 15 predictor variables. We listed the most common predictors of ML models used to predict different times of death in ACS patients (Fig. 3, top 15).

Fig. 3
figure 3

A variables of in-hospital mortality; B variables of 30-day mortality; C variables of 3- or 6-month mortality; D variables of 1 year or more mortality

Discussion

Our systematic review included 50 original studies and reported 216 ML mortality risk prediction models constructed based on large samples. We found that (1) ML models predicting death in ACS patients at different times showed a relatively excellent prediction with a good composite C-index and accuracy; (2) the specific type of model and the variables included in the model severely affected the prediction of the model, with specific ML models predicting death in ACS patients showing excellent performance; (3) age, sex, systolic blood pressure, serum creatinine, Killip classification, heart rate, diastolic blood pressure, blood glucose, and hemoglobin were commonly used predictors, with age being the most commonly used and important predictor variable described.

ACS is a leading cause of death worldwide, and rapid identification of high-risk patients remains unmet clinical needs. For this purpose, several risk scores have been developed, among which the GRACE risk score is one of the best validated predictive tools. However, this score was developed based on traditional statistical methods that were useful and robust [13] but had inherent limitations. This limitation limits their ability to handle large data sets with multiple variables and samples [68]. The chosen predictor variables are run the same way for each individual and uniformly across the range [13]. There are non-linear relationships and complex interactions between ACS risk factors, and large population-level studies can provide critical insights into ACS risk factors [69]. Therefore, the inherent characteristics of traditional statistical methods may lead to low model predictive power. ML is an area of artificial intelligence that is part of a broader approach to data analysis [68]. Unlike traditional predictive models that use selected variables for computation, ML algorithms can easily combine many variables while capturing complex relationships between variables [69, 70]; and search for the best fit randomly or deterministically according to different algorithms 13 to construct robust prediction models. Most importantly, ML algorithms can better describe the complexity and unpredictability of human physiology in many cases [70]. These advantages make ML technology suitable for the medical field, especially for outcome prediction. Recent studies have shown that ML algorithms outperform traditional statistical modeling methods [13]. Our present systematic review supports the above view, showing that the ML algorithm-based prediction model has a more desirable integrated C-index and accuracy.

To visualize the contribution of each characteristic to the risk of occurrence of death in ACS, we calculated the importance of each characteristic. We identified nine variables commonly used to predict short- and long-term mortality in patients with ACS, including five variables from the GRACE score (age, systolic blood pressure, serum creatinine, Killip classification, heart rate) and four new variables (sex, diastolic blood pressure, blood glucose, and hemoglobin). These parameters describe non-modifiable risk factors and different pathophysiological contexts, such as hemodynamics, cardiovascular metabolism, and cardiomyocyte injury. Existing theories and studies also support these risk factors. Age was a well-established predictor and the most commonly used predictor variable found in our current study. Age-related pathological changes in the vascular system play a key role in morbidity and mortality in the elderly. As we age, changes in cardiovascular structure, function, and phenotype accelerate the progression of coronary artery disease, so older patients have more severe lesions and a higher risk of death [71].

There are gender differences in the outcomes of ACS patients. Studies have consistently shown that women with ACS have poorer short- and long-term outcomes than men, with a disproportionate risk of death [72, 73]. Female patients with ACS have different risk factor profiles and clinical presentations than male patients. In general, female ACS patients have a higher prevalence of cardiovascular risk factors, such as diabetes, hypertension, and psychosocial risk factors, such as depression [74]; more atypical symptoms, such as neck pain, fatigue, and dyspnea [75]; longer duration of ischemia due to pre-hospital delays, evidence-based diagnosis and inadequate treatment [76]; as well as a high rate of complications during hemodynamic reconstruction [77]. All of the above contribute to the high risk of death in women with ACS.

Independent of pre-existing metabolic dysregulation in diabetes, hyperglycemia at admission to ACS is associated with poor outcomes, regardless of diabetes status [78]. The mechanisms underlying the association between hyperglycemia and increased mortality in ACS patients are multifactorial. Increased local and systemic inflammatory responses [79], altered platelet function and thrombo-fibrinolytic system [80], increased oxidative stress [81], endothelial dysfunction [82], arrhythmic tendencies [83], and impaired myocardial contractility [84], all ultimately lead to increased atherosclerotic burden and plaque instability, and an increased risk of death.

We found hemoglobin to be a significant predictor of death in ACS. Recent studies have consistently shown that anemia on admission is always associated with poorer outcomes in ACS, as evidenced by increased mortality at different observed timepoints [85, 86]. An imbalance in myocardial oxygen supply and demand is necessary to develop ACS. Low hemoglobin levels worsen the myocardial ischemic injury by reducing the oxygen supply to the damaged myocardium. In contrast, increased myocardial oxygen demand exacerbates this imbalance due to the need for higher cardiac output to maintain adequate systemic oxygen supply [87]. Other plausible explanations are the reduced number of functionally impaired peripheral endothelial progenitor cells and impaired vascular healing capacity in ACS patients with low hemoglobin levels [88]. In addition, studies have shown that the inflammatory factor C-reactive protein(CRP) is negatively correlated with hemoglobin levels in patients with ACS, which may further increase the risk of death [89].

The present study found that LR was currently the most widely used modeling method in ACS mortality risk prediction models. It performed better in mortality risk models across time and even sometimes better than others, indicating the importance of valid predictors. Therefore, the development or updating of prediction models should be inclined to incorporate valid, easily collected, minimally invasive predictors.

Limitations and strengths

Although the current results indicate that the predictive ability of ML models appears satisfactory, there are some methodological flaws or limitations to the inclusion of the original study. First, the ML models with different times of death were combined separately in the current study, and their discriminatory performance was assessed according to C-index and accuracy. Still, most risk models were not constructed with overfitting in mind. Second, a large amount of risk model data was derived from retrospective case–control studies and primarily generated training and validation cohorts at a certain ratio (e.g., 7/3) without external validation using an utterly new validation cohort. In addition, the modeling methods of most studied ML models are not clearly described, such as insufficient disclosure of information on hyperparameter tuning and external validation of ML algorithms. The development of predictive models helps in clinical decision-making and resource allocation. However, the risk of bias, reproducibility, and potential usefulness of predictive models can only be fully assessed if the modeling steps of predictive models are adequately and clearly reported [90]. The transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement presents a list of 22 items, thereby increasing the transparency of predictive modeling studies91. We strongly recommend detailed and standardized reporting of predictive models according to the TRIPOD statement, which not only helps to improve the quality of ML models but also helps to assess their reliability and increase their credibility.

Although we acknowledge the limitations of the original study, we believe that our meta-analysis still has some merit and clinical relevance. First, this is the first meta-analysis to systematically assess the predictive value of ML models for death at different times in patients with ACS. Second, we ranked the model variables of the original study according to importance, providing the most valuable variables for predicting ACS death, complementing variables not included in the GRACE risk score, which can guide further development of mortality risk models. In addition, the original studies included in our systematic review used different authoritative databases, including the MIMIC-III database and multiple national registry databases for acute myocardial infarction. The extensive use of different databases adds to some extent to the reliability of our results.

Conclusion

Risk stratification of ACS patients is crucial for the early identification of high-risk patients to provide effective interventions. The ML model is a good prediction tool for predicting the long-term and short-term mortality risk in ACS patients. The commonly used predictors were age, sex, systolic blood pressure, serum creatinine, Killip classification, heart rate, diastolic blood pressure, blood glucose, and hemoglobin. These can guide future risk scoring systems’ development or update.