Introduction

Premenopausal women typically exhibit a lower risk of ST-elevation myocardial infarction (STEMI), but this risk escalates with age and the emergence of cardiovascular disease (CVD) risk factors, leading to more severe outcomes compared to men1,2,3. Studies indicate a higher in-hospital mortality rate for women with STEMI, as well as a higher prevalence of comorbidities such as hypertension, diabetes, and obesity. However, most randomized clinical trials have limited female representation, which raises concerns about the relevance of their findings4,5,6,7.

Risk scoring systems such as Thrombolysis in Myocardial Infarction (TIMI) and the Global Registry of Acute Coronary Events (GRACE) are vital for predicting STEMI mortality. However, they are largely based on the Western population from the 1990s and early 2000s, inadequately representing the diverse Asian population8,9,10. Furthermore, discrepancies in risk factors, such as differing smoking rates among Western and Asian females with acute myocardial infarction (AMI), and atypical AMI symptoms in women, limit their global applicability11. Additionally, their reliance on logistic regression (LR) presents limitations like rigid data assumptions. These issues underscore the need for new methods tailored to predict mortality in Asian female STEMI patients12,13,14.

Machine learning (ML), with its diverse statistical techniques and algorithms, presents a powerful alternative to traditional risk-scoring systems, enabling computers to learn from data and enhance decision-making and performance without explicit programming in healthcare 15,16,17,18. These include ML algorithms such as LR, support vector machine (SVM), k-nearest neighbours (KNN), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost)19,20,21,22. These algorithms have been especially beneficial for patient subgroups defined by specific characteristics such as age and comorbid diabetes, giving superior area under the curve (AUC) metrics than traditional methods9,23,24,25,26,27.

Despite ML's growing presence in cardiology, research focused on STEMI in Asian women remains limited. Studies have been reported on risk factors in multi-ethnic cohorts and age-related CVD patterns using ML algorithms; however, gender-specific ML-based models are scarce28,29. This gap highlights the urgent need for gender-specific ML models in cardiology tailored to Asian women with STEMI.

Ensemble ML, an advanced ML method, combines multiple models to improve predictive accuracy and adaptability, which is very useful in healthcare's complex environment16. Its application is evident in CVD studies, where studies using ensemble ML show better illness prediction accuracy and patient outcomes30,31,32. Ensemble ML is reported to outperform single ML algorithms, which is crucial in medical fields where precision impacts patient survival33. Feature selection techniques further optimize ML models in healthcare, essential for identifying mortality risk factors in high-risk STEMI patients34,35. However limited studies have been reported on ensemble ML and feature selection methods of women in STEMI.

Addressing the underrepresentation of Asian women in STEMI-related ML models, our study explores both base and ensemble ML models, employing six established algorithms like SVM, KNN, DT, RF, XGBoost, and AdaBoost as base learners. We also focus on identifying key factors associated with in-hospital mortality among multi-ethnic Asian women, a demographic often neglected in existing models, using ML feature selection methods. We aim to compare traditional risk scores with both base ML and advanced ensemble ML models, employing feature selection techniques rooted in RF and SVM algorithms. This also involves analysing our models against diverse registry data and evaluating a model specifically tailored for women against a more general model encompassing all STEMI patients. Ultimately, our goal is to improve prediction accuracy, fostering more personalized and effective clinical decision-making for Asian women with STEMI.

Materials and methods

Study design and setting

We conducted a retrospective cohort analysis using anonymised data from the National Cardiovascular Disease Database (NCVD-ACS) from 2006 to 2016. The NCVD, which is supported by the Ministry of Health Malaysia (MOH) and the National Heart Association of Malaysia (NHAM), collects detailed information on patients diagnosed with Acute Coronary Syndrome (ACS), which includes conditions like STEMI and non-ST segment elevation myocardial infarction (NSTEMI). It includes a wide range of patient information from 24 collaborating Malaysian hospitals, including demographics, treatments, and medications 36.

The study focused on female STEMI patients, to address a research gap in this demographic, particularly in Malaysia. The data gathered from a network of healthcare facilities in both urban and rural areas, represents an extensive and robust sample for research. The study proposes the application of advanced ML techniques to construct predictive models tailored to the unique epidemiological profiles of Asian women with STEMI, hence improving the personalization and effectiveness of their clinical care. The study's workflow and methods are shown in Fig. 1.

Figure 1
figure 1

Research workflow and methodology applied in this study.

Participants

The cohort for this study was collected from the NCVD-ACS registry and spanned the years 2006 to 2016. Our primary analysis included primarily female STEMI patients’ complete data records for clinical outcome analysis. For our secondary analysis, we increased the scope by incorporating three distinct datasets to enhance the robustness and generalizability of our findings:

  • Women complete dataset: consisting of female patients with complete data, allowing a focused analysis on the intended demographic with no missing values in predictor variables.

  • Women imputed dataset: including a larger dataset with missing values addressed through multivariable imputation, increasing female patient records to represent a broader range of clinical circumstances.

  • General complete dataset: including complete data for both male and female STEMI patients, which provides a comparative perspective across genders and allows us to examine the model's performance in a broader context.

Data Source

Our study utilized anonymized patient data from the NCVD-ACS registry spanning from 2006 to 2016. Consecutive in-hospital STEMI cases comprised a total of 15,407 with 6299 complete cases identified (with no missing values on predictors). This study utilised 871 cases of female patients for primary analysis using complete cases from a total of 6299 datasets.

In 2007, the Medical Review & Ethics Committee (MREC) of the MOH of Malaysia approved the NCVD registry study (Approval Code: NMRR-07-20-250). The MREC waived patient informed consent for NCVD37,38. This study also has been authorized by the UiTM ethics committee (Reference number: 600-TNCPI (5/1/6)) and NHAM. The data used in this study were made anonymous before use, as in our research data are interested only in the values and features without having access to patient personal information.

The dataset used in this study includes each patient's information at the time of STEMI hospitalization. Based on the data available at the time, predictions for in-hospital mortality were developed, with the model being utilized once per patient. During the hospital stay, no more predictions were made, aligning the prediction frequency with the crucial decision-making period at the time of patient admission.

Variables and data preprocessing

Variables

STEMI was defined as persistent ST-segment elevation ≥ 1 mm in two contiguous electrocardiographic leads, or the presence of a new left bundle branch block in the setting of positive cardiac markers39. Input variables are features that are used as input in the development of a model to predict the outcome (in-hospital mortality). 48 variables (9 continuous, 39 categorical) from a complete set of data were used in this study (Supplementary Table 1). The categories of variables used were sociodemographic characteristics, CVD diagnosis and severity, CVD risk factors, CVD comorbidities, non-CVD comorbidities, clinical presentation, baseline investigation, electrocardiography (ECG), treatments, and pharmacological therapy. Variables used for model development are variables in the emergency department as first contact as well as variables in the hospital. Our study adopts the following method to address the dynamic nature of patient data during hospitalization:

  • Clinical history, examination, and investigation findings: based on information obtained at the time of admission, these provide a baseline understanding of each patient's initial status.

  • Treatment: we include the initial medical responses and interventions, as well as the primary treatment administered during hospitalization.

  • Medication: recognizing that medication regimens can change, our models consider the final pharmaceutical regimen recommended before discharge, capturing any substantial changes in treatment.

  • Outcome variable (in-hospital mortality): determined based on the patient's condition at the time of discharge, providing a specific endpoint for each case.

The mortality period begins on the day of hospital admission. For in-hospital mortality, the calculation period began with the first hospital admission. Through record links with the Malaysian National Registration Department, the death was confirmed. The registry does not collect information on short-term complications, such as heart failure. Planned follow-up data points were intended to collect this information, but we omitted them from this study due to the high rate of missing values. To increase the significance of this study, we centred our algorithm on policy-altering endpoints such as death. This was accomplished in similar publications9,40,41. The missing rates for each variable utilised in this study are presented in Supplementary Table 1.

Data splitting

We used the stratified random sampling to separate the dataset for model development (70%) and validation (30%) based on Kuhn and Johnson study42 to avoid data leakage43. In circumstances of multiple admissions, a unique patient identification ensured that each patient's data was consistently labelled as the training or testing set, preserving anonymity44.

Data pre-processing methods such as imputation (on missing cases) and balancing (both complete and missing cases) were performed on training data only. Meanwhile, normalization methods were done separately on both training and testing data. We accessed the performance of the developed model and TIMI using a validation set that accounts for 30% of data that is not used for model development.

Data balancing

Our dataset had a significant class imbalance, with non-survival cases (n = 73) accounting for approximately 8.38% of the total dataset (n = 871) and survival cases (n = 798) accounting for 91.62%. To mitigate the imbalance issue and improve the robustness of our model, we used the ROSE package to combine up-sampling and down-sampling techniques on the training data45. The class distribution was adjusted to better reflect a balanced scenario, improving the reliability of subsequent analyses and the predictive performance of the developed models. To preserve the integrity and representativeness of real-world clinical scenarios, this treatment was not applied to the validation dataset.

Data imputation

Since our dataset is prospective, the proportion of missing values across all variables was arbitrary and out of our hands. The definition of an incomplete dataset is up to 30% of variables missing. The probability of missing data in our dataset is independent of both observed values and unseen data components. Our dataset is classified as missing completely at random, indicating that the distribution of missing values is random and independent of any variable that may or may not be included in the analysis. We performed multivariable imputation using chained equations and predicted mean matching from the MICE R package to deal with missing cases for the secondary analysis 46. This method imputes missing values using actual values from other cases in which predicted values are the closest.

Data normalization

Data normalization was used to reduce the bias of features that contribute more numerically to pattern class discrimination42. We employed standardization or z-score normalization, for continuous variables (age, heart rate, systolic and diastolic blood pressure, total cholesterol, high-density lipoproteins (HDL), low-density lipoproteins (LDL), triglyceride, fasting blood glucose) in this study.

Data analysis

Primary analysis

A total of 6299 in-hospital STEMI complete cases were identified (with no missing values on predictors). 871 cases of woman patients were extracted from the data and used as the final dataset for primary analysis. This rendered a full predictor set of 48 variables (9 continuous, 39 categorical) for the study as shown in Table 1.

Table 1 Summary statistics of the complete and imputed dataset.

Secondary analysis

Secondary analyses on the best-performing algorithm were carried out;

(i) For the 15,407 STEMI cases with missing data, we employed multivariable imputation using chained equations to estimate missing values, creating a comprehensive dataset for modelling. This allowed us to include a total of 2197 additional female patients in our analysis, broadening the scope and applicability of our results.

(ii) A total of 4369 patients out of 6299 in-hospital STEMI patients with complete cases, including both male and female patients, were used to train the algorithm with the best performance. Both a women-specific model and a population-specific model were tested and compared using identical testing datasets (262 cases) from the primary analysis of all cases.

Additional statistics

This study presents the mean and standard deviation (SD) of continuous variables as well as the frequencies of categorical variables. Correlation analysis revealed variable associations. Univariate analysis used a Chi-Square test to find significant variables and a two-sided independent student t-test (p < 0.05) to compare them. Pair-wise corrected resampled t-tests were used to compare the base and ensemble ML model performance 49,67. A p-value less than 0.001 indicated statistical significance.

Feature selection

RF and SVM algorithms have produced better results than other base learners in this study. Hence, ranked features from RF and SVM algorithms were used for feature selection. The sequential backward elimination (SBE) algorithm removes irrelevant features in ascending order using model significance value47. Iteratively, SBE was applied to RF and SVM-ranked variables in ascending order48. The prediction models were trained and evaluated for each iteration using the 30% validation dataset that was not used for model development. The models' predictive performance was calculated, and the models with the highest performance and fewest variables were chosen. Then, the base and ensemble ML models were constructed using the selected features from RF and SVM.

Model development

Base ML algorithms

ML algorithms such as SVM52, KNN53, DT54, RF54, XGBoost55, and AdaBoost56 were used to develop prediction models for women with STEMI in R (Version 4.1.2).

SVM is a robust learning algorithm that was used in this study in conjunction with both a linear and a radial basis function (RBF) kernel. KNN is a simple supervised machine learning algorithm that has seen widespread use in the healthcare industry for classification and regression problems (Bansal et al., 2018). DT is a non-parametric supervised learning technique used for classification and regression. To generate multiple small decision trees, RF employs bagging with DT as the primary classifier. The models use the class with the most votes predicted by RF trees. XGB is an implementation of gradient boosting. Gradient Boosting with XGB is more regularised, which improves model generalisation and prevents overfitting, resulting in a more precise result. AdaBoost is an adaptive learning algorithm because it transforms weak learners into strong learners through multiple iterations. These algorithms were chosen based on previous CVD mortality-related research22,24,27,28,57,58. All the hyper-parameters utilised in the development of base and ensemble ML models were tuned using a combination of random search and manual tuning (refer to Supplementary Table 2).

Ensemble ML algorithms

Stacking, a type of ensemble ML algorithm, is a meta-learning strategy that uses the predictions of multiple base learners as input for training a new meta-learner, which makes the final prediction. It is more effective than any individual algorithm in classification and regression problems. In this study, six commonly used ML algorithms, including SVM, KNN, DT, RF, XGBoost, and AdaBoost, are used as base learners, followed by three commonly used meta learners, including RF, generalised logistic model (GLM), and generalized boosted models (GBM)59,60,61. 10-fold cross-validation was used to avoid overfitting for model development on the training set49.

Model evaluation

Model calibration was evaluated using standardized measures on untouched raw validation dataset62. The primary evaluation metric, the AUC, was chosen based on research establishing its effectiveness in a wide range of class distributions, including imbalanced datasets 63,64. While AUC-PR provides more granularity for minority class predictive performance, AUC is still a widely accepted measure for overall diagnostic accuracy. Additional metrics included accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), which provide a comprehensive view of model performance across both classes. To compare the predictive performance of ML models, a paired resampled t-test was used65. In addition, the net reclassification index (NRI) was calculated to determine the percentage improvement in identifying both positive and negative cases with the best model compared to the TIMI risk score66.

Results interpretation

Due to their black-box nature, it is difficult to implement ML models in clinical medicine. Since ML models are agnostic, perturbing input and observing predictions can reveal the behaviour of the underlying model50. Modifying components that are understandable by humans enables us to interpret the input. Thus, we interpret the best ML model in this study using local interpretable model-agnostic explanations (LIME)51. LIME employs a simple linear model to approximate a black-box model locally, as opposed to globally.

Comparative analysis

The computed TIMI scores obtained from the NCVD registry were utilized for validating the performance of the data. Using the 30% validation set data, the TIMI score was compared to the developed base and ensemble ML models using AUC. A performance breakdown graph was also created to evaluate the performance of the TIMI score based on clinical practice and literature cut-off points.

Validation of data was NCVD registry calculated TIMI scores were used for validation data performance. Using a validation set that was not used for model development, the AUC of TIMI score performance was compared to the developed base and ensemble ML models. A graph was also created to compare performance with the TIMI score based on clinical practice and literature cut-off points. The ML high-risk population for this study is defined by a mortality probability of greater than 50%, which is equivalent to a TIMI score of greater than 5.

Ethical declaration

This study was authorized by the UiTM Research Ethics Committee (Reference: 600-TNCPI (5/1/6)), with the approval code REC/673/19. The UiTM Ethics Committee conducts following the ICH Good Clinical Practice Guidelines, Malaysia Good Clinical Practice Guidelines and Declaration of Helsinki.

Results

Patient characteristics

The characteristics of patients utilised in this study are detailed in Table 1. In the complete cases dataset, the mean age of in-hospital female STEMI survivors is 61.8 (SD 11.5) years, while the mean age of non-survivors is 67 (SD 9.8) years. Nearly 90% of the patients were non-smokers. 73% of the patients have a hypertension history, and 57% have diabetes. 32% of patients received percutaneous coronary intervention (PCI) treatment. The reported overall hospital mortality rate for women was 8.4%.

Table 1 also displays the summary statistics for the imputed dataset. The overall mortality rate for women was 12.8 %. There were significant differences in systolic blood pressure, Killip class, fasting blood glucose, beta-blocker, ACE inhibitor, and oral hypoglycemic agent between survivors and non-survivors in both complete cases and imputed datasets (p < 0.001 for all).

Feature selection

SBE feature selection methods were combined with ML algorithms SVM and RF to construct predictive models with optimal performance (refer to methods). The comparison between features selected by ML feature selection with TIMI risk score is illustrated in Table 2. Killip class, fasting blood glucose, age and systolic blood pressure, beta blocker and percutaneous coronary intervention were observed as common predictors in both ML feature selection models in this study. The best SVM Linear model was built using twelve features selected using SVM algorithm feature selection methods. Age, Killip class, and systolic blood pressure are common characteristics shared by the TIMI risk score for STEMI and the best model. The ranking of the selected features by variable importance is presented in Supplementary Table 3.

Table 2 Comparison between features selected by ML feature selection with TIMI risk score.

Algorithm performance on complete cases

On the 30% validation dataset, the models constructed using complete sets (48 variables) and a reduced set of variables compared to the TIMI risk score demonstrated the highest predictive performance (Table 3). Except for base DT and ensemble GBM, most ML models outperformed TIMI risk scores for the prediction of STEMI in women. The model with the best performance was base SVM (SVM selected var; p < 0.001). Table 4 provides a detailed performance evaluation of ML models relative to the TIMI risk score.

Table 3 The AUC of TIMI risk score and ML models with and without feature selection based on a 30% validation dataset.
Table 4 Detailed performance metrics of ML models with and without feature selection for women STEMI patients.

The predictive performance of ML models constructed with SVM-selected features (AUC ranging from 0.70 to 0.93) was better compared to that of models constructed with RF-selected features (AUC ranging from 0.60 to 0.90). There was a significant difference between the base SVM-Linear (SVM selected var) algorithm and the base SVM-Linear (RF selected var) algorithm (p < 0.001). Models constructed with the ensemble RF model (AUC: 0.91, CI: 0.87–0.96) perform the best among ensemble ML models (Table 2). However, the base SVM with the linear kernel (SVM selected var) algorithm demonstrated the highest predictive performance with a reduced number of predictors (12 predictors) for in-hospital prediction of STEMI patients (AUC = 0.93, 95% CI = 0.89 to 0.97) compared to other base and ensemble ML models.

Secondary analysis on best performing model

The best performing ML models, base SVM (SVM selected var), were also trained on an imputed dataset and a general dataset (data with complete cases that are not gender-specific). Then, both types of models were evaluated utilizing the complete cases validation dataset. This enables a valid comparison between models constructed with imputed, general, and complete cases models (Table 5).

Table 5 Detailed performance metrics of best SVM Linear model (SVM selected var) on the imputed dataset and general dataset for STEMI women patients.

SVM (SVM selected var), trained on imputed datasets performed comparably to models trained on the complete dataset using a similar validation dataset of complete cases: SVM (SVM selected var) (AUC = 0.89, CI: 0.81–0.96 vs AUC = 0.93, CI: 0.89–0.98) (p = 0.540). There is no statistically significant difference between the SVM model (SVM selected var) using complete cases with the imputed model.

Using the complete cases validation dataset, the model trained with women's complete cases performed better compared to the models trained with complete cases data that are not gender specific: SVM (SVM selected var) (AUC = 0.93, CI: 0.89–0.98 vs AUC = 0.92, CI: 0.87–0.97) (p < 0.001).

Model interpretation

LIME provides explanations for any individual patient, and the contribution of a given variable may change depending on other features of the patient. The contributions of the variables used for prediction by LIME analysis are illustrated for dead (Fig. 2) and alive (Fig. 3) cases respectively using the best performing model, base SVM Linear (SVM selected var) model.

Figure 2
figure 2

LIME model plots explaining individual predictions for dead cases.

Figure 3
figure 3

LIME model plots explaining individual predictions for alive cases.

Each graph illustrates the ten variables that best characterise the prediction in the local region. The blue bars represent variables that increase the predicted probability (supports), while the red bars represent variables that decrease the predicted probability (reduces) (contradicts). For instance, for the dead cases, a high Killip class > 3 and no PCI intervention with high systolic blood pressure (patient #1) or an older age > 74 years old (patient #68) are variables that strongly indicate non-survival. In the meantime, did not receive PCI intervention with high fasting blood sugar > 14.3 (patient #2) and older age > 74 with higher blood pressure (patient #3) were also strong indicators of non-survival. Pharmacological interventions are noted as variables that contradict and lower the predicted probability of non-survival in (patients #3 and #2). For patients who are alive (Fig. 3), a younger age of 58 years, the absence of chronic renal disease, a lower Killip class < 2, and a lower fasting blood glucose < 6.7 are all supportive of the survival outcome.

Comparison with TIMI conventional risk score

Using a similar validation set, TIMI achieved a lower AUC of 0.81 (0.72–0.89) compared to most of the ML models except for the base DT and ensemble GBM model. Figures 4 and 5 illustrate the graph plotted from the TIMI risk score and the best-performing model, base SVM Linear (SVM selected var) in predicting the mortality risk of the women STEMI patients respectively. For the women patients, the ML score categorized patients as low risk with the probability of < 50% and high-risk stratum as ≥ 50%. This is equivalent to a TIMI low-risk of score ≤ 5 and a high-risk score of > 568.

Figure 4
figure 4

Mortality rate distribution on the validation set of TIMI risk scores.

Figure 5
figure 5

Mortality rate distribution on the validation set of base SVM (using SVM variables) model.

Table 6 tabulates the percentage of mortality in the patients with predicted low risk (TIMI score: ≤ 5; ML probabilities < 0.5) and high risk (TIMI score: > 5; ML probabilities: ≥ 0.5). In the high-risk group, ML models predicted mortality better in comparison to TIMI for in-hospital death in women STEMI patients.

Table 6 Percentage of mortality in patients with predicted low risk (TIMI score: ≤ 5; ML probabilities < 0.5) and high risk (TIMI score: > 5; ML probabilities: ≥ 0.5).

NRI analysis

NRI for the in-hospital model, the net reclassification of women STEMI patients using the base SVM (SVM selected var) produced a net reclassification improvement of 18.8% with p < 0.00001 over the original TIMI risk score.

  

Number of individuals

Reclassification

Net correctly reclassified (%)

Machine learning

Increased risk

Decreased risk

Low risk

High risk

TIMI score

Individuals with events (died) (n = 22)

 

Low risk

0

5

5

1

18

 

High risk

1

16

Individuals without events (alive) (n = 240)

 

Low risk

143

35

35

37

0.83

 

High risk

37

25

Net reclassification index (NRI)

18 + 0.83 = 18.83

Z, p-value

Z= \(\frac{18.83}{\sqrt{\frac{5+1 }{{22}^{2}}+\frac{35+37}{{240}^{2}}}}\)  = 161.19 225.13, p < 0.00001

Conclusion

It was statistically significant. ML model has a better predictive ability compared with the TIMI risk scores model in predicting the mortality rate of Asian women with STEMI patients, and the proportion of correct classification increased by 18.8%

Discussion

This study developed and evaluated ML models to predict in-hospital mortality in Asian women with STEMI, comparing them with traditional risk scores like TIMI. Notably, it is the first study to apply ensemble ML models in this context, achieving higher accuracy than conventional risk scores. Key findings include: the crucial role of feature selection in enhancing model performance; identifying consistent predictors like systolic blood pressure, Killip class, fasting blood glucose, beta-blockers, ACE inhibitors, and oral hypoglycaemics medications; improved performance of ML models using selected features, the SVM linear model with SVM selected features showing the highest accuracy outperforming ensemble ML; most ML models, except DT and GBM, outperform TIMI score; and the use of LIME for model interpretability. These results underscore the value of advanced ML in specific clinical settings, enhancing predictive accuracy and decision-making in treating STEMI in Asian women.

Feature selection enhances ML model performance in our study, aligning with findings from Perez et al.69. Applications of feature selection algorithms increase ML model performance 70,71,72,73,74,75, as seen in this study with the RF (11 predictors) and SVM (12 predictors) models. However, this approach contrasts with other mortality post-STEMI studies where models using larger sets of predictors showed optimal performance 35,76. ML with significant predictors improves risk stratification in Asian STEMI women, providing clinicians with a prognostic tool for better emergency care management.

This study's findings also reveal that ensemble ML methods show promise in predicting in-hospital mortality for Asian female patients, though their performance did not consistently exceed that of base ML algorithms. Particularly, base learners like SVM (AUC: 0.93) and RF (AUC: 0.90) performed on par with ensemble ML models. In medical contexts, even small increases in predictive model performance are crucial77. However, it is notable that the ensemble ML method does not always outperform the base model78. This has been demonstrated in this study that the improvement of the ensemble ML model was not significantly greater than the best-performing base learners SVM, as demonstrated in the literature27,50.

The best-performed model, base SVM Linear managed to identify high-risk patients that reported higher mortality than those classified as high-risk in TIMI. Despite its widespread use in Asia, the TIMI risk score, originally developed from a predominantly Western Caucasian cohort, had limited Asian representation, and only included 25% female participants, indicating an underrepresentation of women. In our study, ML models validated against TIMI showed an AUC value of 0.81 in a non-restricted PCI eligible population, higher than the 0.78 AUC for the fibrinolytic eligible STEMI population reported in the original TIMI study79. The SVM algorithm's robustness in managing high-dimensional and constrained datasets renders it ideal for predicting in-hospital mortality, and its proficiency in modelling non-linear decision boundaries is beneficial for assessing severe AMI prognosis80,81.

NRI was further used for a detailed assessment of model enhancements compared to the TIMI score. The NRI, though less commonly reported in medical research, effectively measures how accurately a new model reclassifies individuals into appropriate risk categories82. In our study we achieved a significant 18.8% improvement in classification accuracy over the TIMI score, indicating that our ML models not only predict more precisely but also better reflect actual patient outcomes. Accuracy tests for NRI were conducted on a separate dataset from that used for model development, providing an unbiased comparison with TIMI and reinforcing the validity of our results.

Our ML models, using feature selection, identified age, Killip class, and systolic blood pressure as key predictors, aligning with univariate analysis and LIME. LIME analysis indicated that factors like older age, increased fasting blood glucose, and absence of percutaneous coronary intervention (PCI) were associated with higher mortality risk, consistent with existing research. However, LIME's identification of influential features should be seen as preliminary and not indicative of causality, necessitating further validation through prospective or randomized controlled trials83,84.

Older female STEMI patients have a higher incidence of coronary artery disease than males2, with Killip class being a key predictor of STEMI patients6,85,86. This finding is consistent with our study and previous ML-based mortality studies40. Women with STEMI face higher mortality due to factors like atypical symptoms, delayed treatment, and less frequent use of cardiac catheterization. Our study found only 34% of Asian STEMI patients received PCI, highlighting a need for improved care. Heart rate is a crucial factor in in-hospital mortality87, and the use of beta-blockers post-STEMI is linked to better outcomes5,7,86,88.

Several limitations exist in this study. Firstly, we could only validate ML models using only the TIMI score. Parameters to calculate the GRACE score were not acquired during patient admission compared to the TIMI score. The TIMI score is adopted during admission due to its simplicity and its development for short-term risk stratification, along with findings that its performance is similar to the GRACE score for predicting in-hospital mortality. Hence collecting information for two risk scores is redundant89.

Future research will aim to utilize high-performance computing and larger datasets for better predictive performance of ensemble techniques. ML models, reliant on data representativeness rather than medical expertise, may exhibit biases and require ongoing validation with real-world data, which can be facilitated by electronic health record systems in hospitals. Integrating these models into hospital systems for physician use and validating them in clinical registries rather than administrative databases, will be key areas of future investigation.

Conclusion

This work demonstrates the effectiveness of both base and ensemble ML models, when combined with feature selection, in predicting in-hospital mortality in Asian women with STEMI. Our findings highlight the potential for combining these advanced ML models with conventional risk-scoring approaches like TIMI to improve mortality risk assessments in this specific group. This opens up the possibility of more nuanced and effective therapeutic decision-making. The improved predictive accuracy achieved by these models not only allows for better patient communication and awareness but also allows healthcare practitioners to optimize their management methods and resource allocation more effectively. In the future, incorporating these ML technologies into clinical practice could greatly enhance care for female STEMI patients. Furthermore, our findings pave the way for future research to test and potentially integrate these models into clinical processes, ultimately leading to more tailored and improved healthcare outcomes for women with STEMI.