Introduction

Cardiovascular disease is a major cause of illness and death worldwide [1], with the presentation of acute chest discomfort in five to twenty percent of emergency room visits [2,3,4,5], ranging from benign to potentially fatal, such as aortic dissection, pulmonary embolism, and acute coronary syndrome [6, 7].

In a busy emergency setting, it is difficult to classify and treat the subgroups of people with cardiovascular disease; hence, prediction scores such as HEART [7, 8], GRACE (Global Registry of Acute Coronary Events) [9], and TIMI (thrombolysis in myocardial infarction) [10], have emerged to facilitate the process to aid in disposition and decision-making. However, in overcrowded and busy emergency rooms, the time required to utilize these scoring systems might be troublesome [11]. Hence, in recent years, artificial intelligence (AI) and machine learning (ML) applications in emergency care have emerged [12], with studies showing that ML outperforms conventional measures by managing the many factors accessible via electronic medical records and big data [13,14,15]. Many different studies employed AI algorithms to generate scores for mortality interpretation as a method for establishing triage ordering [16,17,18,19,20].

For evaluations of the major adverse cardiac effect (MACE), previous studies have included regression-based models such as the Framingham Risk Score (FRS), the GRACE, and the TIMI research [21]. However, these models do not consider the intricate relationships between clinical factors. The systemic grid search approach is deemed superior for multivariable, complicated multiple logistic regression (LR) and, simpler to execute with non-functional data and has shown high performance based on clinical presentation parameters [22]. Also, the evaluation of predictor variables using various machine learning algorithms and systemic grid approaches serves well in the analysis of non-functional data with hyperparameters.

An artificial neural network (ANN) emulates the structural framework of biological neural networks. Comprising interconnected artificial neurons arranged in layers, this computational model undergoes a training process to optimize the weights of connections. Through this refinement, the algorithm acquires the capability to discern patterns, relationships, and features within data, facilitating accurate predictions and classifications on novel, unseen datasets. Therefore, in our study, we aim to develop and validate an ANN model and utilize the systemic grid search based on triage presentation vitals and cardiovascular symptoms to predict MACE at the triage level in an emergency setting. This approach shows intricate connections among factors and produces reliable, predictive forecasts for cardiac outcomes. Hence, our model will help prevent MACE by allowing early detection and timely intervention in the emergency department.

Methods

Study design, setting, and population

This was a cross-sectional study undertaken in the emergency department at Aga Khan Hospital, which is an urban, 62-bed emergency department that serves almost 60,000 patients yearly. The triage data utilized for this research was gathered from hospital electronic databases from December 2017 to December 2020. All individuals (age 18 and older) who arrived at the emergency room with cardiovascular symptoms were included in the research. In the index ED visit, the criteria for cardiovascular symptoms (chest pain, sweating, shortness of breath, shoulder pain, arm pain, jaw pain, impending doom, hypotension, etc.) were derived according to the International Classification of Diseases, Tenth Revision, Chapter 11 (ICD-10) [23]. Excluded were patients who lacked a disposition record, were deceased upon admission, were moved to another hospital from the ED, or were discharged against medical advice (LAMA).

Between January 1, 2017, and December 31, 2020, a total of 292,953 patients presented to the triage of the emergency department, where the emergency severity index (ESI), a five-level scale, was used to triage the patients [24]. Eighty-five thousand nine hundred twenty-nine patients who were less than 18 years old were removed, while 25,573 were eliminated owing to missing information, which included missing hospital registration numbers as well as missing outcome information. In day-to-day emergency services, some patients reach the triage counter but are not admitted into the emergency room due to either overcrowding (diversion) or being left without being seen (LWBS). Further, 21,440 patients were excluded due to transfer out, LAMA (Leaving Against Medical Advice), and death on arrival (DOA). The remaining 69,317 were sent home, and the actual sample size was 97,333 (Fig. 1).

Fig. 1
figure 1

Flow chart showing the exclusion criteria and final number of patients included in the analysis

The sample size represented all individuals who presented to the triage of our emergency department with cardiovascular-related complaints. According to the ESI triage categorization, the patients were further subdivided into high-risk (P1 and P2) and low-risk (P3, P4, and P5). For reporting observational research, the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist for observational studies was used.

Data features and missing data imputation

The dataset comprised 44 feature variables of demographic and clinical characteristics of patients, among which several were predictors of MACE, mortality, and cardiac arrest based on prior research. The variables included age (years), gender, triage vitals (systolic blood pressure, diastolic blood pressure, respiratory rate, oxygen saturation, and temperature), presenting symptoms, cardiac diagnosis, pre-existing comorbidities, triage category, triage processing time windows, and disposition, as well as the outcome variable of all-cause in-hospital (up to 30 days) mortality, cardiac arrest, and MACE (STEMI (ST-segment elevation myocardial infarction), NSTEMI (non-ST elevation myocardial infarction), acute pulmonary edema, heart failure, and cardiogenic shock). Electrocardiography was not used owing to its technical limitations and interpretive variability. Our computerized triage record had 2.6% missing triage vitals and 0.6% missing category covariates, which were subsequently accounted for using the imputation method based on random forests (RF) algorithms. The imputation algorithm was initiated with the average values for continuous variables and the most frequent category for the categorical variables.

Outcome measurements

We established MACE (STEMI, NSTEMI, acute pulmonary edema, heart failure, and cardiogenic shock) during the hospital stay as primary outcome metrics for our investigation. The secondary outcomes were all-cause in-hospital (up to 30 days) and cardiac arrest following the initial ED visit.

Data processing and model development

In the absence of an independent study cohort for the validation of models, we have adopted the train-test split approach to develop and test the prediction accuracy of the models. The complete dataset was split into two sets: the training dataset, which consisted of 80% randomly selected cases, and the testing dataset, which consisted of the remaining 20% of random observations (holdout cases). The training dataset was used to train the model for outcome prediction, and the testing dataset was used to assess the performance of the trained model for outcome prediction. Three binary classifiers, ANN, RF, and LR, were trained and evaluated for three distinct tasks, namely the prediction of MACE, in-hospital (up to 30 days) mortality, and cardiac arrest.

ANN classifier

The optimal hyperparameter setting and structure of the ANN model were identified with the help of a grid search strategy. Based on the classifier’s performance, a model with four hidden layers was used. Grid search was used to adjust a total of nine hyperparameters of the ANN model, including the number of neurons in the first hidden layer [search space 1000, 800, 600], the number of neurons in the second hidden layer [search space 600, 400, 200], the number of neurons in the third hidden layer [search space 400, 200, 100], the number of neurons in the fourth hidden layer [search space 100, 50, 25], the learning rate [search space 1e − 1, 1e − 2, 1e − 3, 1e − 4, 1e − 5], the dropout rate [search space 0.4, 0.5, 0.6, 0.7], activation function “LeakyReLU” parameter alfa [search space 0.01, 0.02, 0.03, 0.04, 0.05], batch size for batch normalization [search space 8, 16, 32, 64], and number of epochs [search space 10, 15, 20, 30]. Each hidden layer was followed by a dropout layer and batch normalization. The activation function was “LeakyReLU” with “binary crossentropy” loss function. Using a grid search strategy, the ideal hyperparameters’ settings and training parameters for the three tasks namely the prediction of MACE, cardiac arrest, and in-hospital (up to 30 days) mortality.

RF classifier

In the same way, three different RF classifiers were trained and tested to see how well they could predict MACE, in-hospital (up to 30 days) mortality, and cardiac arrest. The grid search approach was adopted to optimize the four parameters of the RF classifier. The four hyperparameters and their corresponding search spaces are as follows: the number of trees [search space 100, 200, 500], the size of the random subsets of features to consider when splitting a node [search space: auto,' sqrt, log2, and the search space for the random subsets of features], the depth of each tree in the forest [search space 6, 7, 8, 9, 10], and the criteria for splitting nodes in a decision tree [search space: ‘Gini,’ ‘entropy’].

LR classifier

Finally, classifiers based on binary LR were developed to predict the three outcome variables, namely MACE, in-hospital (up to 30 days) mortality, and cardiac arrest.

In addition to comparing ANN, RF, and LR with one another for predicting mortality, cardiac arrest, and MACE, the Emergency Severity Index (ESI), a routinely used risk stratification modality in the ED, was compared with the three classifiers. The prediction performance of each of the classifiers and the ESI was evaluated using the validation dataset.

In the validation dataset, all models were tested for their overall accuracy of prediction. The sensitivity, specificity, recall precision, and F1-score of each model were calculated. Using the predicted probabilities of ANN, RF, and LR classifiers, the receiver operating characteristic (ROC) curve analysis was performed, and the area under the curve (AUC) was evaluated across these three outcomes.

Feature selection

Using the “featurewiz” technique, significant features for the classification of three outcome variables, namely in-hospital (up to 30 days) mortality, cardiac arrest, and MACE, were obtained separately. In supplemental files, the specifics of the chosen characteristics for each of the three jobs are described (Table S1 and Table S2). The prediction performance of three classifiers was examined using both whole and feature-selected datasets.

Results

A total of 97,333 patients were included in our study, and 33.3% of individuals had cardiovascular symptoms. The mean age of patients was 54.08 years (SD 19.18), and 5170 (53.0%) were male (Table 1). The average ED admission time was 7.87 h (SD 4.37), and the average ED departure time was 8.97 h (SD 4.37). The distribution of the ESI levels was as follows: P1 30,037 (30.9%), P2 33,986 (34.9%), P3 30,411 (31.2%), P4 1688 (1.7%), and P5 1211 (1.2%). According to risk classification, 64,043 or 65.8% were classified as high-risk. At the time of the patient’s presentation in the ED, shortness of breath was the most prevalent presenting symptom noticed in 22,881 cases (23.5%).

Table 1 Comparison of training and validation dataset and outcomes of emergency triage dataset (n = 97,333)

The MACE was observed in 23,052 (23.7%) of the patients, In-hospital (up to 30 days) mortality in 10,888 (11.2%) patients, and cardiac arrest in 5483 (5.6%) patients.

The distribution of diagnoses revealed that the majority of patients had STEMI (6229), followed by NSTEMI (5627), heart failure (1171), cardiogenic shock (4810), and acute pulmonary edema (1783). Seasonal variation in mortality and cardiac arrest was observed in our data (Fig. 2). 34,679 (35.6%) patients presented in winter, i.e., December, January, and February, whereas 38,509 (39.5%) presented in summer, i.e., April, May, and June of 2020. The majority of patients, around 68,893 (70.8%) presented on weekdays.

Fig. 2
figure 2

Seasonal variation of the number of cases of mortality and cardiac arrest

The proposed ANN structure sequential model of implementation of the best selected ANN model architecture for in-hospital (up to 30 days) mortality, cardiac arrest, and MACE was 491,801, 353,951, and 178,501, respectively, as trainable parameters. Trainable parameters distributed by layers are presented in Additional file 1.

The AUC in the validation dataset was 0.931 for the ANN, 0.911 for the RF, and 0.889 for the LR classifier for predicting in-hospital mortality, with f1-scores of 0.610, 0.593, and 0.585, respectively (Fig. 3). Similarly, the AUC in the validation dataset for cardiac arrest was 0.968 for ANN, 0.962 for RF, and 0.946 for the LR classifier, with f1-scores of 0.67, 0.61, and 0.49, respectively (Fig. 4). In addition, the prediction of MACE and AUC in the validation data was 0.973 for ANN, 0.964 for RF, and 0.966 for LR, with f1-scores of 0.694, 0.671, and 0.499, respectively (Fig. 5). The AUCs of various models were compared to see the significant differences (Table 2).

Fig. 3
figure 3

Precision-recall curve and AUC curve for artificial neural network (ANN) to other models comparison with RF and LR analysis that predicts 30-day hospital mortality with and without selected features

Fig. 4
figure 4

Precision-recall curve and AUC curve for artificial neural network (ANN) to other models comparison with RF and LR analysis that predicts cardiac arrest in ED with and without selected features

Fig. 5
figure 5

Precision-recall curve and AUC curve for artificial neural network (ANN) to other models comparison with RF and LR analysis that predicts MACE in ED with and without selected features

Table 2 Comparison of AUC of different models

The sensitivity for the prediction of in-hospital mortality was 94.6%, 87.9%, and 79.7% for the ANN, RF, and LR classifiers, with specificities of 93.3%, 93.3%, and 93.4%, respectively. Similarly, the sensitivity for cardiac arrest prediction using ANN, RF, and LR classifiers was 93.4%, 94.8%, and 68.5%, with specificities of 97.2%, 96.8%, and 96.4%, respectively. Furthermore, the sensitivity for MACE prediction using ANN, RF, and LR classifiers was 99.3%, 99.4%, and 99.2%, respectively, with the specificities being 94.5%, 94.2%, and 94.2%, respectively (Table 3).

Table 3 Sensitivity and specificity analysis of the reference and machine learning models in the overall and selected validation set

Discussion

In this retrospective, single-center, cross-sectional study, we describe a new, more accurate way to predict MACE, in-hospital (up to 30 days) mortality from all causes, and cardiac arrest using a systemic grid technique in an ANN. The extensive dataset, comprising 97,333 patients, allowed for a robust analysis of the performance of the proposed ANN model in comparison to RF and LR classifiers and routinely used emergency severity index (ESI).

Of the three models we have chosen, ANN has an AUC of 0.97 for MACE, 0.968 for cardiac arrest, and 0.931 for in-hospital (up to 30 days) mortality. The implementation of an ANN model for early prediction of MACE at the ED triage has the potential to revolutionize current practices and significantly improve patient outcomes. The systemic grid search approach aims to enhance the model's performance, ensuring optimal predictive capabilities. Comparison with traditional risk assessment methods will provide insights into the added value of the proposed model in the emergency care setting. By manipulating the hyperparameters of the various models, ANN with a systemic grid search approach provided the highest accuracy among the different models (RF and LR). The integration of machine learning models into routine ED procedures presents challenges related to interpretability, scalability, and real-time applicability. Addressing these challenges is crucial for the successful implementation of such models in clinical practice [25].

The use of artificial intelligence for the prediction of various outcomes in the emergency department has gained popularity among researchers in recent years. Jang DH et al. [26] developed and evaluated ANN classifiers for early detection of patients at risk of cardiac arrest in overcrowded emergency departments. The research utilized a single-center electronic health record (EHR)-based approach and compared three ANN models (multilayer perceptron-MLP, long-short-term memory-LSTM, and hybrid) with other classifiers such as the modified early warning score (MEWS), logistic regression, and random forest. In a dataset of 374,605 emergency department visits, the ANN models consistently outperformed non-ANN models. The area under the receiver operating characteristic curve (AUROC) values for ANN models (MLP 0.929, LSTM 0.933, and hybrid 0.936) surpassed those of non-ANN models, with the hybrid model demonstrating the highest performance. Similar to our findings, ANN classifiers exhibited superior test characteristics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), particularly when compared with MEWS thresholds and each other.

The study by Wu CC et al. [27] addresses the lack of risk scores to distinguish non-ST-elevation myocardial infarction (NSTEMI) from non-cardiogenic chest pain, aiming to reduce misdiagnosis in emergency departments. Employing an artificial intelligence (AI) approach, an ANN model was developed using data from 268 chest pain patients. The model demonstrated high accuracy (92.86%) and an impressive AUC of 98.4%. The ANN model exhibited strong sensitivity (90.91%), specificity (93.33%), positive predictive value (76.92%), and negative predictive value (97.67%).

Another study by Hong WS et al. [28] aimed to predict hospital admission at ED triage by incorporating patient history alongside triage information. In a retrospective analysis of 560,486 adult ED visits, three types of classifiers (logistic regression, gradient boosting, and deep neural networks) were trained on datasets containing triage information, patient history, and the full set of variables. The models demonstrated robust predictive capabilities, with the inclusion of patient history significantly enhancing performance compared to triage information alone. The low-dimensional XGBoost model, utilizing variables such as ESI level, outpatient medication counts, demographics, and hospital usage statistics, achieved an impressive AUC of 0.91. The findings underscore the effectiveness of machine learning in predicting hospital admission, emphasizing the importance of incorporating patient history for improved accuracy in admission risk assessment during ED triage.

Similarly, Goto T, et al. [29] have recently assessed the performance of machine learning approaches in predicting clinical outcomes and disposition for children in ED triage, comparing them with conventional triage methods. The study focused on critical care (admission to ICU and/or in-hospital death) and hospitalization outcomes. Machine learning models (lasso regression, RF, gradient-boosted decision tree, and deep neural network) were developed using routinely available triage data as predictors. Results showed that machine learning models, especially for hospitalization prediction, outperformed conventional triage methods, demonstrating higher discriminative ability and reducing both undertriage and overtriage of pediatric patients. The study concludes that machine learning-based triage could enhance prediction accuracy and improve patient disposition in pediatric emergency settings.

The significant cardiovascular burden of the South Asian population is reflected in our study’s prevalence of 33.3%, owing to a Mediterranean diet and a sedentary lifestyle [30]. The high prevalence of cardiovascular symptoms and better prediction of MACE, cardiac arrest, and in-hospital 30-day mortality by ANN provide significant evidence to incorporate this model in electronic triage systems at emergency departments.

Our data also contrasts with the emergency severity index (ESI), which is used to stratify patients at risk in routine clinical practice but has poor predictive capacity for identifying critically ill patients and a high degree of heterogeneity within each triage category [31]. Future research should compare this model to emergency physician gestalt in low-resource emergency rooms and assess patient outcomes.

Limitations

Our research has several limitations. Firstly, due to the absence of external validation, our retrospective analysis employs data from a single institution, and hence, the performance of our model may not be generalizable. Secondly, ANN has issues with interpretability and inferences, may not operate with huge non-functional datasets, and requires a significant amount of time to generate findings. We have used the systematic grid search approach to solve this issue. However, we feel our method can accommodate hyperparameters given the retrospective nature of the data. Thirdly, we did not test the influence of our model on real-time data and clinical practice. This was beyond the scope of our research, but we will assess it in a future investigation. Fourthly, the potential for systemic bias in nurses’ practices is more prevalent in LMICs. There is an element of subjectivity in setting the triage ESI level, and any systemic bias would be mirrored in the model, preventing further generalizability. Our application of the ANN model in settings with limited resources will need an electronic health record with operational e-triage tools to make choices in real time. This is uncommon in many rural and some metropolitan places, which is also one of the weaknesses of our study. However, this research confirms the notion that LMICs should employ ANN as a support tool to aid doctors and reduce medical mistakes in their ED.

However, the application of artificial intelligence and machine learning in healthcare poses several difficulties, such as malpractice responsibility, patient satisfaction, insurance coverage, damage to physical integrity, innovation expenses, legal challenges, healthcare professional liability, and a dearth of high-quality data [32].

Conclusion

For healthcare insurance in poor nations, a transparent and efficient data governance mechanism is essential, with technical and regulatory requirements supplemented using a humanistic-centered approach. ANN with systematic grid searching predicted MACE, cardiac arrest, and in-hospital 30-day mortality in triaging ED patients with cardiovascular symptoms with higher accuracy in contrast to LR and RF models. Our prediction model can, therefore, aid emergency room doctors in making prompt triage choices for patients with cardiovascular symptoms by categorizing and prioritizing patients in the early phase based on their triage presentation criteria.