Introduction

Symptoms suggestive of myocardial infarction (MI) are a major reason for presentation to the emergency departments (ED) worldwide [1]. Measurement of cardiac troponin is crucial to diagnose or to rule out non-ST-elevation MI (NSTEMI) [2, 3]. For the management of patients with suspected NSTEMI, current guidelines recommend the application of high-sensitivity cardiac troponin (hs-cTn) assay-specific thresholds such as the 99th percentile or study-derived cut-offs for measurements obtained directly at presentation and, depending on the selected diagnostic approach, during serial sampling after one, two or three hours. [3,4,5,6,7]

Application of fixed assay-specific hc-cTn thresholds combined with predefined time points of serial sampling remains challenging in busy emergency settings with globally widely differing patients’ characteristics. Besides, in the context of suspected NSTEMI, clinicians do not interpret hs-cTn concentrations and thresholds in isolation, but in combination with ECG findings and clinical characteristics, such as chest pain onset time, cardiovascular risk factors, age, sex, and other comorbidities, which are largely neglected in most current diagnostic algorithms [8]. Thus, a diagnostic algorithm, simultaneously including various variables such as hs-cTn concentrations, their dynamic change during flexibly timed resampling, ECG findings as well as most relevant and immediately available clinical variables, constitutes an unmet clinical need in patients with suspected MI, both in the ED and in the ambulatory care setting.

Based on prior work [9], we derived and validated a machine-learning model, which estimates the individual probability of NSTEMI in patients presenting with symptoms indicative of MI. This model accounts for immediately available confounding clinical variables, allows for flexible timing of potential serial sampling and can be applied using most established hs-cTn assays, including point-of-care assays. We aimed to prove its clinical application in patients with suspected NSTEMI and [1] defined the model’s overall diagnostic accuracy, [2] assessed the clinical performance according to MI probability thresholds in heterogeneous clinical conditions, and [3] finally compared the model’s clinical utility against currently recommended assay-specific thresholds. Overall, this work shall pave the way towards the routine clinical implementation of medical decision support systems to improve a rapid, efficient and safe diagnostic process in patients with suspected MI.

Methods

Study design and populations

In the “Artificial intelligence in suspected myocardial infarction study “ (ARTEMIS), we derived and externally validated diagnostic models by estimating the probability of MI using machine learning (probability machines) in adult patients presenting to the ED with symptoms suggestive of MI. We excluded patients presenting with ST-segment elevation MI. The overall study concept is displayed in Fig. 1. Briefly, probability machines for MI were derived in the BACC (Biomarkers in Acute Cardiac Care; NCT02355457) study, which is an ongoing prospective observational diagnostic study performed at the University Heart & Vascular Center Hamburg, Germany [10, 11]. The probability machines were then externally validated in the stenoCardia (Study for Evaluation of New Onset Chest Pain and Rapid Diagnosis of Myocardial Necrosis; NCT03227159) cohort, which prospectively enrolled patients with suspected acute coronary syndrome at the EDs of the University Medical Center Mainz, the Federal Armed Forces Hospital Koblenz, and University Hospital Hamburg-Eppendorf between 2007 and 2009 in an observational fashion [12, 13]. To confirm the generalizability and global applicability of the newly developed and validated diagnostic models in clinically and geographically widely varying settings, anonymized individual-level data of thirteen additional cohorts from nine countries and four continents were transferred to the University Medical Center Hamburg-Eppendorf, Germany, to centrally apply the diagnostic models on the harmonized data in the global generalization dataset (see Supplementary Appendix for detailed description).

Fig. 1
figure 1

Study concept and diagnostic model development. This figure displays the overall study design including study populations, development of the diagnostic model, model validation and generalization, as well as comparison to the current standard of care

All studies were carried out according to the principles of the Declaration of Helsinki and approved by the local ethics committees. Participation was voluntary; each patient gave written informed consent. The TRIPOD checklist for this study is provided in Table S1 in Supplementary Appendix.

Adjudication of final diagnosis

The primary outcome of this study was the diagnosis of NSTEMI at time of ED presentation, which included type 1 and type 2 MI. In the derivation and validation dataset, the final diagnosis of MI was adjudicated after patient discharge by two cardiologists independently considering all available clinical, imaging, electrocardiographic and hs-cTn information. Cases in which the two initial adjudicators disagreed were reviewed by a third cardiologist. Detailed information on the adjudication process in each cohort including the generalization dataset may be found in the Supplementary Appendix.

Outcome data

For prognostic evaluation, we collected data on incident MI, excluding the index events, as well as all-cause death within 30 days after ED presentation.

Troponin measurements

Concentrations of cardiac troponin was measured by five hs-cTnI assays (Architect® i2000 platform by Abbott; Atellica® IM platform by Siemens Healthineers; Atellica VTLi® point-of-care device by Siemens Healthineers; Access® platform by Beckman Coulter; PATHFAST® Analyser by PHC) and one hs-cTnT assay (Elecsys® Cobas e411 platform by Roche Diagnostics) in blood samples collected at time of ED presentation and serially thereafter as part of routine clinical care or in batches of samples that had been stored at  – 80 °C. Targeted timing of the second blood draw differed between the various participating studies and ranged from one to three hours. Time elapsed between serial study blood sampling in the ED was documented. Additional information regarding the hs-cTn assays used in all ARTEMIS study cohorts is provided in the Supplementary Appendix.

Clinical variables

In total, 18 patient-specific as well as hs-cTn-related variables readily available at time of ED presentation and all previously associated with myocardial infarction were considered for model development. The most important clinical variables were selected for the final model (see Supplementary Appendix).

Statistical analysis and model development

A detailed statistical description is provided in the Supplementary Appendix and summarized in Figure S1. Briefly, for each of the six hs-cTn assays studied, we derived, validated, and globally applied two machine-learning diagnostic models, which estimate the individual probability of an acute MI in individuals presenting to the ED with suspected MI: One model was based on a single hs-cTn measurement obtained at time of ED presentation, the second model on two serial hs-cTn measurements. Modeling steps in the model derivation phase included multiple imputation of missing co-variables, cross-validation in all modeling and variable selection steps, and combination of multiple machines in a super learner with equal weights. Probability estimates of the super learner were calibrated in all validation and generalization studies.

The diagnostic performance of the models across the spectrum of possible MI probability thresholds was evaluated in one percent increments. Diagnostic performance measures were obtained from random effect meta-analyses and included negative and positive predictive value (NPV and PPV), sensitivity and specificity, proportion of patients below or above a given MI probability threshold as well as corresponding 30-day incidence of MI or death. Resulting tables and figures could be used to identify patients at low risk of MI suitable for outpatient management or those at high risk who are suitable to inpatient or invasive strategies. To illustrate the clinical applicability and to contrast the performance of the novel diagnostic model with the current state of the art approach, we compared the diagnostic performance measures of our diagnostic model with the 0 h, 0/1 h and 0/2 h strategy recommended by the ESC guideline [4].

To make the algorithm readily available and applicable to clinicians, a mobile application is currently constructed based on the present models, which are easily transferable to other systems. In a mid-term perspective, semi-automated integration of the diagnostic models into the local electronic health record systems as a medical support system is envisioned.

All statistical analyses were performed in R version 4.2.0 [14].

Results

Study populations

The models were developed in 2575 patients with suspected MI in the derivation cohort BACC and then applied in 1688 patients of the validation cohort stenoCardia as well as in 23,411 patients of the global generalization dataset. Baseline characteristics of the derivation, validation and global generalization cohorts can be found in Table 1 and Tables S2, S3, S4, S5. In the overall dataset, median age was 61 [50,73] years, 55.8% were male and 46.1% presented to the ED within the first three hours after symptom onset. Prevalence of MI ranged from 5.5 to 16.8% across the study cohorts. During follow-up, 643 (2.7%) incident cardiovascular death and 1007 Mis (4.8%) were observed.

Table 1 Baseline characteristics for derivation, validation, and generalization cohorts

Serial measurements of all hs-cTn assays were available in the derivation dataset, but availability of measurements varied among the validation and generalization cohorts (Figure S1). Overall, at time of ED presentation, hs-cTnT Elecsys was the most widely used assay with measurements available in 20,001 patients followed by hs-cTnI Architect in 14,255, hs-cTnI Atellica in 8332, hs-cTnI Access in 6946, hs-cTnI Pathfast in 3246 and hs-cTnI Atellica VTLi in 1088 patients Fig. 2.

Fig. 2
figure 2

Discrimination measures using the diagnostic model based on a single and on a serial hs-cTn measurement per assay summarized across the validation and generalization cohorts. This figure summarizes the discrimination measures AUC and LogLoss with 95% CI for each hs-cTn assay using the diagnostic model with single and serial hs-cTn measurements. The displayed measured represent the summarized values from the validation and generalization cohorts. Detailed results from each cohort are displayed in Figure S5. Abbreviations: AUC  area under the curve, CI  confidence interval, hs-cTn  high-sensitivity troponin

Model derivation

Among 18 variables investigated, 9 variables for the single hs-cTn measurement and 8 variables for the serial hs-cTn measurement were selected (Table S6, Fig. 3). Based on these variables, four different learning machines were selected and combined to a super learner into each diagnostic model: For the single hs-cTn diagnostic model multivariable logistic regression with restricted cubic splines, gradient boosting, multivariate adaptive regression splines and elastic net were selected. For the serial hs-cTn diagnostic model multivariable logistic regression with restricted cubic splines, gradient boosting, multivariate adaptive regression splines and random forest were selected. Both diagnostic models provided a better performance compared to models based on hs-cTn alone, models including information on eGFR, or the full models (Figures S2, S3, S4). The machine-learning-based super learner outperformed classical multiple logistic regression for both the single and serial validation models (Figure S3). Specifically, it performed better than any single machine for the single hs-cTn troponin measurements. The diagnostic model using single or serial hs-cTn measurements showed high discriminative accuracies for each evaluated troponin assay (Figure S5).

Fig. 3
figure 3

Diagnostic pathway in patients with suspected myocardial infarction—the machine-learning supported clinical application. This figure displays the clinical workflow to estimate the individual MI probability using the ARTEMIS diagnostic model. Abbreviations: CAD  coronary artery disease, ECG  electrocardiogram, MI  myocardial infarction, hs-cTn  high-sensitivity cardiac troponin

Model validation

In the validation dataset, the diagnostic model showed a better performance, compared to models based on hs-cTn alone, models including information on eGFR or a model including all offered clinical variables (Figures S2, S3, S4). Observed and predicted risks of MI were for all assays in the derivation data and after calibration in the validation data (Figure S6). When applying the diagnostic model based on a single or a serial hs-cTn measurement in the validation dataset, we observed an increase in AUC and a decrease in logLoss and Brier Score (Figure S5).

Global generalization

In the global generalization dataset, observed and predicted risks of MI were again similar for all assays after re-calibration (Figure S7). The discriminative accuracy using the diagnostic model was high across all cohorts (Figure S5; Table S7). When summarizing the measures across the validation and generalization cohorts, the AUCs were similar for all hs-cTn assays applied (Fig. 2). In detail, the AUCs were 0.95 (95%CI 0.94–0.96) and 0.98 (95%CI 0.97–0.99) for the single and serial hs-cTn diagnostic model using the Access assay, and 0.92 (95%CI 0.89–0.94) and 0.96 (95%CI 0.95–0.98), for the Architect assay, respectively. For the Atellica assay, the AUC was 0.93 (95%CI 0.90–0.97) and 0.96 (95%CI 0.94–0.98), and 0.86 (95%CI 0.82–0.89) and 0.92 (95%CI 0.90–0.95), for the Atellica VTLi point-of-care assay, respectively. For the Elecsys assay, the AUC was 0.89 (95%CI 0.87–0.92) and 0.94 (95%CI 0.92–0.96) and the patient-near Pathfast assay revealed an AUC of 0.95 (95%CI 0.94–0.97) and 0.98 (95%CI 0.97–0.99), respectively.

Clinical application

To illustrate the clinical usability, we calculated the diagnostic measures for each possible MI probability threshold. Across the range of thresholds, we observed a decreasing NPV and sensitivity with increasing MI probability, while PPV, specificity and 30-day mortality continuously increased (Figure S8, Tables S8, S9). As examples, the diagnostic measures to rule-out MI in individuals with a MI probability below 0.5%, below 1% and below 2% are depicted in Table 2 using both diagnostic models with single and serial hs-cTn measurements. When using single hs-cTn measurement and a MI probability of less than 0.5%, we observed very high NPVs of 99.6% or greater. In contrast, when using serial hs-cTn measurement and a MI probability of, e.g., less than 2%, we observed excellent diagnostic measures with NPV values of 99.5% or above and a proportion of at least 60% of the population. Importantly, these values were associated with a low risk of 30-day mortality ranging between 0.6–1.1%.

Table 2 Diagnostic measures of selected MI probability thresholds to rule-out of MI

Comparison to standard of care

Comparative analyses using a single hs-cTn measurement approach based on the ESC algorithms versus the ARTEMIS pathway are depicted in Table 3. Using the ARTEMIS pathway and considering an MI probability threshold < 0.5% to identify subjects eligible for direct rule-out of MI, the safety, quantified by NPV and sensitivity, was very high and similar when compared to the direct rule-out approach of the ESC algorithms. Importantly, however, the proportion of patients qualifying for direct and safe rule-out based on a single hs-cTn measurement was increased by factor two–three by our machine-based model, ranging between 30 and 49%, as compared to 14 and 15% using the direct rule-out approach provided by the ESC algorithms. Using an MI probability of > 50% as a direct rule-in criteria, high accuracy, quantified by the PPV and specificity, was achieved. The accuracy and proportions of direct rule-in were similar to the ESC algorithms. Furthermore, the observational zone after a single hs-cTn measurement was reduced for all hs-cTn assays by 10–33% when using the ARTEMIS pathway. For the serial hs-cTn measurement approach, a selection of possible ARTEMIS thresholds to define rule-out and rule-in of MI resulted in overall comparable diagnostic performances when directly compared to the ESC 0/1 h and 0/2 h algorithms (Table S10).

Table 3 Diagnostic performance comparison of the direct rule-out or rule-in approach based on a single hs-cTn measurement of the ESC 0/1 h algorithms and the ARTEMIS diagnostic model

Exemplary clinical use cases

The general workflow and the potential clinical application of the ARTEMIS pathway are displayed in Fig. 3 and Supplementary Appendix (Figure S10). The smart interpretation of cardiac troponin, which can be measured with a large variety of possible hs-cTn assays in ARTEMIS, in combination with other easily available clinical variables may inform the treating physicians in real time about the individual probability of MI in form of a mobile application or, if embedded in the local electronical medical health record system, as a medical decision support system. Hereby, ARTEMIS may guide safe, efficient and immediate medical decision in patients presenting with suspicion of MI.

Discussion

Extending prior work [9], we derived, validated, and generalized a personalized diagnostic model to immediately, accurately, and safely quantify the risk probability of MI. From individual-level data contributed by more than 27,000 patients with suspected acute MI in four continents, nine countries and 14 prospectively established real world cohorts we applied various machine-based learning tools and developed a super learner model resulting in two diagnostic models. Their clinical application allows providers to determine the probability of MI with high diagnostic accuracy. The personalized model (1) works irrespective of which hs-cTn assay is used, (2) integrates the information of important and rapidly available clinical variables, (3) requires neither assay-specific cut-offs nor fixed timing of serial sampling, (4) can be applied after calibration in various clinical settings with widely varying pre-test probabilities and (5) offers a selection of risk probability thresholds (e.g., 0.5%, 1% or 2% MI probability) which allows for safe and immediate discharge in a very high proportion of patients.

While the application of hs-cTn assays improves visibility of even minor myocardial injury and allows for early detection of MI, the clinical management and decision-making became more challenging [4, 13, 15]. Consequently, various assay-specific hs-cTn algorithms have been developed and implemented to efficiently diagnose and triage patients with suspected MI [16,17,18]. Although these algorithms allow for major advances in rapid and safe clinical decision-making, they still rely on inflexible rules for the timing of hs-cTn resampling (1, 2 or 3 h) and apply assay-specific thresholds of mostly very low concentrations and do not account for clinical variables such as age, sex, risk factors, chest pain onset time, and others. In consequence, the assay-specific 0/1 h and 0/2 h or 0/3 h algorithms as suggested by the European Society of Cardiology for example, are not fully implemented in global clinical routine [4].

To accelerate the advantage of hs-cTn usage in clinical routine and enable—in interaction with hs-cTn point-of-care tests—a safe application also in ambulatory settings, we extend the concept of risk probabilities introduced recently [9] towards a highly accurate personalized diagnostic model. As the model was trained using eleven (selected out of an initial 18) clinical variables including time of chest pain onset, time between serial sampling, ECG, age, sex, and cardiovascular risk factors and nearly all hs-cTn tests currently available, it provides the highest possible diagnostic accuracy and allows for rapid and safe decision-making. Both, single and serial sampling models achieve excellent diagnostic accuracy and offer the opportunity to select rule-out thresholds which allow rapid and safe discharge in a high proportion of patients. To achieve the best balance between high safety and high efficacy, a low MI probability threshold (e.g., 0.5%, 1% or 2%) is recommended for rule-out after single or serial testing, respectively. Compared with previous data on the performance of the ESC 0/1 h algorithm reporting a rule-out proportion of 44–57%, the rule-out proportions achieved by the application of the thresholds of the diagnostic models are larger and range, e.g., for a serial rule-out cut-off < 2%, between 60 and 76% [18, 19]. This improvement is most apparent for a single measurement approach, which allows direct rule-out of MI in 30–49% of the overall population compared to 13–15% using the ESC algorithm [18,19,20,21,22].

As the model is based on heterogenous global data, it is calibrated for European, Australian, New Zealand, Northern American, and Japanese conditions and, therefore, can be generally applied. The model also integrates two point-of-care hs-cTn assays (Pathfast and Atellica VTLi). When hs-cTn point-of-care assays are used, the ARTEMIS model can be applied in outpatient settings and, therefore, might improve diagnostic accuracy and speed in outpatient care and could reduce the number of hospital admissions.

In general, machine-based learning diagnostic and prediction models need to fulfill high methodological, clinical and regulatory standards before being used by healthcare professionals in clinical practice [23]. A recent report raises 12 critical questions, all of which have been positively addressed by the current algorithm [23]. In particular, the sample size is appropriate, validation has been extensively performed, and the outcome variable is labeled reliable, replicable, and independent.

Prior work already introduced machine-learning concepts to provide an individualized and objective assessment of the likelihood of myocardial infarction [24]. It for the first time presented the concept of machine-based learning to improve the diagnostic accuracy of MI diagnosis and rule-out. Although this work paved the way towards modern diagnostic approaches and performs well in routine clinical practice [25], it relies on only two predefined clinical variables age and sex beyond hs-cTn, and it is restricted to one specific hs-cTnI assay. It further highlights the need for model calibration prior to application in the population, which was limited in this the first concept [25]. The ARTEMIS model had been calibrated for the heterogeneous clinical conditions globally but requires further calibration of the super learner for each clinical setting, in which it will be directly applied. In consequence, the concept and construction of the ARTEMIS model will enable both, the inclusion of any hs-cTn assay entering the market and local calibration to settings in which it will be clinically applied.

The integration of the selected, easily available variables including whatever hs-cTn test available, supports an app- or middleware-guided safe, efficient and immediate medical decision. Whereas the ARTEMIS pathway might be suitable for embedded middleware approaches, which enable the integration into the hospital-based electronic health record system, app-based solutions might be more suitable for ambulatory care or independent emergency settings.

Some limitations should be considered when interpreting the findings. First, the outcome diagnoses of MI were adjudicated in each cohort separately and were not based on a harmonized standard operating procedure. Second, our models were validated to estimate the individual risk of MI in patients with clinically suspected MI. This does not include other acute conditions, that may lead to acute chest pain, such as pulmonary embolism or aortic dissection. Therefore, the estimated MI probabilities must always be considered in the clinical context and should not be used as only basis for decision-making. Finally, our diagnostic models were derived, validated, and generalized using data from multiple prospective, diagnostic studies, but have not been prospectively tested in clinical routine. Therefore, to assess real-world performance not only in the ED but also in other clinical settings (e.g., in ambulatory care or in the preclinical setting in ambulances), prospective clinical trials directly applying the ARTEMIS diagnostic model and comparing against standard of care is of importance.

In conclusion, we developed, validated, and globally applied the easily applicable diagnostic ARTEMIS model considering immediately available variables to estimate the individual risk of MI in patients with suspected MI. The model can be used with most hs-cTn assays currently available and allows for rapid and safe discharge of a very high proportion of patients. Its digital application might improve routine clinical practice globally and enable a personalized diagnostic evaluation of suspected MI.