Background

Chest pain and related symptoms are the most common reasons for patients to present to the emergency department (ED) [1, 2], and present extremely heterogeneous with a wide spectrum of underlying conditions ranging from lethal diseases such as acute myocardial infarction (AMI) to minor acute problems such as intercostal neuralgia. Ruling in or ruling out high-risk conditions in a timely manner is of great importance and a great challenge [3,4,5,6]. Furthermore, the majority of undifferentiated acute chest pain patients are low risk and do not require further invasive tests or admission [6, 7]. Therefore, risk stratification for chest pain patients at EDs has been recommended in several guidelines [6, 8] to not only identify as many true low-risk patients as possible but also avoid missing major adverse cardiac events (MACEs).

The Global Registry of Acute Coronary Events (GRACE) score is an objective prediction tool for definite acute coronary syndrome (ACS), incorporating age, vital signs, kidney function, ECG and troponin levels [9]. This tool has been validated for risk stratification of individuals with acute chest pain [10,11,12,13,14,15,16,17]. In particular, the 0 h/3 h algorithm with the GRACE score incorporated into is recommended (Class I, Level B) for risk stratification and rule-out of AMI in patients with suspected non-ST-elevation ACS by 2015 European Society of Cardiology (ESC) guideline [6].

However, there are six formulas used to calculate the GRACE score for different outcomes, including those for predicting in-hospital death [9], in-hospital death or myocardial infarction (MI) [18], death within 6 months after discharge [19], death or MI within 6 months after discharge, death from admission to 6 months later [20], and death or MI from admission to 6 months later [20]. None of these formulas are specific for rule-out/rule-in of high-risk conditions in patients with undifferentiated chest pain presenting to the ED. The GRACE models have been compared with other scores for stratifying undifferentiated chest pain, such as the History, ECG, Age, Risk Factors, and Troponin (HEART) score and the Thrombolysis in Myocardial Infarction (TIMI) score [21, 22]. Generally, GRACE was inferior to the HEART score, and the most common GRACE score applied was the one for predicting in-hospital death [11,12,13,14,15,16]. The questions of why this model is selected and whether it is the most appropriate one remain unanswered. No study has comprehensively assessed these scores in detail in chest pain patients. Therefore, the superiority of certain GRACE scores remains unclear.

Using a range of model performance indices, we aimed to evaluate the performance of six GRACE models and compare their discrimination, reclassification and diagnostic accuracy with those of HEART and TIMI scores to rule out/rule in 30-day MACEs among acute chest pain patients presenting to the ED.

Methods

Study design

This is a secondary analysis of a previous cohort. We prospectively collected data through an observational study of acute chest pain patients from August 24, 2015 to September 30, 2017 in EDs of two public hospitals in China, the urban ED of the Qilu Hospital of Shandong University (a university-affiliated teaching hospital) and the rural ED of the People’s Hospital of Wenshang County. This study was approved by the ethics committees of the collaborating hospitals. Written informed consent was obtained from all participants.

Patient enrolment

Patients were consecutively recruited if they were aged 18 or older and had acute nontraumatic chest pain and troponin tests. Acute symptoms of myocardial ischaemia or an ischaemic equivalent, such as epigastric discomfort, dyspnoea or fatigue, were also considered as chest pain according to the American Heart Association case definitions [23].

To assess the performance of risk scores for the stratification of non-ST-elevation chest pain, patients with ST-elevation myocardial infarction (STEMI) were excluded. Other exclusion criteria for analysis included patients unable or unwilling to provide informed consent.

Data collection and measurements

Clinical information was extracted from the medical records and collected through patient interviews by research assistants using a standardized case report form (CRF); the variables were in accordance with the international standards [23]. Demographics, risk factors, previous medical history, symptom characteristics, physical examination, vital signs, troponin values, laboratory tests, triage, treatments and outcomes were covered.

Patients participated in follow-up telephone interviews at 30 days after enrolment, and information about MACEs and hospital attendances was collected.

Risk score calculations

Methods and formulas used to calculate the GRACE risk scores have been described in detail previously [24], including all 6 models for predicting in-hospital death (IHDth), in-hospital death or MI (IHDthMI), death within 6 months after discharge (OH6mDth), death or MI within 6 months after discharge (OH6mDthMI), death from admission to 6 months later (IH6mDth), and death or MI from admission to 6 months later (IH6mDthMI) (Table 1). These scores were calculated retrospectively using the prospectively obtained data. An ECG with ST depression (new horizontal or down-sloping ST depression ≥0.05 mV in two contiguous leads) or ST elevation (new ST elevation at the J point in two contiguous leads with the cut-points: ≥0.1 mV in all leads other than leads V2-V3 where the following cut points apply: ≥0.2 mV in men ≥40 years; ≥0.25 mV in men < 40 years, or ≥ 0.15 mV in women) [25] was defined as ischaemic ST deviation. Two independent cardiologists interpreted the ECGs blinded to the clinical data, troponin levels and events. Discrepancies were evaluated by a third cardiologist. The troponin results from the first blood sample arranged by emergency physicians in their daily work were used to calculate scores. The 99th percentile of the upper reference limit (URL) was used as the cutoff for determining positivity. In-hospital PCI and in-hospital CABG were assigned a score of 0 because the GRACE models were used to stratify chest pain patients immediately after arrival at ED and the PCI or CABG performed during admission were not applicable. In emergency care practice, not all GRACE predictor variables can be collected completely, especially the serum creatinine test (not routinely assessed) and the Killip class (not rated in patients without AMI). Here, we performed two kinds of assessment: one was based on the complete GRACE variables with the creatinine value and Killip class assigned as zero if absent; the other was based on the deletion of creatinine and Killip class from all observations using mini-GRACE, which has been introduced in the development of NICE guideline 94 and validated through a large MINAP registry of patients with ACS [26, 27]. The HEART and TIMI scores were calculated according to previous studies (Supplementary Table 1) [22, 28].

Table 1 Variables included in six GRACE models

Clinical outcomes

The primary outcome was the composite endpoint of MACEs within 30 days, including death from all causes, AMI (index and subsequent), emergency revascularization, cardiac arrest and cardiogenic shock. The diagnosis of AMI was performed according to the third universal definition of MI, as a detection of the rise and/or fall in cardiac biomarkers with at least one value above the 99th percentile of URL and with symptoms or ECG changes or imaging indicative of new ischaemia [25]. Cardiogenic shock was defined as persistent (> 30 min) systolic blood pressure (SBP) of less than 90 mmHg and/or cardiac index < 2.2 L/min/m2 secondary to cardiac dysfunction, requiring intravenous inotropic or mechanical support [23]. Two senor cardiologists from the clinical events committee adjudicated the MACEs independently using all available clinical records, and discrepancies were evaluated by a third senior physician. If patients were lost to follow-up, a local death registry was used to supplement the survival status.

Statistical analysis

Continuous variables are presented as the mean (standard deviation), and categorical variables are presented as the number of cases (percentage). Baseline characteristics between the MACE and no MACE groups were compared using t tests for continuous variables and chi-square (χ2) tests for categorical variables. Pearson product-moment correlation was used as “r” to describe the direction and quantify the strength of the linear association between GRACE scores and the incidence of MACEs in individuals with chest pain. The calibration was evaluated using the Hosmer-Lemeshow goodness-of-fit test (HLT). Low HLT χ2 and P value > 0.05 illustrate agreement between observed and predicted probabilities of an event and good model fit. Discrimination of scores was assessed by the area under the curve (AUC) of receiver operating characteristic (ROC) curves. An AUC of 0.8 ~ 0.9 is considered excellent and 0.7 ~ 0.8 acceptable [29]. Taking into account the implicit correlation between the curves of these scores, we used the Delong test to compare any two AUCs [30]. Reclassification was performed to assess how well a risk score improved predictions compared with another one based on category-free net reclassification improvement (NRI) and absolute integrated discrimination improvement (IDI). Diagnostic accuracy with 95% confidence intervals (CIs) of the different scores was determined, including sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV). To guarantee the safety of discharging low-risk patients, cutoff values of each model to obtain sensitivities of at least 95%, 98% and 99% were identified. High-risk cutoff values were set at a specificity of ≥95% or ≥ 90%. The sensitivity and specificity of different strategies were compared using the McNemar test, while proportions of patients classified as low-risk and high risk, NPV and PPV were compared using the χ2 test. A P value of less than 0.05 (two-sided significance testing) was considered statistically significant in the analysis. All statistical analyses were performed using SAS V.9.4 (SAS Institute Inc., Cary, North Carolina, USA) or MedCalc V.18.11.3 (MedCalc Software, Ostend, Belgium).

Results

Study population

A total of 3536 patients with acute nontraumatic chest pain and initial cTn tests presented in the participating EDs from August 24, 2015 to September 30, 2017. Some patients were excluded due to denial of informed consent (77) and diagnosis of STEMI (472). There were 88 patients with insufficient information to calculate the GRACE scores, including 74 due to no initial ECG and 14 due to no SBP values. For 13 surviving patients, follow-up contacts were unsuccessful. Eventually, 2886 patients remained for analysis (Fig. 1). Baseline characteristics and initial evaluation between patients with and without 30-day MACEs are compared in Table 2. Patients with 30-day MACEs tended to be older, be male, have a higher burden of risk factors and have significantly higher GRACE scores than those without 30-day MACEs (P < 0.001).

Fig. 1
figure 1

Study flowchart. cTn cardiac troponin, ECG electrocardiography, STEMI ST-segment elevation myocardial infarction, SBP systolic blood pressure

Table 2 Baseline characteristics of the study population

Outcomes

There were 590 (20.4%) chest pain patients with adjudicated MACEs in 30 days after presentation, including 52 patients (1.8%) who died from all causes, 549 (19.0%) with index AMI, 24 (0.8%) with subsequent AMI, 10 (0.3%) who underwent emergency revascularization, 32 (1.1%) who experienced cardiac arrest and 32 (1.1%) who experienced cardiogenic shock.

Correlation between GRACE scores and actual event rates

All six GRACE models showed good positive linear correlation with the actual MACE rates in patients with undifferentiated chest pain (Fig. 2). The GRACE (IHDthMI) and GRACE (IH6mDthMI) exhibited very strong relationships, with r values of 0.913 and 0.920, respectively (P < 0.001).

Fig. 2
figure 2

Pearson product-moment correlation between GRACE scores and actual event rates. GRACE Global Registry of Acute Coronary Events, MACE major adverse cardiac events

Agreement between observed and predicted probabilities of an event

As shown in Fig. 3, the predicted probabilities of an event were much close to the observed event rates across deciles of five GRACE models. And the HLT P values for the GRACE (IH6mDthMI), GRACE (IHDth), GRACE (IH6mDth), GRACE (OH6mDth) and GRACE (OH6mDthMI) were 0.113, 0.446, 0.608, 0.312 and 0.073, respectively. However, the P value of GRACE (IHDthMI) was < 0.001.

Fig. 3
figure 3

Hosmer-Lemeshow goodness-of-fit tests of six GRACE models. GRACE Global Registry of Acute Coronary Events, MACE major adverse cardiac events

Discrimination

The ROC curves of all the GRACE models, HEART and TIMI scores are depicted in Fig. 4. The AUCs of GRACE (IHDthMI), GRACE (IH6mDthMI), GRACE (IHDth), GRACE (IH6mDth), GRACE (OH6mDth) and GRACE (OH6mDthMI) were 0.83 (0.81, 0.84), 0.82 (0.81, 0.83), 0.75 (0.73, 0.76), 0.73 (0.72, 0.75), 0.72 (0.70, 0.73) and 0.70 (0.68, 0.71), respectively. The AUCs of GRACE (IHDthMI) and GRACE (IH6mDthMI) were equal to the C-statistic of HEART score at 0.82 (0.80, 0.83) and superior to the other GRACE models and the TIMI score (0.71, 0.69–0.73) (P < 0.001). The C-statistics of the GRACE models and the HEART and TIMI scores in each participating hospital are presented in Supplementary Table 2.

Fig. 4
figure 4

Receiver operating characteristic curves of six GRACE models (a), HEART and TIMI scores (b) for the prediction of MACE within 30 days. GRACE Global Registry of Acute Coronary Events, HEART History, ECG, Age, Risk factors, Troponin, MACE major adverse cardiac events, TIMI Thrombolysis in Myocardial Infarction. * Significantly different from the HEART score. # Significantly different from the TIMI score

Reclassification

As shown in Table 3, the GRACE (IHDthMI) improved risk classifications of chest pain patients with positive NRI and IDI over the other GRACE models and the TIMI score (P < 0.001), while it presented comparable ability to the HEART score. The same trend was seen in the GRACE (IH6mDthMI) model.

Table 3 The reclassification measurements of the GRACE (IHDthMI) and GRACE (IH6mDthMI) compared with other scores

Rule-out and rule-in of events

For discharging low-risk patients safely with sensitivity ≥95% (as shown in Table 4), GRACE (IHDthMI) ≤81 and GRACE (IH6mDthMI) ≤79 could identify 868 (30%) and 821 (28%) patients as low risk, respectively, which were significantly better than other GRACEs and a HEART score ≤ 3 (22%). Additionally, GRACE (IHDthMI) ≤81 and GRACE (IH6mDthMI) ≤79 ruled out MACEs with an NPV of 0.967 (0.955, 0.979) and 0.965 (0.952, 0.977), respectively, exceeding other GRACEs and comparable to a HEART score ≤ 3 with an NPV of 0.970 (0.957, 0.983). If the sensitivity was set at ≥98%, the proportion of patients identified as low risk would decrease to 14% for GRACE (IHDthMI) ≤64 and GRACE (IH6mDthMI) ≤61 with no change in the superiority to HEART ≤2 (11%) and TIMI =0 (12%) (Supplementary Table 3). If the sensitivity was improved up to ≥99%, the proportions of low-risk patients would drop below 10% for GRACEs, which were still significantly greater than a HEART score ≤ 1 (3%) (Supplementary Table 3).

Table 4 Performance of different models in terms of diagnostic accuracy at cutoff values with certain sensitivity and specificity levels

Regarding ruling in MACEs with specificity ≥95% (Table 4), GRACE (IHDthMI) > 186 and GRACE (IH6mDthMI) > 161 could recognize 12% and 11% of patients as high risk, respectively, which were more than a HEART ≥8 (9%) and a TIMI ≥5 (8%). The sensitivities of GRACE (IHDthMI) > 186 and GRACE (IH6mDthMI) > 161 were even better than those of HEART ≥8 and TIMI ≥5 without compromising specificity. As shown in Supplementary Table 4, GRACE (IHDthMI) > 168 and GRACE (IH6mDthMI) > 146 recognized 19% and 18% of patients as high risk, which were smaller than the proportion (22%) identified by the HEART score (7–10). However, the GRACE (IHDthMI) > 168 had greater specificity (0.904, 0.892–0.916) and PPV (0.591, 0.550–0.633) than HEART ≥7, with a specificity of 0.870 (0.856, 0.884) and a PPV of 0.534 (0.496, 0.573).

Performance of the mini-GRACE models

The mini-GRACE (IHDthMI), mini-GRACE (IH6mDthMI), mini-GRACE (IHDth), mini-GRACE (IH6mDth) and mini-GRACE (OH6mDth) showed positive linear correlations with the actual MACE rates (r ≥ 0.793, P < 0.001). A very strong relationship remained in the mini-GRACE (IH6mDthMI) (r = 0.917). The mini-GRACE (IHDth), mini-GRACE (IH6mDth) and mini-GRACE (OH6mDth) had good calibration (P ≥ 0.517) while the other two did not (Supplementary Table 5). The mini-GRACE (IHDthMI) and mini-GRACE (IH6mDthMI) models, with AUCs of 0.82 (0.80, 0.83) and 0.81 (0.79, 0.82), respectively, were still superior to other models in discrimination and reclassification (Supplementary Figure 1) (Supplementary Table 6).

Discussion

This study provides the first comprehensive evaluation and comparison of all six GRACE risk-prediction models in patients with undifferentiated chest pain. In the two Chinese EDs included in this study, all six GRACEs showed a positive linear correlation with actual MACE rates, and the five models had good calibration. All the C-statistics were ≥ 0.70. The GRACE (IHDthMI) and GRACE (IH6mDthMI) exhibited very strong relationships with actual MACE rates (r > 0.9) and showed excellent discriminatory capability (AUC > 0.80). Improvements in AUC, NRI and IDI indicated that GRACE (IHDthMI) and GRACE (IH6mDthMI) were comparable to the HEART score and superior to the other models.

The GRACE risk scores were developed using multivariable regression to assist cardiologists in estimating the risk of different outcomes in hospitalized patients with ACS and have been indicated to provide the most accurate stratification of risk of ACS both on admission and at discharge [31, 32]. One model is specific to one kind of outcome, including death or composite of MI and death during hospitalization, within 6 months after discharge and from admission to 6 months later. The MI referred to here is the subsequent AMI occurring after the index ACS. However, for undifferentiated chest pain, the high-risk conditions mainly present a composite endpoint of index AMI, subsequent AMI, death, emergency revascularization, cardiac arrest and cardiogenic shock within 30 days after presentation to the ED [33]. The incidence of index AMI is much greater than that of subsequent AMI, as shown in our study. Our results suggested that the GRACE models showed at least a moderate correlation with the actual incidence of MACEs in the undifferentiated chest pain cohort. In particular, very strong correlations appeared in GRACE (IHDthMI) and GRACE (IH6mDthMI). Furthermore, the predicted probabilities of an event and the observed event rates were significantly similar across deciles of five GRACE models. Therefore, there are foundations for the GRACE models to provide accurate stratification of patients with acute chest pain.

In previous studies, C-statistics for predicting 30-day MACEs in chest pain patients were merely evaluated according to the GRACE (IHDth) with AUCs of 0.60 to 0.83, which were always inferior to those of the HEART score [10,11,12,13,14,15,16]. Consistently, we found that the AUC of the GRACE (IHDth) was 0.75 (0.73, 0.76), which was actually lower than that of the HEART score in this study. However, GRACE (IHDth) was neither the only GRACE model nor the best GRACE model for stratifying chest pain. The GRACE (IHDthMI) and GRACE (IH6mDthMI) had better total discriminatory capability (AUC > 0.8) and reclassification without difference from the HEART score. Although the performance of all these models was not good in the rural hospital as in the urban hospital, the advantages of GRACE (IHDthMI), GRACE (IH6mDthMI) and HEART were consistent in both EDs. Significantly positive NRI and IDI in this study showed that the GRACE (IHDthMI) and GRACE (IH6mDthMI) could provide a higher predicted probability of an event for high-risk patients and a lower predicted probability for low-risk patients than the other four models. The possible explanation may be that events predicted by these two GRACE models referred to attacks of AMI rather than merely death, though not the index AMI. Compared with the models for events after discharge, the periods in the hospital or from admission to 6 months later were closer to the 30-day follow-up after presentation to the ED. Our results did not refute previous conclusions but complemented them by providing more complete recognition of the GRACE models.

Exact cutoff values should be determined for clinical use to identify low-risk patients for safe and early discharge without compromising the immediate treatment of high-risk chest pain. Reaney et al. found that GRACE (IHDth) 0–55 could reach a sensitivity of 95.2% and NPV of 95.8%, identifying 21.2% patients as low risk. GRACE (IHDth) ≥119 defined 16% of patients as high risk (specificity 89.8%; PPV 48.1%) [16]. Poldervaart et al. determined GRACE (IHDth) ≤72 as the cutoff and 19.1% patients were classified as low-risk (sensitivity 95%; NPV 96%) [14]. Cullen et al. chose the cutoff of GRACE (OH6mDth) ≤50 to determine low-risk (24% patients) with a sensitivity of 98.9%, and the cutoff for recognizing high risk (28% patients) was ≥100, with a specificity of 76.2% [17]. In our study, the performance of GRACE (IHDth) was relatively consistent with that of previous studies, with a value of ≤79 identifying 18% patients as low risk (sensitivity 95.1%; NPV 94.4%) and a value of > 145 defining 16% patients as high risk (specificity 90.0%; PPV 49.9%). At the same sensitivity and specificity, the GRACE (IHDthMI) and GRACE (IH6mDthMI) outperformed the GRACE (IHDth) and other GRACEs. Although there is no rigorous standard for the sensitivity of risk-stratification models for chest pain, an international survey suggested that clinicians may expect a sensitivity of 99% or higher for AMI or other MACEs [34]. If the sensitivity was set at ≥99%, GRACE (IHDthMI) and GRACE (IH6mDthMI) were still superior, but the proportions of low-risk patients would drop below 10%. A meta-analysis demonstrated that the pooled sensitivity and specificity of a HEART score ≤ 3 for predicting MACEs were 96.7% (94.0, 98.2%) and 47.0% (41.0, 53.5%), respectively [35]. HEART≤3 in our cohort had a similar sensitivity (96.8%) but a lower specificity (27.0%). The sensitivity of HEART ≤2 was higher at 98.8%(97.9, 99.7%) at the cost of a lower proportion (11%) of patients identified as low risk. In our previous report, the HEART score would not appear to provide additionally helpful risk stratification to the usual care for discharging low-risk patients [36]. Regarding the high-risk category, HEART ≥7 did not perform as well (specificity 87.0%; PPV 53.4%) as in Reaney’s study [16].

For ruling out and ruling in MACEs, the HEART score illustrated a certain advantage over the GRACE (IHDth) but not the GRACE (IHDthMI) or GRACE (IH6mDthMI). The strengths of GRACE are still noteworthy. Possible explanations might be the detailed class and objectivity of components of the GRACE beyond HEART and TIMI. Although the HEART score was directly developed for undifferentiated patients, the assignment of every variable only included three qualitative classes (i.e., 0,1,2) [21]. The classes for each component of TIMI score are even lesser (only 0 or 1) [22]. In contrast, the GRACE scores included many more quantitative variables, such as age, SBP, pulse and creatinine, which are supposed to identify subtler differences and result in more exact stratification. As highlighted by the 0 h/1 h algorithm from the ESC guideline recommendations, quantitative interpretation overcomes the qualitative interpretation of high-sensitivity troponin levels for ruling out and ruling in AMI in chest pain patients, and the cutoff levels are assay specific [6, 37]. Furthermore, some “soft” variables are included in the HEART score, such as the medical history, risk factors and symptoms. It has been shown that these variables do not have a sufficient discriminatory ability to rule in or rule out ACS in the ED [38]. The combination of symptom variables as a “history” component in the HEART score was still not clearly stated and not assessed systematically [39]. The GRACE score can avoid this situation due to the absence of subjective variables. The popularity of handheld devices has made the complexity of GRACE no longer a disadvantage.

The results from the assessment of mini-GRACE were mainly in accordance with those of the complete models. Although the correlation of the mini-GRACE (IHDthMI) and the calibration of the mini-GRACE (IH6mDthMI) were lower than the complete models, the discrimination and reclassification of these two mini scores remained excellent and significantly outperformed other models. This illustrates that the differences in model performance may be due to the disparities in weights of shared variables.

Limitations

This study had several limitations. First, the performance of different GRACE scores was assessed in chest pain patients from two hospitals in China. Although urban and rural hospitals were both included, the validation of each score in wider patients should be determined by further studies of heterogeneous groups. In particular, the cutoff levels of the GRACEs are not the same in different studies due to the disparity of inclusion and exclusion criteria and the incidence and definition of MACEs. Determination and validation of the specific cutoff values in clinical practice in certain hospitals are needed. Second, the cardiac marker used in the calculation of scores was the contemporary cTn assay arranged by emergency physicians in their daily work. The ability of scores combined with high-sensitivity cTns to stratify chest pain still needs to be evaluated in future studies. Third, all components used in the risk scores were calculated automatically through a computer algorithm. The ECG variables were based on the standard interpretation from senior cardiologists. This calculation process deviated from clinical reality. Further studies to evaluate the discrimination of scores calculated immediately by the treating physicians are needed.

Conclusions

From our evaluation and comparison of the six GRACE models in a prospective cohort of undifferentiated chest pain patients in two Chinese EDs, we found that all six GRACE models presented acceptable or excellent discriminatory capacity for predicting 30-day MACEs. In particular, the GRACE (IHDthMI) and GRACE (IH6mDthMI) were comparable to the HEART score and superior to other GRACEs and TIMI in terms of discrimination and reclassification. At a certain sensitivity and specificity, GRACE (IHDthMI) and GRACE (IH6mDthMI) could identify more patients to rule out or rule in 30-day MACEs than other models. Although cutoff levels of the GRACEs may be specific in different cohorts and validation of these levels are needed, the reasonable application of appropriate GRACE models in the evaluation of undifferentiated chest pain patients in ED should be recommended.