Introduction

Coronary artery disease (CAD) is a common cardiovascular disease that seriously damages human health. Due to the rapid economic development and higher incidence of CAD in developing countries, China has observed an upsurge in patients undergoing CABG over the last decade1. The high risk of heart surgery during perioperation has gradually come to the attention of surgeons. Several risk evaluation systems, which quantify the risk by the patients’ data and predict their mortality or morbidity, have been developed and have received positive evaluations during the last two decades worldwide. Of these systems, two have become predominant: EuroSCORE in Europe and STS risk evaluation system in North America2. In China, Fuwai Hospital created a national multi-center database of patients undergoing isolated CABG known as the Chinese Coronary Artery Bypass Grafting Registry Study 3,4. Based on the more than 9,000 patients in this database, Sino System for Coronary Operative Risk Evaluation (SinoSCORE) was published in 20105.

SinoSCORE, EuroSCORE II and STS risk evaluation system were all developed using heart surgery patients in different regions and were well received, to varying degrees, for clinical application. The aim of this study is to validate SinoSCORE with isolated CABG patients in East China and compare the accuracy of predictive mortality of the three systems.

Results

For all 1616 patients in study, the realized mortality was 31 patients, or 1.92%. The baseline clinical characteristics of total patients were summarised in Table 1. The baseline data of subsets grouped by risk were shown in Table 2 and Table 3. The realized and predictive mortality rates for total patients and subsets were shown in Table 4. The predictive mortality of EuroSCORE II was1.74 ± 1.37% (95%CI 1.67–1.81), while SinoSCORE was 1.35 ± 3.30% (95%CI 1.19–1.51) and STS risk evaluation system was 1.05 ± 1.45% (95%CI 0.98–1.12). The receiver operating characteristic (ROC) curves of the three systems for total patients and subgroups were shown in Fig. 1 and Fig. 2. SinoSCORE achieved excellent discrimination (AUC = 0.888), followed by STS risk evaluation system (AUC = 0.844) and EuroSCORE II (AUC = 0.814). When grouping by risk, SinoSCORE (AUC = 0.790) also achieved the best discrimination in high-risk group, followed by STS risk evaluation system (AUC = 0.681) and EuroSCORE II (AUC = 0.647), while SinoSCORE (AUC = 0.901) and EuroSCORE II (AUC = 0.861) had excellent performance in low-risk group (Fig. 2, Table 4).

Table 1 CABG patient baseline clinical characteristics.
Table 2 Baseline clinical characteristics of low risk groups.
Table 3 Baseline clinical characteristics of high risk groups.
Table 4 Realized and predictive mortality rates of the three systems.
Figure 1
figure 1

The receiver operating characteristic curves of three risk evaluation systems in total patients For all of total patients, the receiver operating characteristic curves of EuroCORE II was 0.814, of SinoSCORE was 0.888, and of STS risk evaluation system was 0.844, respectively.

Figure 2
figure 2

The receiver operating characteristic curves of the three risk evaluation systems with subsets. (A) The receiver operating characteristic curves of the three risk evaluation systems with high risk (EuroCORE II 0.647, SinoSCORE 0.790 and STS risk evaluation system 0.687). (B) The receiver operating characteristic curves of the three risk evaluation systems with low risk (EuroCORE II 0.861, SinoSCORE 0.901 and STS risk evaluation system 0.777).

In terms of model calibration, SinoSCORE (H-L: P = 0.405), EuroSCORE II (H-L: P = 0.973) and STS risk evaluation system (H-L: P = 0.934) all achieved positive calibrations (H-L: P > 0.05) in the overall population. When patients were divided into high-risk group and low-risk group, the calibration was also assessed in each group by the Hosmer-Lemeshow (H-L) statistics. In the subset of high risk, SinoSCORE (H-L: P = 0.988), EuroSCORE II (H-L: P = 0.103) and STS risk evaluation system (H-L: P = 0.898) achieved good calibrations (H-L: P > 0.05); so did in low-risk group: SinoSCORE (H-L: P = 0.994), EuroSCORE II (H-L: P = 1.000) and STS risk evaluation system (H-L: P = 1.000) (Table 4).

Calibration plots showed that three risk evaluation systems deviated from the diagonal. It was explained that three risk evaluation systems underestimated mortality rates in total patients, where SinoSCORE performed slightly better than others (Fig. 3).

Figure 3
figure 3

Calibration plots for the three risk evaluation systems. (A) Calibration plots for EuroCORE II. (B) Calibration plots for SinoSCORE. (C) Calibration plots for STS risk evaluation system.

The decision curve analyses (DCA) represented the clinical practicability of the three risk evaluation systems to predict operative mortality. The results were showed as a graph with the selected probability threshold (i.e., the degree of certitude of postoperative mortality over which patients refused operation) plotted on the abscissa and the net benefits of the risk evaluation system on the ordinate. In the entire cohort, decision curves of EuroSCORE II and SinoSCORE were similar, and the curve of EuroSCORE II was slightly greater than the curve of SinoSCORE, included between 0 and 30%. But they were all always above the curve of STS risk evaluation system regardless of the selected threshold. (Fig. 4C) In high-risk group, the net benefits of the STS risk evaluation system were worse than those of SinoSCORE and EuroSCORE II regardless of the selected threshold. The curve of SinoSCORE was slightly greater than that of EuroSCORE II, included between 0 and 40%. (Fig. 4A) In low-risk group, the net benefit of the SinoSCORE was always greater than that of EuroSCORE II and STS risk evaluation system between 0 and 20% (Fig. 4B).

Figure 4
figure 4

DCA showed the clinical usefulness of EuroSCORE II, SinoSCORE and STS risk evaluation system in predicting in-hospital mortality. The grey line represented the net benefit of providing surgery for all patients, assuming that all patients would survive. The black line represented the net benefit of surgery to none patients, assuming that none would survive after surgery. The red, blue and green lines represented the net benefit of applying surgery to patients according to EuroSCORE II, SinoSCORE, and STS risk evaluation system, respectively. The selected probability threshold was plotted on the abscissa. (A) DCA for high-risk group. (B) DCA for low-risk group. (C) DCA for entire cohort.

Discussion

In recent years, because of the rapidly increasing CABG patients and the demand for high-risk surgery, both patients and surgeons have become aware of the risk evaluation system. These systems have played an important role in surgical decision-making and have improved the quality of medical treatment, preoperative patient education and consent, optimisation of the allocation of medical resources and standardisation of the comparisons among different centers or surgeons6,7,8. The risk evaluation systems were aimed at providing a more accurate assessment to guide surgery for individual patients by balancing the potential risks and benefits9. A thorough risk evaluation system should be established on a large database that is representative of current clinical practice, and systematic data validation should be utilised to affirm its accuracy10.

Risk evaluation systems for heart surgery have been under study in developed countries since decades ago, and based primarily on European (EuroSCORE II) and North American (STS risk evaluation system) databases, which may lead to the obvious errors when applied in Chinese population11,12,13,14,15. In this context, SinoSCORE, which was established with Chinese database, was developed in 2010. At the same time, the previously developed risk evaluation systems were under continuous revision to improve the accuracy and representativeness of the database due to the increasing numbers of research centers, cases, and changed or removed of outdated risk factors16,17,18,19,20,21. Therefore, SinoSCORE, EuroSCORE II and STS risk evaluation system were all established for several years. The first affiliated hospital of Nanjing Medical University and East hospital affiliated to Tongji University are both regional central hospitals, located in Nanjing and Shanghai, East China. Patients from the two hospitals could represent typical East China patients. Because of the vast territory of China, there are great differences in the four corners. There were some different proportions in the same risk factors between our study database and SinoSCORE database, such as age, diabetes, hypertension, renal failure, cerebrovascular accident, previous cardiac surgery and so on (Table 5). It is significant to compare the three risk evaluation systems in East China patients.

Table 5 The baseline risk factors of SinoSCORE database and Local database.

Validation literatures on Chinese patients excluding isolated valve surgery, only one had been published that indicated the EuroSCORE II performed well in predicting mortality in total and in the low-middle risk group, whereas not in the high-risk group22. Although the EuroSCORE II database had significant differences with our study database in parity of regions and populations, it achieved excellent predictive value in total (AUC = 0.814), as well as in low-risk groups of patients (AUC = 0.861). Similar to the result of Bai et al.22, the discrimination of EuroSCORE II in the high-risk group was not satisfactory. The number of patients at high-risk in EuroSCORE II was two times higher than in SinoSCORE and STS risk evaluation system, some patients with low-risk were assigned to the high-risk group, and which might be the reason contributed to the discrimination of EuroSCORE II in the high-risk group was not satisfactory.

As well-known as EuroSCORE II, STS risk evaluation system was composed of three parts: isolated CABG, isolated valve surgery and valve surgery plus CABG17,19,20. The validation database affirmed the clinical application value of this system19. In recent years, there were reports that STS risk evaluation system was well-validated in British, New Zealander and in Indian patients (in which it had satisfactory calibration power but poor discriminatory power) undergoing heart surgery2,23,24. In our study, STS risk evaluation system achieved positive calibrations (H-L: P > 0.05) in the entire cohort and in subsets, which was in accordance with Zhang et al.23. They reported that this system might be a potentially appropriate choice for Chinese patients undergoing isolated CABG. But discrimination of STS risk evaluation system (AUC = 0.687), as well as EuroSCORE II (AUC = 0.647), was poor in high-risk group. One possible reason was that the preoperative parameters of patients in high-risk group had dramatic difference. Another possible reason was that EuroSCORE II and STS risk evaluation system also predict others cardiac surgical mortality, evaluating the predictive capacities of isolated CABG mortality may undermine its potency.

SinoSCORE solved the problem that China did not have its own heart surgery risk evaluation system. Although just started, SinoSCORE has achieved good assessments in several medical centers throughout China25,26,27,28,29,30. Therefore, in theory, SinoSCORE should be most relevant to Chinese patients compared with others. In our study, SinoSCORE remained the most valuable risk evaluation system (AUC = 0.888). There are several reasons. First, our study database shared the same human race with SinoSCORE database. Second, There were more similar risk factors between our study database and SinoSCORE database, such as sex, peripheral vascular disease, active endocarditis, critical preoperative state3,4, and which might be the reason contributed to SinoSCORE had excellent expected power. Third, all the patients in the modelling of SinoSCORE were patients only underwent CABG while patients underwent different kinds of cardiac operations were subjected to EuroSCORE II and STS risk evaluation system.

As we all know, for the risk evaluation systems, it is more meaningful to improve the ability of predicting high risk patients. Although the discrimination of the three risk evaluation systems in the high-risk group was lower than the discrimination in the low-risk group, SinoSCORE was the best discrimination in high-risk group. A part of patients in the study were involved in the establishment of SinoSCORE, which might be the reason contributed to the discrimination of SinoSCORE in high-risk group is satisfactory. Although the three systems all had good calibration and discrimination, unfortunately, they sensibly underestimated the mortality in the entire cohort and subsets. One possible reason was that although cardiac surgery and perioperative care in China have developed rapidly in the last decades, there are still some gaps compared with the developed countries. Another possible reason was that there were 3.87% of patients (65 cases) excluded from the study because of incomplete data. The discrimination of risk evaluation systems was tested by AUC, which was used to assess how well the system could discriminate between survivors and non-survivors. Therefore, AUC is considered to be one of the most important indicators to evaluate the systems. AUC is an indicator of the comprehensive evaluation system, which is more important than the predictive accuracy.

There are some limitations of the study. First, this study was a double-center retrospective and non-randomised observational study. Second, the population size was still small compared with other systems that were sourced from a large number of patients. Third, EuroSCORE II and STS risk evaluation system are designed for variety cardiac surgery, And STS risk evaluation system can also predict other outcomes. Evaluate the predictive capacities of EuroSCORE II and STS risk evaluation system to predict only isolated CABG mortality may undermine its potency. The above points might contribute to bias. Therefore, the mortality statistics maybe limited to some degree.

In summary, for isolated CABG operation in East China patients, SinoSCORE fits the data well, with excellent discrimination and good calibration. SinoSCORE showed no compromise when compared with EuroSCORE II and STS risk evaluation system.

Methods

The study included all patients (1681 enrolled) undergoing isolated CABG in two hospitals (the first affiliated hospital of Nanjing Medical University and the east hospital affiliated to Tongji University) between January 2010 to December 2016, which was approved by ethics committees of the two hospitals. All experiments were performed in accordance with relevant guidelines and regulations. Written informed consent was obtained before data collection. There were 65 (3.87%) patients excluded from the analyses because of incomplete data, and a total of 1616 procedures comprised the study’s database. The database included 1267 males and 349 females, with an average age of 65.21 ± 8.50 years. Each patient’s diagnosis was confirmed by coronary arteriography. According to the study database, the operative risk was predicted using the algorithms online SinoSCORE available at http://www.cvs-china.com/sino.asp, EuroSCORE II available at http://www.euroscore.org/calc.html and STS risk evaluation system available at http://riskcalc.sts.org/STSWebRiskCalc273/de.aspx. The predictive mortality of each patient was ascertained by each of the systems. The definition of mortality was post-operative in-hospital death and included against-advice discharge deaths.

To further explore the predict efficacy of the three evaluation systems, in each set, it was divided into two subgroups according to the realized mortality rate (1.92%, 31/1616): high-risk group (predictive mortality ≥1.92%) and low-risk group (predictive mortality <1.92%). The calibration and discrimination of the three systems in total patients and each subset were assessed, and were compared. In order to make a fair comparison among the three systems, we compared the predictive and realized mortality rates in total and each subset.

Statistical Analysis

The baseline data were presented as means ± standard deviation, interquartile rang for continuous variables and calculated by the t test; categorical variables were expressed as percentages and were calculated by the χ 2 (chi-square) test. P < 0.05 was considered as the statistically significant level.

Calibration and discrimination were used to assess predictive efficiency. The calibration was assessed by the Hosmer-Lemeshow (H-L) statistics. The calibration is considered to be good if P > 0.05, which indicates that the system could predict mortality accurately31. The discrimination was assessed by C statistics using the area under the receiver operating characteristic curve (AUC). Discrimination measured the evaluated system’s capacity to differentiate the individuals by illness or death. AUC ranges 0.50–1.00, and AUC > 0.70, > 0.75, and > 0.80 indicates that the discrimination is available, good and excellent, respectively32.

Calibration plots of realized versus predictive mortality rates for 20 equally sized groups by ranked predictive risk calculated of the three systems were constructed. The ideal calibrated predictions consist with the 45° line. When points below or above the diagonal indicates overestimation or underestimation respectively.

The net benefit of three risk evaluation systems for predicting in-hospital mortality was performed by Decision Curve Analysis (DCA). DCA consists in the subtraction of the proportion of all patients who are false-positive from the proportion who are true-positive, weighting by the relative harm of a false-positive and a false-negative result33. The statistical analysis was performed with SPSS Version 18 (SPSS Inc., Chicago, Illinois, USA). DCA was performed with R software version 3.4.0 (The R Foundation for Statistical Computing; State of Jersey, Austria) with package Decision curve.

Data Availability

All data generated or analyzed during this study are included in this published article.