Background

Cardiovascular disease represents a considerable health problem and is a major cause of death worldwide [1]. The condition is commonly treated with Percutaneous Coronary Intervention (PCI), which is a low-cost procedure as compared to coronary artery bypass graft surgery (CABG), requiring shorter hospitalization and recovery times. Deployment of drug eluting stents (DES) have largely replaced the use of bare metal stents (BMS), improving long-term prognosis, mainly by reducing the rate of restenosis [2]. Until now, identifying potential risk factors for subsequent major adverse cardiovascular events may offer additional advantages with respect to outcome [2,3,4,5]. However, this requires suitable models for determining the risk factors.

The Cox model is a general quasi-parametric choice for analyzing censored data. This model relates the log of the hazard ratio to a linear function of the predictors. There have been several limitations for the Cox model such as requiring medical knowledge to model covariate interaction in terms of complex nonlinear forms, as well as the proportional hazard assumption [6, 7]. Failure to establish and ignore these assumptions can affect the validity of the results.

Random Survival Forest (RSF), as an ensemble learning method, has been developed to overcome the problems mentioned in the Cox model and other classical models for the analysis of survival data. The most important feature of RSF is the proper performance of this model for measuring the importance of variables [8]. This model is also suitable for medical research in the field of high dimensional data [9,10,11]. Various studies have evaluated the performance of the RSF model in comparison with the Cox model [12].

Several studies have been performed on the risk factors of future adverse events following PCI with the use of BMS and DES [5, 13]. However, there are a limited number of studies describing the results of long term follow up after PCI treatment, and results from long-term follow-up may not necessarily match those of short-term follow-up. Furthermore, to the best of our knowledge, the RSF model has not previously been used to identify factors affecting the occurrence of MACCE in patients undergoing angioplasty with stent deployment.

Therefore, we have conducted a long-term study to identify factors affecting the occurrence of MACCE following coronary stenting, comparing the RSF and Cox proportional-hazards models.

Methods

The current retrospective cohort study was performed on 220 patients (69 women and 151 men) undergoing coronary angioplasty from March 2009 to March 2012 in Farshchian Medical Center in Hamadan city, Iran. In this study, major adverse cardiovascular and cerebrovascular events known as MACCE were selected as the designated events (including death, CABG, stroke and repeat revascularization) for survival analysis.

Survival time (months), as the response variable, was considered from the date of angioplasty to the end of the follow-up period (September 2019) or the occurrence of MACCE. For the patients who had not experienced MACCE, the time from the date of angioplasty to the end of the follow-up time was considered as the censored survival time.

To identify the factors influencing the occurrence of MACCE during 10 years follow-up after coronary angioplasty, the performance of the Cox model and RSF model were investigated. Also, the event-free survival curve from MACCE was constructed with the Kaplan–Meier method.

It should be noted that the restricted mean survival time (RMST) reported for between-group summary metrics. Unlike median survival time, it is estimable even under heavy censoring.

Cox proportional hazard model

Cox proportional-hazards model specifies the conditional hazard function based on the vector of predictor variables. The general form of hazard for the ith subject with the Xi profile at the time of t based on the Cox model was as follows:

$${\text{h}}\left( {{\text{t}},{\text{x}}} \right) = h_{0} \left( {\text{t}} \right)\exp \left(\sum \beta_{i} x_{i}\right)$$

The Cox model consists of two components: non-parametric component as unspecified increasing function, known as the baseline hazard (h0) and the parametric component, which is a linear and multiplicative function of the Xi [6].

Random forest survival model

The RSF model, as a tree-based ensemble non-parametric algorithm can solve the limitations of the Cox model as well as identify and rank the most important variables affecting survival time. Ensemble learning is a type of supervised learning technique in which the main idea is producing several models in a training data set and then combining (average) output rules or the hypotheses obtained from them [14].

In general, the RSF algorithm includes the following steps:

  1. 1.

    The number of B Bootstrap samples were selected from the original data. In each bootstrap sample, about one-third of the data was out of the bag. For example, 1000 samples of Bootstrap were selected from the main data, in each Bootstrap sample, 670 samples were used for training, and the remaining out-of-bag (OBB) sample used for testing and estimation of prediction error.

  2. 2.

    A survival tree-based Nelson-Aalen estimator was grown for each Bootstrap sample. In each node of the tree, mtry covariates were randomly selected out of all covariates for splitting. A variable was chosen to maximize the separation between two formed tree nodes. Growth stops after a certain stop condition is met (e.g., when the number of observations within a terminal node is less than a preset value or when the node becomes pure). Default values of mtry = √ p and the log-rank statistic are used as split criteria.

  3. 3.

    To obtain a risk prediction ensemble, information from the terminal nodes (nodes with no further split) of B survival trees were aggregated. For each tree, the cumulative hazard function (CHF) is calculated, and then the average of these CHFs reports the ensemble CHF.

  4. 4.

    The prediction error was calculated for the ensemble CHF using OOB data.

In this study, the implementation of the RSF model for data in each time consisted of 2000 trees based on log-rank as splitting criteria. The relative importance of each variable was also assessed using VIMP criteria. The larger the VIMP value for a variable, the more important the predictor role of that variable.

Evaluation of survival models

Brier Score, as a measure to evaluate the performance of different survival models, is the mean square error of the prediction and indicates the predictive ability of a prediction model. Smaller values of the Brier Score indicate a more accurate prediction. The general form of the score is as follows:

$${\text{BS}}({\text{t}},\hat{S}) = {\text{E}}\left( {Y_{i} \left( {\text{t}} \right) - \hat{S}\left( {{\text{t}}|X_{i} } \right)} \right)^{2}$$

where \(Y_{i} \left( {\text{t}} \right)\) is the event status for the i-th subject at time t, and \(\hat{S}\left( {{\text{t}}|X_{i} } \right)\) is the survival probability for this person at time t according to the model [15].

Therefore, IBS (Integrated Brier Score) and C index criteria based on OOB data were used to compare the performance of Cox models and the random survival forest. It should be noted that for computing the evaluation criteria, all variables were included in both models.

Analyses were performed using the R3.6.3 (randomForestSRC, pec, survival) software package. The significance level was considered as 5%.

Results

From March 2009 to March 2012, 220 patients, including 151 males (66.8%) and 69 females (31.4%) who underwent PCI with stents implantation, were retrospectively evaluated. During angioplasty, the mean age of patients was 60.11 ± 11.09, which in the males (58.74 ± 11.07) was statistically shorter than that of the females (62.75 ± 10.72) (P = 0.013). Table 1 presents descriptive information of the patients and the comparison of the median survival time based on the Log Rank test for each of the variables.

Table 1 One to ten year survival rate for the patients who underwent angioplasty from March 2009 to March 2012 in Hamadan (west of Iran)

During a mean follow-up period of 96.65 months, 96 (43.7%) of the 220 patients experienced MACCE. Of them, 48 patients passed away (21.8%), 16 patients (7.3%) underwent CABG, 5 patients had a non-fatal myocardial infarction (2.3%) and 27 patients (12.3%) required repeat revascularization. Most of the deaths (44 patients) are due to cardiac complications and only the cause of the death of 4 patients was reported to be cancer. The median survival time was 98 months. The 1–10 year’s survival rate is also presented in Table 2. Patient survival decreased from 99% in the first year of follow-up to 39% at the end of the follow-up. Figure 1 illustrates the survival function of patients using the Kaplan Meier method. The estimated MACCE free survival during the follow-up period was only 39%. The results showed that the mean survival time ± SE in smokers (81.25 ± 4.19 months) was shorter than that of non-smokers (102.19 ± 2.52). Also, diabetic patients had a shorter mean survival time (73.73 ± 2.85) than non-diabetic patients (101.74 ± 2.41). Patients with hypertension experienced a shorter mean survival time (90.74 ± 3.45) than patients without hypertension (100.38 ± 2.95). Patients with a stent length greater than 20 mm had shorter survival time (89.79 ± 3.69) than the patients with a stent length of shorter than 20 mm (98.36 ± 2.64).

Table 2 Clinical characteristics of the patients who underwent angioplasty from March 2009 to March 2012 in Hamadan (west of Iran)
Fig. 1
figure 1

Kaplan Meier plot of MACCE free survival

The results confirmed that the proportional hazards assumption of the Cox model was generally established. The multivariable Cox model revealed that variables such as age, diabetes, smoking, and stent length had a significant effect on patient survival after angioplasty (HR = 1.03, HR = 2.17, HR = 2.41, HR = 1.74, respectively) (Table 3). Moreover, the results of comparing the Cox model and the RSF model with the log-rank splitting rule based on 2000 trees showed that the RSF model with IBS (time = 120) of 0.124 offered a better predictive performance compared to the Cox model with the value of 0.135. Also, the OOB Error Rate for the RSF was 0.352, while the OOB Error Rate for the Cox was 0.374. The C index for RSF and the Cox model was 0.648 and 0.626, respectively. According to the RSF model, the most important and influential variables affecting patient survival were diabetes, smoking, age, stent length, setting (presentation of coronary artery disease), and hyperlipidemia, respectively (as presented in Table 4 and Fig. 2).

Table 3 Cox regression analyses of factors associated with the occurrence of MACCE after angioplasty in a 10-year follow-up
Table 4 VIMP of random survival forest model
Fig. 2
figure 2

Variable importance (VIMP) of random survival forest model

In the next step of analysis confounders with significant unadjusted hazard ratios were included in the multiple cox regression. Also, for the RSF, confounders with positive VIMP were included. Then these two models were compared. The results show that for these conditions, the RSF (based on the six confounders with positive VIMP) has a better performance compared to the Cox model (based on the four confounders with significant unadjusted hazard ratios) (Table 5).

Table 5 Comparison of Cox regression and RSF models in different scenarios

Discussion

In this study, the long-term survival of cardiovascular patients after angioplasty was investigated in a 10-year follow-up. Comparing the predictive performance of the two models showed that the predictive performance of RSF was better than the Cox model.

Cox model showed that variables such as older age, diabetes, smoking, and longer stent length were the most important variables affecting patient survival. The most important factors affecting the survival of patients based on the RSF model were in order of diabetes, smoking, age, stent length, presentation of coronary artery disease, and hyperlipidemia.

To the best of our knowledge, there has been no similar study investigating the factors influencing the occurrence of MACCE after angioplasty with RSF. Until now, various studies have evaluated the short-term predictors of MACE (major adverse cardiac events) following PCI. However, few studies have focused on the long-term follow up outcomes. Most of these studies have reported short-term follow up results and compared the complications and survival of patients with DESs and BMSs.

We observed an incidence of 43.7% MACCE, whereas, Aghajan et al. [16] reported 14.4% MACE in elderly patients (with a mean age of 70.8 ± 4.7 years) during a 10 years follow-up period. During a shorter follow-up period of 2 years, Zhou et al. [17] reported 7.4% MACE, and after 3 years Meliga et al. [18] reported 26.5% MACE in patients treated at seven European and American medical centers.

Our results from both random forest and Cox's regression models showed that diabetic patients demonstrated a higher risk of MACCE (HR = 2.17). Similar results were also reported by Aghajan and coworkers (HR = 1.33), Meliga and coworkers (HR = 2.85) and Ebrahimzadeh and coworkers (HR = 2.91) [16, 18,19,20].

As expected, the traditional risk factors (e.g. age, diabetes and smoking) increased the risk of MACCE. One year increase in the age increased the risk of MACCE by 5%. The hazard rate of MACCE in smokers was 2.41 times that of the non-smokers. These results are consistent with the findings obtained by Farshidi et al. [21] and Tsai et al. [22], indicating a significant correlation of old age, smoking and diabetes during PCI with mortality.

The finding of this study confirmed that individuals with a stent number of 3 were 1.8 times more likely to experience MACCE than those with a stent number of 1. Also, the chance of MACCE increased with an increase in the number of involved vessels. Tsai et al. [22] found that triple vessel and stent implantation predicted the development of MACE in Chinese PCI patients.

Also, our results showed that there was no statistically significant effect of stent type on the survival of patients. The current study is observational non-randomized; therefore a comparison of two stents types will be biased according to the lesion or patient characteristic, and any interpretation of treatment result is therefore precluded (due to indication bias). However, in a randomized controlled trial study conducted by Horst et al., in patients undergoing PCI, there were no significant differences in the composite outcome of death from any cause and nonfatal spontaneous myocardial infarction between the two types of stents after a median of 5 years of follow up [23]. Flice et al., reported a statistically significant difference between the two types of stents in the occurrence of MACE during a 3-year follow-up (18% for the DES versus 28% for the BMS stent) in coronary patients with chronic obstructive pulmonary disease [24]. Cai et al., showed that there was a statistically significant difference in MACE between BMS (15.9%) and DES (8.8%) stent during 30-month follow-up [25]. Farshid et al., demonstrated that the need for re-hospitalization in patients treated with BMS was significantly higher than those treated with DES (P = 0.034). However, in the long-term follow-up, there was no significant difference in the mortality rate between the two types of stents [21]. Duggalb et al., reported that there was a statistically significant difference in unadjusted mortality rates between the BMS (5%) and DES (3.8%) [26]. Also, in the study by Dieguez et al. [27], the rate of all-cause mortality in patients treated with DES (6.5%) was significantly lower than that of patients treated with BMS (12.2%) (P = 0.049).

The study conducted by Melberg et al., showed that after a median of 10 years' follow-up, a quarter of the patients were dead, and more than half of the patients died from non-cardiac causes. Also, causes of death will change from MACE (MACCE) and be more dominated by cancer, especially after 5 years [28]. However, in the present study, cancer was the cause of death in only four patients.

One of the limitations of the present study is that it may be difficult to confirm the cause of death for people who died out of hospital. Since fewer diagnostic tests in terminally ill or elderly patients may be performed, the causes listed in the death certificates may be inconclusive. Also, the analysis of this type of data with composite endpoints from a competing risk perspective can be considered.

Conclusion

The current study showed that the use of machine-learning prediction models such as RSF may improve long-term prediction in patients undergoing coronary stenting. Although the prediction performance of RSF based on the prediction error criteria was better than the Cox model, the most important variables identified in the two methods were similar. Our findings imply that the presentation of coronary artery disease (acute or chronic) and hyperlipidemia may also be considered as important prognostic variables in addition to diabetes, smoking, age, and stent length. The risk of complications may be modified by controlling these prognostic factors.