Background

Atrial fibrillation (AF) is one of the most common sustained heart rhythm disorders and a condition associated with high mortality and morbidity. There are nearly 335 million patients with AF worldwide [1], with a prevalence rate of 2.9% [2]. The incidence of AF is increasing rapidly with the aging of the population and changes in lifestyle. The treatment consists of either antiarrhythmic drug therapy or catheter ablation or both. While it is believed that the benefits of radiofrequency ablation (RFA) generally outweigh the risks in properly selected patients; however, RFA is associated with a risk of complications leading to an increase in morbidity and mortality, increased in-hospital length of stay, and a substantial increase in healthcare costs. The risk of RFA postoperative complications continues to be a cause of concern and accurate forecasts of postoperative complications could be useful to both physicians and patients. In patients undergoing AF ablation, a preoperative assessment of the procedural risks and outcomes should be undertaken and this was recommended by the recent AF guidelines [3].

Although preoperative assessment of the risks of the RFA procedure had been widely studied, there were still some limitations to clinical application. First, some studies have revealed risk factors for complications, but their results were inconsistent [4,5,6,7,8,9,10]. Thus, there remains a need to further investigate the risk factors. Second, in one study [11] of the limited predictive model, the researchers built a model with limited variables, but were lack of many important factors: including the type of AF, peri-procedural medicine, echocardiography, and laboratory test results. Due to the lack of clinical factors, its reported AUC of any complications after RFA was 0.65 (95% CI = 0.63-0.67) for the derivation cohort and 0.64 (95% CI = 0.62-0.66) for the validation cohort, which was far from satisfactory.

To overcome the above-mentioned challenges, this study first collected a wide range of clinical factors, then define in-hospital complications as clinical outcomes. Finally, as a real-world observational study, this study may provide new evidence to relieve the inconsistency of reported risk factors. The objective of this study was to use machine learning techniques to develop an effective risk model for predicting complications after radiofrequency ablation of atrial fibrillation patients and reveal important risk factors based on the developed model.

Methods

Study population

This retrospective cohort study took place at a large-scale hospital in East China (Shanghai Chest Hospital, Shanghai, China). Patients who underwent RFA after being diagnosed as AF from April 2018 to October 2021 were eligible for inclusion in this study. Patients who were younger than 18 were excluded. Patients who underwent RFA procedure simultaneous cardiac valve surgery left atrial appendage surgery or pacemaker implantation were excluded. Patients who had more than one hospital visits of RFA procedure during the study period were treated as multiple samples in this cohort, meaning each RFA procedure of one patient was independent from other RFA procedures of the same patients. A total of 3365 procedures were analyzed in the present study. This study was approved by the Ethics Committee of Shanghai Chest Hospital (Shang, China) with approval number KS (P) 22005. Since the data were collected retrospectively, consent was not required.

Data collection and definitions

Patients’ demographic characteristics, medical history, signs or symptoms at presentation, electrocardiographic features, laboratory values, and in-hospital clinical outcomes were collected from hospital information systems, laboratory information systems, and electronic health records. For variables with multiple measurements such as heart rate, blood pressure, and other baseline variables like white cell count, and creatinine clearance rate, the last measurements before RFA were collected.

The primary outcome of interest was the occurrence of any complication after RFA, including pulmonary vein stenosis, phrenic nerve injury, periesophageal vagus nerve injury, arteriovenous fistulas/pseudoaneurysm, cardiogenic shock/arrest, cardiac effusion/tamponade, thromboembolic events (ischemic stroke, transient ischemic attack (TIA), peripheral embolism, or pulmonary embolism), pneumothorax, hemorrhage events or myocardial infarction. Hemorrhage events included minor hemorrhages like access site hemorrhage and major hemorrhage. Major hemorrhage events were defined as any bleeding events requiring blood transfusion. Besides, the occurrence of the two most common types of complications, including cardiac effusion/tamponade and hemorrhage, are defined as secondary outcomes.

Data pre-processing

The data format was unified, duplicates or unmatched items were dropped and outliers were replaced with null values. Q-Q plots, histograms, and Shapiro–Wilk tests were used to assess continuous variable distributions. Outlier was defined as values not lying within 1.5 times the interquartile range from the median. Variables with more than 30% missing values were removed from the analysis. Other variables with equal or less than 30% missing values were imputed by the multivariate imputation by chained equation (MICE) method [12]. The binary variables like gender, drugs use, and medical history were encoded as 0 and 1 (0 = female/no, 1 = male/yes). The model output corresponds to postoperative complication and was represented as a binary class (0 = without complication, 1 = with complication).

Model construction

We evaluated the prediction performance of the Logistic regression model as well as 4 different machine learning models including decision tree (DT), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) that have been demonstrated to apply to medical field and big data sets previously. A total of 59 different features (Table 1) were used as inputs into the prediction models. Multivariable logistic models were fitted using backward stepwise regression. For the stepwise method, Akaike Information Criterion (AIC) was used as the selection criteria to choose the predictors. Moreover, known and potential risk factors such as age or gender were considered in the logistic model. For machine learning models, we applied the grid search method with five-fold cross-validation to identify the optimal hyperparameters, which yield the highest value of AUC.

Table 1 Baseline characteristics of patients with or without complications

Model evaluation

The performance and estimation of the general error of the models were assessed using 20 times repeated fivefold cross-validation, where the data set is divided into 5 equal parts. In each repetition, one of the 5 parts is used as a test set, while the remaining 4 parts are used as a training set to train the model. The performance of the model is evaluated on the test set, and the process is repeated until each part has been used as the test once. This procedure is repeated for a total of 20 times, with a different random seed used for each repetition to ensure the variability of the results. The final evaluation of the model is based on the average performance across all repetitions. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC). In addition, we calculated accuracy, sensitivity (recall), specificity, and F score with a cut-off point, which was estimated using the maximized Youden index in the training set. Model calibration was tested by the Brier score. The smaller the Brier score is, the better calibration will be. 95% confidence intervals were calculated by 20 times repeated fivefold cross-validation for each metric. Shapley additive explanations (SHAP) were used to evaluate the importance of variables [13].

Feature ranking and selection

We used all candidate features to build the initial model. For ease of interpretation and application, machine learning models with top 5, top 10, top 15, and top 20 features were constructed according to the ranked importance of the features. For each machine learning algorithm, the feature subset generating the highest AUC was selected as the optimal feature subset.

Statistical analysis

Data were presented as the mean ± standard deviation (SD) for normally distributed data, or medians and interquartile range (IQR) for non-normally distributed data. Normally distributed variables were compared using Student’s t-test and non-normally distributed variables were compared using the Mann–Whitney U test. Categorical data were expressed as numbers and percentages (%). Pearson’s χ2 test or Fisher’s exact test were used for categorical data, as appropriate. All P values were two-tailed, and a P-value of < 0.05 was considered to represent statistical significance. Statistical analysis was performed in R version 4.1.2 and Python 3.9.13. The model development, evaluation, and calibration were performed using the Scikit-learn (1.0.2) and xgboost package (1.7.4) in Python. SHAP values were computed and visualized with the shap package (0.41.0). The imputation was performed in R using package “mice” (3.15.0). The sample data and code are publicly available on the project GitHub website at https://github.com/awei1234/Machine-Learning-Based-Risk-Models-for-Procedural-Complications-of-RFA-for-AF-patients.

Results

Study sample and procedural complications

Three thousand sixty five consecutive RFA procedures on 3187 AF patients between April 2018 and October 2021 were collected. Supplementary Figure S1 is a flow chart describing the procedure for subject selection. The variables used for model construction and missing rates were shown in supplementary Table S1. The baseline characteristics of the patients and the comparisons between the two groups with or without complications are shown in Table 1. Patients in the complication group were older than those without complications (71 years, IQR 64.5–77 years vs 66 years, IQR 59–72 years). The proportion of male patients in the complication and non-complication groups was 59.7% and 62.7%, respectively. The baseline characteristics of the patients and the comparisons between the two groups with or without cardiac effusion or hemorrhage are shown in supplementary tables S2 and S3. Table 2 displays the specific procedural complications and total complications. There were a total of 62 procedural complications with a rate of 1.84% in the entire cohort. No procedure-related death was observed. Cardiac effusion/tamponade was the most common and accounted for 0.84% of the entire procedures followed by access site hemorrhage or hematoma (0.62%), hemorrhage requiring blood transfusion (0.27%), thromboembolic events (0.12%), cardiogenic shock/arrest (0.06%), arteriovenous fistulas/pseudoaneurysm (0.06%), pneumothorax (0.03%), pulmonary vein stenosis (0.03%), and phrenic nerve injury (0.03%).

Table 2 Complicaitons following radiofrequency ablation in the study population

Feature selection and ranking

When adding features according to their importance, the AUC of DT models consistently decreased (from 0.627 to 0.580 for any complication, from 0.606 to 0.513 for cardiac effusion/tamponade, and from 0.649 to 0.620 for hemorrhage). For postoperative cardiac effusion/tamponade, the GBM model was an exception, showing an increasing trend in AUC with the increase in the number of features. Other machine learning models that used the top 5 ranked features performed better than models with more features. For any complication or hemorrhage, the RF, GBM, and XGBoost models demonstrated good performance, especially when using the top 10, 15 or, 20 features were utilized. The corresponding AUCs were shown in Fig. 1. For any complication, the optimal numbers of features were 5, 20,15, and 15 for DT, RF, GBM, and XGBoost, respectively. For cardiac effusion/tamponade, the optimal numbers of features were 5, 5, 15, and 5. For hemorrhage, the optimal numbers of features were 5, 15,15, and 10. The range of hyper-parameters was shown in supplementary table S4. The evaluation metrics with 95% confidence intervals for each model with different features were shown in supplementary table S5.

Fig. 1
figure 1

AUC of the model with different numbers of the selected features. A: any complication; B: cardiac effusion/tamponade; C: hemorrhage

Model performance and comparison

Of the considered machine learning models, the best-performing models were RF for any complication, and cardiac effusion/tamponade, XGBoost for cardiac effusion/tamponade. The AUCs for these models were as follows: 0.721 (95% CI = 0.713–0.729) for any complication, 0.696 (95% CI = 0.688–0.703) for cardiac effusion/tamponade, and 0.839 (95% CI = 0.832–0.845) for hemorrhage.

The receiver operating characteristic (ROC) curves, and performance metrics, including AUC, accuracy, sensitivity, specificity, F score, and Brier score were presented in Fig. 2, and Table 3.

Fig. 2
figure 2

Receiver operating characteristic curves for the DT, RF, GBM, and XGBoost models in predicting any complication, cardiac effusion, and hemorrhage

Table 3 The evaluation metrics with 95% confidence intervals for each model using 20-round fivefold cross-validation

To stratify patients into different risk groups, for the RF model the predicted probability of 0–0.029 and > 0.029 were selected to range as low and high risk, respectively. To validate the ability to stratify patients into different risk groups, in Fig. 3, the incidence rate of each risk group and inter-group differences in the test set were compared for the RF model.

Fig. 3
figure 3

Postoperative complication incidence rate and the number of patients in different risk groups in the test set. Note: The number in brackets, eg. ‘887’ in ‘low risk (887)’ represents the number of patients who are classified into the low-risk group. The grey dashed line represents the actual postoperative complication incidence rate in the test set

Important features associated with postoperative complications

The result of the logistic regression model postoperative complications is shown in Table 4. Higher values of CREA and AST were associated with increased probabilities of procedural complications. The higher value of AST was associated with increased probabilities of procedural cardiac effusion. Persistent AF, higher values of CREA, DD, and TnI were associated with increased probability of hemorrhage.

Table 4 Stepwise multivariable logistic regression model of different outcome

Based on the RF, GBM, and XGBoost models, the important features among different outcomes have a high degree of coincidence (Fig S2). From the results of the best algorithm models with different outcomes, it is known that the most important risk factors are: Ccr, ALB, CHA_2DS_2-VACs, DD, AST, NT-pro-BNP, LDH, TSH, CREA, age, UA, DBP, and LAD for any complication, cardiac effusion/tamponade or hemorrhage (Fig. 4). In the SHAP summary plots (Fig S3), the distribution of SHAP value contributions is shown for the top-ranked features present in models for predicting different outcomes.

Fig. 4
figure 4

Top-ranked features in predicting different complications. A: Top-ranked 10 features derived from the RF model in predicting any complication; B: Top-ranked 5 features derived from the XGBoost model in predicting cardiac effusion/tamponade; C: Top-ranked 10 features derived from the RF model in predicting hemorrhage; D: 13 important features in predicting any complication, cardiac effusion/tamponade or hemorrhage

Figure 5 shows the SHAP dependence plot of the top 10 most important features for any complication, showing that higher CHA_2DS_2-VACs score, DD, AST, NT-pro-BNP, LDH, age, and lower Ccr, CREA were related to increased risk of any complication. An obvious U-shaped relationship exists between ALB or TSH and the risk of postoperative complication, as both too low and too high levels of ALB or TSH were associated with an increased risk.

Fig. 5
figure 5

SHAP dependence plot of the RF model in predicting any complication. It shows how a single feature (the top 10 important features) affects the output of the RF model. SHAP values for specific features exceed zero, representing an increasing risk of postoperative complication

From outside to inside, the importance of the feature was successively decreased.

Discussion

The present study included 3187 patients undergoing RFA (3365 procedures) in a large center that captured real-world clinical information and was used to develop a risk model for complications associated with the procedure.

In this study, the most common complication was cardiac effusion or tamponade (0.83%), similar to the results ranging from 0.5% ~ 1.3% previously reported [8, 14,15,16,17,18,19]. For vascular complications, previous studies reported incidences from 1.1% to 2.3% [16,17,18, 20, 21]. In our study, the incidence of access site hemorrhage was 0.62%; hemorrhage requiring transfusion, 0.27%; thromboembolic events, 0.12%; arteriovenous fistulas/pseudoaneurysm, 0.06%; and pulmonary vein stenosis, 0.03%. The overall rate of procedural complications in this study was 1.84%, which is a lower level compared to the complication rates previously reported ranging from 3.3%-6.84% [7, 8, 15,16,17,18,19,20,21,22,23,24,25,26,27] to as high as 9.1% [28] in a survey of U.S. medicare patients. Several potential reasons were contributing to the low incidence of postoperative complications in this study. First, we excluded patients undergoing concomitant other surgeries like left atrial appendage closure, leading to a lower incidence of postoperative complications. Second, this study was conducted at a high-volume center, with more than 1000 RFA procedures performed annually. Complication risk was reduced when the surgery occurred in hospitals with high surgery volumes, similar to those reported previously [14, 21, 24]. Finally, the outcome of this study was only based on the in-hospital data.

Using 20 variables identified by machine learning techniques, we developed a predictive model for postoperative complications with good predictive power in AF patients undergoing RFA. According to the definition of the literature [29, 30], the AUC value between 0.7 and 0.8 is acceptable. The model shows better performance (AUC = 0.721) than the model reported previously [11] (AUC = 0.64) and has the potential to be used in clinical practice, particularly for the outcome of hemorrhage, where the AUC reaches 0.839. To evaluate the clinical applicability of the model, patients was stratified into high-risk and low-risk groups according to the probability of the best performed machine learning model. The incidence of postoperative complications difference between two groups was statistically significant.

This study not only developed a more accurate risk model and identified previously unrecognized important risk factors but also made it “explainable”. Our study benefits from the utilization of SHAP values to unveil the “black box” of machine learning models, thus, our model can furnish implications for patient management even when implemented on individual patients. We employed radar plot and as well as SHAP dependence plot for visualized at the feature and the individual level. Among the 10 most important features, most had an obvious cut-point at which the predicted risk abruptly changed. For example, Ccr < 50 ml/(min × 1.73m2), ALB > 50 g/L or < 35 g/L, CHA_2DS_2-VACs score ≥ 4, DD > 5 mg/L, AST > 100 U/L, NT-pro-BNP > 2000 ng/L, CREA < 50 μmol/L, or older than 80 resulted in a significant increase in postoperative complication risk.

Ccr is accepted as the best overall measurement for assessing renal function [31], a Ccr < 60 ml/(min × 1.73m2) is considered compromised renal function. From the shap dependence plot, reduction of Ccr is shown to increase the risk of postoperative complication, which is consistent with previous research fundings [7, 14]. In our study, ALB is another key predictor for postoperative complication. An obvious U-shaped relationship exists between ALB and the risk of postoperative complication, as both lower than 35 and higher than 50 g/L were associated with an increased risk. Serum ALB is usually used to reflect nutritional status and the ability of the liver to synthesize protein. Decrease in ALB level is indicative liver damage or malnutrition. Meanwhile, several novel findings have been disclosed in our study. Preoperative elevated D-dimer was essential predictors of postoperative complications. Elevated D-dimer indicate a hypercoagulable state and secondary fibrinolysis, which may result in thrombotic disease [32, 33]. Whereas thromboembolic events were infrequent in this study, this could be due to the relatively short length of postoperative hospital stay. Patients with postoperative complications were at a hypercoagulable state at the early stage after ablation procedure but have not yet shown thromboembolic symptoms. Furthermore, preoperative elevated AST, and NT-pro-BNP were essential predictors of postoperative complications in our study. Patients with more comorbidities are more likely to exhibit dysregulated hepatic function, or myocardial function and significantly higher AST, or NT-pro-BNP levels.

The independent factors of procedural complications that have been reported previously were the gender of female [11, 15, 17, 18, 24, 25], older age [11, 16, 20, 24, 25], longer procedural duration [18, 34], the complexity of the procedure [20], CHA_2DS_2-VASc score [8, 9], smaller left atrium dimension [34], and comorbidities like congestive heart failure [11, 16], renal insufficiency [7, 14], coagulopathy [11], peripheral vascular disease [9, 11], chronic obstructive pulmonary disease [11], hypertension [14], mild liver disease [14], diabetes with chronic complications [14], and coronary artery disease [26]. Risk factors like CHA_2DS_2-VACs score, CREA, Ccr, and older age, which are in accordance with previous studies, play an essential role in our model. The inconsistencies between our findings and previous studies are primarily due to the following reasons. Firstly, the differences between studies could result from differences in inclusion criteria or the number of subjects enrolled. Secondly, previous studies mostly included limited variables and included few laboratory indicators. Compared to comorbidities or prior diseases, laboratory indicators for short-term outcome prediction were more objective and sensitive.

To reduce the risk of postoperative complications for AF patients requiring RFA, it is recommended to take the following measures. Firstly, preoperative comprehensive assessment and optimal control of correctable risk factors such as coagulation capability or renal function should be effectively and efficiently implemented in advance to achieve better outcomes. Secondly, the patient’s vital signs and cardiac function throughout the procedure should be closely monitored. Finally, for patients with high risk after RFA, appropriate postoperative care or surveillance is necessary for detecting early complications. Additionally, schedule regular follow-up visits for discharged patients are recommended to assess the patient’s recovery and to provide cardiac rehabilitation and health education.

This study provides additional evidence that can contribute to further research in this field. In this retrospective study, we developed and evaluated different machine learning algorithms using a wide range of features to predict postoperative complications of RFA. Considering the composite outcome of any complication, we conducted sub-models of the most common complication to investigate whether the predictors were different between those two groups. Moreover, for any complication, cardiac effusion, or hemorrhage, over half of the top 10 features were laboratory features. This study demonstrated that the laboratory features, which instantly reflect physical conditions and have been ignored by previous studies, may be more sensitive and more relevant to postoperative complication prediction. One of the advantages of this finding is that it uses variables that are easily accessible within the electronic medical records (EMR). As a result, the model can be integrated into a decision support system under the EMR framework. In practice, this decision support system would access the clinical information of a new patient and calculate the risk of the patient experiencing a postoperative complication.

The present study also has several limitations. Firstly, generalizability is a potential limitation because all patients were included in a single center. Although 3365 procedures were included in this study, with the data collected for patients who presented between 2018 and 2021, the data from a single center, which could not represent the population of Chinese RFA patients, a multi-center study is needed to validate this result. Secondly, this was an in-hospital outcome prediction study based on retrospective use of electronic medical record data, the complications that are known to occur late such as atrio-esophageal fistula might not be captured. The complication rate might be underestimated. However, the majority of the complications occurred in a short period after the RFA procedure, so it is unlikely that a significant number of complications were missed. Finally, although we have included more variables than in previous studies, potential factors such as ablation duration and other intraoperative variables were not available in our database.

Conclusions

We report an overall complication rate of 1.84% in a large data set of AF radiofrequency ablation. This study indicates that machine learning based on the RF, and XGBoost algorithms showed good performance in predicting different complications after RFA. The model developed in this study may assist clinicians in assessing the risk of complications for patients with AF.