Background

Machine learning (ML) is a discipline of artificial intelligence and data science that uses algorithms and models to learn from previous data and make the computer predict future events based on these data [1]. ML has achieved significant advancement in recent years and has been incorporated into the medical field for several purposes, including predicting disease diagnosis and prognosis [2, 3]. ML methods were implemented to predict surgery outcomes, and several ML methods’ performances were comparable to the available risk stratification methods [4,5,6]. EuroSCORE is one of the most commonly used risk-scoring systems in cardiac surgery and has proven accuracy [7]. Tricuspid valve surgery is associated with a higher risk than other cardiac procedures, which could be attributed to the disease, surgery, or patient characteristics [8]. The risk of tricuspid valve surgery varies widely in the literature because of the low volume of tricuspid valve surgery and the variability of the associated procedures [9]. EuroSCORE predicts mortality after cardiac surgery in general. The Society of Thoracic Surgeons (STS) cardiac surgery risk model and EuroSCORE II are not explicit for predicting outcomes after tricuspid valve surgery. ML could be superior to the traditional risk-scoring methods because of the high out-of-sample prediction. Therefore, we aimed to apply several ML methods, including shrinkage methods and decision trees, to predict operative mortality after tricuspid valve surgery and compare the predictive ability of these models to EuroSCORE II.

Methods

Design

This retrospective analysis included 1161 consecutive patients who underwent tricuspid valve surgery at a single tertiary referral center from 2009 to 2021. We included all tricuspid valve surgery, repair or replacement, isolated or concomitant with other cardiac procedures, and functional or organic tricuspid valve disease. The local ethical committee approved the study under approval number R21013 for data collection for tricuspid valve surgery projects.

Study data and definitions

The data included in this study were dichotomized to simplify the interpretation of the models. Age was dichotomized at 70 years old, and the cutoff point for dichotomizing body mass index (BMI) was 30 kg/m2. Anormal bilirubin was defined as levels ≥ 21 µmmol/L. Low ejection fraction (EF) was defined as EF of 40% or lower [10], and systolic pulmonary artery pressure was dichotomized at 50 mmHg [8]. Other variables included in the model were sex, hypertension, diabetes mellitus, chronic obstructive pulmonary artery disease (COPD), endocarditis, previous stroke, atrial fibrillation, chronic kidney disease (CKD) grade IV, New York Heart Association grade III-IV, history of heart failure within 1 year, moderate/severe right ventricular (RV) dysfunction, moderate/severe RV dilatation, emergency surgery, and reoperative surgery. Operative data included surgical urgency, TV repair or replacement, beating tricuspid valve surgery, and isolated tricuspid valve surgery.

CKD grade IV was defined as a creatinine clearance < 30 ml/min using the Cockcroft-Gault equation [11]. RV dilatation was diagnosed if the right ventricular basal diameter was >42 mm or the mid-level diameter was >35 mm. The severity of RV dysfunction was graded using tricuspid annular plane systolic excursion (TAPSE) measured in M-mode. Mild dysfunction was defined as TAPSE>20 mm, moderate dysfunction as TAPSE=15–20 mm, and severe dysfunction as TAPSE<15 mm [12, 13].

The study outcome was operative mortality (n= 112), defined as mortality occurring within 30 days of surgery or within the same hospital admission.

Secondary tricuspid regurgitation was the indication for tricuspid valve repair, and the indications for tricuspid valve replacement were failed previous repair, infective endocarditis, and degenerative tricuspid valve disease. Tricuspid valve repair was performed with annuloplasty rings (n= 927), De-Vega repair (n= 61), and one patient had biscuspidization. Annuloplasty prostheses used for repair were SMB50 band (Sovering MiniBand, SMB50, Sorin) (n=547), Duran band (Medtronic, Inc.) (n=210), MC3 (Edwards Lifesciences) (n=106), Tri-Ad (Medtronic ATS Medical Inc.) (n=49), Cosgrove-Edwards band (Edwards Lifesciences) (n=10), Contour 3-D (Medtronic Inc.) (n=4), and Simplici-T (Medtronic Inc.) was used in one patient. Mechanical valves were used in 20 patients, and biological valves in 139 patients. Coronary artery bypass grafting was performed concomitantly in 320 patients, aortic valve replacement in 247, and mitral valve surgery in 1040.

Data analysis

Descriptive analysis

For descriptive analysis, patients were grouped according to operative mortality into two groups: patients who survived (n= 1049) and those who had operative mortality (n= 112). Data are presented as numbers and percentages and were compared between surviving and non-surviving patients with the chi-squared test. The study included covariates with a small percentage of missing data. Missing data were due to documentation issues and unrelated to patients’ characteristics; therefore, they were considered missing completely at random and will not affect the analysis. Data analysis was performed using STATA 17 (Stata Corp, College Station, TX, USA).

Machine learning analysis

The study data were divided by random sampling into two sets: the training set (75%) and the testing dataset (25%). The analysis was performed on the training set, and the model’s performance was evaluated on the test dataset.

Random forest

All categorical variables listed in Table 1 were used to predict operative morality. Random forest with cross-validation was used to optimize testing accuracy and identify the optimal number of trees, tree depth, and splitting features. After identifying feature importance, recursive feature elimination was used to reach the minimum number of variables that achieve the same accuracy. The correlation between variables was visualized using a heat plot. The model was tested after eliminating the correlated variables, and the accuracy of the prediction was reported [14]. The analysis was performed using the Stata command (c_ml_stata_cv) that integrates scikit-learn, numpy, and pandas python packages [15].

Table 1 Comparison of the baseline characteristics between surviving patients and those with operative mortality

Logistic regression

Multivariable logistic regression was used on the training set, and the final model was evaluated in the testing set with a cross-validated area under the curve (cvAUC) and k-fold receiver operator curves. Subset selection was performed using stepwise forward selection methods with a P value of 0.05 required for retaining the variables in the final model.

Shrinkage methods

Least absolute shrinkage and selection operator (LASSO) and elastic net with cross-validation were used to identify the most important variables for mortality prediction. Lambda was used as a tuning parameter for LASSO, and the best lambda associated with the lowest mean square error in the testing dataset was used Appendix. Cross-validation was used to identify the optimal alpha and lambda for the elastic net model. The deviance ratios in the training and testing sets were calculated. The analysis was performed using the Stata code (lasso logit).

Comparison with EuroSCORE

The AUCs of random forest, logistic regression, LASSO, and elastic net were plotted and compared with the AUC of EuroSCORE, and the P value was reported.

Results

Baseline characteristics

The operative mortality group had more patients aged ≥70 years, with a higher prevalence of diabetes mellitus, hypertension, endocarditis, COPD, history of heart failure, chronic kidney disease grade IV, moderate/severe RV dilatation, and dysfunction, reoperative surgery, isolated tricuspid valve surgery, and tricuspid valve replacement (Table 1).

The EuroSCORE was significantly higher in patients with operative mortality [8.52 (4.745–20.035) vs. 4.11 (2.29–6.995), P<0.001]. The correlation between all variables was visualized in the heat plot (Fig. 1).

Fig. 1
figure 1

Heat plot showing the correlation between all variables included in the analysis

Random forest classification

The optimal number of trees was 50, tree depth was 2, and splitting features was 3. The model that included all variables achieved a training accuracy of 90.5% and a testing accuracy of 90.4%. The classification error in the training set was 9.7%, and in the testing set, it was 8.6%. The feature importance of all variables is presented in Fig. 2A.

Fig. 2
figure 2

A Feature importance of the random forest prediction model. B Feature importance of the random forest final prediction model

Recursive feature elimination was used to reach the minimum number of variables with the same accuracy. Progressive elimination of the features with minor importance was performed. In the final model, the number of trees was 150, the optimal depth was 8, and the splitting features were 3. The accuracy of the training data was 93%, and the testing data was 92%. The classification error rate in the training data was 9%, and in the testing data, it was 4.8%. The final model had eight variables (Fig. 2B).

Tricuspid valve replacement was correlated with chronic kidney disease (correlation r= 0.77) (Fig. 1). When removing the tricuspid valve replacement, the training accuracy was 91%, and the testing accuracy was 90%. The classification error rate in the training set was 8.4% and that in the test set was 10%.

Prediction of operative mortality using logistic regression, LASSO, and elastic net

Factors identified with logistic regression were age, hypertension, COPD, nonobese, heart failure, reoperative surgery, emergency surgery, and tricuspid valve replacement (Table 2). The model’s predictive power was assessed using cvAUC and 10-fold ROC curves with an AUC of 0.76 (Fig. 3).

Table 2 Factors affecting operative mortality by logistic regression analysis
Fig. 3
figure 3

Cross-validated and k-fold receiver operator curve for logistic regression

LASSO identified 13 variables that can predict operative mortality (Table 3). The predictive power was assessed using the cvAUC and 10-fold ROC curves (AUC= 0.78) (Fig. 4).

Table 3 LASSO identified variables and their coefficients
Fig. 4
figure 4

Cross-validated and k-fold receiver operator curve for LASSO regression

Elastic net identified 17 variables that affected operative mortality (Table 3). The predictive power was assessed using the cvAUC and 10-fold ROC curves (AUC= 0.795) (Fig. 5). The deviance ratio was better in the test set in both lasso and ENET (the deviance ratio was 0.129 for both).

Fig. 5
figure 5

Cross-validated and k-fold receiver operator curve for elastic net

Comparison with EuroSCORE II

The observed mortality was 9.65% (95% CI 8.01–11.49). The predicted EuroSCORE II operative mortality was 6.58% (95% CI 6.16–7.01), and the predicted mortality in the test dataset with the random forest was 9.89% (95% CI 9.57–10.21). EuroSCORE II significantly predicted mortality [OR 1.11 (95% CI 1.06–1.15); P<0.001], and the AUC was 0.73. AUC nonsignificantly increased with logistic regression using our model compared to EuroSCORE II (P= 0.25) and nonsignificantly increased with LASSO (P= 0.057). The AUCs of the elastic net (P= 0.048) and random forest (P<0.001) models were significantly increased compared to EuroSCORE (Fig. 6).

Fig. 6
figure 6

Comparison of the AUC of logistic regression, LASSO, elastic net, random forest, and EuroSCORE

Discussion

The application of machine learning to predict the outcomes after cardiac surgery is increasing [16]. Machine learning methods use training data to learn important features and make predictions on out-of-sample data. Additionally, several assumptions for ordinary statistical methods are not required for many ML methods, such as linearity, collinearity, and the limited number of variables introduced into the models. These features could make the predictive value of ML methods better than ordinary statistical tests such as multivariable regression for predicting the outcomes after surgical interventions. Nevertheless, the debate about the performance of ML methods compared to logistic regression is ongoing, with several studies reporting conflicting results [17]. Furthermore, ML methods have several drawbacks, such as the need for a large sample size, lack of interpretability, and the probability of overfitting the training data. In addition, there are several ML algorithms, and their prediction ability could vary widely and not be suitable for all datasets. However, a study reported comparable accuracy of ML algorithms in predicting cardiac surgery outcomes with a relatively smaller number of patients than those used in STS scoring [16]. Meanwhile, evaluation of the risk of every surgical cardiac procedure separately is required, especially after the wide use of transcatheter interventions [18].

This study evaluated ML methods in predicting operative mortality after tricuspid valve surgery. Operative mortality was defined as 30-day mortality or mortality occurring within the indexed hospitalization. Tricuspid valve surgery is relatively infrequent compared to other surgical procedures, and the current risk stratification with EuroSCORE or STS is not explicitly for predicting mortality after tricuspid valve surgery [7]. Additionally, several operative risk factors were not considered in risk stratification, such as valve repair vs. replacement, isolated vs. concomitant tricuspid valve surgery, and beating vs. arrested tricuspid valve surgery. In this analysis, we used parametric shrinkage methods to identify the most relevant factors (LASSO and elastic net), and we used the nonparametric random forest algorithm with recursive feature elimination. The random forest model achieved the best accuracy, and the model’s performance was better than that of EuroSCORE. The ability of the random forest to identify strong predictors could be affected by collinearity [19]. In this analysis, we found that the performance of random forest was not affected by the correlated variables. Darst and associates found that the presence of many correlated variables decreased the importance of causal variables [14]. They concluded that random forests might not be suitable for high-dimensional data.

Factors included in EuroScore II calculation were age, gender, chronic lung disease, extracardiac arteriopathy, poor mobility, previous cardiac surgery, active endocarditis, critical preoperative status, renal impairment, Canadian Cardiovascular Society (CCS) angina class 4, left ventricular function, recent myocardial infarction, pulmonary hypertension, NYHA class, surgery on thoracic aorta, the urgency of surgery, and concurrent procedures. Factors identified by LASSO were emergency surgery, COPD, age ≥ 70 years, reoperative surgery, hypertension, heart failure, moderate/severe right ventricular dysfunction, nonobese, tricuspid valve replacement, diabetes mellitus, grade IV chronic kidney disease, endocarditis, and pulmonary artery systolic pressure ≥ 50 mmHg. In addition to these factors, elastic net identified isolated tricuspid valve surgery, beating tricuspid valve surgery, NYHA III-IV, and moderate/severe right ventricular dilatation. Eight variables were identified using the random forest method: age ≥ 70 years, heart failure, emergency surgery, chronic kidney disease grade IV, diabetes mellitus, tricuspid valve replacement, hypertension, and redo surgery.

The mortality rate in our series is comparable to that in other studies. Dreyfus and associates studied 466 patients who underwent isolated tricuspid valve surgery for severe noncongenital tricuspid regurgitation at 12 French centers between 2007 and 2017, and the in-hospital mortality rate was 10% [8]. Chen’s and colleagues reported a perioperative mortality of 11.8% after isolated reoperative tricuspid valve replacement [20]. It has been shown that the hospital mortality rate is more than 35% in patients who undergo tricuspid valve replacement after previous tricuspid valve repair [21]. Albacker and colleagues reported 13% mortality after tricuspid valve replacement [22].

Age is the most robust risk factor for increased complications after cardiac surgery and the most frequent cause for deferring patients from surgery. The relationship between age and mortality is not linear and varies across surgical procedures [23]. In Chen’s study, the deceased patients were significantly older than the survivors by 10 years [20]. However, Topilsky and coworkers reported that early mortality was not associated with increased age, and age should not be considered in deciding surgery for symptomatic patients with severe tricuspid regurgitation [24]. In the Tri-Score for predicting mortality in isolated tricuspid valve surgery, age ≥ 70 years was a significant predictor of mortality, similar to our series [8].

Heart failure is a well-established predictor of postoperative mortality across many surgical specialties. Consequently, it is included in several risk prediction tools, such as the American College of Surgeons (ACS) Surgical Risk Calculator [25]. Additionally, preoperative low ejection fraction is a risk factor for increased morbidity and mortality after cardiac surgery [10]. Subbotina and associates demonstrated that preoperative severe right ventricular dysfunction was associated with acute preoperative and postoperative decompensations and poor outcomes after tricuspid valve surgery [26]. Additionally, the risk is increased in cases of biventricular dysfunction [20]. In the Tri-Score, the number of patients who were hospitalized for heart failure within 1 year was 163, and hospital mortality occurred in 25 (out of 48 total mortalities) [8]. Furthermore, NYHA functional classes III–IV were identified as predictors of morbidity and mortality after cardiac surgery [24, 27]. Among these variables, heart failure was identified in all models, RV dysfunction in the LASSO and elastic net models, NYHA III–IV and RV dilatation in the elastic net model only, and low ejection fraction did not appear in any of the risk models.

The risk of cardiac surgery increases substantially with repeated procedures. The operative techniques have improved recently, and together with the increased life expectancy, reoperative cardiac surgery has increased. Reoperative cardiac surgery could be associated with catastrophic complications, such as hemorrhage, which lead to increased morbidity and mortality [28]. Resternotomy complications are almost triple those of primary sternotomy [29]. When a reoperative tricuspid valve replacement is performed, mortality risk increases [30]. Right mini-thoracotomy could be the preferred incision for repeat sternotomy because of the lower complication rate compared to redo sternotomy [31]. In the Tri-Score, 33% of the mortality group had previous left-sided valve surgery, and previous surgery was a predictor for mortality [8].

Several investigators have reported that a higher risk of tricuspid valve replacement remains significant compared to valve repair and independent of other preoperative characteristics [22, 32]. In the Tri-Score, patients with tricuspid valve replacement had a higher mortality risk, as 69% of mortalities occurred in patients with replacement [8]. These results are comparable to other series [22]. Tricuspid valve replacement was identified as a risk factor in all our models. Aspects unique to tricuspid valve replacement should be thoroughly investigated to determine the contributing factors to mortality and whether they are disease- or technique-related.

COPD is a risk factor for morbidity and mortality after cardiac surgery [33]. COPD is already included in the EuroSCORE as a predictor of operative mortality; however, disease severity was not assessed [34]. In the Tri-Score, COPD was not a predictor of mortality, in contrast to our results [8]. COPD was a risk factor in the logistic regression, LASSO, and elastic net models but not in the random forest model. The effect of COPD may be more comprehensive if included as a severity stage.

Machine learning methods undergo continuous improvement. Despite their promising results, more research is required to validate their use in developing a risk score for predicting outcomes after cardiac surgery. Their role in developing a prediction score would replace the current traditional scoring methods.

Study limitations

The study has several limitations. This is a single-center experience, and the outcomes of tricuspid valve surgery could vary widely among centers. The external validity of our results should be confirmed in a multicenter study. Although we included preoperative and operative variables, there are several nonreported variables that could have affected the outcomes, such as the cardioplegia types [35] and TV prosthesis [36]. Although the study included patients who underwent tricuspid valve surgery, it included heterogeneous subgroups (repair vs. replacement and isolated vs. concomitant tricuspid valve surgery). A large sample size is required to improve the accuracy of machine learning prediction, and future studies on more selective cohorts will yield a prediction model specific to each group with improved prediction accuracy. We included abnormal bilirubin level as a covariate in the analysis; however, other scores indicative of hepatic dysfunction and known to be associated with increased mortality after surgical procedures, such as MELD and Child-Pugh scores, were not evaluated in this study. We encourage the performance of future studies that incorporate several unusually recorded variables related to the hemodynamic status, laboratory results, and operative times.

Conclusions

Machine learning methods effectively predict operative mortality following tricuspid valve surgery with high accuracy compared to traditional risk-scoring methods using EuroSCORE II. The incorporation of machine learning methods in cardiac surgery risk scoring with comprehensive inclusion of all possible variables is recommended.