Topological data analysis to identify cardiac resynchronization therapy patients exhibiting benefit from an implantable cardioverter-defibrillator

Background Current guidelines recommend considering multiple factors while deciding between cardiac resynchronization therapy with a defibrillator (CRT-D) or a pacemaker (CRT-P). Nevertheless, it is still challenging to pinpoint those candidates who will benefit from choosing a CRT-D device in terms of survival. Objective We aimed to use topological data analysis (TDA) to identify phenogroups of CRT patients in whom CRT-D is associated with better survival than CRT-P. Methods We included 2603 patients who underwent CRT-D (54%) or CRT-P (46%) implantation at Semmelweis University between 2000 and 2018. The primary endpoint was all-cause mortality. We applied TDA to create a patient similarity network using 25 clinical features. Then, we identified multiple phenogroups in the generated network and compared the groups’ clinical characteristics and survival. Results Five- and 10-year mortality were 43 (40–46)% and 71 (67–74)% in patients with CRT-D and 48 (45–50)% and 71 (68–74)% in those with CRT-P, respectively. TDA created a circular network in which we could delineate five phenogroups showing distinct patterns of clinical characteristics and outcomes. Three phenogroups (1, 2, and 3) included almost exclusively patients with non-ischemic etiology, whereas the other two phenogroups (4 and 5) predominantly comprised ischemic patients. Interestingly, only in phenogroups 2 and 5 were CRT-D associated with better survival than CRT-P (adjusted hazard ratio 0.61 [0.47–0.80], p < 0.001 and adjusted hazard ratio 0.84 [0.71–0.99], p = 0.033, respectively). Conclusions By simultaneously evaluating various clinical features, TDA may identify patients with either ischemic or non-ischemic etiology who will most likely benefit from the implantation of a CRT-D instead of a CRT-P. Graphical abstract Topological data analysis to identify phenogroups of CRT patients in whom CRT-D is associated with better survival than CRT-P. AF atrial fibrillation, CRT cardiac resynchronization therapy, CRT-D cardiac resynchronization therapy defibrillator, CRT-P cardiac resynchronization therapy pacemaker, DM diabetes mellitus, HTN hypertension, LBBB left bundle branch block, LVEF left ventricular ejection fraction, MDS multidimensional scaling, MRA mineralocorticoid receptor antagonist, NYHA New York Heart Association Supplementary Information The online version contains supplementary material available at 10.1007/s00392-023-02281-6.


Introduction
Cardiac resynchronization therapy (CRT) is a standard device therapy in a selected subgroup of patients with symptomatic heart failure, reduced left ventricular ejection fraction (LVEF), and prolonged QRS duration in sinus rhythm [1][2][3].Despite the well-defined indications for CRT implantation, it is still challenging to pinpoint those candidates who would benefit from an additional implantable cardioverterdefibrillator (ICD) [4].There is growing scientific evidence that ischemic etiology is one of the most important factors to be considered while deciding between the implantation of a CRT-defibrillator (CRT-D) or a CRT-pacemaker (CRT-P) [5,6].Nevertheless, the current guidelines of the European Society of Cardiology (ESC) recommend the simultaneous evaluation of etiology and multiple risk factors of sudden cardiac death (SCD) and all-cause mortality, such as age, presence of myocardial fibrosis, and comorbidities [4].It is also emphasized that this risk assessment should be carried out in a personalized manner, in which novel and robust data analysis techniques could have a pivotal role as they could be tailored for the integrated assessment of risk profiles.
One such technique is topological data analysis (TDA) which has gained increasing popularity within the realm of cardiovascular medicine [7][8][9].By synthesizing multiple domains of input features using the tools of topology, TDA enables the encoding of complex relationships within medical datasets in a simple and compressed format (i.e., as patient similarity networks) [10].Then, through the visual interpretation and statistical analysis of the generated networks, we can identify specific subgroups (i.e., phenogroups) of patients within monolithic disease categories that might respond differently to a given therapy.Thus, we may assume that TDA could be used to improve risk stratification and optimize device selection in CRT candidates.
Accordingly, we used TDA to identify phenogroups of CRT patients in whom CRT-D is associated with better survival than CRT-P.

Study population and data collection
A total of 2603 patients with heart failure and reduced LVEF (New York Heart Associaton [NYHA] II-IVa) and a prolonged QRS duration (QRS ≥ 130 ms) who underwent successful CRT implantation at the Heart and Vascular Center of Semmelweis University (Budapest, Hungary) between October 2000 and September 2018 were included in our retrospective database.CRT devices were implanted as per guidelines [4].Device implantation was performed using standard transvenous techniques under local anesthesia.Baseline clinical characteristics, such as demographics, medical history, cardiovascular risk factors, physical status, medications, echocardiographic, and laboratory results, were retrieved for each patient from the electronic medical record system of Semmelweis University.The study protocol complies with the Declaration of Helsinki and was approved by the Semmelweis University Regional and Institutional Committee of Science and Research Ethics (approval number: 161/2019).Obtaining informed consent was waived owing to the retrospective nature of the analysis.

Outcomes of interest
The primary endpoint was all-cause mortality.Status (i.e., dead or alive) and the date of death were obtained for all patients by querying the National Health Insurance Database of Hungary in May 2021.We also performed a sensitivity analysis using the composite of all-cause death, heart transplantation, and left ventricular assist device implantation.

Topological data analysis
TDA is an emerging technique that adopts methods from the field topology, a discipline of mathematics focusing on shape analysis, to create compact visual representations of high-dimensional datasets in an unsupervised fashion.It dwells on the concept that shape analysis techniques can be applied to connect data points (such as patients) with similar characteristics in a multidimensional space and to plot these connections as a two-dimensional topological network.The generated network consists of nodes and edges.Nodes represent collections of patients with similar characteristics, whereas edges connect two nodes if they have at least one patient in common.Networks can be color-coded based on outcomes to gain insights into the data.Before generating a topological network, the following parameters are needed to be defined: (1) a distance metric, which measures the similarity between data points, and (2) one or more lenses, which are filter functions describing the distribution of the data.For each lens, we also have to set the gain (which controls the number of nodes) and resolution (which controls the number of edges) prior to analysis.
After discarding features with 40% or more missing values (Supplemental Table 1), we used the remaining twentyfive to generate the topological network: age, sex, type of the implanted device, etiology of heart failure, NYHA functional class, history of atrial fibrillation, history of ventricular arrhythmia, history of diabetes mellitus, history of hypertension, body mass index, history of myocardial infarction (MI), history of percutaneous coronary intervention (PCI), history of coronary artery bypass graft surgery (CABG), serum creatinine, serum urea, serum sodium, hemoglobin, LVEF, left ventricular end-diastolic and endsystolic diameters, QRS morphology (as the presence of left bundle branch block [LBBB]), usage of beta-blockers, mineralocorticoid receptor antagonists (MRAs), amiodarone, and oral anticoagulants.Missing values of the features were replaced using mean imputation, which was followed by Z-score transformation.We applied normalized correlation as the distance metric in conjunction with two multidimensional scaling lenses (with a resolution of 29 and a gain of 1.6, both equalized).Patients placed into nodes disconnected from the main network (n = 244) were considered outliers and were omitted from the further steps of the analysis.
After creating the topological network, we wanted to divide it into regions with distinct clinical characteristics and all-cause mortality rates.To this end, we first performed community autogrouping using the Louvain method, which finds the best possible grouping of nodes with high intragroup but low inter-group connectivity fields [11].This step resulted in the generation of 16 autogroups.Next, the group with the most outbound connections relative to its size (i.e., the number of nodes in the group) was merged with an adjacent group to which it was connected with the most edges.This step was repeated multiple times until five densely connected groups of nodes (referred to as phenogroups) were created.Due to the inherent nature of TDA, the phenogroups overlapped partially (i.e., 16 patients belonged to two phenogroups at the same time).However, this phenomenon does not violate any assumptions or requirements of the statistical tests used for subgroup comparisons.

Statistical analysis
Statistical analysis was performed using GraphPad Prism (version 8.0, GraphPad Software, San Diego, California, USA), SPSS (version 21.0, IBM, Armonk, New York, USA), and R (version 4.1.2,R Foundation for Statistical Computing, Vienna, Austria).Continuous variables with a normal distribution are presented as mean ± standard deviation, whereas those with non-normal distribution are reported as median and interquartile range (IQR).Categorical variables are expressed as frequencies (n) and percentages (%).The clinical characteristics of the CRT-D and 1 3 CRT-P groups were compared using unpaired Student's t-test or Mann-Whitney U test for continuous variables and Chi-squared or Fisher's exact test for categorical variables, as appropriate.The event-free survival of patient subgroups was visualized on Kaplan-Meier curves, and log-rank tests were performed for comparison.Follow-up duration was estimated using the reverse Kaplan-Meier method, and mortality and absolute risk reduction were

CRT-D CRT-P C Non-ischemic patients
Fig.  1A).When sequential multivariable models were built by adding these covariates to the univariable model in a stepwise manner (Supplemental Table 5), we found both etiology and history of ventricular arrhythmia to be negative confounding variables (i.e., not adjusting for them would result in the underestimation of the true strength of the association between the type of the implanted device and all-cause mortality).As shown by absolute risk reductions calculated at each year of the follow-up, CRT-D was associated with better survival than CRT-P only in the 2-to-6-year interval after implantation (Fig. 2A, Supplemental Table 4).In the sensitivity analysis, device type was found to be a significant predictor of the composite endpoint in the multivariable (adjusted HR 0.85, 95% CI 0.76-0.95,p = 0.003) but not the univariable Cox model (unadjusted HR 0.97, 95% CI 0.88-1.07,p = 0.498) (Supplemental Fig. 1A).Among those with ischemic etiology, 882 patients reached the primary endpoint over the median follow-up duration of 7.9 (5.1-12.2) years, and 5-and 10-year mortality were 54 (51-57)% and 80 (77-82)%, respectively (Supplemental Table 4).In this subset of patients, CRT-D was associated with a lower risk of death compared to CRT-P according to both univariable (unadjusted HR 0.84, 95% CI 0.74-0.97,p = 0.014) and multivariable analysis (adjusted HR 0.86, 95% CI 0.75-0.99,p = 0.043) (Fig. 1B).Based on absolute risk reduction, CRT-D was associated with better survival than CRT-P only in the 2-to-7-year interval following implantation (Fig. 2B, Supplemental Table 4).Sensitivity analysis showed that the type of device is a significant predictor of the composite endpoint only in the univariable (unadjusted HR 0.87, 95% CI 0.76-0.99,p = 0.046) but not the multivariable Cox model (adjusted HR 0.88, 95% CI 0.76-1.01,p = 0.071) (Supplemental Fig. 1B).
In patients with non-ischemic etiology, 690 individuals died over 8.4 (5.5-12.0)years.Five-year mortality was 36 (34-39)%, whereas 10-year mortality was 61 (58-65)% (Supplemental Table 4).The survival benefit from CRT-D vs. CRT-P was confirmed by both univariable and multivariable analysis (unadjusted HR 0.82, 95% CI 0.70-0.96,p = 0.013 and adjusted HR 0.78, 95% CI 0.66-0.93,p = 0.005) (Fig. 1C).We also observed that patients with CRT-D exhibited significantly lower mortality than those with CRT-P for up to 9 years after the implantation (Fig. 2C, Supplemental Table 4).In terms Fig. 2 Absolute risk reduction associated with CRT-D vs. CRT-P at each year of the follow-up.Absolute risk reductions and their confidence intervals were calculated based on Kaplan-Meier estimates.Absolute risk reduction was considered significant if the value 0 fell outside of its confidence interval.ARR absolute risk reduction, CRT cardiac resynchronization therapy, CRT-D cardiac resynchronization therapy defibrillator, CRT-P pacemaker, pp percentage point of the composite endpoint, CRT-D was associated with better outcomes than CRT-P based on the univariable (unadjusted HR 0.86, 95% CI 0.74-0.99,p = 0.046) and the multivariable analysis as well (adjusted HR 0.81, 95% CI 0.69-0.96,p = 0.013) (Supplemental Fig. 1C).
In addition, we also assessed the association of the device type with all-cause mortality in patients with and without a history of atrial fibrillation (Supplemental Fig. 2).Multivariable Cox regression analysis revealed that CRT-D was associated with better survival than CRT-P only in the latter (adjusted HR 0.77, 95% CI 0.67-0.89,p < 0.001) but not in the former subgroup of patients (adjusted HR 0.91, 95% CI 0.77-1.07,p = 0.260).
Differences could also be observed in all-cause mortality between the phenogroups (Fig. 5, Supplemental Table 11).Although the survival of the three non-ischemic phenogroup was not significantly different from each other and the two dominantly ischemic phenogroups exhibited similar survival, all three non-ischemic phenogroups showed significantly better survival than those including dominantly ischemic patients (Fig. 5, Supplemental Table 11).In the sensitivity analysis, we also found similar results regarding the composite endpoint (Supplemental Fig. 3, Supplemental Table 12).

Discussion
To identify patients with a greater survival benefit from a CRT-D than a CRT-P, we applied TDA, an advanced data analysis approach that is aptly suited for simultaneously evaluating various clinical features.With this technique, we could delineate five distinct phenogroups: three containing almost exclusively non-ischemic and two including predominantly ischemic patients.All three non-ischemic phenogroups showed significantly better survival than those including dominantly ischemic patients.CRT-D was superior to CRT-P in terms of all-cause mortality only in one non-ischemic (phenogroup 2) and one ischemic phenogroup (phenogroup 5).Many of the differences between the characteristics of the phenogroups were subtle but still statistically significant.Importantly, these differences might remain hidden from the physicians during the preimplantation assessment but were efficiently captured by TDA, implying that this approach may have a relevant role in optimizing device selection and improving survival.
The need for an ICD in CRT candidates, especially for primary prevention, is still a subject of debate.The current guidelines recommend a complex, personalized preimplantation risk assessment evaluating several clinical features associated with the risk of SCD and cardiac and non-cardiac (i.e., comorbidity-driven) mortality [12].Nevertheless, in everyday clinical practice, this assessment is challenging and might lead to incongruent and suboptimal use of the devices.
CRT can reduce per se the risk of SCD by inducing reverse remodeling [13,14].In addition, the technological advancements and the emergence of new heart failure medications (such as angiotensin receptor-neprilysin inhibitors and sodium-glucose co-transporter 2 inhibitors) during the last years resulted in a substantial decrease in the incidence of SCD [15,16].Moreover, the patient population has changed over the past decades (e.g., also older patients are being considered for CRT implantation) as CRT indications were refined and extended, increasing the proportion of non-cardiac diseases among the underlying causes of death.At the same time, CRT-D is still associated with a higher risk of complications and inappropriate ICD therapies, highlighting the importance and challenges of choosing the optimal device [12].
To this date, the Comparison of Medical Therapy, Pacing, and Defibrillation in Heart Failure (COMPANION) is

Probability of survival
Fig. 5 Kaplan-Meier estimates of the time to death from any cause in the five phenogroups identified using topological data analysis.CRT cardiac resynchronization therapy, TDA topological data analysis the only trial that randomized patients to CRT-P or CRT-D [2], although it was designed to compare the effects of CRT with optimal medical therapy and not CRT-D vs. CRT-P.
In this trial, it was found that CRT-P was not significantly associated with a reduction in the risk of all-cause mortality, whereas CRT-D was associated with a 36% risk reduction.Nevertheless, these findings and those reported by observational studies are still insufficient to firmly prove or refute the superiority of CRT-D over CRT-P.The ongoing Re-evaluation of Optimal Re-synchronization Therapy in Patients with Chronic Heart Failure (RESET-CRT) trial, hypothesizing that CRT-P is non-inferior to CRT-D for allcause mortality, is expected to provide crucial information to address this clinically important evidence gap [17].As a prelude to the RESET-CRT randomized controlled trial, a population-based weighted cohort study including 3569  CRT patients was conducted with the same inclusion and exclusion criteria and primary endpoint [18].Similar to our results, patients without a defibrillator were significantly older, which might explain the higher rate of all-cause mortality in CRT-P vs. CRT-D patients.However, after adjusting for age and entropy balancing for baseline clinical characteristics, CRT-P was proved to be non-inferior in terms of survival.If the ongoing randomized controlled trial results in the same findings, the clinical relevance of optimal device selection on the overall survival will be confirmed, leaving us with the question of how physicians could be supported to appropriately select those CRT candidates who show an additional survival benefit from an ICD [19].
Per the current guidelines, ischemia and the presence of myocardial fibrosis or scar tissue are the primary factors that should be considered when choosing CRT-D over CRT-P [12].Multiple studies have reported that ischemic patients have a clear mortality benefit from having an ICD [5,[20][21][22][23].In our patient cohort, CRT-D was found to be superior to CRT-P in terms of all-cause mortality, showing a waning survival benefit over time (Fig. 2).These results may imply that CRT-D reduces SCD predominantly in the first 6-9 years after implantation, but with time, other cardiac or non-cardiac factors (i.e., comorbidities) become the most prevalent causes of death, resulting in decreased benefit from CRT-D vs. CRT-P.A similar trend was also observed by Doran et al. in a post hoc analysis of the COMPANION trial [13].Nevertheless, they found that CRT-D was associated with better survival than CRT-P in non-ischemic patients but not in ischemic patients.Of note, when the COMPANION trial was conducted, the effect of optimal medical therapy on reverse remodeling or reducing the risk of SCD was not as impressive as nowadays.
In non-ischemic patients, the use of primary prevention ICD is a more challenging question, especially concerning long-term all-cause mortality.The Danish Study to Assess the Efficacy of ICDs in Patients With Nonischemic Systolic Heart Failure on Mortality (DANISH), one of the most recent randomized controlled trials in the field, was designed to investigate the effect of primary-prevention ICD implantation on mortality in patients with non-ischemic heart failure, of whom 58% received CRT as well [24].Although the risk of SCD was reduced by 50% during the 5-year-long follow-up period, there was no difference in allcause mortality in the entire study cohort (irrespective of CRT status), only in patients under 68 years of age, which suggests that non-cardiovascular factors are responsible for a higher proportion of death in the elderly [24].
These incongruent findings and the complex nature of the clinical assessment call for advanced data analysis approaches (e.g., machine learning and TDA) to facilitate the identification of those CRT candidates who are most likely to experience an additional mortality benefit from an ICD.To the best of our knowledge, our study is the first that used TDA for this exact purpose.Nevertheless, phenogrouping of heart failure patients has been previously performed by Cikes et al., who applied unsupervised machine learning techniques (multiple kernel learning and k-means clustering) in a subset of MADIT-CRT (Multicenter Automatic Defibrillator Implantation Trial with Cardiac Resynchronization Therapy) patients to integrate clinical features and imaging data in order to identify those who are most likely to respond to CRT [25].Despite the apparent differences in study design, population, and methods, both their and our study highlighted the relevance of utilizing advanced data analysis techniques for identifying responders to specific therapies.Moreover, similar to their findings, we also noted the accumulation of clinical characteristics known to be predictive of volumetric response (such as female sex, non-ischemic etiology, LBBB, longer QRS duration) in some of the phenogroups (i.e., phenogroups 1, 2, and 3) (Fig. 4).
In our study, TDA could identify one ischemic and one non-ischemic phenogroup that showed mortality benefit from the implantation of a CRT-D instead of a CRT-P.These results confirm that CRT candidates form a heterogenous population of heart failure patients, and several distinct phenotypes exist within both the ischemic and non-ischemic patient subsets.Our study also showcases the capabilities of TDA to capture the subtle differences in the characteristics of these patients, suggesting that this approach may have a relevant role in optimizing device selection and improving outcomes.

Limitations
Besides its strength, our study has several limitations that should be acknowledged.First, the large dataset we analyzed in this study was derived from a single center.Thus, additional investigations must be conducted in the future to confirm our findings in external cohorts.Second, due to the retrospective nature of data collection, the proportion of missing values was relatively high in our dataset; hence we had to omit several well-established prognostic markers (e.g., N-terminal probrain natriuretic peptide) from TDA.Third, a CRT-D or a CRT-P device was implanted based on the physicians' clinical judgment and not in a randomized fashion, which may have resulted in selection bias.Fourth, the primary endpoint of our study was all-cause mortality, as cause-specific mortality data was unavailable.Moreover, we could not investigate reverse remodeling either, as follow-up echocardiographic data were available only for a limited number of patients (< 10%) in our dataset.Fifth, despite the lack of large, randomized trials proving the usefulness of CRT in patients with permanent atrial fibrillation, we did not limit our analysis to patients in sinus rhythm as we wanted to explore the data of all patients who underwent successful CRT implantation at our center.Nevertheless, the subgroup analysis revealed that CRT-D was associated with better survival than CRT-P only in patients with no history of atrial fibrillation.Last, the generated topological network cannot be used directly to classify new patients.Nonetheless, patients in the network can be labeled based on their location, and then, using this labeled data, machine learning classifiers can be trained to allocate new patients to the identified phenogroups.However, as we did not have access to any external datasets required for the thorough validation of such classifiers, we decided to postpone their development and validation until additional datasets become available.

Conclusions
In this retrospective observational study, CRT-D was found to be superior in reducing all-cause mortality compared to CRT-P in the entire cohort and both ischemic and non-ischemic patient subgroups.By simultaneously evaluating various clinical features, TDA could identify distinct phenogroups even within ischemic and non-ischemic subsets of CRT candidates and be able to pinpoint those who are more likely to show additional benefit from the implantation of a CRT-D instead of a CRT-P in terms of all-cause mortality.

Fig. 6
Fig.6 Kaplan-Meier estimates of the time to death from any cause in CRT-D and CRT-P patients in the five phenogroups identified using topological data analysis.Univariable and multivariable Cox proportional hazards models were used to compute hazard ratios with 95% confidence intervals.Each multivariable model included the follow-

1
Kaplan-Meier estimates of the time to death from any cause in the entire study cohort, ischemic patients, and non-ischemic patients.
mia.CI confidence interval, CRT-D cardiac resynchronization therapy defibrillator, CRT-P cardiac resynchronization therapy pacemaker, HR hazard ratio All-cause mortality of CRT-D and CRT-P patients Clinical characteristics of the five phenogroups identified using topological data analysis.AF atrial fibrillation, CRT-D cardiac resynchronization therapy defibrillator, DM diabetes mellitus, HTN hypertension, LBBB left bundle branch block, LVEF left ventricular ejection fraction, MRA mineralocorticoid receptor antagonist, NYHA New York Heart Association Fig.3The topological network with the five identified phenogroups of CRT patients.The topological network was created using twentyfive features (metric: normalized correlation, lenses: 2 × multidimensional scaling [resolution: 29, gain: 1.6, equalized]).Each node represents a collection of similar patients, and two nodes are connected if they have at least one patient in common.Nodes are color-coded based on all-cause mortality.The topological network was divided into five regions (i.e., phenogroups) based on all-cause mortality.CRT cardiac resynchronization therapy, CRT-D cardiac resynchronization therapy defibrillator, CRT-P cardiac resynchronization therapy pacemaker