Risk factor structure of heart failure in patients with cancer after treatment with anticancer agents’ assessment by big data from a Japanese electronic health record

As the prognosis of cancer patients has been improved, comorbidity of heart failure (HF) in cancer survivors is a serious concern, especially in the aged population. This study aimed to examine the risk factors of HF development after treatment by anticancer agents, using a machine learning-based analysis of a massive dataset obtained from the electronic health record (EHR) in Japan. This retrospective, cohort study, using a dataset from 2008 to 2017 in the Diagnosis Procedure Combination (DPC) database in Japan, enrolled 140,327 patients. The structure of risk factors was determined using multivariable analysis and classification and regression tree (CART) algorithm for time-to-event data. The mean follow-up period was 1.55 years. The prevalence of HF after anticancer agent administration were 4.0%. HF was more prevalent in the older than the younger. As the presence of cardiovascular diseases and various risk factors predicted HF, CART analysis of the risk factors revealed that the risk factor structures complicatedly differed among different age groups. The highest risk combination was hypertension, diabetes mellitus, and atrial fibrillation in the group aged  ≤ 64 years, and the presence of ischemic heart disease was a key in both groups aged 65–74 years and 75 ≤ years. The machine learning-based approach was able to develop complicated HF risk structures in cancer patients after anticancer agents in different age population, of which knowledge would be essential for realizing precision medicine to improve the prognosis of cancer patients. Supplementary Information The online version contains supplementary material available at 10.1007/s00380-023-02238-9.


Introduction
Recent improvements in cancer therapy have increased the number of survivors [1,2]. As these survivors get older, the comorbidity of cardiovascular diseases (CVDs) has become a serious concern, especially given that age is one of the most important risk factors for CVDs. Among CVDs, heart failure (HF) is a serious condition with poor prognosis [3,4]. Although multi-drug therapies have become the standard for various types of cancer and that molecular-targeted anticancer agents have been introduced in clinical practice, various anticancer agents have been shown to predispose patients to cardiovascular complications, typically in the form of HF [3][4][5]. Such unfavorable cardiovascular side effects frequently force clinicians and patients to interrupt anticancer therapy. Although a body of knowledge has been developed from a number of clinical studies, most of the previous studies regarding HF during anticancer treatment were cancer-specific, therapeutic agent-specific, or both in selected patients; however, this is far from a realistic clinical 1 3 situation in which patients have various types of cancer and complex backgrounds as well as the choice of therapeutic interventions [6][7][8][9]. The current risk stratification system for cardiotoxicity seems inadequate due to these heterogeneous factors. This limitation is partly due to the fact that hundreds of thousands of patients need to be enrolled in clinical studies to obtain knowledge regarding the relationships among their complex clinical backgrounds, types of cancers, therapeutic interventions, and HF. It has been impractical to conduct such studies because of the large effort and excessive cost. Another obstacle is the difficulty required to handle and process such a large dataset, as was used in this study. However, recent advancements in computing and data science technologies have made it possible to manage big data. To capture a realistic clinical situation, an electronic health record (EHR) provides a unique opportunity to obtain a dataset that reflects the situation in clinical practice.
To this end, we employed a machine learning-based approach to analyze medical big data extracted from a Japanese EHR stored in the Diagnosis Procedure Combination (DPC) reimbursement system. Our procedure may provide a new analytical approach for discovering a structured risk model. The present study aimed to examine the risk of HF development after the treatment by anticancer agents using medical big data.

Study design and data source
This retrospective cohort study used data from the DPC database of patients who were treated by anticancer agents. Although there is no single EHR database that covers the entire population, the DPC system is the largest hospitalbased database and covers 83% of acute-care medical institutions in Japan. We obtained the database of Medical Data Vision Co., Ltd. (Tokyo, Japan), which covered 291 acute care hospitals and 129 of the 393 designated cancer hospitals in Japan. The total number of patients included in the Medical Data Vision Co., Ltd. Database was 17.85 million. The DPC database contains dated information for diagnosis using the International Classification of Disease 10th revision (ICD-10) codes, comorbidities at admission and discharge, and complications that occurred during hospitalization. This database includes the records of both outpatients and inpatients. In addition, data were extracted for procedures, such as medications, medical devices, consumables, physiological and laboratory examinations, and in-hospital deaths as well as basic patient information, such as anonymized patient ID, age, sex, height, and weight. We classified the patients into three age groups (≤ 64 years, 65-74 years, and ≥ 75 years) according to the World Health Organization (WHO) criteria for early-stage (≥ 65 but ≤ 75 years) and late-stage (≥ 75 years) elderly patients. This study was approved by the Institutional Review Board of Kurume University (approval number 19115). The requirement for informed consent was waived because all the data were anonymized.

Definition of cancer and HF
We defined patients with cancer as those with ICD-10 codes C00-C96. As D01-09 are intraepithelial carcinomas, we excluded them from this study because they are unlikely to be clinically targeted for anticancer therapy. We enrolled patients aged 18 years or older at the initial diagnosis of cancer who received anticancer agents between April 2008 and January 2017 (Fig. 1, Supplementary Table 1). The date of enrollment was defined as the time when the anticancer agents were first administered in this study.
We also defined the ICD-10 codes I50.0, I50.1, I50.9, I11.0, and I42.7 as the HF codes (Supplementary Table 2 and 3). In this study, we identified HF patients as those who were labeled with both the HF codes and used one or more of the following therapeutic drugs for HF: angiotensin-converting enzyme inhibitors, angiotensin II receptor blockers, β-blockers, mineralocorticoid receptor antagonists, loop diuretics, tolvaptan, nitroglycerine, carperitide, dobutamine, or phosphodiesterase 3 inhibitors. We excluded patients with HF codes if they were simultaneously coded for acute myocardial infarction or one or more of the other conditions in which clinical signs and symptoms are similar to HF, including cardiopulmonary arrest, acute respiratory distress syndrome, severe pneumonia, pleuritis, or severe renal failure (Supplementary Table 2 and 3). Patients with a history of HF prior to the start of anticancer treatment were also excluded.
For other comorbidities, the diagnosis was made when the patients were coded with ICD-10 codes corresponding to those comorbidities at the time of enrolment.

Statistical analyses
All data manipulation and extraction were performed with GNU bash (version 4.3.48(1) release) including GNU Awk 4.1.3, GNU grep 2.25, and GNU sed 4.2.2 in the Linux platform (Ubuntu 16.04.6 LTS operating system) on the HPE ML350 Gen9 E5-2699v3 server (36 core CPUs, memory 768 Gb, 9.6 Tb HDD). Specific tables for data analysis were prepared using custom programming in an interactive terminal (GNOME Terminal 3.18.3) or shell scripting with bash. Basic statistics such as means and standard deviations for the summary data were obtained using R version 3.4.4 9 [10]. Classification and regression tree (CART) analysis was performed using the Stata 15 software (Stata Corp LLC, Texas, USA). The Cox proportional hazard model was used to evaluate the risk factors for HF after treatment with anticancer agents. Because cancer death may be considered as a competing risk for the development of HF, the Fine-Gray competing risk model [11] was also used for sensitivity analysis for the Cox proportional hazard model. CART is a non-parametric decision tree learning technique that partitions future space with a set of all possible combinations of a set of risk factors. A partitioned future space consists of an asymmetrical combination of risk factors that provide interpretable patient clinical profiles with various degrees of risk for clinical outcomes. This learning technique has been applied in prospective epidemiological studies and is applicable to our retrospective cohort data [12,13]. P-values less than 0.01 were accepted as statistically significant.  Table 1), 140,327 patients were enrolled in the study (Fig. 1). Table 1 shows the baseline characteristics of the patients. Among the patients in the study population, 5,093 (4.0%) were diagnosed with HF after initial administration of anticancer agents (Fig. 1). Upon comparison, patients who experienced HF had a higher average age, and 43.5% of these patients were over 75-years-old. Cardiovascular comorbidities, such as hypertension (HT), ischemic heart disease (IHD), and atrial fibrillation/flutter (AF) and cardiovascular risk factors, such as diabetes mellitus (DM) and dyslipidemia (DL), were also higher in the HF group. This indicates that many patients with cancer had cardiovascular diseases and their risk factors at baseline in this study.

Risk factors for HF after anticancer treatment
Because the burden of HF is disproportionately distributed among the elderly [14], we divided the subjects into three age groups: ≤ 64-years-old, 65-74-years-old, Fig. 1 Enrollment of patients treated for cancer and identification of incidences of HF in this study. Patients with cancer were enrolled in this study on the first day of anticancer agent administration. Patients with or without HF diagnosis were selected as depicted and ≥ 75-years-old. The impact of age on HF after the administration of anticancer agents was assessed using Kaplan-Meier analysis for the three age groups. The HF was more prevalent in the two older groups compared to the youngest age group during the observational period (Fig. 2). Next, we evaluated the risk factors of HF after treatment with anticancer agents. In the Cox's proportional hazard model after adjusting for sex and comorbidities, HF was more prevalent in the older groups than in the younger group; the hazard ratio was 1.23 (99%CI 1.14-1.33; P < 0.01) for the 65-74 years group and 1.67 (99%CI 1.55-1.80, P < 0.01) for ≥ 75 years group compared to the youngest group, respectively ( Table 2). Comorbidities of cardiovascular

Risk factors for HF development
We evaluated the risk factors of HF after treatment with anticancer agents by multivariable analysis. We investigated the impact of asymmetrical combinations of risk factors on HF in each of the three age groups and survival tree analyses based on the CART [15]. As the risk of developing HF depends on age (Fig. 2), we evaluated the structured risk factors with age stratification using the CART method (Fig. 3). Notably, the decision trees obtained by the CART method indicated that the risk factors for developing HF varied in each age group. IHD was the root node in the two older groups (Fig. 3). To better characterize the risk structures in the three age groups, we stratified the risk of HF development into three groups: low-risk (relative HR < 2 compared to the lowest HR within the corresponding age group), medium-risk (2 ≤ relative HR < 3), and high-risk (relative HR ≥ 3) (Fig. 3). In the group aged ≤ 64 years, the combinations of "HT, DM, and AF," "AF and male sex," and "DM, IHD and male sex" were associated with high risk. In the two older groups, IHD was the key factor associated with high risk. However, in the group aged ≥ 75 years, the combination of HT, DM, and CKD or hyperuricemia (HU) were also associated with high risk in the absence of IHD (Fig. 4). AF appeared in all three high-risk groups, and AF alone was sufficient to constitute a high-risk group. In this stratification of the three risk groups, each of the risk groups was constituted by various combinations of risk factors that were different among the age groups. Next, we performed a subgroup analysis of the top six cancer types with the highest numbers of patients to clarify risk factors common to all cancer types. IHD and AF were significantly associated with HF development in all cancer types (Table 3). In addition, the results for IHD and AF were similar to those from the CART analysis of all patients and belonged to the high-risk group.

Discussion
In the present study, we evaluated the prevalence and risk factors of HF in 140,327 patients with various cancers who were treated by anticancer agents in Japan using a comprehensive database. First, among patients treated for cancer, 4.0% experienced HF after anticancer therapy. Second, the presence of CVDs and their risk factors predicted HF development after treatment with anticancer agents, and the risk factors for HF formed context-dependent structures that were distinct among different age populations. The strength of our study was the large sample size used to show the prevalence and risk factors for HF after anticancer treatment.

HF prevalence and prognosis in cancer
We found that the prevalence of HF was 4.0% in the study population, which was apparently higher than that in the general population in Japan over the age of 45 (approximately 1.6%) [16] and in the United States over the age of 65 (approximately 1.0%) [17]. The apparently higher prevalence of HF in patients with cancer than in the general population is consistent with a prior study that showed an HF prevalence of 7.06% after the diagnosis of various types of cancer [18]. This may be explained by several factors. First, because both cancer and HF predominantly affect older people, a skewed proportion of older people in the study population may have contributed to the higher prevalence of HF. Second, the administration of anticancer agents and associated medical interventions such as high-volume hydration may increase the risk of HF after the administration of anticancer agents. Third, invasive surgical treatments also increase the risk of HF.

Risk stratification in our analysis
The risk factors for HF development in this study included cardiovascular comorbidities and their risk factors [19,20], which was consistent with previous reports regarding the risk factors for HF in general [21][22][23][24][25] and in patients taking anticancer agents [8,9,26,27]. Therefore, our findings suggest that patients with cancer benefit from careful monitoring for HF risk factors as well as early intervention. However, it is unknown whether all risk factors contribute equally to HF or which risk factors predominantly cause HF in patients with cancer. To address this problem, we performed a machine learning-based analysis of the structured risk factor for HF after administration of anticancer agents using the CART algorithm. Our analysis showed that the effect of a given risk factor depends on the context of other risk factors, as illustrated by the distinct decision trees among the three age groups in the CART analysis. This finding implies that better risk stratification for individual patients may be achieved when combinations of risk factor structure are considered. In particular, IHD and AF form the roots of high-risk groups, and early detection and treatment of these diseases are considered important. As the DPC data or other EHR data are continuously generated in day-to-day clinical practice, better risk stratification may be achieved as the size of the dataset and knowledge grow. Such risk stratification is essential for the realization of precision medicine. Notably, the varied and context-dependent risk structures revealed in this study would not be suitable for a simple and static scoring system. However, the EHR has several limitations in assessing various aspects of patient status in clinical practice. It is crucial to carefully design how EHRs can be converted into datasets that are suitable for analyses. Therefore, the optimization of data extraction is essential for obtaining clinically meaningful findings from large datasets, such as the DPC database. Careful and iterative design is required to extract appropriate datasets for machine learning of the structured risk factors, and a collaborative framework among clinicians, clinical scientists, data scientists, and bioinformaticians is important.

Study limitations
Several limitations of this study should be noted, including the nature of the DPC data structure and content. First, this was a retrospective study. DPC is a hospital-based, but not a patient-based, database that is primarily for recording medical procedures and costs but not clinical signs or laboratory data. Furthermore, medical records could not Fig. 3 CART analysis of risk factors for HF development. Risk factors for HF development in indicated age groups according to CART analysis. Patients were stratified according to the presence (y) or absence (n) of the corresponding risk factors. The number (N) of all patients and those who developed HF after starting the administration of anticancer agents (HF) are shown. The combinatorial risk, assessed by the relative hazard ratio (RHR), was expressed relative to the lowest hazard ratio in each age group. The combinatorial risk was stratified and color-coded into low-(1 ≤ RHR < 2, yellow), medium-(2 ≤ RHR < 3, orange), and high-risk (3 ≤ RHR, red) groups. The rightmost panels indicate RHR (red bars) and 95% confidence intervals (blue bars). *P < 0.05 and ***P < 0.001 compared to the lowest risk group (RHR = 1) in each age group ◂ Fig. 4 Graphical summary of risk structures for HF development. The risk factors for HF after treatment with anticancer agents formed context-dependent structures that were distinct among different age groups 1 3 be retrieved when patients moved from DPC institutions to other DPC or non-DPC institutions. Second, diagnoses in the DPC database were less defined or validated than those in retrospective patient record-based or prospective registry studies. Third, we enrolled only those patients treated with anticancer agents. Therefore, a comparison between patients treated with and without anticancer agents cannot be made to evaluate the true impact of a given anticancer agent. Fourth, in this retrospective study, we summarized the big data and determined the structured risk factors of HF after treatment with anticancer agents but did not perform prognostic prediction or validation studies. These are required to confirm whether an intervention to control HF risk factors can reduce the risk of HF development and improve the prognosis of patients treated for cancer in the near future. Fifth, we have no data regarding stage of cancer, which can be directly related to prognosis.

Conclusion
Using a comprehensive DPC database, the present study demonstrated that 4.0% of patients with cancer had HF after treatment with anticancer agents. The machine learning-based approach was able to develop complicated HF risk structures for these patients after age stratification. The findings obtained in the studies such as this one are essential to achieve precision medicine for better outcomes for patients with cancer.
Funding This study was partly supported by grant from the Kurume University Millennium Box Foundation for the Promotion of Science.