Characteristics of the study cohorts
The clinical information and DNA methylation data of FHS participants (Offspring Cohort Exam 8) were used to develop a HFpEF risk prediction model. After excluding samples with censoring, with unqualified DNA methylation, and lack of medical information, a total of 984 eligible participants were obtained as the final samples with complete information over a follow up of 8 years (Fig. 1). Among them, 877 participants did not experience heart failure and 91 HFpEF events occurred. A total of 95 EHR variables (the simplified version is shown in Table 1, the full version is shown in Additional file 2: Table S1) and 402,380 CpGs were obtained for further analyses. Since their DNA methylation data were sequenced in University of Minnesota (UMN, 738 no-CHF and 59 HFpEF) and Johns Hopkins University (JHU, 139 no-CHF and 32 HFpEF), respectively, which can be presumed as dependent datasets, data from UMN batch and JHU batch were used as the training set and the testing set (Fig. 1; Table 1). Considering the limited sample size, we did not further balance the sample size. In the training and testing sets, the median follow-up period was 8.69 ± 1.25 years and 8.64 ± 2.05 years, with mean participant’s ages of 64.68 ± 8.29 and 70.13 ± 8.91 years, and the proportion of male participants were 37.39% and 70.76%, respectively (Table 1).
Table 1 Demographic of participants in the training set and testing set (the simplified version) Prediction model construction using DeepFM
After data pre-processing, we obtained 318 DMPs and 25 clinical characteristics (Additional file 2: Table S2). Next, we performed feature selection using LASSO and XGBoost algorithms. The LASSO algorithm simultaneously performs feature selection and regularization, aiming to enhance the predictive accuracy and interpretability of statistical models by selectively putting variables into the model. The important parameter, lambda, contributes to feature selection. We obtained 4 set of features according to the value of lambda (lambda.min and lambda.1se for calculating AUC and misclassification error) and obtained 80 features intersected (Fig. 2a–c). The XGBoost algorithm integrates many weak classifiers together with regularized boosting technique to form a strong classifier. It took 80 features from LASSO and further reduced to 30 features, including 5 clinical variables and 25 CpG loci, which were next fed into the DeepFM model. Five clinical variables (age, diuretic use, body mass index (BMI), albuminuria, and serum creatinine) accounted for nearly 20% of the contribution, explained by the gain index (Fig. 2d). The cg20051875 had the largest gain index, accounting for 13% of the total contribution. In addition, 25 CpGs accounted for 80% of the total contribution, although the contribution of each CpG was weak.
Based on the DeepFM method, we developed the HFmeRisk model to investigate the feasibility of the early-stage risk prediction for HFpEF using 25 DNA methylation sites and 5 clinical features. We also tested the performance of the DeepFM algorithm using only 5 clinical features or 25 DNA methylation features alone. In the testing set, the AUCs for the HFmeRisk model, the model with EHR alone, and the model with CpGs alone were 0.90 (95% confidence interval [CI] 0.88–0.92), 0.78 (95% CI 0.73–0.82), and 0.65 (95% CI 0.62–0.67), respectively (Fig. 3a; Additional file 2: Table S3). Although the DNA methylation model achieved a lower AUC, AUC was improved when combined with EHR to form the HFmeRisk model. In summary, the “EHR + DNA methylation” model achieved the best AUC in most cases in the testing set.
Calibration of the HFmeRisk model is shown in Fig. 3b. The Hosmer–Lemeshow statistic was 6.17, with P = 0.632, indicating that the HFmeRisk model is well calibrated in the testing set.
Similarly, using the decision curve (Fig. 3c), the HFmeRisk model also showed a higher net benefit than the other models. Decision curve of HFmeRisk model is higher than the gray (“All”) and black (“None”) line. Patients would benefit more from the prediction of HFmeRisk model compared to other schemes (5 EHR model and 25 CpGs model) in most ranges.
Evaluation of the HFmeRisk
We evaluated the performance of HFmeRisk from the aspect of number of features, effect of age, external data verification, comparison with other models, comparison with other omics features, and covariate shift between training and testing subjects, respectively. To evaluate the effect of the number of features on the HFmeRisk model, we selected the top 5, top 10 and top 15 features for further modeling and found that the number of features had a strong effect on the model results (Additional file 2: Table S4). These results suggest that the number of features in the model cannot be reduced further so as to maintain sufficient predictive performance.
Since age is a very critical clinical characteristic in the prediction of HFpEF, it is particularly important to assess the impact of aging-related CpGs on the HFmeRisk model [27, 28]. We used aging-related CpGs reported in 3 articles [29,30,31] to validate their predictive power, and obtained AUC of 0.655, 0.530, and 0.534 in the testing set, respectively (Additional file 1: Materials and Methods Section 3 and Additional file 2: Table S5), indicating that the 26 age-related CpGs mentioned in Hannum G et al. study appeared to have equal predictive power to the 25 CpGs in the HFmeRisk model (AUC = 0.65). However, we combined 26 age-related CpGs mentioned in Hannum G et al. study and 5 clinical features of HFmeRisk model (age, diuretic use, BMI, albuminuria, and serum creatinine) together and obtained AUC of 0.858 in the testing set (Additional file 2: Table S5) which is less than that in HFmeRisk model (AUC = 0.90), indicating that the HFmeRisk model performed better in the testing set from the combined feature perspective. The reason may be that the 5 clinical variables we considered already included age, although the age-related 26 CpGs and the 25 CpGs in the HFmeRisk model had comparable predictive power, the age-related CpGs showed no advantage when combined with the clinical characteristics (including age). Also, using only clinical characteristics (age and the remaining four clinical variables) performed worse than the HFmeRisk model. After that, we also did a Pearson correlation analysis between 25 CpGs and age in the training and testing set, and the absolute value of the correlation was less than 0.24 (Additional file 2: Table S6). In addition, when we performed the HFpEF prediction using the age feature alone, the AUC is 0.68 (Additional file 2: Table S5), which further confirms that age has some predictive power, but it does not predict HFpEF well alone.
To evaluate the impact of the sample size of training set on the HFmeRisk model, we randomly selected 25%, 50%, 60%, and 75% of the training set participants and found that the results of the testing set performed stably regardless of the sample size of the training set, indicating that the prediction results were independent of the sample size of the training set (Additional file 2: Table S7).
Because DNA methylation data is not currently available in prospective cohort populations and the HFmeRisk model contains five clinical features, there are currently no suitable datasets in public databases that could be used as external testing sets. To further illustrate the validity of the HFmeRisk model, we evaluated the model using 36 patients who had developed HFpEF and 2 samples who did not have HFpEF after 8 years in the Framingham Heart Study cohort but did not appear in the HFmeRisk model, and obtained an AUC of 0.82 (Additional file 3: Fig. S1). We attempted to demonstrate that the predictive power of the HFmeRisk model for HFpEF is reliable by evaluating 38 samples.
In addition, we compared the performance of the HFmeRisk model with nine benchmark machine learning models that are currently widely used (Additional file 1: Materials and Methods Section 2). Although there were slight differences among their AUCs (AUC = 0.63–0.83) using the same 30 features, the DeepFM model still achieved the best performance (AUC = 0.90, Additional file 3: Fig. S2 and Additional file 2: Table S3). We also used the Cox regression model, a common model for disease risk prediction, for comparison with machine learning model. If the variables with P < 0.05 in univariate analysis were used for multivariate analysis, the screening of variables from the 450 K DNA microarray data works tremendously, so we directly used the 30-dimensional features obtained by dimensionality reduction for multivariate analysis of cox regression. The performance of the models was compared using the C statistic or AUC, and the DeepFM model (AUC = 0.90) performed better than the Cox regression model (C statistic = 0.85). Calibration was also assessed by comparing predicted and observed risk (Hosmer–Lemeshow P = 0.199). The calibration curves for the possibility of 8-year early risk prediction of HFpEF displayed obvious concordance between the predicted and observed results (Additional file 3: Fig. S3).
To assess whether other omics data could also predict HFpEF, HFmeRisk was compared with other omics models (“EHR + RNA” model and “EHR + microRNA” model). For “EHR + RNA” model and “EHR + microRNA” model, we used the consistent feature selection and modeling approach with the HFmeRisk model (Additional file 1: Materials and Methods Sections 4 and 5; Additional file 3: Fig. S4–S9). The AUC results show that the HFmeRisk model combining DNA methylation and EHR has the best performance under current conditions compared to the "EHR + RNA" model (AUC = 0.784; Additional file 3: Fig. S6) and "EHR + microRNA" model (AUC = 0.798; Additional file 3: Fig. S9), suggesting that DNA methylation is suitable to predict the CHF risk than RNA.
To test whether the training subjects and the testing subjects are sufficiently similar in terms of clinical parameters, which is equivalent to determine whether a covariate shift has occurred, we used adversarial validation to test whether the distribution of the training and testing sets are consistent. If a covariate shift occurs in the data, it is theoretically possible to distinguish the training data from the testing data with a higher accuracy by a classifier. Here, AUC and Matthews correlation coefficient (MCC) were used to measure the results [32]. The general MCC threshold can be set to 0.2, and MCC > 0.2 indicates the phenomenon of covariate shift. The MCC of training and testing subjects is 0.105 and the AUC is 0.514 (Additional file 1: Materials and Methods Section 6; Additional file 3: Fig. S10), indicating that no covariate shift occurs and the training set and the testing set are distributed in the same way.
HFmeRisk model is superior to the published CHF risk prediction model
Furthermore, we compared the performance of the HFmeRisk model with that of published CHF risk prediction models. William B. Kannel et al. proposed a 4-year risk appraisal model (using 9 EHR features) to assess the risk of CHF by gender in the FHS cohort using a mixed logistic regression algorithm [33]. Since we use the same FHS cohort to build models, it is possible to evaluate both models simultaneously. Due to data limitations, the reconstructed Willliam’s model contains only 79 participants (52 males and 27 females). Detailed characteristic information is listed in Additional file 1: Materials and Methods Section 7. Ultimately, the AUCs for the HFmeRisk model and Willliam’s model were 0.99 and 0.74 for male, 0.94 and 0.89 for female, respectively (Fig. 3d). In the HFmeRisk model, the number of male and female participants are different but the AUC results are similar, which shows that the model is not sensitive to gender. Additionally, adding the gender feature to the HFmeRisk model did not get an improvement in the testing set (Additional file 2: Table S8). Since our data did not include the characteristics of other published articles, we directly compared the AUC or C statistic of the two published articles. Sadiya S. Khan et al. described 10-year risk equations for CHF (using 10 EHR features) with a C-statistic of 0.71–0.87 in the validation set, and Edward Choi et al. established an early detection model (using 58,652,000 medical codes) of CHF with an AUC < 0.88 in the testing set [10, 34]. Their AUCs are all less than that of HFmeRisk, indicating the superiority of risk prediction by both DNA methylation and clinical features.
Biological functions of CpGs involved in HFmeRisk model
Next, we investigated the biological function of the 25 CpGs in HFmeRisk model. Approximately 2/5 of them were located in the promoter region (TSS200, TSS1500, 5UTR, and 1stExon). Most of the CpG loci were located in CpG islands or the “Open sea” and located on 17 genes and 8 intergenic regions in total (Table 2). Among them, the DNA methylation level of cg10083824 and cg03233656 significantly negatively associated with the expression of target genes, GRM4 (R = −0.38, p = 0.0054) and SLC1A4 (R = −0.31, p = 0.025), respectively, in HFpEF participants, while the association among normal participants were not obvious (Fig. 3e). It implies that the existence of some regulatory role of DNA methylation and gene expression. They were involved in 16 gene ontology terms (Fig. 4a; Additional file 2: Table S9) and 10 KEGG pathways (Fig. 4b; Additional file 2: Table S10). Overall, they have key functions for intercellular signaling, interaction and energy metabolism, and involved in pathways of urea cycle (SLC25A2/cg05845376) [35], the synthesis of cytochrome enzymes (CYP2E1/cg21024264) [36], the amino acid metabolism (MRI1/cg25755428, GRM4/cg10083824, and GRIK4/cg06344265) [37], the amino acid transportation (SLC1A4/cg03233656) [38], the activation of the amino acid (GARS/cg21429551) [39] (Fig. 4c, d; Additional file 2: Table S11–S12; Additional file 3: Fig. S11). Together, these findings give new evidence into the HFmeRisk model.
Table 2 The 25 CpGs associated with HFmeRisk model Furthermore, we explored the relationship of the genes twenty-five CpGs located with disease or trait by intersecting with published GWAS results. All these genes were reported to be associated with risk factors for heart failure such as BMI (GRM4, SLC25A2, and ZBTB20) [40], systolic blood pressure (SLC1A4, ZBTB20, and SLC25A2) [41], ejection fraction (SLC1A4 and DLGAP1) [42], atrial fibrillation (SLC25A2 and SLC1A4) [43], coronary artery disease (ZBTB20 and SLC25A2) [44], type 2 diabetes (ZBTB20) [45], cardiac Troponin-T levels (DLGAP1) [46], diastolic blood pressure (RHOBTB1) [47], gout (CYP2E1) [48], implying the scientific validity of CpGs in model for CHF risk prediction.