Background

Arterial hypertension affects over a billion people worldwide, with an estimated overall prevalence in adults of around 30–45% [1, 2]. This is one of the major risk factors for multiple cardiovascular and renal disorders and represents one of the most preventable causes of morbidity and mortality. Around 5 to 10% of arterial hypertension cases are estimated to have secondary causes, most commonly due to parenchymal renal disease, renovascular hypertension, obstructive sleep apnea, or endocrine diseases, such as Cushing’s syndrome (CS), primary aldosteronism (PA), and pheochromocytoma/paraganglioma (PPGL) [3, 4]. Diagnostic screening for secondary hypertension is complex and expensive [5] and generally restricted to patients with clinical signs, including hypertension in young adults, sudden worsening of blood pressure in normotensive subjects, and drug-resistant hypertension, among others [3, 4]. However, the prevalence of secondary causes of hypertension is often underestimated and many cases remain unrecognized [6,7,8]. Early detection of endocrine forms of hypertension is crucial to control high blood pressure and prevent hypertension-mediated organ damage and related cardiovascular complications, as well as achieve effective long-term treatment [9]. Moreover, hormone excess may increase individual risk of other consequences beyond hypertension, particularly in the case of CS or PA [10,11,12].

Currently, the diagnosis of endocrine hypertension (EHT) relies on hormonal evaluations, with specific diagnostic algorithms for each type of endocrine disorder. Hormone assays present some limitations, including (i) common borderline values, especially in mild forms of over-secretion, where the diagnosis may be missed or wrongly called; (ii) lack of direct estimation of the individual risk to hormone exposure, despite the important inter-individual variability of hormone excess consequences; (iii) recommendation for multiple hormonal tests, with screening and confirmatory strategies still debated, and varying from one center to the other [13]. New biomarkers could potentially allow to directly measure tissue exposure to hormone excess, with the potential improvement of diagnostic accuracy and prediction of individual susceptibility to different consequences.

Circulating biomarkers in the blood have the advantage of being non-invasive when utilized. Leukocyte DNA methylation is particularly convenient as a biomarker, being a chemically stable yet dynamic biological hallmark, with a key role in epigenetic regulation in both health and disease [14]. Recent epigenome-wide association studies exploring leukocyte DNA methylation in hypertensive patients versus normotensive controls identified different loci associated with blood pressure regulation [15,16,17]. However, it is not yet established whether blood methylation profiles could also discriminate patients with EHT from those with primary hypertension (PHT). Indeed, hormone excess may impact peripheral tissues at the epigenetic level, measurable as hormone-specific methylome signatures. This hypothesis is supported by a recent work, in which we were able to identify a blood methylome signature of CS [18].

In the present study, we explored leukocyte methylation profiles in hypertensive patients. Specifically, we analyzed whole blood methylome signatures in patients with PHT or EHT related to CS, PA, or PPGL.

Results

Patients

Blood samples were collected from 255 patients. Patients had been diagnosed either with primary hypertension (PHT, n = 42), or endocrine forms of hypertension (EHT, n = 213), either related to the presence of CS (n = 57), PA (n = 101), or PPGL (n = 55). Each group, except PA, showed a predominance of female patients, and the mean age was lower in patients with EHT than in patients with PHT (p < 0.01; Table 1).

Table 1 Sex and age distribution in the different hypertension types

Whole blood DNA methylome profile in hypertension

Whole-genome blood DNA methylome was determined for the 255 samples, with 731,635 informative CpG sites in all samples. Unsupervised principal component analysis showed a distribution of samples with separation depending on the type of hypertension (Fig. 1a). The main components of variability were associated with the type of hypertension, but also white blood cell count variation and, to a lower extent, with age and sex (Additional file 1: Fig. S1).

Fig. 1
figure 1

Global structure of blood DNA methylation profile in different hypertension types. a Sample projections based on the first two principle components (PC1, PC2) of unsupervised PCA performed on the whole dataset (n = 731,635 CpG sites, n = 255 samples). The center of each group is indicated by the larger white circles. b Boxplot of the most variable CpG sites (n = 48,452 with M-Value SD > 0.4) in the four hypertension types. #CS, PA and PPGL were aggregated into a single EHT type, with random sampling of 55 samples from each group. *Student’s t-test p value < 0.05

The mean methylation level of the most variable CpG sites among all samples (n = 48,452) was not significantly different between the different types of hypertension, except in the group of CS patients, where methylation was decreased (Fig. 1b).

The methylome signature of endocrine hypertension

Comparing the 731,635 informative CpG sites of each type of EHT to PHT, differentially methylated CpG sites were identified in EHT (n = 123,288), CS (n = 197,897), PA (n = 114,988), and PPGL (n = 66,976) versus PHT, respectively (Additional file 2: Table S1). Differentially methylated CpG sites were distributed all over the genome (Fig. 2c–f), and the proportion of hypo- and hyper-methylated CpG sites was not strongly related to CpG genomic density (Fig. 2a), nor to CpG proximity to genes (Fig. 2b). Gene set enrichment analysis of genes associated with the differentially methylated CpG sites in the four comparisons revealed enrichment in pathways related to blood pressure regulation mechanisms and hypertension, including MAPK, Rap1, phospholipase D and calcium signaling pathways [19,20,21,22,23] (FDR < 0.001 in all comparisons; Additional file 3: Table S2).

Fig. 2
figure 2

Characteristics of differentially methylated CpG sites in endocrine hypertension. a Distribution of differentially methylated CpG sites related to CpG genomic density. b Distribution of differentially methylated CpG sites related to genes structure. cf Manhattan plots representing the genomic distribution of differentially methylated CpG sites for each type of endocrine hypertension, in the comparisons CS vs. PHT (c), PA vs. PHT (d), PPGL vs. PHT (e), EHT vs. PHT (f). The most differentially methylated genes (mean –log10(p value) > 11, Limma) are highlighted in black

Beyond the analysis of individual CpG sites, a specific analysis of differentially methylated genes was performed (Table 2 and Additional file 4: Table S3), highlighting the implication of specific genes for each type of EHT (Fig. 2c–f). The most significantly differentially methylated genes were FKBP5 in CS and CD300LG in PA.

Table 2 Top 15 genes associated with the most differentially methylated CpG sites for each endocrine hypertension type

Prediction of endocrine hypertension

Four different machine learning methods—Lasso, Logistic Regression, Random Forest, Support Vector Machine—were used to build a prediction model for each type of endocrine hypertension on subsets of samples (training cohorts), and subsequently tested on remaining samples (validation cohorts). The prediction performance was better for individual types of EHT (CS, PA and PPGL) than for predicting EHT as a whole. Indeed, balanced accuracy against PHT reached 0.95 for CS using SVM, 0.88 for PA using Lasso, 0.83 for PPGL using RF, and 0.74 for EHT using LR (Fig. 3a; selected CpG sites in Additional file 1: Table S4). Misclassified samples (false negatives and false positives) mostly showed a positioning on the global methylome space in regions of overlapping between different hypertension types (Additional file 1: Fig. S2). In the two comparisons with sample numbers imbalance (PA vs. PTH and EHT vs. PHT), down- and up-sampling were used to further explore prediction score, with no major impact on performance (Additional file 1: Fig. S3). The selected sets of CpG sites were specific for each endocrine hypertension type (Fig. 3b). These specific CpG sites, except for just one CpG site in CS, were all present among the differentially methylated CpG sites selected for each type of endocrine hypertension (Additional file 2: Table S1).

Fig. 3
figure 3

Prediction of endocrine hypertension by blood methylome marks. a Heatmap of prediction scores in validation cohorts for each type of endocrine hypertension (CS, PA, PPGL, EHT) versus primary hypertension (PHT), obtained with different machine learning approaches. RF random forests, SVM support vector machine, LR logistic regression, Lasso Least Absolute Shrinkage and Selection Operator. b Venn diagram of top performing features selected for each comparison (prediction score of reference: balanced accuracy)

Discussion

In this work, we were able to identify a blood methylome signature discriminating different types of EHT -including CS, PA and PPGL- from PHT. These different conditions are currently diagnosed with specific hormone measurements, representing the gold standard [9]. However, hormone assays are not always conclusive, with accuracies far from perfect [24,25,26], and quite expensive. This leads to screening strategies that target hypertensive patients with the highest risk of endocrine etiology [3, 4]. These probabilistic approaches could miss cases compared to a full screening of the entire hypertensive population [27]. Specific non-hormonal biomarkers could potentially help in situations where (i) hormone levels are not applicable—for instance, exogenous glucocorticoid administration; (ii) hormone dosages are not informative, due to situations interfering with hormone assays [28]; (iii) hormone dosages are borderline, not providing a proper classification of patients. Some non-hormonal markers of endocrine hypertension have been proposed, including specifically targeted metabolomics [29], and miRNAs [30, 31]. Blood methylation marks may also be suitable for direct measurement of hormone excess on peripheral tissues. Indeed, DNA methylation is dynamic but stably preserved [14] and could potentially reflect the individual impact of hormone excess over time better than hormonal evaluation, that need to be standardized for time, feeding-status, salt and fluid intake, and co-medication, among others.

Beyond the specific benefits for diagnosis, massive molecular screenings in common diseases like hypertension are expected to improve personalized medicine, through the identification of features predicting response to treatment, or specific complications. Addressing these aims will require specific study designs, focused on follow-up. For these up-coming studies, robustness and statistical power will improve by restricting the number of candidate features to be tested. This prior feature selection will be helped by this work and others, providing lists of CpG sites associated with each type of hypertension or with each type of hypertensive treatment [32].

Of note, the methylome signatures identified here are impacted by white blood cell count. Indeed, neutrophils increase and lymphocytes decrease in case of CS and, to a lesser extent, of PA [33,34,35,36]. However, we showed that white blood cell count was contributing to a limited part of methylome variability. To consider this effect, the methylome signatures provided here were adjusted for white blood cell count. The proportion of neutrophils was used as a unique proxy for this adjustment, given the high correlation with proportions of lymphocytes and lymphocyte subtypes.

Here, we identified blood DNA methylation loci associated with EHT. The blood DNA methylome profile associated with the different types of hormonal excess was also explored, with the identification of genes differentially methylated, such as FKBP5 in CS and CD300LG in PA. FKBP5 encodes a co-chaperone regulating the glucocorticoid receptor activity [37], and its increased expression and decreased methylation in blood have already been described in CS [18, 38]. Moreover, stress-induced epigenetic up-regulation of FKBP5 has been associated with an increased risk of cardiovascular events [39]. CD300LG encodes a vascular endothelial cell adhesion molecule, which is implicated in leukocyte binding and transmigration, and whose expression seems to be differently influenced by the local environment in different tissues [40]. Aldosterone is known to induce endothelial dysfunction [41] and to promote a pro-inflammatory phenotype, with vascular infiltration of monocytes, macrophages, and lymphocytes, as demonstrated in both animal models of hypertension [42, 43] and human vascular endothelial cells [44]. Whether CD300LG possibly contributes to the effect of aldosterone on vascular endothelium and lymphocyte migration, and whether this association represents a specific trait of aldosterone-induced hypertension, remains to be established. One study explored CD300LG variants in a mouse model of hypertension, but with uncertain conclusions [45]. For a majority of other genes, the biological relevance in endocrine hypertension remains to be explored, including the impact of DNA methylation of gene expression.

The hypothesis of a common blood methylome signature of EHT was also addressed. When we compared CS-, PA- and PPGL-related signatures, a limited number of methylation marks were shared, suggesting the prominence of individual endocrine hypertension signatures over a common signature of EHT. However, sample projections based on the variability of methylation levels showed overlap between PA and PPGL, and a tendency of CS to cluster separately. Indeed, secondary aldosteronism is common in PPGL [46] and could in part explain this similarity with PA. To which extent an endocrine-related molecular signature of hypertension shared between CS, PA and PPGL, but distinct from PHT, is induced remains to be determined. By directly comparing EHT as a whole to PHT, we provide here a set of methylome marks, which may facilitate the identification of such molecular signatures.

Whether these methylation marks better reflect the risk of complications related to hormone excess also remains to be elucidated. Of note, the methylation signatures identified here were based on whole methylome analysis. Routine use of such a tool would be limited by cost and complexity. Reliable biomarkers should allow the prediction of a clinical endpoint of interest more easily, at a lower cost and over a shorter time span than the direct measurement of the clinical end point [47]. For methylation marks, this would imply a technology transfer to targeted measurements, such as pyrosequencing, methylation specific-MLPA, methylation specific-high resolution melting analysis, or nanopore sequencing [48, 49]. Setup and validation of suitable targeted assays have to be performed for the identified methylome marks of EHT.

A potential interest of selected methylome marks could be the integration in a targeted multi-omic chip for the diagnosis of endocrine hypertension (ENSAT-HT project; http://www.ensat-ht.eu). Whether these methylation marks could positively impact the prediction of a multi-modal classifier remains to be determined, especially for sensitive detection of the rarest forms of endocrine hypertension, namely CS and PPGL.

This work aimed to develop machine learning models with methylome data to predict endocrine hypertension. This approach might be seen as too simple regarding the pathophysiological complexity of these conditions, and the extensive and complex diagnosis workup currently needed to call these conditions. However, machine learning could help in leading a step forward toward a robust and accurate screening tool and could provide a unique opportunity to understand the complex relationships within an omic dataset and a complex phenotype. On the other side, this data-driven strategy may be impacted by biases, including recruitment biases—in expert centers and optimal conditions-, and representation biases—considering the rare prevalence of CS or PPGL compared to the other types of hypertension. This latter risk is mitigated here by the study design splitting samples into independent training and validation cohorts. Finally, the heterogeneity of sample sizes for the different types of endocrine hypertension may impact the machine learning process. However, the group of EHT was balanced in terms of endocrine hypertension types when analyzed as a whole, and optimized strategies were used for unbalanced comparisons, including balanced accuracy measurement, or down- and up-sampling of the training cohorts.

Conclusions

Endocrine hypertension can be identified by blood DNA methylation markers, with specific methylation signatures for each type of endocrine disorder.

Methods

Patients and samples

A total of 255 blood samples were collected in eight different centers, as part of the European ENSAT-HT study (http://www.ensat-ht.eu). The cohort included patients with PHT (n = 42) and patients with different types of EHT, namely CS (n = 57), PA (n = 101), or PPGL (n = 55) (Additional file 5: Table S5). The diagnosis was made according to the current guidelines for screening and management of each specific disease [4, 50,51,52]. Diagnosis of PHT also required the exclusion of endocrine hypertension and other secondary causes (renal disease, pharmacological cause and obstructive sleep apnea syndrome) as well as the exclusion of patients with low-renin hypertension. Patients with uncertain diagnosis, those with pregnancy, severe comorbidities (including heart failure, chronic kidney disease, active malignancy) were also excluded.

Whole-genome DNA methylation measurement

Leukocyte DNA was extracted from EDTA blood samples, using the DNA Isolation kit for Mammalian Blood (Roche, Basil, Switzerland). DNA quality was assessed on a Genomic DNA ScreenTape system (Agilent, Santa Clara, CA, US) and quantified using a Qubit 3.0 Fluorometer (Thermofisher, Waltham, MA, US). DNA was treated by bisulfite and then hybridized to the Infinium MethylationEPIC BeadChip (Illumina, San Diego, CA, US; ~ 865,000 sites), starting from 500 ng of DNA. All experiments were performed following the manufacturer’s instructions at the P3S Post-Genomic Platform of Sorbonne University (Paris, France).

Bioinformatics and statistics

All samples passed the quality controls provided by the Genome Studio software (v. 2011.1; Illumina). Data were exported in Intensity Data (IDAT) format and then processed using the minfi package (v. 1.32.0) [53] in the R software environment (v. 3.6.3) (https://cran.r-project.org/). An additional quality control of samples was performed based on probe detection quality, confirming the good quality of each sample (mean detection p value < 0.05).

Data were normalized using the stratified quantile normalization procedure implemented in the “preprocessQuantile” minfi function [54] and the methylation score for each CpG probe was extracted as a β-value. Then, the ChAMP package (v. 2.16.1) was used to filter the probes [55]. A total of 731,635 CpG sites passed the following criteria: detection p value < 0.01, presence of the targeted CpG site, absence of frequent SNPs in the probe, single hybridization hit, autosomal target. Of note, filtering the probes before normalization versus after normalization did not significantly impact the values (data not shown).

The significant components of variation in the dataset were assessed using the singular value decomposition method (SVD) for methylation data [56], and a detected batch effect (Slide) was corrected using the “ComBat” method [57], as implemented in the ChAMP package.

White blood cell count of subpopulations (neutrophils, lymphocytes B, lymphocytes T4, lymphocytes T8, lymphocytes NK, monocytes) were estimated by the reference-based “RefbaseEWAS” method [58] implemented in the ChAMP package (Additional file 6: Table S6). Since neutrophils were the most represented cell type in all samples, and since the proportions of neutrophils and lymphocytes were anti-correlated (Pearson’s r = − 0.96; Additional file 1: Fig. S1c, d), the estimated proportion of neutrophils was chosen as the unique proxy reflecting variations in white blood cell count.

M-values, used for statistical analysis, were calculated from β-values (log2 ratio of the intensities of methylated vs. unmethylated probes) using the lumi package (v. 2.36.0) [59].

The global data structure was assessed on β-values by principal component analysis (PCA), using all the CpG probes. The most variable CpG probes were selected based on their Standard Deviation (SD cutoff: 0.4, n = 48,452 probes) among all samples. Differentially methylated CpG sites were identified using the Limma package (v. 3.40.6) [60] for each of the following comparison: EHT vs. PHT, CS vs. PHT, PA vs. PHT, PPGL vs. PHT. For the EHT vs. PHT comparison, the EHT group was obtained by a random sampling of CS, PA and PPGL (n = 55 samples for each type of endocrine hypertension), to avoid imbalance between the endocrine hypertension types. The estimated neutrophil count and age were included as covariates and CpG sites were selected based on an adjusted p value < 0.05 (Benjamin-Hochberg correction procedure; Additional file 2: Table S1). Gene set enrichment analysis of genes associated with differentially methylated CpG sites was performed using the “gometh” method (KEGG chosen as collection of pathways to test) implemented in the missmethyl package (v. 1.18.0) [61], adjusting for the number of CpG sites associated to each gene [62] (Additional file 3: Table S2). For each endocrine hypertension type, differentially methylated genes were selected when they presented at least three differentially methylated CpG sites in each comparison and a − log10(p value) > 5 (Additional file 4: Table S3).

For predicting endocrine hypertension, the same four comparisons described above were considered for supervised learning (CS vs. PHT, PA vs. PHT, PPGL vs. PHT, and EHT vs. PHT). For each comparison, training (80%) and validation (20%) samples were randomly selected by maintaining the initial proportion of samples in each hypertension type in each comparison (Additional file 1: Table S7). Starting from the most variable CpG sites previously selected (n = 48,452 probes with M-value SD > 0.4), the most discriminating CpG sites for each comparison (n = 200 for CS vs. PHT, n = 141 for PA vs. PHT, n = 70 for PPGL vs. PHT, and n = 135 for EHT vs. PHT) were pre-selected from the training sets using the boruta package (v. 0.3) [63] (Additional file 1: Table S8). For each comparison, four different classifier methods were then used: (i) penalized Lasso (Least Absolute Shrinkage and Selection Operator) regression with tenfold cross-validation, using the glmnet package (v. 4.0–2) [64]; (ii) Support Vector Machine (SVM) [65]; iii) Random Forest (RF) [66]; iv) Logistic Regression (LR) [67] with Fast Correlation-Based Filter (FCBF) [68] using the orange toolbox (v. 3.30.1) [69] (Additional file 1: Table S9). The trained models were tested on the validation sets for each comparison, and the prediction performance was evaluated using balanced accuracy, sensitivity, specificity, F1, Kappa, and AUC scores. The model training was further evaluated using up-sampling and down-sampling (70).

Group comparisons were performed using Student’s t test for variables normally distributed, or chi-square test for binary categorical variables. All tests were computed in the R software environment.