Background

Primary aldosteronism (PA) is one of the most common causes of secondary hypertension. The estimated prevalence ranges from 3–6% in primary care [1, 2] to 5–11% in referred hypertensive patients [3, 4] and is even higher in patients with resistant hypertension (more than 11%) [5]. Diagnosing PA is important as targeted therapy may lower the increased risk for cardiovascular events and target organ damage [6] in these patients.

As supported by the Endocrine Society Clinical Practice Guideline, diagnosis of PA is generally made by a sequence of blood tests: elevated aldosterone-to-renin ratio (ARR) [7], followed by confirmation of autonomous aldosterone production by non-suppressible aldosterone levels [8]. Many antihypertensive medications change renin and aldosterone levels, falsely increasing or decreasing the ARR, which leads to false positive or false negative test results [9, 10]. Therefore, aldosterone and renin levels are usually determined after cessation of antihypertensive medications that affect the ARR [8]. However, medication washout raises safety concerns in patients with severe hypertension or individuals with a recent cardiovascular event in which tight blood pressure (BP) control is necessary. In addition, a substantial number of patients experience symptoms during the washout period: i.e. (worsening of) headache, fatigue or palpitations.

Since diagnostic testing is burdensome, potentially harmful and costly, it is necessary to select carefully which patients to further test for PA. The Endocrine Society Clinical Practice Guideline recommends to perform diagnostic testing in all patients with ‘an increased risk of having PA’, this includes patients with hypertension and either sustained BP above 150/100 mmHg, spontaneous or diuretic-induced hypokalemia, adrenal incidentaloma, obstructive sleep apnea or a family history of early onset hypertension or cerebrovascular accident, patients with resistant hypertension and first-degree relatives of patients with PA [8]. The majority of hypertensive patients referred to hospitals fulfil at least one of these criteria, meaning that most referred patients should be tested for PA. With a prevalence of 4–10% this strategy results in many negative test results and thus to unnecessary testing and costs. This may be one of the reasons that the Clinical Practice Guideline is poorly adopted and that many patients with hypertension are left unscreened for PA [11]. Therefore, we aimed to develop and validate a clinical decision tool to determine which patients with difficult-to-control hypertension have a low probability of PA and do not need to undergo intensive testing. Thereby we aim to limit exposure to invasive testing while at the same time increasing the efficiency of testing in the remaining patients.

Methods

Study population

1125 hypertensive patients were referred to the University Medical Center Utrecht between January 2010 and October 2017. Patients with difficult-to-control hypertension, defined as persistent hypertension despite treatment according to the current guidelines and/or presence of (sub) clinical vascular disease [12], were eligible for this cross-sectional study. These patients reflect the group of patients commonly referred by general practitioners. Patients with difficult-to-control hypertension underwent an extensive, standardized diagnostic protocol to evaluate the cause of hypertension. The diagnostic protocol has been outlined in detail previously [13]. In general, patient selection for this protocol was in line with the Endocrine Society Clinical Practice Guideline recommendations, which recommends widespread screening for PA. Since participants in this study were not subject to procedures and were not required to follow rules of behaviour outside the scope of routine clinical practice, no formal consent was required [14, 15], which was approved by the institutional ethics committee (Medisch Ethische Toetsingscommissie Utrecht, University Medical Center Utrecht, Utrecht, The Netherlands). Patients who were normotensive on 24-h ambulatory BP measurement, patients with an evident and treatable cause of hypertension such as steroid or excessive liquorice use, and patients with a recent cardiovascular event in which withdrawal of antihypertensive medication was not safe, did not undergo the extensive, standardized diagnostic work-up and were excluded from this study (n = 299).

Baseline measurements

During the first hospital visit information on patient demographics, medical history and medication use was collected. Office BP was measured with an automated oscillometric device (Omron M7 Intelli IT, OMRON Healthcare, Hoofddorp, Netherlands; or WatchBP Office, Microlife, Widnau, Switzerland) after five min rest while the patient was seated. Three readings were recorded, simultaneously on both arms, and one min apart. The third BP measurement on the arm that measured the highest BP was recorded. 24-h ambulatory BP measurement (WatchBP O3 Ambulatory, Microlife, Widnau, Switzerland) was also performed on the arm that measured the highest BP and recorded BP every 20 to 30 min, day and night. Screening for obstructive sleep apnea was performed with the Philips questionnaire. Patients at intermediate or high risk for obstructive sleep apnea as determined by this questionnaire, were additionally screened by the overnight RUSleeping RTS [16]. Patients with more than 15 events per hour were considered having probable obstructive sleep apnea. Blood tests were performed and included fasting glucose, HbA1c, sodium, potassium, creatinine, cholesterol levels, TSH and fT4. Urinary sodium, albumin and creatinine were determined in the first morning-void urine sample. Calculated albumin-to-creatinine ratios were categorized into category 1 (< 3 mg/mmol), category 2 (3–30 mg/mmol) or category 3 (≥30 mg/mmol).

The diagnostic protocol was changed in June 2015. Before that time, the protocolised set of laboratory tests was performed during the medication washout period. Baseline laboratory tests and pre-washout 24-h ambulatory BP measurements were only performed as indicated.

Diagnosis of primary aldosteronism

Aldosterone and renin levels were measured after cessation of antihypertensive medication for 2 to 6 weeks: 2 weeks for ACE inhibitors, angiotensin II antagonists, calcium channel blockers, alpha blockers and direct vasodilators; 4 weeks for diuretics and beta blockers (the latter including a two-week tapering scheme); and 6 weeks for mineralocorticoid receptor antagonists and direct renin inhibitors. In addition, oral contraceptives and NSAIDs were discontinued for at least six respectively 2 weeks. If BP rose above the predetermined BP level or if the patient experienced hypertensive symptoms, diltiazem and/or doxazosin [9] were prescribed. Plasma aldosterone concentration (PAC; pmol/L) and plasma renin activity (PRA; fmol/L/s) were measured seated in the early morning after patients had been up for at least 90 min and after correction of hypokalemia (potassium < 3.8 mmol/L). PAC and PRA were both measured using a radioimmunoassay, the methodology has been described elsewhere [17]. The ARR was calculated by dividing PAC by PRA. The ARR cut-off value was 5 pmol/fmol/s, which has been shown to reach a sensitivity of 100% in our laboratory [17]. The diagnosis of PA was confirmed by a non-suppressible aldosterone (≥280 pmol/L) after salt loading (SLT) with two liters intravenous saline (0.9%) infusion in 4 h [8]. A CT-scan with/without adrenal venous sampling (AVS) was performed to distinguish between unilateral PA (i.e. aldosterone-producing adenomas or unilateral adrenal hyperplasia) and bilateral PA (bilateral adrenal hyperplasia).

Statistical analyses

The decision tool was built with the following pre-specified clinical variables: age, 24-h ambulatory systolic BP, serum potassium, potassium supplementation (yes/no), serum sodium, eGFR and HbA1c. These variables were chosen based on findings from previous studies in which they significantly differed between patients with PA and primary hypertension and/or were independently associated with PA [3, 18,19,20]. More importantly, these clinical variables are easy to obtain in the hospital setting as well as in general practice, enabling widespread use of the diagnostic model.

Patients with a missing reference test (ARR or saline infusion test result after elevated ARR) were excluded from analysis (n = 2). Baseline characteristics are given for the observed, non-imputed data. Missing values for the variables in the model were assumed to be missing at random conditional on other observed variables and/or the outcome. For further analysis, missing values (age, n = 1; 24-h ambulatory systolic BP, n = 497; serum potassium, n = 451; potassium supplementation, n = 0; serum sodium, n = 454; eGFR, n = 413; and HbA1c, n = 76) were imputed using 20-fold multiple imputation by predictive mean matching (for continuous variables) and polytomous regression (for categorical variables) (R-package MICE). A complete case analysis excluding these patients would yield loss of efficiency and would provide biased results, since missing data rarely occur completely at random and are usually dependent on the outcome [21]. The imputation model included the seven clinical characteristics for the decision tool, the outcome, and several other clinical variables collected during the outpatient visits [22] including prescribed antihypertensive medication, measures of target organ damage and laboratory values measured during medication washout. Ambulatory blood pressure and laboratory values of potassium, sodium and creatinine measured during medication washout were available for > 96 and > 99% of the patients with missing baseline values. Primarily using these follow-up values to build the decision tool would provide incorrect estimates as medication withdrawal changes the laboratory values.

Model derivation was performed by multivariable logistic regression, including the seven pre-specified clinical variables. A separate model was fit on each multiply imputed dataset. No variable selection was performed. Variables were logarithmically or quadratically transformed if this improved overall model fit, determined by Akaike’s Information Criterion. Internal validation was performed by applying a bootstrap-based shrinkage technique [23, 24]. The intercept was adjusted after recalibration.

Model performance was assessed by discrimination and calibration. In addition, test characteristics for different cut-off values of the predicted probability were determined. Discriminative performance was estimated by pooling the c-statistics from each dataset using Rubin’s rule. Test characteristics (positive and negative predictive value (PPV and NPV), positive and negative likelihood ratio (LR+ and LR-), sensitivity and specificity) were obtained similarly. These estimates and their standard errors (except LR+) were logit transformed, pooled by using Rubin’s rule, and then back transformed [25]. The calibration plot was obtained by plotting the observed frequencies of PA against the pooled predictions of the 20 imputed datasets. The final model was presented after pooling the shrunken beta coefficients, recalibrated intercepts and standard errors through Rubin’s rule.

Since other centers may use different post-SLT aldosterone cut-off values to increase sensitivity of the confirmation test, sensitivity analysis was performed by using the final model to predict PA defined as ARR > 5 pmol/fmol/sec confirmed by post-SLT aldosterone ≥190 pmol/L. Predicted probabilities, c-statistic and test characteristics were obtained by applying the final model to the 20 imputed datasets and pooling the results in a similar way as described above. All statistical analyses were performed with R, version 3.4.3 (R Development Core Team, Vienna, Austria).

Results

Baseline characteristics and prevalence of PA

Baseline characteristics for patients included in this cross-sectional, diagnostic study differed from those who were excluded: age 53.2 (±13.3) vs 58.1 (±15.9) years, family history of hypertension 66% vs 54%, 24-h ambulatory BP 144/86 (±17/11) vs 134/78 (±18/10) mmHg and probable obstructive sleep apnea 19% vs 42% (Supplementary File 1). Although the study population comprised 49% women, the proportion of women among the PA cases was only 20% (Table 1). The majority of the patients (94% in the original, non-imputed dataset) fulfilled the Endocrine Society Guidelines of increased risk for PA. 137 of 824 patients (17%) had an elevated ARR (> 5 pmol/fmol/s) and 40 patients (4.9%) had a confirmed diagnosis of PA after saline infusion. AVS was performed in 29 patients, 17 showed lateralization and were therefore diagnosed as having unilateral PA. In 10 patients PA subtyping was based on the CT-scan, which showed a unilaterally enlarged adrenal gland in four patients. One patient did not undergo additional subtyping.

Table 1 Patient characteristics summarized for the total population and patients with and without primary aldosteronism

Development and validation of the diagnostic model

The average shrinkage factor over the 20 imputed datasets was 0.78, resulting in the following diagnostic model after pooling:

Predicted probability = 1/(1 + exp-(− 27.3348 + 0.2082*age (in years) – 0.0021*age (in years)2 + 0.0138*24-h ambulatory systolic BP (in mmHg) – 0.8296*potassium (in mmol/L) + 1.5057 (if potassium is supplemented) + 0.1593*sodium (in mmol/L) – 0.0103*eGFR (in mmol/L) – 0.0246*HbA1c (in mmol/mol))).

The predicted probability for an individual patient can be calculated by using the calculator provided in Supplementary File 2. Pooled model coefficients and odds ratios are presented in Table 2. The calibration plot (Fig. 1) shows the agreement between predicted probabilities and observed frequencies of PA. The discriminative ability of the diagnostic tool was moderate to good with a c-statistic of 0.77 (95%CI 0.70–0.83) (Fig. 2). Table 3 shows the test characteristics (sensitivity, specificity, PPV, NPV, LR+ and LR-) and proportion of patients spared intensive testing for predicted probability cut-off values between 1.0 and 2.5%. This range is chosen as this is the zone where clinical decision making is going on. The proportion of patients spared intensive testing reflects the proportion of patients with a predicted probability equal to or below the cut-off value in which (according to our decision tool) no further testing is needed, and is estimated at 8% (95%CI 4–18%) to 32% (18–50%). These cut-off values carry a sensitivity of 0.98 (95%CI 0.96–0.99) and 0.92 (0.83–0.97), and NPV of 0.99 (0.98–1.00) and 0.99 (0.97–0.99). Sensitivity analysis predicting PA when a lower post-SLT aldosterone cut-off value (≥190 pmol/L) was applied, hardly changed sensitivity and NPV, and demonstrated similar agreement (Supplementary files 34).

Table 2 Model coefficients and odds ratios
Fig. 1
figure 1

Calibration plot showing the agreement between predicted and observed probabilities of primary aldosteronism. Error bars represent corresponding Bootstrap-based standard errors. PA = primary aldosteronism

Fig. 2
figure 2

Receiver operating characteristics (ROC) curve showing the discriminative performance of the diagnostic tool. Discriminative performance is the ability of the model to distinguish between patients with and without primary aldosteronism. The ROC curve plots the sensitivity vs specificity for different cut-off values of the tool (predicted probabilities)

Table 3 Test characteristics and proportion of patients spared intensive testing

Discussion

The main finding of this cross-sectional, diagnostic study is that a decision tool with seven easy-to-measure clinical variables can reliably select patients with difficult-to-control hypertension with a low probability of PA, sparing 8 to 32% of patients intensive diagnostic testing.

This is an important finding since satisfactory tools to preclude low-risk patients from diagnostic testing are lacking. There is a need for diagnostic tools that reduce the number of patients to be intensively tested for PA to a greater extent than the algorithm provided by the Endocrine Society does. Moreover, results from a French study deriving a diagnostic model to estimate the probability of PA in patients referred for PA screening [19], cannot be generalized to all patients referred with difficult-to-control hypertension. The investigators included a study population with a substantial higher prevalence of PA (elevated ARR was 45% compared to 17% in our study) and a relatively large proportion of patients on potassium supplementation. Moreover, they predicted the presence of elevated ARR instead of PA itself.

The population of patients with difficult-to-control hypertension in the present study represents the group of hypertensive patients generally referred by general practitioners, which is reflected by the prevalence of PA: 4.9% in this study compared to 4.6–11.2% in other cohorts of referred hypertensive patients [3, 4]. More importantly, the majority of the patients in this study fulfil the Endocrine Society Clinical Practice Guidelines of increased risk for PA [8]. Therefore, the decision tool in the present study adds diagnostic value on top of the diagnostic algorithm presented in the Guideline. The proportion of male PA patients in this study is relatively high, given the small number of PA cases in the present study, the observed difference may also be due to chance.

The clinical value of the decision tool lies in excluding PA, precluding low-risk patients from intensive diagnostic testing, rather than confirming PA. The extent to which it reduces the proportion of patients to be tested varies with the cut-off value chosen and depends on the clinical setting in which the decision tool will be applied. In community hospitals, where the prevalence of PA is relatively low, a high cut-off value (for example 2.5%) may be chosen, sparing intensive diagnostic testing in a considerable proportion of patients, while the absolute number of missed cases remains low. In a tertiary center, where the prevalence of PA is relatively high, a lower cut-off value (for example 1.5%) may be chosen, limiting the chance of a missed case but still reducing the diagnostic burden. By reducing the proportion of hypertensive patients to be intensively tested and increasing the probability of catching a PA case among those tested, our decision tool may motivate physicians to perform diagnostic workup for PA in patients with difficult-to-control hypertension, thereby reducing underdiagnoses.

One of the strengths of this study is that the reference test was missing in only two of the 826 patients. Additionally, the chance of missing a PA case based on a false negative ARR is low: sensitivity of the ARR (cut-off value 5 pmol/fmol/s) reached 100% (95%CI 75.9–100%) in our laboratory [17] and hypokalemia was corrected before it was determined. Since all patients were diagnosed by the same combined approach, our results do not suffer from differential verification bias. Yet, the aldosterone cut-off value of ≥280 pmol/L after saline infusion (as proposed by the Endocrine Society Clinical Practice Guideline [8]) on which PA diagnosis was based, is rather arbitrary and results in a small number of false negative test results [26, 27]. Some patients with an intermediate post-SLT aldosterone (140–280 pmol/L) do have the clinical syndrome of PA based on adrenal computed tomography, adrenal venous sampling and/or expert panel assessment [26]. However, sensitivity analysis showed that our decision tool also reliably selects patients at low risk for PA when a lower post-SLT cut-off value is used. In addition, patients with an intermediate test result are more likely to have idiopathic aldosteronism [27], limiting the chance of withholding a surgical cure for these patients, but possibly precluding them from receiving adequate medical treatment with a mineralocorticoid receptor antagonist. Other patients may have pre-stages of PA, illustrating the importance of repeated screening by the decision tool after several months or years. Another strength of this study is that clinical variables were routinely collected, resembling daily clinical practice, which is critical in a diagnostic study.

One of the limitations of this study, as in any study using multiple imputation methods to impute missing baseline values, is that the robustness of the method highly depends on the validity of the missing at random assumption. Although our diagnostic protocol was changed in June 2015 and many patients had missing values before that time, one third of the patients evaluated before June 2015 did have a complete set of baseline values. Therefore, these missings can be considered missing at random. Furthermore, although pre-specified clinical variables were used to minimize the risk of overfitting, the methodology by which these variables were selected was not systematic and is therefore vulnerable to bias. The lack of statistical significance for some of the pre-specified variables in the model can be the result of this unsystematic selection but can also be due to a lack of power. Finally, the model was not externally validated in another cohort of patients with difficult-to-control hypertension, which is required to guarantee generalizability and should be done before the decision tool is applied in clinical practice.

Conclusions

In conclusion, with a decision tool based on seven easy-to-measure clinical variables, patients with a low probability of PA can be reliably selected and a considerable proportion of patients with difficult-to-control hypertension can be spared further intensive diagnostic testing.

Future perspectives

By excluding patients at low risk for PA, the decision tool reduces the number of patients to be invasively tested and increases the efficiency of testing in the remaining patients, which will allow health-care providers to allocate their (financial) resources to patients with difficult-to-control hypertension at higher risk for PA. This work raises the opportunity for external validation in other patient populations (i.e. primary care) and may lead to extensive clinical application, better detection and, subsequently, earlier treatment of PA. Also, given the high prevalence of PA in drug-adherent patients as compared to non-adherent patients [28], adding a measure of adherence to the existing model may further improve the diagnostic accuracy of the decision tool. That may decrease the number of patients that need invasive testing even more.