
Pheochromocytoma is a rare tumor of chromaffin cells in the adrenal medulla or sympathetic ganglia, which can present clinically as hypertension, spells (of hypertension, palpitations, headache, or other symptoms), or an incidentally discovered adrenal mass seen on imaging studies [15]. The annual incidence is only 1.55 to 8 cases per million [610], although rare genetic mutations can increase predisposition [11, 12]. As many common clinical syndromes (such as refractory hypertension) can mimic symptoms and signs of pheochromocytoma, the condition is "frequently sought and rarely found" [2, 13, 14].

The biochemical screening test used for diagnosing pheochromocytoma is institution and laboratory-dependent with variable performance, and an "ideal" test for pheochromocytoma has been sought over the years as false or uninterpretable results are not uncommon with some traditional tests [1, 1447]. Recently, measurement of fractionated plasma free metanephrines by high performance liquid chromatography and electrochemical detection has been endorsed by investigators at the National Institute of Health (NIH) as the single best test for biochemical screening for pheochromocytoma [4861]. The objective of the current study is to systematically review the literature to determine the diagnostic efficacy of measurements of fractionated plasma metanephrines in detection of pheochromocytoma.


Study selection

We included studies of adults who underwent measurement of fractionated plasma free metanephrines for the purpose of diagnostic testing. All studies had a "methods" section and included at least 10 subjects with pheochromocytoma (or paraganglioma) and at least 10 subjects without the diagnosis. Studies in which more than a third of subjects were below the age of 18 years or focussing on patients with end stage renal disease were excluded. In the case of multiple concurrent publications from the same research group, only the article describing the largest number of subjects tested were included. Updated unpublished data obtained from authors was included. The term pheochromocytoma refers to adrenal pheochromocytomas and extra-adrenal paragangliomas. The method of Lenders was used for measurement of fractionated plasma metanephrines [49]. Plasma metanephrine measurements in the setting of clonidine-suppression or glucacon-stimulation were excluded.

Data sources

The following electronic databases were searched with no language restrictions: Medline (1989–February 2003), Pre-Medline February 21, 2003, Cochrane Database for Systematic Reviews, American College of Physicians Journal Club (September 1991–October 2002), Database of Abstracts of Reviews, the Controlled Clinical Trials Database, CANCERLIT (1975–2002), Healthstar 1975–December 2002, and CINAHL 1982 to week 1 February, 2003. The search strategy used incorporated the MESH heading "metanephrine" or the textword roots of "metanephrine" or "normetanephrine", as well as the textword "plasma", and the MESH headings of "paraganglioma", "pheochromocytoma," or textword roots of "paraganglioma", "pheochromocytoma", or "phaeochromoctyoma", and the MESH headings "sensitivity and specificity", or "diagnosis", or the textword roots of "sensitiv", "specific", or "diagnos", or the textword "likelihood ratio". We also searched the Web of Science for articles citing the methodologic study of Lenders [49] and hand-searched the abstract books of the 82nd to 84th annual meetings of the Endocrine Society (2000–2002). Two endocrinologists independently screened the titles and abstracts obtained through the electronic search and all full-text articles, deemed potentially relevant by either of the reviewers, were obtained for formal review. After reviewing the full-text articles, both reviewers agreed on which articles would be included in the systematic review.

Assessment of methodologic quality and quality of reporting of included studies and data abstraction

Each of the two reviewers independently assessed the quality of methodology and reporting of the included studies, using a 25-item checklist developed by the Standards for Reporting of Diagnostic Studies Accuracy Group (STARD) [62] (Table 1). The two reviewers also independently abstracted the data from published studies in duplicate and consensus was reached on the final data presented. In the case of updated unpublished data from the Mayo Clinic, Rochester, ethics board approval and signed consent was obtained for chart review.

Table 1 STARD checklist for the reporting of studies of diagnostic accuracy (shortened item description)

Statistical analyses

A kappa statistic was calculated to measure agreement between the two reviewers in assessment of methodologic and reporting quality [63]. For sensitivities, specificities, and likelihood ratios, 95% confidence intervals were calculated using Wilson's method [64]. The Score Method was used for calculation of 95% CI of likelihood ratios when a zero cell was noted [64]. Likelihood ratios (LRs) predicting the presence of pheochromocytoma given a positive test result (sensitivity/1-specificty) and a negative test result (1-sensitivity/specificity) were calculated for each included study. Of note, a positive LR above 10 and negative LR below 0.1 has been noted to generate large changes from pre-test to post-test probability of disease, often resulting in a large change in patient management; whereas positive LRs between 5 and 10 and negative LRs between 0.1 and 0.2 are considered to generate moderate shifts in pre-test to post-test probability of disease [6568].

Pooling likelihood ratios was performed if each laboratory used the assay technique of Lenders [49] with an upper limit of a population-based 95% reference range used as the basis for positivity of the test. Either a free metanephrine or free normetanephrine fraction value had to be above the reference range cut-off, for a test to be considered positive [69]. A chi-squared test of homogeneity (Q-statistic) was performed for pooled studies [70, 71]. A random effects model was used for pooling of likelihood ratios using Review Manager 4.1 [66, 70, 72]. A funnel plot was constructed to visually assess for publication bias of pooled studies [73, 73]. A separate analysis was performed for all pheochromocytomas and for sporadic pheochromocytomas (those without known genetic predisposition to the disease).


Studies included in the systematic review

We retrieved 101 unique references; however, 41 of the references were excluded as they were published prior to the specific technique of Lenders being described in 1993 [49], leaving 60 references for consideration of inclusion in the systematic review [1, 4, 1618, 22, 24, 26, 44, 4961, 74111]. Of these 60 articles, 36 were deemed potentially relevant by either endocrinologist/reviewer [1, 4, 1618, 22, 24, 4959, 61, 76, 78, 8285, 88, 89, 97104, 104, 111]. Thereafter, 13 studies were excluded because of overlap of patients from the same institution or group of institutions in another publication [49, 50, 5257, 61, 82, 84, 85, 111]. Of the 23, full-text studies reviewed, 18 were excluded as they lacked new data on sensitivity and specificity of fractionated plasma metanephrine measurements in at least 10 patients with and 10 patients without pheochromocytoma or lacked a methods section [1, 4, 1618, 22, 24, 51, 58, 59, 61, 76, 78, 8285, 88, 89, 97, 98, 100104], or did not cite the Lenders method [99, 100].

The remaining studies included in the systematic review were authored by: Lenders et al. [51], Raber et al. [59], and Sawka et al. [18], from hereforth to be referred to as the NIH, Vienna, and Mayo papers or studies, respectively.

Summary of methodologic and reporting quality of included studies

The methodologic and reporting quality of the three included studies was evaluated independently by two endocrinologists using a 25-item checklist developed by the STARD steering committee (Table 1) [112]. The kappa statistic for measuring agreement between the two reviewers in assessing the STARD items addressed in each study was 0.82 for the Mayo study, 0.65 for the NIH study, 0.60 for the Vienna study.

Specific threats to internal validity of the studies were appraised. In all of the studies, subjects who had signs, symptoms, or imaging characteristics that warranted testing were included (as opposed to asymptomatic controls). However, blinded adjudication of test interpretation and diagnoses was not performed in any of the studies. In terms of limitation of selection bias, consecutive patient recruitment was noted only in the Mayo study [18]. Data was collected prospectively in the NIH study, retrospectively in the Mayo study, and method of data collection was unclear in Vienna study [18, 51, 59].

In terms of limiting verification bias, only the NIH investigators stated that the results of plasma metanephrine measurements were not used in guiding further evaluation. In the Mayo and Vienna studies, all pheochromocytoma patients had histologic confirmation, whereas in the NIH study, either histologic confirmation or evidence of inoperable metastatic pheochromocytoma on imaging was deemed adequate for diagnosis. In subjects without pheochromocytoma, different criteria were used to define a negative diagnosis in all three studies: alternative diagnosis after subspecialty evaluation in the Mayo study, alternative adrenal histology in the Vienna study, and, in the NIH study, either lack of radiological evidence of a tumor on imaging or pathologic examination of a non-pheochromocytoma adrenal mass, or patient follow-up of 2 years or more. Thus, only in the Vienna study [59], was a histologic gold standard applied to all patients, regardless of disease status.

Overall, the least number of STARD methodologic and reporting criteria were addressed in the case-control design Vienna study [59]. The Vienna study was also the smallest, comparing 17 patients with pheochromocytoma to 14 subjects without pheochromocytoma, and showed the highest diagnostic accuracy (sensitivity and specificity each 100 percent). Of note, the dichotomous nature of case-control designs may overestimate the accuracy of diagnostic tests [113].

Diagnostic efficacy of measurements of fractionated plasma metanephrines in diagnosis of pheochromocytoma

The cut-off values for positivity as well as the conditions of measurement of fractionated plasma metanephrines were slightly different in the Mayo study compared to the NIH and Vienna studies. In the NIH study, the criterion for test positivity was a metanephrine fraction of 0.3 nmol/L and/or a normetanephrine fraction of 0.6 nmol/L, based on a laboratory reference range [51]; and the same criterion was used in the Vienna study [59]. In the Mayo study, the criterion for positivity was a metanephrine fraction of 0.5 nmol/L or a normetanephrine fraction of 0.9 nmol/L, based on a 95% reference range of Mayo Medical Laboratories [18]. Acetaminophen was generally avoided prior to measurements of plasma free metanephrines in all studies. Furthermore, subjects were supine for at least 20 minutes with an indwelling intravenous cannula in both the NIH and Vienna studies, but not the Mayo study.

Demographics of patients in the included studies were examined (Table 2). A description of patients in the updated (published and unpublished) database from Mayo Clinic Rochester from January 1, 1999 to November 29, 2001 is herein provided. The updated database included the 349 subjects (including 33 patients with pheochromocytoma) that were tested between January 1, 1999 and November 27, 2000 [18], as well as another 158 subjects (including 23 with pheochromocytoma) that were recruited between November 28, 2000 and November 9, 2001. The newly added 158 subjects were consecutive patients seen at the Mayo Clinic Rochester, who did not have a known familial predisposition to pheochromocytoma, were not tested during the first series, and had complete measurements for fractionated plasma metanephrines as well as 24-hour urinary total metanephrines and catecholamines. In both series, patients without pheochromocytoma were screened in clinical practice because of one or more of the following reasons: hypertension, spells (such as episodes of anxiety, sweating, palpitations, or headache), adrenal abnormality on imaging, previous history of surgically resected pheochromocytoma, or known familial predisposition to pheochromocytoma. Upon combining the published and unpublished data, there were a total of 56 patients with pheochromocytoma (39 of whom were truly sporadic with no known genetic predisposition to pheochromocytoma and no previous history of pheochromocytoma, 70%) and 445 subjects without pheochromocytoma (399 with no known genetic predisposition to pheochromocytoma, 90% percent) (Table 2).

Table 2 Demographic characteristics of subjects in included studies

The main difference in demographic characteristics between the included studies was that the majority of subjects without pheochromocytoma in the NIH study had a genetic predisposition to the disease, whereas the majority in the Vienna and Mayo studies did not (Table 2). Furthermore, the non-pheochromocytoma subjects in the Mayo study appeared older with a mean age above 50 years (Table 2). Also, in the Vienna study, all subjects without pheochromocytoma had a known abnormality of the adrenal, whereas this was not the case in all patients in the NIH and Mayo studies.

The diagnostic efficacy of measurements of fractionated plasma metanephrines in detection of pheochromocytoma from the three included studies (including updated unpublished data in the Mayo study) are shown in Table 3[18, 51, 59]. For all patients, the sensitivities ranged from 96% to 100%, and 95% CI ranged from 82% to 100%, whereas the specificities ranged from 85% to 100% with 95% CI ranging from 78% to 100%. For subjects either at risk for or with sporadic pheochromocytoma, the sensitivities ranged from 97% to 100% (95% CI ranged from 79% to 100%), whereas the specificities ranged from 82% to 100% (95% CI ranged from 79% to 100%) (Table 3). Furthermore, for all patients, the positive likelihood ratios ranged from 6.31 to 29.17 and the negative LRs ranged from 0.02 to 0.03 (Table 3). The positive LRs for all patients with or at risk for sporadic pheochromocytoma (with cured patients who have had a previous diagnosis of pheochromocytoma excluded from the Mayo study) ranged from 6.07 to 29.00 and the negative LRs ranged from 0.031 to 0.03 (Table 3).

Table 3 Sensitivity and specificity of measurements of fractionated plasma metanephrines

Upon pooling of the positive likelihood ratios for all patients (n = 287 with pheochromocytoma, n = 1103 without pheochromocytoma), significant heterogeneity was indicated using chi-squared test (X2 = 8.20, degrees of freedom = 2, p = 0.017), indicating that studies may have been different secondary to differences in populations studied, assay technique, or reference standard (Figure 1). Although pooling of statistically heterogenous data is of questionable value and should be considered exploratory, the pooled positive likelihood ratio was noted to be 7.86 (95% CI= 5.17, 11.94), which was significantly higher than 1 (z = 9.66, p < 0.001). The pooled estimate of negative likelihood ratios for all patients was 0.02 (95% CI= 0.01, 0.04, z = -8.60, p < 0.001 for the value being less than 1), with no evidence of statistical heterogeneity (X2 = 0.20, p = 0.91) (Figure 2). The funnel plots examining for publication bias were not interpretable as they were limited by very few studies included in the analyses.

Figure 1
figure 1

Likelihood ratios (LRs) of a positive fractionated plasma metanephrine measurement predicting pheochromocytoma in all patients (including sporadic and genetically-predisposed patients)

Figure 2
figure 2

Likelihood ratios (LRs) of a negative fractionated plasma metanephrine measurement predicting pheochromocytoma in all patients (including sporadic and genetically-predisposed patients)

Next, we determined the diagnostic efficacy of fractionated plasma metanephrine measurements in patients at risk for sporadic disease. We included 191 pheochromocytoma patients and 718 non-genetically predisposed non-pheochromocytoma patients and found the pooled estimate of a positive likelihood ratio was 5.77 (95% CI = 4.90, 6.81, z = 20.85, p < 0.001 for the difference being greater than 1 (with no statistically significant evidence of heterogeneity between studies, X2 = 1.84, p = 0.4) (Figure 3). The pooled estimate of negative likelihood ratios for sporadic subjects was 0.02 (95% CI= 0.01, 0.07, z = -6.31, p < 0.001 for the value being less than 1) (no evidence of statistical heterogeneity, X2 = 1.08, p = 0.58) (Figure 4).

Figure 3
figure 3

Likelihood ratios (LRs) of a positive fractionated plasma metanephrine measurement predicting pheochromocytoma in patients with sporadic pheochromocytoma or at risk for sporadic pheochromocytoma

Figure 4
figure 4

Likelihood ratios (LRs) of a negative fractionated plasma metanephrine measurement predicting pheochromocytoma in patients with sporadic pheochromocytoma or at risk for sporadic pheochromocytoma


Upon systematically reviewing the literature, we have determined that fractionated plasma metanehrine measurements are highly sensitive in detecting pheochromocytoma, although specificity of these measurements may be variable, particularly in testing for sporadic disease. A negative fractionated plasma metanephrine measurement is highly effective in ruling out disease. However, a positive test result only moderately increases suspicion of disease, particularly in low risk subjects being tested for sporadic pheochromocytoma.

Pooled likelihood ratios may be applied in estimation of an individual patient's probability of sporadic pheochromocytoma, given a positive biochemical test result. The pre-test probability of sporadic pheochromocytoma (prevalence) is estimated to be 0.5% among screened hypertensive patients [114], and 5.1% among incidentally discovered adrenal masses >1 cm in diameter in absence of symptoms of adrenal disease [adrenal "incidentalomas"] [3]. For a patient with positive fractionated plasma metanephrines, the post-test probability of sporadic pheochromocytoma would be 2.8% in the patient with hypertension, and 23.7% in the patient with an adrenal incidentaloma. In other words 97.2% of hypertensive subjects and 76.3% of subjects with incidentaloma would not be expected to have a pheochromocytoma, in spite of a positive test result. Similarly, we may estimate the probability of sporadic pheochromocytoma, given negative fractionated plasma metanephrine measurements, using the pooled negative likelihood ratio value of 0.02. For a patient with normal fractionated plasma metanephrine measurements, the post-test probability of sporadic pheochromocytoma would be estimated to be 0.01% in the patient with hypertension and 0.11% in the patient with an adrenal incidentaloma.

Our findings are limited by the fact that data from the included studies may have been subject to multiple methodologic limitations, possibly resulting in over-estimation of the diagnostic efficacy of fractionated plasma metanephrine measurements. Also, many of the patients studied had known genetic predisposition, previously surgically cured disease, or metastatic pheochromocytoma, thereby limiting the external generalizability of our summary. Furthermore, positivity cut-offs were derived somewhat differently between the studies, possibly accounting for the observed heterogeneity of positive likelihood ratios between studies. The criterion for positivity in the NIH and Vienna studies were based on a NIH laboratory reference range [51, 59]; whereas a higher criterion was used in the Mayo study, based on a 95% reference range derived by Mayo Medical Laboratories [18]. The Mayo reference range has been tested in hypertensive patients who were not subject to indwelling intravenous cannulation or prolonged supine rest, possibly accounting for the slightly higher cut-offs. Indeed, a laboratory medicine tradition has to derive normal ranges from "normal" healthy individuals as such individuals reflect the general population and are easily accessible for study. Such ranges are reflective of "non-disease", but their use may be subject to excessively high rates of false positive tests in subjects with conditions mimicking a disease in question who are likely to be tested clinically (such as patients with refractory hypertension in the case of pheochromocytoma testing). Limitations of deriving "non-disease" ranges in subjects with conditions mimicking a disease in question (such as hypertensive patients in this case) may include decreasing sensitivity of testing and the potential for missing a potentially fatal, treatable diagnosis.

It is notable that data on the efficacy of fractionated plasma metanephrine measurements in detection of pheochromocytoma is limited to only three laboratories with patients recruited from 6 clinical centres. This may be a reflection of the labor-intensive, time-consuming nature of the high performance liquid chromatography and electrochemical detection method as well as the nuisance of potential interference with acetaminophen [115]. A newer method described by Roden et al. may circumvent the acetaminophen interference issue, but is also quite labor-intensive and might not be suitable for widespread clinical laboratory use [104]. A method of measurement of fractionated plasma metanephrines using liquid chromatography with tandem mass spectrometry shows promise in terms of improved specificity and rapidity of processing of multiple samples [115]. Further clinical study is indicated to validate such newer assays in clinical patient populations.


Where does this evidence summary leave the physician who is faced with the common clinical scenario of a patient with refractory hypertension or incidentally found adrenal mass? Firstly, the clinician must assess the relative likelihood of pheochromocytoma in each clinical case and decide whether testing is warranted. Decisions for the type of test performed may be subject to clinical availability, cost, and clinical experience of the ordering physician and local laboratory. If measurement of fractionated plasma metanephrines is performed, a positive test result in a high risk setting (such as a genetically predisposed individual or an individual with a known adrenal mass characteristic of pheochromocytoma) or a negative result in a low risk setting (such as a patient with refractory hypertension) is highly predictive of confirming or refuting the diagnosis, respectively. However, a negative result in a high risk setting (such as testing of a genetically predisposed patient or a patient with a known vascular adrenal mass), or a positive result in a low risk setting (such as refractory hypertension) must be interpreted with some caution.