Background

Ovarian cancer is the leading cause of death from gynecologic cancers in the United States and the fifth-top cause of cancer death in women (Link 1). Non-specific clinical manifestation mainly hinders the early diagnosis of ovarian cancer[1]. Cancer antigen 125 (CA125) was the only FDA-approved biomarker for ovarian cancer before the year 2008. CA125 is indicated for use as an aid in the detection of residual ovarian carcinoma in patients who have undergone first-line therapy and would be considered for diagnostic second-look procedures. Although the CA125 serum level elevated in 80% of epithelial ovarian cancer (EOC) patients with advanced stage [2], it increased in only 50% of patients with stage I EOC [3]. In addition, CA125 serum levels elevate in various benign gynecological diseases (including endometriosis) [4], non-gynecologic malignancies [5]. Therefore, considerable efforts are underway to identify new serum biomarkers, alone or combining with CA125 to improve EOC detection [6, 7].

With high-throughput technologies employed, a large number of new biomarkers have been discovered [810]. Human epididymis protein 4 (HE4) is among the most promising ones [11]. High levels of HE4 are found in the serum of patients with EOC, especially in serous and endometroid cancers [12]. Unlike CA125, HE4 doesn’t overexpress in endometriosis and other benign gynecological diseases [11]. And HE4, as an aid in monitoring recurrence or progressive disease in patients with epithelial ovarian cancer, has been the first biomarker for EOC after CA125 to be approved by the U.S. Food and Drug Administration (FDA) at the year of 2008. However, conflicts arise on the sensitivity of HE4 and CA125 [5, 1316].

Moore and colleagues [17] have explored a multianalytes assay named the Risk of Ovarian Malignancy Algorithm (ROMA™), which combines the results of HE4 EIA (enzyme immunoassay), ARCHITECT CA 125 II™ and menopausal status into a numerical score to predict malignancy when an ovarian mass was found clinically. Although ROMA™ has received clearance from the FDA of U.S. in September of the year 2011, the diagnostic accuracy of ROMA compared to CA125 and HE4 alone is still controversial [13, 1618]. Here we try to clarify conflicting results existing in the diagnostic accuracy of ROMA, and in the performance comparison among ROMA, HE4 and CA125.

Methods

Data sources and search strategy

We followed the Meta-analysis Of Observational Studies in Epidemiology (MOOSE)[19] and the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Link 2). MEDLINE (through PubMed interface), EMBASE, Web of Science, Google Scholar, the Cochrane Library and ClinicalTrials.gov (ended on 22th December, 2011) were searched. Reference lists of articles identified were manually searched. Publication languages were not limited. The terminology for search was based on the standardized National Library of Medicine MeSH terms and free texts. The search strategies of all the databases were based on those of PubMed (Additional file 1: Table S1).

Two authors (RXT and WPL) independently screened the search results based on the titles and abstracts. The full text of selected articles were reviewed independently by another two authors (KC and LLY) to determine the inclusion. Disagreements were resolved by referring to a third author (MC).

Inclusion criteria

Studies that investigated both serum HE4 and CA125 as diagnostic tests or calculated the ROMA algorithm were included if (1) they were cross-sectional studies; and (2) performed in the same population presenting pelvic mass; (3) all serum specimens were collected preoperatively; (4) all subjects with histological diagnostic information; (5) with sufficient data for reconstructing fourfold table.

Studies recruiting participants without presenting pelvis mass, with obviously error data or ROC curve analysis containing healthy person and case–control studies were excluded. Case–control studies were excluded, for these studies had a tendency of overestimating or underestimating the diagnostic performance of a test [20].

Data extraction

The data extracted from each study included: author; year; country; design; recruitment; age; menopausal status; test methods (e.g. chemilumenesence immunoassay); number of patients; sensitivity; specificity and cut-off value. Four fold tables were reconstructed. Two reviewers (FKL and RXT) independently extracted the data for each study and referred to a third opinion (MC) when disagreements appeared. Important data that were not provided in the original studies were referred to their authors through Emails.

Index tests and reference standard

Since the Risk of Ovarian Malignancy Algorithm (ROMA™) is a qualitative serum test that combines the results of HE4 EIA (enzyme immunometric assays), ARCHITECT CA 125 II™ and menopausal status into a numerical score. Index tests for HE4 and CA125 in this meta-analysis questions were specified as EIAs and chemilumenesence immunoassays respectively. ROMA algorithm is the following [17]:

Premenopausal :  predictive index  PI = 12.0 + 2.38 × LN HE 4 + 0.0626 × LN CA 125
(1)
Postmenopausal :  PI = 8.09 + 1.04 × LN HE 4 + 0.732 × LN CA 125
(2)
Predicted probability : PP = 100 × exp PI / 1 + exp PI
(3)

Reference standard was based on outcomes of histopathological diagnosis. In all studies, ovarian cancer surgical stages were referred to criteria from FIGO (International Federation of Gynecology and Obsterics) [21] (Link 3). Early stage were defined as FIGO stages I & II, while advanced stage were FIGO stages III & IV.

Methodological quality assessment

The methodological quality of each study was evaluated with QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) [22] quality items. Overall scores were not helpful for interpreting study quality [23] and were avoided in studies evaluation by QUADAS-2 tool. Doubts were resolved by discussion. In the items of QUADAS-2, the blindness of index tests and reference test has been list, but not the blindness between index tests. So one item that focus on validity of this comparative question has been added in Risk of Bias part of Domain 2 (Index Test) in QUADAS-2 [22] as follows. “Were the results of index tests interpreted without knowledge of each other?” The answers (Yes, No or Unclear) of this question were considered to help assessing the Risk of Bias of including studies. According to the suggestion in Concerns Regarding Applicability part of Domain 2 (Index Test) in QUADAS-2 [22], variations in test technology, executing, or interpretation might affect estimates of the diagnostic accuracy of a test. If index test methods varied from those specified in the review question, concerns about applicability might exist.

Index tests for HE4 and CA125 in this meta-analysis questions were specified as EIAs and chemilumenesence immunoassays respectively. For tests of HE4, the chemilumenesence immunoassays were more sensitive than the specified EIAs, thus bias might be introduced into pooling of studies. And similarly, for CA125, EIA and RIA (radioimmunoassay) assays were less sensitive and steady than chemilumenesence immunoassays, so studies using either EIA or RIA will be considered as High Concern Regarding Applicability. The ROMA test employed the results from tests of CA125 and HE4 within the same study. So ROMA was considered as High Concern Regarding Applicability when either HE4 or CA125 test was evaluated as High Concern Regarding Applicability.

Data analysis plan

The statistical analysis is based on the following steps: (1) qualitatively describing the findings; (2) searching for heterogeneity and threshold effect; (3) figuring out the sources of heterogeneity by subgroup analysis; (4) choosing appropriate model and pooling estimates statistically. Univariate [24] and bivariate model [25] were two choices for diagnostic meta-analysis. When a positive correlation existed between true positive rate (TPR) and false positive rate (FPR), the bivariate analysis model was more appropriate [26].

Heterogeneity of studies were shown with forest graphs and explored with I2 estimates [27]. The main advantage of I2 was inherent independence with the number of the studies included in the meta-analysis. I2 estimates below 25% were regarded as low risk of heterogeneity, between 25% and 50% as moderate heterogeneity, and 50% or higher as high heterogeneity. If there was a low level heterogeneity, univariate meta-analysis model was used (Meta-DiSc software version 1.4 [28]). If there was a moderate to high heterogeneity, Spearman correlation coefficients was explored. Positive Spearman correlation coefficients between Logit(TPR) and Logit(FPR) denoted the presence of threshold effects (Meta-DiSc software version 1.4). Then a bivariate model as well as HSROC (Hierarchical Summary Receiver Operator Characteristics) were estimated and plotted; if negative, summary estimates were pooled without HSROC [24, 29]; and if zero, summary estimates were pooled the way same as low level heterogeneity.

Influence analysis reestimated the meta-analysis by omitting each study in turn (STATA version 10.0) to confirm the stability of our analysis model. Publication bias was investigated by Deek’s funnel plot as well as asymmetry test [30]. Subgroups were analyzed hierarchically by menopausal status, FIGO stages and concern of methods of index tests. In some studies, patients with low malignant potential tumors (LMP) or borderline tumors (BL) were classified into EOC group. And these studies were specifically analyzed as subgroup EOC (LMP/BL). Subgroups with less than four studies were analyzed with univariate model, because the bivariate model required 4 studies at least [26]. Summary estimates and 95% CIs (confidence intervals) for sensitivity, specificity, DOR, LR ± and AUC were calculated (STATA version 10.0 [31, 32]). HSROC (Hierarchical summary receiver operating characteristic curves) plots were shown when appropriate. Comparisons between estimates of different tests were performed with z-test.

Figure 3
figure 3

Forest Plots of paired sensitivity and specificity for ROMA.

Table 3 Summary estimates of ROMA for EOC and OC prediction
Figure 4
figure 4

Hierarchical summary receiver operating characteristic (HSROC) curves and results of bivariate analysis for ROMA to predict EOC. Results of bivariate analysis: estimates of each studies (the squares), the summary point (solid circle), 95% confidence region (the small ellipse), 95% prediction region (the big ellipse) and HSROC (solid line) were shown. Each study is represented by each square in the meta-analysis. The size of the square indicates the size of each study.

Figure 5
figure 5

Influence analysis of individual studies for performance of ROMA to predict EOC. The meta-analysis was reestimated by omitting each study in turn. The diamonds represented the estimates of the left studies, with their 95% confidence intervals (solid) went through their centers.

Results

Search results

Of the 267 references identified from 6 databases, 11 articles [1318, 3337] met the inclusion criteria and were included in meta-analysis (Figure1).

Figure 1
figure 1

Flowchart of selection of eligible studies.

Characteristics of the included studies were summarized (Table1). 7792 tests from 2878 patients presenting pelvic mass at risk of ovarian cancer were retrieved. Of the 11 studies, 6 studies [15, 17, 18, 34, 36, 37] enrolling 1547 patients investigated the performance of ROMA for EOC prediction. Five studies [16, 3336] with 883 patients compared the performance of HE4 and CA125 for OC prediction. Four studies [13, 15, 18, 36] with 715 patients compared the performance of HE4 and CA125 for EOC prediction. And 3 studies [15, 18, 36] (482 patients) compared the performance among ROMA, HE4 and CA125 for EOC prediction. In all studies, the spectrum of patients was considered representative. All enrolled participants present pelvis mass of suspected ovarian origin, have never received any treatment before and plan to have a surgical intervention. The prevalence of proven ovarian cancer across all studies ranged from 7.86% to 63.1% (overall prevalence was 18.5% for EOC). The study of Holcomb and colleagues [14] had the lowest prevalence (7.86%) for only investigating the results of premenopausal women.

Table 1 Characteristics of studies included in the analysis

Methods of index tests

All of 11 including studies measured serum HE4 and CA125. For HE4 measurement, 8 studies [13, 1618, 33, 3537] used EIA (enzyme immunoassay), the other 3 studies [14, 15, 34] employed CMIA (chemiluminescent microparticle immunoassay). For the measure of CA125, 5 studies [14, 15, 17, 34, 37] employed CMIA, 3 studies [16, 35, 36] with EIA, 3 studies [13, 18, 33] used RIA (radioimmunoassay), CLEIA (chemilumenscence enzyme immunoassay) and ECLIA (electrochemilumenscence immunoassay) respectively. CMIA, CLEIA and ECLIA belonged to chemilumenesence immunoassays, which were higher sensitive than EIA or RIA. According to Methodological quality assessment (the 4th part of Methods section), HE4 tests with CMIA, CA125 tests with EIA and RIA were regarded as high Concern Regarding Applicability. The ROMA tests were considered as high Concern Regarding Applicability when either HE4 or CA125 test was evaluated as high Concern Regarding Applicability (Figure2).

Figure 2
figure 2

Graph of QUADAS-2 quality items results. Figure2 a. Proportion of studies with low, high, or unclear risk of bias.Figure2 b. Proportion of studies with low, high, or unclear Concerns Regarding Applicability. Three horizontal bars represented index tests HE4, CA125 and ROMA, respectively.

Methodological quality of all included studies

Quality of included studies was assessed by the QUADAS-2 tool (Figure2 & Table2). Within 9 [13, 14, 1618, 3437] of 11 studies, the results interpretation of index tests (HE4/CA125) were blind with reference standard test (ROMA). The other 2 studies [15, 33] were unclear. In 5 of the 11 studies [14, 16, 3436] the results of index tests (HE4 and CA125) were interpreted without knowledge of each other. In the other 6 studies [13, 15, 17, 18, 33, 37] the blindness was unclear. So when assessing the studies with the item “Could the conduct or interpretation of the index test have introduced bias?” in domain 2 of QUADAS-2, the results showed that 5 studies [14, 16, 3436] were low risk of bias, 1 study [13] was high risk of bias and 5 studies [15, 17, 18, 33, 37] were unclear their risk of bias. Four [1618, 34] of the total 11 studies were considered as low risk of bias for the Patient Selection (Domain 1 of QUADAS-2) for their consecutive enrollment of patients; 2 studies [14, 15] were regarded as high risk of bias and in the other 5 studies [13, 33, 3537] the risk was unclear.

Table 2 QUADAS-2 quality items results

Performance of ROMA for predicting EOC

Forest plots of sensitivity and specificity of ROMA for EOC prediction were shown in Figure3.

Mean estimates and their 95%CIs were: sensitivity 0.89 (0.84- 0.93), specificity 0.83 (0.77- 0.88) and AUC 0.93 (0.90- 0.95) (Table3). High level of heterogeneity lay in both sensitivity (I2 = 71.6%) and specificity (I2 = 80.7%).

Threshold effect existed (Spearman correlation coefficient: 0.657, p = 0.156).Thus bivariate model was used to pool estimates. HSROC plots showed the summary estimates of sensitivity and specificity as well as the confidence and prediction regions (Figure4).

Subgroups analysis observed variability in pooled estimates (Table3). We have compared these estimates between subgroups to investigate the performance of ROMA. Across all subgroups, performance (AUCs) of ROMA for EOC detection ranged from 0.88 to 0.97. The ROMA performed better in EOC whole population (AUC: 0.93, 95%CI 0.90- 0.95) than in either premenopausal subgroup (EOC-preM) (AUC: 0.88, 95% CI 0.85- 0.91) or postmenopausal subgroup (EOC-postM) (AUC: 0.89, 95% CI 0.86- 0.92). And the ROMA had better performence in EOC-advanced stage group (AUC: 0.88, 95% CI 0.85- 0.91) than in both EOC whole population and EOC-early stage group (AUC: 0.88, 95% CI 0.83- 0.93). What’s more, the ROMA performed better in EOC population than in OC population (AUC: 0.89, 95% CI 0.87- 0.92).

ROMA had lower sensitivity in premenopausal subgroup (EOC-preM) (0.82, 95%CI 0.67- 0.91) than postmenopausal subgroup (EOC-postM) (0.93, 95%CI 0.89- 0.96). EOC group (0.83, 95% CI 0.77- 0.88) had higher specificity than both EOC-early stage (0.76, 95% CI 0.73- 0.79) and EOC-advanced stage (0.76, 95% CI 0.73- 0.79) groups. ROMA had higher sensitivity in EOC-advanced stage group (0.98, 95%CI: 0.94-1.00) than in EOC whole population (0.90, 95% CI 0.84- 0.93) and EOC-early stage group (0.81, 95% CI 0.71- 0.89). In addition, we found in subgroup method with Concern Regarding Applicability, ROMA had higher specificity in high Concern Regarding Applicability group (EOC-methods High concern) (0.87, 95% CI 0.83- 0.90) than both high Concern Regarding Applicability group (EOC-methods Low concern) (0.75, 95% CI 0.72- 0.78) and EOC whole population. Finally, No differences were found in other summary estimates (except AUC between EOC and OC groups) within EOC, EOC (LMP/BL) and OC groups (Table3).

The appearance of the Deeks’ funnel plot for ROMA on EOC detection was symmetrical (Additional file 2: Figure S1), and the funnel plot asymmetry test showed little sign of publication bias (regression coefficients was −3.73; p = 0.617). When single study was omitted, the summary estimates (sensitivity, specificity and DOR) were close to those obtained with all eligible studies (Figure5 & Additional file 3: Table S2).

Performance comparison between HE4 and CA125

Four studies [13, 15, 18, 36] compared the performance of HE4 and CA125 for predicting EOC (Figures 6 & 7). All the 2 groups (EOC-HE4, EOC-CA125) were analyzed in bivariate model (Figure8). CA125 had a higher AUC than HE4, while a lower specificity than HE4. No significant differences were found within other paired estimates (Table4). Five studies [16, 3336] compared the performance of HE4 and CA125 for predicting OC (Figures 9 & 10). All the 2 groups (OC-HE4, OC-CA125) were also analyzed via bivariate model (Figure11). CA125 had a higher AUC than HE4, while no significant differences were found within other paired estimates (Table4).

Figure 6
figure 6

Forest Plots for sensitivity and specificity of HE4 to predict EOC.

Figure 7
figure 7

Forest Plots for sensitivity and specificity of CA125 to predict EOC.

Figure 8
figure 8

Hierarchical summary receiver operating characteristic (HSROC) curves and results of bivariate analysis for HE4 and CA125 to predict EOC. Results of bivariate analysis: estimates of each studies (the squares), the summary point (solid circle), 95% confidence region (the ellipse) and HSROC (solid line) for HE4 (black) and CA125 (red) were shown. Each study is represented by each square in the meta-analysis. The size of the square indicates the size of each study.

Table 4 Performance comparison between HE4 and CA125 for EOC and OC prediction
Figure 9
figure 9

Forest Plots for sensitivity and specificity of HE4 to predict OC

Figure 10
figure 10

Forest Plots for sensitivity and specificity of CA125 to predict OC.

Figure 11
figure 11

Hierarchical summary receiver operating characteristic (HSROC) curves and results of bivariate analysis for HE4 and CA125 to predict OC. Results of bivariate analysis: estimates of each studies (the squares), the summary point (solid circle), 95% confidence region (the ellipse) and HSROC (solid line) for HE4 (black) and CA125 (red) were shown. Each study is represented by each square in the meta-analysis. The size of the square indicates the size of each study.

Studies included also investigated the diagnostic value of HE4 and CA125 in early stage of EOC, as well as distinguishing EOC from benign pelvic mass in premenopausal and postmenopausal women. Because all these settings contained less than 3 studies, we didn’t pool them as subgroups but summarized their sensitivity specificity with forest plots (Additional file 4: Figure S2).

Performance comparison among ROMA, HE4 and CA125 for EOC prediction

Three studies evaluated the performance of HE4, CA125 and ROMA for EOC detection (Figure12). All three groups (EOC- ROMA, EOC- HE4 and EOC- CA125) were pooled with univariate model (Figure13 & Table5).

Figure 12
figure 12

Forest Plots for sensitivity and specificity comparison among ROMA, HE4 and CA125 to predict EOC. A: ROMA; B: CA125; C: HE4.

Figure 13
figure 13

Summary receiver operating characteristic (SROC) curves for ROMA, HE4 and CA125 to predict EOC. Results of bivariate analysis: estimates of each studies (the squares) and SROCs (solid line) for ROMA (black), HE4 (red) and CA125 (green) were shown. Each study is represented by each square in the meta-analysis. The size of the square indicates the size of each study. The AUCs and 95% CIs of ROMA, HE4 and CA125 are 0.92 (0.86- 0.97), 0.95 (0.92- 0.98) and 0.88 (0.81- 0.96), respectively.

Table 5 Summary estimates of comparison among ROMA, HE4 and CA125 for EOC prediction

Among the three tests, HE4 had the highest specificity (0.94, 95% CI: 0.90-0.96), but a lower sensitivity (0.80, 95% CI: 0.73-0.85) than ROMA (0.86, 95% CI: 0.81-0.91). The ROMA had a higher specificity (0.84, 95% CI: 0.79-0.88) than CA125 (0.78, 95% CI: 0.73-0.83). Meanwhile no differences were found between CA125 and HE4, as well as between CA125 and ROMA in their summary sensitivity. The DOR, LR ± and AUC values were similar among the three tests (Table5).

Discussion

Summary of main results

Our results found that, first, ROMA could help distinguish EOC from benign pelvic mass with a high diagnostic accuracy (AUC: 0.93). The ROMA has high sensitivity to predict advanced stage EOC than early stage EOC and in postmenopausal women than in premenopausal women. Second, although HE4 has higher specificity than CA125 for EOC monitoring, CA125 has better diagnosis accuracy (higher AUC) than HE4 for EOC or OC prediction. This is based on the results of 4 studies that compare HE4 and CA125 within the same population. Third, based on the results of comparison of HE4, CA125 and ROMA in the same population, the overall performance (AUC) of the three tests for EOC prediction are similar. ROMA is less specific but more sensitive than HE4, while both ROMA and HE4 are more specific than CA125 for EOC monitoring.

All studies included were subjected to close scrutiny with the QUADAS-2 tool, resulting in high quality across the items. Heterogeneity often existed in diagnostic meta-analysis [38], and mainly resulted from characteristics of the study population, variations in the study design, different statistical methods, and different covariates [39]. Within-study quality were highly concerned in this meta-analysis. Both high level of heterogeneity in sensitivity and specificity were found for ROMA test. The existence of threshold effect might partially explain the heterogeneity. Analysis of subgroups (EOC-methods high concern and EOC-methods low concern) found the EOC-methods High concern group had higher specificity than both EOC-methods Low concern group and EOC group.

In the current paper, only three studies evaluated the diagnostic value of ROMA at early stage of EOC. The early stage ovarian cancer usually presented non-specific clinical manifestation, and the FIGO staging by surgery often resulted in low prevalence of early stage EOC. So future clinical investigations will be promising and expectant to be prospective studies recruiting enough patients with early stage EOC.

We analyzed the predictive value of ROMA for patients with EOC, EOC(LMP/BL) and ovarian cancer. No differences were found in all summary estimates (except AUC between EOC and OC groups) of EOC, EOC (LMP/BL) and OC groups. Although EOC accounted for 90% of ovarian cancer, we didn’t think ROMA could be expanded to predict ovarian cancer, for both HE4 and CA125 were biomarkers of epithelial ovarian cancer[2, 11].

Cut-off values were variable for HE4 (70-150pM) and ROMA (preM: 7.4-13.1%; postM: 10.9-27.7%), but consistent for CA125 (35U/mL) across studies. Among the studies included, only one study[15] used specific cut-off values for premenopausal (70pM) and postmenopausal women (140pM). Studies found that HE4 levels in healthy subjects were associated with age [40, 41]. So it would be essential to define a specific normal range and cut-off value for premenopausal and postmenopausal women respectively. For other two predictors ROMA and CA125, it would also be indispensable for each center to define their normal ranges and cut-off values.

Strengths and weaknesses

Except employing a comprehensive search strategy, strict inclusion criteria and sound analysis protocol, strengths of this paper also contain that only studies investigating both the two tests (HE4 and CA125) or all three tests (HE4, CA125 and ROMA) in a same population have been included in tests comparisons. The latter makes sure that the comparison takes place between studies under the same or similar population background, thus reduces the heterogeneity between studies [42].

The main limitations are: (1) unable to gain the unpublished paper. (2) Study number might be small. We believe that reliability of the meta-analysis are majorly dependent on the quality of studies included. (3) The diagnostic value of ROMA, HE4 and CA125 in early stage EOC have not been convincingly analyzed.

Conclusions

ROMA can help distinguish EOC from benign pelvic mass. ROMA is less specific but more sensitive than HE4. Both ROMA and HE4 are more specific than CA125 for EOC prediction. CA125 has better diagnosis accuracy than HE4 for EOC and OC prediction. ROMA is promising predictor to replace CA125, but its utilization requires further exploration.

Links

1 What are the key statistics about ovarian cancer? American Cancer Society Web site. [http://www.cancer.org/Cancer/OvarianCancer/DetailedGuide/ovarian-cancer-key-statistics]

2 Diagnostic Test Accuracy Working Group. Handbook for DTA Reviews. Cochrane Collaboration Web site. [http://srdta.cochrane.org/handbook-dta-reviews].

3 International Federation of Gynecology and Obstetrics FIGO website [http://www.figo.org/]