Introduction

Ovarian cancer continues to be the leading cause of death among women with gynaecological malignancies in developed nations (Parkin et al. 2005). In the USA, the prevalence of ovarian cancer in postmenopausal women is 1 in 2,500 and the lifetime risk of a woman developing ovarian cancer is 1 in 72 (1.39%). The age-adjusted incidence and death rates for ovarian cancer are 13.3 and 8.8 per 100,000, respectively. The overall 5-year survival rate is 45.5%. Five-year survival rates are inversely related to the stage of disease at first diagnosis. Early stage ovarian cancer is asymptomatic and only 19% of cases are first diagnosed as localised primary cancer (i.e. Stage I). The corresponding 5-year survival rate for Stage I disease is 92.7%. The majority of cases (67–74%), however, are diagnosed with metastatic disease (i.e. Stages III and IV), when the 5-year survival rate is only 30.6% (Ries et al. 2006).

The diagnosis of localised, primary cancer and the development of tests with better diagnostic efficiency are undoubtedly the major priorities for achieving long-term reduction of mortality due to ovarian cancer (Paley 2001). Indeed, available data support the hypothesis that ovarian cancers may be detectable up to 2–5 years prior to their clinical presentation (Jacobs et al. 1993, 1996, 1999; Zurawski et al. 1988) and that if effective screening for Stage I disease was achieved with an accuracy of 80% or more, mortality would be halved (Bast et al. 1983). The development of effective community-based screening or earlier detection tests for ovarian cancer, however, is challenging because of the low prevalence of the disease (Parkin et al. 2005).

The most well-characterised biomarker for ovarian cancer is CA-125 (Nossov et al. 2008). Serum concentrations of CA-125 are elevated (i.e. >35 U/ml) in more than 90% of patients with late stage disease but are elevated in only 50% of patients with Stage I disease (Nustad et al. 1996). It is becoming evident that the single biomarker approaches for the detection of early stage ovarian cancer may never realise the diagnostic efficiency requisite for implementation as a community-based screening and alternate approaches, including the combination of multiple biomarkers may be required (Badgwell and Bast 2007).

The quantification of multiple blood-borne biomarkers and the use of multivariate classification models represent a promising approach for improving diagnostic efficiency. Such biomarkers may represent: unique tumour-derived or over-expression products; products elaborated via aberrant neoplastic processing/modification of host proteins; and/or host response proteins elicited by the presence of the tumour and which may display profiles that vary and/or are specific for different types of tumours.

Previous studies have established proof-of-principle and reported significant improvements in the detection of ovarian cancer using such approaches. For example, Gorelik et al. (2005) measured seven serum biomarkers in 44 newly diagnosed women with early stage ovarian cancer (i.e. Stages 1 and II) and 45 healthy controls and employed five of the biomarkers (including CA-125) in a classification tree analysis to achieve a sensitivity of 84% at a specificity of 95% for the detection of early stage disease. More recently, Visintin et al. (2008) in a study of 156 newly diagnosed ovarian cancer patients (48 Stage I, 62 Stage II, 64 Stage III and 64 Stage IV) and 362 healthy women utilised six serum biomarkers (including CA-125) in a multivariate classification model and reported a sensitivity and specificity of 95.3 and 99.4%, respectively, for the overall detection of ovarian cancer.

To further evaluate the utility of a multimarker approach for the diagnosis of ovarian cancer, in this study a phase II biomarker trial (Pepe et al. 2001) was conducted to evaluate the performance of a panel of plasma biomarkers (CA-125, CRP, IL-6, IL-8 and SAA) to correctly classify women with ovarian cancer. The retrospective, case–control, modelling/validation study was designed to test the primary hypothesis that the area under the receiver operator characteristic curve (AUC) for the biomarker panel was significantly greater than the AUC for CA-125 alone. The data were additionally stratified to assess the performance of the biomarker panel to correctly classify women with early stage ovarian cancer (i.e. Stages I and II).

Materials and methods

Sample collection

Blood (10 ml) was collected via vena puncture into EDTA vacutainer tubes. Samples were centrifuged at 1,000×g for 10 min within 20–30 min of collection. Plasma was stored as 250–500 μl aliquots and stored at −80°C until assayed. All plasma samples analysed were not stored for greater than 6 years. In a preliminary study, analyte concentrations were found not to correlate significantly with duration of storage. Additional disease and control samples were provided by the Biobank at Peter MacCallum Cancer Research Institute. Samples were randomly selected from the sample bank. Inclusion and exclusion criteria into the trial are detailed in Table 1. The distribution of samples by disease stage, tumor type and patient age are summarised in Table 2.

Table 1 Criteria for inclusion and exclusion in the Phase II biomarker trial
Table 2 Age, stage and tumour type distribution within the all cases (n = 150)

Study design

A phase II biomarker trial design (Pepe et al. 2001) was employed to assess the diagnostic efficiency of a biomarker panel for the detection of ovarian cancer. The study was a retrospective, case–control design in which a multivariate classification model was developed using a modelling sample cohort, n = 179 (Table 3). The multivariate classification model (diagnostic rule) was validated using an independent sample cohort (n = 183). The primary outcome of the study was to test the hypothesis that the area under the received operator characteristic (ROC) curve for the biomarker panel (AUCA) was significantly greater than for CA-125 alone (AUCC). It is acknowledged that most informative biomarkers will increase the AUC by 0.05 or more, and that good risk prediction models will have an AUC greater than 0.7 (May and Wang 2008). Secondary outcomes of the study were: (1) to estimate the sensitivity and specificity of the multimarker panel; and (2) to determine the relationship between the predicted posterior probability for membership of the disease class (ρP, derived from multivariate modelling) and disease stage and type. The performance of the diagnostic rule was also evaluated using a subset of the validation cohort: all controls + Stages I and II cases only, designated the early stage cohort.

Table 3 Sample distribution between the model and validation cohorts

Sample processing

Where necessary plasma samples were diluted appropriately for each assay according to manufacturers, specifications using a phosphate buffer containing bovine serum albumin (Sigma, St. Louis, MI, USA). In brief, for IL-6 and IL-8 assays, plasma samples were diluted 1:4, and for SAP, SAA and CRP assays plasma samples were diluted 1:2,000. Plasma CA-125 concentrations were assayed without prior dilution.

Frozen plasma samples and dilutions were thawed on ice prior to assay. All assays were performed in accordance with manufacturers’ instructions. All assays contained supplied standard curve samples of known analyte concentration. All standards, controls and patient samples were assayed in duplicate. Upon completion of each multiplex assay, a 5-parameter fit equation was employed to generate standard curves, from which mean values for each sample were calculated.

Biomarker quantification

Multiplexed bead-based assays were used to measure all analytes on a Biorad Bioplex 100 platform, with the exception of CA-125, which was assayed using a Roche modular E170. Multiplexed interleukin-6 and interleukin-8 assays (BioPlex®) were obtained from Bio-Rad Laboratories, Hercules, CA, USA and data are reported as pg/ml (LD = 10 pg/ml, intra- and inter-assay CV = <15 and <30%, respectively). Multiplexed serum amyloid A (SAA, ng/ml, LD = 0.2 pg/ml, intra- and inter-assay CV = 3.8 and <19.8%, respectively) and C-reactive protein (CRP μg/ml, LD = 6 pg/ml, intra- and inter-assay CV = 8.0 and <17.5%, respectively) assays were obtained from Millipore (Billerica, MA, USA). CA-125 was quantified using Roche CA-125 Elecsys II assay (Roche, Mannheim, Germany, LD = 0.6 U/ml; intra- and inter-assay coefficients of variation CV = 3.3 and 4.3%).

Statistical analysis

Statistical analyses, model development and samples classifications were performed by an independent biostatistian (Emphron Informatic Pty Ltd., Toowong, QLD, Australia). The primary outcome of the Phase II biomarker trial was the statistical comparison (Wilcoxon statistic (Waegeman et al. 2008), see below) of the area under the curve of the ROC curves for the biomarker panel and CA-125. Two sample group comparisons of median values were assessed by Mann–Whitney tests. Multiple group comparisons were assessed by Kruskal–Wallis tests (Kruskal and Wallis 1952). Dunn’s tests (Dunn 1964) were used for post hoc two sample comparisons. A p value of <0.05 was ascribed as statistically significant.

Modelling

As previously described, all samples (n = 362) were randomly assigned to two cohorts: the first was designated as the modelling cohort (n = 179) from which a classification algorithm was generated; and the second as the validation cohort (n = 183) which was used to establish the performance of the classification algorithm. A multivariate classification model was developed, based upon biomarker data obtained from the modelling cohort, using a stochastic gradient boosting model with a logistic loss function as previously described (Friedman et al. 2000). The boosted logistic regression algorithm was implemented within the R statistical programming environment (Team 2003). The implemented classification algorithm reported a predicted posterior probability value (i.e. the likelihood that a sample came from a woman with ovarian cancer, that is ρP) for each patient sample. ρP values were used to generate ROC curves for the biomarker panel. Biomarker data obtained from the validation cohort and the early stage cohorts were submitted to the classification algorithm to establish diagnostic efficiency (i.e. the proportion of samples correctly classified by the modelling algorithm). For classification of samples based on plasma CA-125 concentrations, a threshold value of ≥35 U/ml was used. A threshold value of 0.3 was used for the classification of samples based on ρP.

Receiver operator characteristic curve comparisons

The diagnostic performance of the biomarker panel and CA-125 alone were assessed by comparison of the area under ROC curves (Hanley and McNeil 1982). The ROC curve for the biomarker panel was based on ρP values. The area under the ROC curve (AUC) was calculated using the Wilcoxon statistic (Waegeman et al. 2008). As the AUC for the CA-125 and for the biomarker panel are not statistically independent, since they are based on the same patients, the difference in AUC between the diagnostics were statistically assessed using a bootstrap procedure (Efron 1986). The number of bootstrap samples used in this analysis was n = 100,000. The estimators considered were the area under the ROC curve as well as the difference between the AUCs, and the measures of accuracy were the 95% confidence intervals.

Results

Cohort characteristics; variation in plasma biomarker concentrations with patient age, disease state, stage, and tumour type

The age distribution of the case cohort (overall and by disease stage) is presented in Table 2. The median age (range) of the control cohort (n = 212) was 49 (20–89) years (mean ± SE, 47 ± 0.8). Within case and control cohorts, no effect of age on plasma biomarker concentration was identified (Spearman’s rank correlation using the Bonferroni correction, p > 0.05), with the exception of SAA in the control cohort (ρ = 0.243, p = 0.005, n = 212).

The variation in biomarker plasma concentrations for controls, all cases of ovarian cancer and early stage (i.e. Stages I and II) ovarian cancer are presented in Fig. 1. The median concentration of all biomarkers was significantly greater in both case cohorts (i.e. validation and early stage) than in the control cohort (p < 0.001, as assessed by Mann–Whitney tests) (Table 4). No significant differences in biomarker plasma concentrations were identified between the modelling (n = 179) and validation (n = 183) cohorts (p > 0.05). No statistically significant effects of duration of sample storage at −80°C could be identified for any analyte (data not shown).

Fig. 1
figure 1

Variation in biomarker plasma concentrations presented as scatter plots and median values for controls (n = 212), all cases of ovarian cancer (n = 150) and early stage ovarian cancer (Stages I and II, n = 91). CA-125 U/ml; CRP μg/ml; SAA ng/ml; IL-6 and IL-8 pg/ml

Table 4 Biomarker plasma concentrations

The cut-off value for CA-125 plasma concentration was ≥35 U/ml in 7.6% of control samples (i.e. 16 false positive cases). Within the case cohort, CA-125 plasma concentration was less than 35 U/ml in 8.7% of samples (i.e. 13 false negative cases) and 60.7% of case samples (i.e. 91 cases) were early stage disease (Stages I and II). Within early stage cases, CA-125 plasma concentration was <35 U/ml in 12.1% (i.e. 11 false negative cases). CA-125 was elevated (i.e. ≥35 U/ml) in 75% of Stage I, 93.7% of Stage II and 98.1% of Stage III/IV cases. Data were further analysed with respect to tumour type (i.e. serous, n = 99, clear cell, n = 15), endometrioid (n = 11), mucinous (n = 11) and mullerian mixed type (mixed, n = 6). Tumour typing was not available for eight samples. No significant effect of tumour type on biomarker plasma concentrations was identified (Kruskal–Wallis one-way analysis of variance, p > 0.05).

Multivariate modelling and ROC curve comparisons

A binomial classification model was developed by subjecting the observed plasma concentrations for each of the five biomarkers within the modelling cohort to stochastic gradient boosted logistic regression analysis (Friedman et al. 2000). The algorithm (diagnostic rule) generated was implemented within the R statistical environment and its diagnostic efficiency evaluated using the independent validation cohort (n = 183, controls = 115, cases = 68). A ρP was calculated for each patient’s set of biomarkers and used to generate the model ROC curve (Fig. 2). The variation in ρP with disease state is presented in Fig. 3. The primary measure of the diagnostic efficiency of the algorithm, area under the ROC (AUCA), is presented in Table 5 and compared with that of CA-125 alone (AUCC). AUCA was significantly greater than AUCC for the validation cohort (p < 0.01) and early stage cohort (p < 0.01). To assess the potential influence of age on overall multivariate analysis, age-adjusted modelling and validation were performed. The data derived were not statistically significantly different from non-age adjusted.

Fig. 2
figure 2

ROC curves for individual biomarkers and multivariate model

Fig. 3
figure 3

Predicted posterior probabilities (ρP the probability of a sample belonging to the case group) are presented for controls and cases within the validation and early stage cohorts. Median values are represented by horizontal lines. Control versus case comparisons (Mann–Whitney test) for all cohorts were significantly different (p < 0.01)

Table 5 Comparison of the area under the curve of the receive operator characteristic curves for CA-125 alone and the multianalyte panel within the validation cohort

Secondary outcome measures

Estimates of the sensitivity and specificity for CA-125 (at a threshold value ≥35 U/ml) and biomarker panel (at threshold values of 0.3 and 0.5) are presented in Table 6 for the validation and early stage cohorts. The relationship between ρP and disease stage is presented in Fig. 4. No significant effect of tumor type on ρP was identified (p > 0.05, Kruskal–Wallis test, data not shown).

Table 6 Contingency tables for multianalyte diagnostic (threshold ρP ≥ 0.3 and ρP ≥ 0.5) and CA-125 alone (threshold > 35 U/ml)
Fig. 4
figure 4

Association between ovarian cancer disease stage and predicted posterior probabilities. Median values are represented by horizontal lines. Control versus case comparisons (Mann–Whitney test) for all cohorts were significantly different (p < 0.01)

Second-site confirmation of predicted posterior probability values

Study samples were re-assayed in a second laboratory by an independent operator (GB). Data obtained were subjected to multivariate modelling and the ρP values derived were compared with those obtained from the primary site. No statistically significant differences in ρP values derived from data from the two laboratories were identified by paired two samples analysis (p > 0.05).

Discussion

The primary objective of this study was to test the hypothesis that the area under the ROC curve for a multimarker ovarian cancer panel (CA-125 and acute phase response proteins: CRP, SAA, IL-6 and IL-8) was significantly greater than that observed for CA-125 alone. A phase II biomarker trial (a retrospective, case–control study) that involved 362 patient samples was conducted to test this hypothesis.

All biomarkers were significantly elevated in association with ovarian cancer, however, individually none of the biomarkers displayed greater diagnostic performance than CA-125. With respect to the area captured under the ROC curves by individual biomarkers, CA-125 > SAA > CRP > IL-8 > IL6. When biomarker data were used to generate a multivariate classification model, the ρP values from the biomarker panel captured 0.988 of the area under the ROC curve which was significantly greater than that observed for CA-125 alone (0.960, p < 0.01). When applied to early stage cases alone (i.e. Stages I and II), the performance of the biomarker panel was similar (0.985) however, CA-125 performance decreased to 0.937 (p < 0.01), data consistent with an increased diagnostic efficiency of the multimarker panel for early stage ovarian cancer.

The area under the ROC curve was used as the primary statistical endpoint as this parameter is considered less susceptible to variations in mix of true positive and negative samples within the study cohort.

A measure of the reproducibility of biomarker panel performance was obtained by the re-assay of study samples in a second laboratory by an independent operator. The ρP values obtained from both laboratories were not significantly different.

The concentration of analytes reported in this study for controls and cases are comparable with those previous published (Geisler et al. 1996; Johnson et al. 2008; Kodama et al. 1999; Lambeck et al. 2007; Maccio et al. 1998; Moshkovskii et al. 2005a; Woolas et al. 1993). The diagnostic efficiency of CA-125 within the study cohort (all FIGO stages) was 91.1%. The diagnostic efficiency of CA-125 for ovarian cancer has been previously reported between 70 and 90% (Park et al. 1995; Saraswathi and Malait 1995; Visintin et al. 2008).

All biomarkers utilised in the panel have been previously associated with ovarian cancer. Bertenshaw et al. (2008) reported CRP and IL-8 concentrations as being amongst the most informative ovarian cancer serum biomarkers in a multianalyte profiling study. Similarly, IL-6 and IL-8 have been reported to be elevated in serum of patients with ovarian cancer (Darai et al. 2003; Lambeck et al. 2007; Lokshin et al. 2006) and utilised in multimarker panels for the detection of ovarian cancer (Gorelik et al. 2005). In the latter study, IL-6 and IL-8 were used in combination with other cytokines and CA-125 in a classification tree analysis to deliver a test with greater sensitivity and specificity than CA-125 alone. IL-6 and IL-8 are pleiomorphic cytokines that have been also implicated in aspects of tumor growth, disease progression and/or treatment (Hefler et al. 2003; Wang et al. 2005, 2007; Xu and Fidler 2000).

CRP and SAA are major components of the acute phase response (Pepys and Baltz 1983). Several studies have reported elevated serum concentrations of CRP in association with ovarian cancer (Avall Lundqvist et al. 1989; Hefler et al. 2008; Kodama et al. 1999; Maccio et al. 1998; McSorley et al. 2007). Only limited data, however, are available on SAA concentrations in ovarian cancer patients (Helleman et al. 2008). Serum concentrations of CRP are correlated with IL-6 and high concentration has been reported to be a significant factor in prognosis of ovarian cancer (Kodama et al. 1999; Maccio et al. 1998). Indeed, high CRP is reportedly a risk factor for developing ovarian cancer (McSorley et al. 2007).

In 2005, Moshkovskii et al. (2005b) identified two forms of SAA using SELDI-TOF mass-spectrometry. The authors provided evidence that the presence of both forms in 55% of ovarian cancers compared to only 6% of healthy controls indicative that an N-terminal truncated form of SAA may be significant for diagnosis. SAA was further identified in a proteomic study by Helleman et al. (2008) as a potential marker for monitoring of disease progression, where in combination with CA-125 and seven other markers, a sensitivity of 91–100% was achieved. CRP and SAA have been implicated in a range of neoplastic diseases (Weinstein et al. 1984).

The data obtained in this study are consistent with and support previous observations of the association between elevated acute phase proteins and the presence of ovarian cancer. When such biomarkers are used in combination with CA-125, diagnostic efficiency for ovarian cancer is increased overall (validation cohort) and for early stage disease (early stage cohort). At ρP values of 0.3–0.5, the biomarker panel delivers a balance between sensitivity and specificity, and displays a false positive rate of 6–8%. At this level of performance, while the biomarker panel would reduce by 30–50% the number of women misdiagnosed with cancer by CA-125, it would not be suitable as a screening modality.

The study reported herein is a retrospective, case–control design and the diagnostic performance parameters reported cannot be extended beyond the context of the study. Additional studies, therefore, are required to assess the clinical utility of such multimarker tests within both high-risk cohorts (including women with a genetic predisposition to ovarian cancer) and within the general population (where reliable estimates of positive and negative predictive values may be obtained) (Coates et al. 2008). The utility of biomarker panels for the diagnosis of ovarian cancer, such as that reported herein, may be further enhanced if used in a multimodal approach, for example, in combination with a symptom index as recently described by Andersen et al. (2008).