Introduction

Approximately 70 % of breast cancer patients are estrogen receptor-positive (ER-positive), for whom adjuvant tamoxifen monotherapy is the most common treatment option. Large-scale randomized trials have shown that administering tamoxifen for approximately 5 years can significantly reduce the relapse risk and mortality for ER-positive patients throughout the first 10 years of a > 15-year follow-up [1, 2]. Unfortunately, the usefulness of tamoxifen is limited by endocrine resistance, which causes cancer relapse and patient death. Therefore, it is vitally important to develop predictors that can accurately predict outcomes for patients receiving tamoxifen. Although several predictors based on gene expression have been built for this purpose [37], their predictive powers are often markedly decreased in independent cohorts from different laboratories [811] and were usually invalid in node-positive patients [7, 12]. For example, the 21-gene recurrence score (RS) proposed by Paik et al. [3] was not significantly associated with recurrence risk when adjusted for clinicopathologic parameters in a validation cohort [13], and it might provide little information beyond these parameters [1416].

Most of the previously proposed predictors make decisions based on the cut-offs of transformed absolute expression measurements which are sensitive to the systematic inter-laboratory biases of microarray experiments, including batch effects, that are introduced by experimental conditions, reagent dosage, and personnel differences [17]. Although the data normalization and batch effect correction algorithms are supposed to eliminate such biases, they distort the real expression signals by smoothing down biological differences and scaling all arrays from different experiments to the same distribution under the unreliable assumption that the absolute amount of total mRNA in each sample is similar across different cell types or conditions [1821]. In contrast, the relative ordering of expression measurement (ROE) in gene pairs within a sample is invariant to monotonic data transformation (normalization) and rather resistant to batch effects [22], which are promising features for building laboratorially robust predictors, e.g., the predictors of top scoring pair (TSP) and k-TSP [4, 22, 23]. However, as a few TSPs are typically composed of mutually correlated gene pairs, they could miss many patients without the top features. Given the heterogeneity of ER-positive breast cancer [2426] and the multiple mechanisms involved in tamoxifen resistance [11, 2731], it would be necessary to develop a predictor with combined gene pairs that are each associated with a subset of resistant patients.

As genes are widely dysregulated in ER-positive tumors [32], we assumed that there would be many gene pairs with reversed ROEs in a significant portion of ER-positive tumor samples compared with normal controls. We could use these gene pairs as tumor-associated features to characterize tumor samples at the individual level and then further identify the patient subgroups with different tamoxifen responses. Based on the above-mentioned assumptions, using an integrated dataset consisting of 420 normal controls and 1,129 ER-positive tumor samples without treatment and outcome information, we identified the ER-positive tumor-associated gene pairs with stable ROEs in normal controls and significantly reversed ROEs in ER-positive tumor samples. Using these gene pairs, we characterized each sample from a multi-laboratory cohort of 292 ER-positive patients receiving tamoxifen and identified the gene pairs (features) whose reversal was significantly associated with the relapse risk of these patients. We extracted a feature subset to develop a predictor and validated the predictor in 2 independent multi-laboratory cohorts of 250 and 248 tamoxifen-treated patients and in a cohort of 297 chemo-endocrine-treated patients. Finally, we compared the proposed predictor with 2 other predictors using inter-laboratory data.

Materials and methods

Data collection

An integrated dataset, including 420 normal controls and 1,129 ER-positive breast cancer samples documented in the GEO [33], ArrayExpress [34], and TCGA [26] repositories (Table S1), was used to identify ER-positive tumor-associated gene pairs.

The characteristics of the patient and sample cohorts used for predictor development and validation are summarized in Table 1 and Table S2, respectively. Notably, these sample cohorts were profiled at various laboratories. The training cohort consisted of 292 ER-positive patients from 3 centers who received tamoxifen monotherapy for 5 years. This cohort was used to assess the association of the ROE reversals of gene pairs with relapse risk and to develop an ROE-based endocrine predictor (RE predictor) to predict the outcomes of tamoxifen-treated ER-positive patients. The first and second validation cohorts consisted of 250 and 248 ER-positive patients, respectively, who received tamoxifen (N = 470) or letrozole (N = 28) alone for 5 years. These 2 cohorts were studied to assess and compare the performance of the RE predictor, sensitivity to endocrine therapy (SET) index [7] and RS [3].

Table 1 Clinical characteristics of the patients at diagnosis

Two untreated cohorts with 209 and 253 ER-positive patients who did not receive any systemic therapy were studied to compare with the tamoxifen-treated patients to assess the benefit from adjuvant tamoxifen for patients who were predicted to be tamoxifen insensitive or sensitive.

A chemo-endocrine cohort was studied to test the predictive power of the RE predictor for endocrine therapy after chemotherapy. This cohort consisted of 297 ER-positive breast cancer patients who received neoadjuvant chemotherapy and subsequent adjuvant endocrine therapy with tamoxifen or aromatase inhibitors (AIs) or both in sequence [35].

Feature selection and predictor construction

Figure 1 describes the processes for developing and validating the RE predictor. In the integrated dataset, all gene pairs were evaluated to extract the ER-positive tumor-associated gene pairs that had stable ROEs in more than 99 % of normal controls and the reversed ROEs in more than 10 % of ER-positive tumor samples, which corresponded to a false discovery rate (FDR) <0.01 %. Gene pairs reversed in >50 % of tumor samples was removed as tamoxifen insensitive occurs in <50 % of ER-positive patients [31, 36, 37]. Then, using the extracted n gene pairs, we characterized each of the samples in the training cohort as a binary vector (x 1, x 2, …, x n ) in which x i (i = 1,2, …, n) was assigned 0 if the ROE of the ith pair of genes was R i1  ≥ R i2 , and 1 otherwise (reversed). The univariate Cox regression model was used to identify the gene pairs (features) whose reversal was significantly associated with the relapse risk of the patients. The final feature subset was obtained from these features through a wrapper method based on a simple classification rule: a patient was predicted to be tamoxifen insensitive for a feature subset if at least 2 gene pairs were reversed in the subset, which was determined by the binominal test, as discribed in detail in the Supplemental Methods. Here, we used a genetic algorithm [38] to search the final feature subset that resulted in the largest product of the positive predictive value (PPV) and negative predictive value (NPV) as defined in the Statistical Analysis section. The genetic algorithm was implemented with a population size of 10,000 and a crossover fraction of 0.9; it was terminated if the optimization objective of the best subset was not improved in 100 generations. The details of the genetic algorithm are shown in the Supplemental Methods. An RE Predictor was developed based on the classification rule with the optimized feature subset.

Fig. 1
figure 1

Outline of the processes for developing and validating the RE predictor. GP, gene pair

Statistical analysis

Log-rank tests were used to assess the differences between the Kaplan–Meier estimates of the relapse-free survival (RFS) in the predicted groups. Multivariate analyses with the Cox proportional hazards regression model were performed to calculate the hazard ratios (HRs) and their 95 % confidence intervals (CIs). The additional prognostic value of the RE predictor was assessed by comparing with the full clinical model using the likelihood ratio test.

Since the HRs and P values of the Cox model do not directly test the predictive power of a predictor [39], the performance was assessed by the PPV, defined as the probability of relapse within 10 years for patients predicted to be tamoxifen insensitive; the NPV, defined as the probability of 10-year RFS for patients predicted to be tamoxifen sensitive; and the absolute risk reduction (ARR), defined as the absolute difference in RFS between the 2 predicted groups [35]. These 3 measures were estimated based on the Kaplan–Meier estimates of the survival function of cumulative events [40]. The CIs for NPV and PPV were calculated based on the Greenwood variance estimate and for ARR were estimated by bootstrapping with 10,000 iterations [41]. All statistical computations were performed in R version 2.15.2.

Results

Development of the RE predictor

First, from the integrated dataset, we identified 264,933 ER-positive tumor-associated gene pairs with a FDR < 0.01 %. From these gene pairs, 562 gene pairs associated with relapse risk were identified using the Cox model with a 1 % FDR. Next, 15 gene pairs (Table S3) were finally extracted from the 562 gene pairs using a genetic algorithm [38] based on the classification rule. Finally, the RE predictor was developed based on the ROE of the 15 gene pairs and the classification rule.

Performance of the RE predictor in Independent validation cohorts

The RE predictor was evaluated in the 2 validation cohorts of 250 and 248 patients. In the first cohort, for the patients predicted to be tamoxifen sensitive, the 10-year RFS (NPV) was 91 % (95 % CI 85–97 %) and there was an ARR of 34 % (95 % CI 17–51 %) (Table 2). By comparison, the 10-year point estimate of RFS for the patients predicted to be tamoxifen insensitive was 57 % (95 % CI 47–68 %), corresponding to a PPV of 43 % (32–53 %) and an HR of 4.99 (95 % CI 2.45–10.17, P = 9.13 × 10−7) (Fig. 2a). Similar performance was achieved for the second validation cohort (Table 2; Fig. 2b). Overall, the patients predicted to be tamoxifen sensitive had a significantly higher probability of RFS than the patients predicted to be tamoxifen insensitive.

Table 2 Performance for prediction of relapse within 10 years
Fig. 2
figure 2

Kaplan–Meier estimates of relapse-free survival for the patients who received endocrine therapy only in a the first validation cohort, and b the second validation cohort, and presented separately for the subsets with c node-negative and d node-positive patients. The tamoxifen sensitive and insensitive groups were predicted by the RE predictor. Eleven patients were excluded from the analysis of nodal status-defined subsets because of missing data. The hazard ratio (HR) is a measure for the risk of relapse; P values are calculated by the log-rank tests

Considering that many patients (53 %) in the 2 cohorts had partially missing clinicopathologic parameters, we performed multivariate analyses by combining the patients of both cohorts to increase the statistical power. In a multivariate Cox model, the RE prediction was significantly associated with the relapse risk (HR = 5.26, 95 % CI 2.76–10.03, P = 4.52 × 10−7) after adjusting for standard clinicopathologic parameters, including nodal status, age, tumor size and grade (Table 3). Addition of the RE prediction to a multivariate Cox model of the clinicopathologic parameters significantly increased the predictive utility of the model (likelihood ratio of the complete model vs the clinical model, 29.4; P = 5.92 × 10−8). In the complete model, node-positive status was also independently associated with significantly greater risk of relapse (HR = 2.02, 95 % CI 1.22–3.34, P = 0.006; Table 3).

Table 3 Multivariate cox regression analysis of association with RFS (N = 235)

Notably, the RE predictor appears to perform well in both node-negative and node-positive patients. The patients predicted to be tamoxifen insensitive had a significant higher risk of relapse compared with the patients predicted to be tamoxifen sensitive in both node-negative (HR = 3.57, 95 % CI 1.79–7.12, P = 0.0001; Fig. 2c) and node-positive (HR = 4.67, 95 % CI 2.62–8.33, P = 9.36 × 10−9; Fig. 2d) patient subsets. For the patients predicted to be tamoxifen sensitive, the 10-year point estimate of RFS (NPV) was 93 % (95 % CI 88–98 %) in node-negative patients and 81 % (95 % CI 72–91 %) in node-positive patients. Moreover, the PPV was 30 % (95 % CI 21–38 %) and 63 % (95 % CI 51–71 %) for the node-negative and node-positive patients, respectively, corresponding to an ARR of 23 % (95 % CI 13–33 %) and 44 % (95 % CI 30–57 %), respectively.

Note that the tamoxifen-treated patients who were predicted to be tamoxifen sensitive (but not the patients predicted to be tamoxifen insensitive) had a significant improvement of RFS compared with their untreated counterparts (Fig. 3). For the patients predicted to be tamoxifen sensitive, the 10-year point estimate of RFS was 70 % (95 % CI 62–80 %) and 69 % (95 % CI 61–77 %) for the 2 untreated cohorts, while the RFS of the first validation cohort (91 %) was significantly improved compared with the patients from both untreated cohorts (HR = 0.26, 95 % CI 0.12–0.55, P = 0.0002, Fig. 3a; and HR = 0.34, 95 % CI 0.17–0.70, P = 0.002, Fig. 3b). A similar improvement was observed in the second validation cohort (Fig. 3c, d). Conversely, for the patients predicted to be tamoxifen insensitive, there was no significant RFS improvement observed for the 2 tamoxifen-treated cohorts compared with the 2 untreated cohorts (all P > 0.1, Figure S1).

Fig. 3
figure 3

Kaplan–Meier estimates of relapse-free survival for the patients predicted to be tamoxifen sensitive in a, b the first validation cohort and c, d the second validation cohort treated with endocrine therapy (Tam-treated) versus the 2 untreated cohorts (Untreated). The hazard ratio (HR) is a measure for the risk of relapse; P values are calculated by the log-rank tests

Testing of the RE predictor in the chemo-endocrine cohort

The RE predictor was also applied to the chemo-endocrine cohort of 297 patients (Figure S2). The 2 predicted patient groups showed a significantly different relapse risk after neoadjuvant chemotherapy and subsequent adjuvant endocrine therapy (HR = 2.79, 95 % CI 1.40–5.55, P = 0.002). The result suggested that the RE predictor test could still predict the outcomes of endocrine therapy-treated ER-positive patients, despite receiving neoadjuvant chemotherapy. However, the PPV and NPV of this cohort could not be assessed because there was only a 3-year median follow-up [35].

Comparison of the RE predictor with other predictors

We also evaluated the performance and robustness of 2 absolute expression value-based predictors, SET index and RS, which were previously developed to predict the outcomes of patients receiving tamoxifen. Although the SET index performed well in the previous study [7], RFS of the high-SET category of the first validation cohort was not significantly improved, compared with the intermediate or low-SET categories (HR = 0.69, 95 % CI 0.45–1.05, P = 0.08; Figure S3). Additionally, the PPV of the SET index (32 %, 95 % CI 24–39 %) (Table 2) was not significantly larger than the baseline of 29 % (the lower 95 % confidence limit of the PPV was smaller than the baseline). For the RS, the NPV and PPV were 77 % (95 % CI 69–85 %) and 42 % (95 % CI 28–53 %) in the first validation cohort (Table 2), respectively; both of which were not significantly larger than the baselines. Similar NPV and PPV were achieved for the RS in the second validation cohort. In contrast, the RE predictor showed significantly larger NPV and PPV than the baselines in both validation cohorts.

In hierarchical clustering analyses of all genes in the expression profiles using Euclidean distances with ward linkage, the samples were separated according to laboratories (Figure S4), which indicated that there were extensive inter-laboratory biases between the microarrays profiled at different laboratories. As the biases could systematically influence SET and RS scores, thresholds optimized in a training cohort would be unsuitable for validation cohorts from other laboratories (Table S4). For example, when evaluated in technical replicates that were profiled at 2 laboratories, 6, 2, and 1 of the 16 patients were classified into discordant groups for the SET index, RS, and RE predictor, respectively (Table S5).

The above results suggested that scores of the SET index and RS tend to be greatly affected by systematic biases of expression measurements between laboratories, whereas the RE predictor is rather robust against inter-laboratory biases.

Discussion

This study develops the RE predictor and validates its predictive performance in independent multi-laboratory cohorts. The results indicated that the RE predictor could accurately predict the RFS outcome of ER-positive patients receiving tamoxifen and of ER-positive patients receiving neoadjuvant chemotherapy and endocrine therapy. The RE predictor was also observed to be more robust than the absolute expression value-based predictors for samples profiled in different laboratories, as the ROE was barely affected by the inter-laboratory biases. Thus, the RE predictor could be applied to patients from different centers.

In principle, the predictive effect of the RE predictor should be confirmed in a properly designed prospective randomized trial, which would enable testing for the interaction between the RE predictor and the tamoxifen. However, this type of prospective study is unjustifiable because the use of an untreated cohort is unethical after the approval of tamoxifen [1, 2]. Thus, we compared the patients receiving tamoxifen with the untreated patients from different retrospective studies. The results showed that 5-year tamoxifen treatment significantly reduced relapse risk compared with no systemic treatment for the patients predicted to be tamoxifen sensitive, but not for the patients predicted to be tamoxifen insensitive. We recognize that the distributions of age and nodal status were different for the treated and untreated cohorts and that this difference might result in a different relapse rate from the natural disease history (unrelated to tamoxifen). However, the results suggested that the patients predicted to be tamoxifen sensitive would benefit more from tamoxifen than the patients predicted to be tamoxifen insensitive.

In the multivariate Cox analysis, the RE prediction and nodal status were 2 independent prognostic factors. Note that the node-positive patients predicted to be tamoxifen sensitive had a 19 % relapse risk from endocrine therapy alone. This is a common problem of all predictors for ER-positive patients receiving endocrine therapy as the poor prognosis of node-positive patients. Two recent studies have reported that approximately 20 % of the node-positive, low-risk patients identified by predictors would develop a distance relapse [7, 12]. Furthermore, these predictors perform well in node-negative patients but showed either no statistical significance or were barely significant in node-positive patients when comparing predicted low-risk with predicted high-risk groups [7, 12]. For the RE predictor, the node-positive patients predicted to be tamoxifen sensitive and insensitive had significantly different RFS rates, with a large ARR of 43 % for the patients predicted to be tamoxifen sensitive compared with the patients predicted to be tamoxifen insensitive. This result demonstrates the effectiveness of the RE predictor in outcome prediction of node-positive, ER-positive patients treated with tamoxifen. Nevertheless, RFS rate of predicted low-risk group could most likely be further improved if predictor is developed only in node-positive cohorts.

Our analysis showed that the scores of the SET index and RS tended to be greatly affected by the systematic biases of expression measurements between laboratories. Although the RS scores have not been measured by the reverse transcriptase polymerase chain reaction (RT-PCR) assay [3], the scores based on RT-PCR would also be influenced, as systematic biases also affects the RT-PCR measurements [17].

As the RE predictor could accurately predict the outcomes of tamoxifen-treated ER-positive patients and was robust across different laboratories, it could be used to assist in clinical decision-making combined with nodal status. For the node-negative patients predicted to be tamoxifen sensitive, a 93 % 10-year RFS rate justifies the selection of tamoxifen monotherapy. In sharp contrast, the probability of relapse within 10-year was 63 % for the node-positive patients predicted to be tamoxifen insensitive. Although there is no evidence suggesting that the allocation of tamoxifen is useless for these patients (no randomized control), this probability might be sufficiently high for consideration of alternative adjuvant therapies (e.g., AIs and aggressive chemotherapies) or clinical trials of investigational treatments rather than tamoxifen alone. For the node-positive patients predicted to be tamoxifen sensitive and node-negative patients predicted to be tamoxifen insensitive, the probability of RFS was 81 and 70 %, respectively. Thus, sequential chemo-endocrine therapy or prolonging tamoxifen to 10 years [42, 43] or AIs [4447] could be considered for these patients to further improve the cure likelihood. The RE predictor could be combined with nodal status and prediction test for targeted therapies and chemotherapies [35] to provide predictive tests for personalized adjuvant treatment regimens for breast cancer patients.