Introduction

The last decade has seen a dramatic increase in the use of genomic assays in routine clinical practice for patients with early-stage breast cancer. These genomic assays differ in the technological platforms, development, analytical and clinical validation as well as the gene sets that are included in the algorithm. Furthermore, the patient cohorts included in the clinical validation studies differ substantially ranging from small single-center cohorts of convenience samples to large cohorts from randomized clinical trials with long-term clinical outcomes and archived tissue.

The current study focused on two such genomic assays: The 21-gene Recurrence Score® assay (Oncotype DX®, Genomic Health, Inc., Redwood City, CA, USA) and the Prosigna assay (NanoString Technologies Inc., Seattle, WA, USA). The 21-gene assay is a quantitative reverse transcriptase polymerase chain reaction (qRT-PCR)-based multigene assay that has been clinically validated as a prognosticator in estrogen receptor (ER)-positive early-stage breast cancer treated with endocrine therapy as well as a predictor of the likelihood of chemotherapy benefit (i.e., patients with a high score have a greater likelihood of benefit and patients with a low score would be expected to have little to no benefit) [17]. The Recurrence Score result provides a quantitative estimate of the 10-year risk of distant recurrence based on the individual patient’s tumor. The 21-gene assay has been incorporated into the consensus statement of the IMPAKT (IMProving care And Knowledge through Translational research) 2012 Working Group (as having convincing evidence on analytical and clinical validity) [8] and into major international guidelines including those by the National Comprehensive Cancer Network®, the American Society of Clinical Oncology®, the European Society for Medical Oncology, and St. Gallen, and a recommendation by the National Institute for Health and Care Excellence (NICE) in the UK [913].

The Prosigna assay is based on PAM50, a 50-gene microarray profile originally developed for research purposes to assess the phenotypic diversity of breast tumors and corresponding diversity in gene expression [14, 15]. The Prosigna assay has been validated as a prognosticator in clinically low-risk, postmenopausal patients with ER-positive early-stage breast cancer treated with endocrine therapy [1618]. To date, the Prosigna assay has not been shown to be predictive of chemotherapy benefit. The IMPAKT 2012 Working Group did not find the analytical/clinical evidence for this assay to be convincing [8] and the assay is currently not acknowledged in international guidelines as having data supporting prediction of chemotherapy benefit [913].

Increasingly, there is a misconception that all the risk-stratifying assays provide similar information that can be used interchangeably for prognostication and treatment decisions. There have now been multiple reports comparing the 21-gene assay with the 70-gene assay (MammaPrint®, Agendia, Inc. Amsterdam, The Netherlands), the five-antibody assay (Mammostrat®, Clarient, Inc., Aliso Viejo, CA, USA), and the 12-gene assay (EndoPredict®, Sividon Diagnostics, Cologne, Germany) that have clearly shown that the assays stratify patients differently [1922].

We hypothesized that the 21-gene assay and the Prosigna assay also stratify patients differently. To test this hypothesis, we performed a prospectively designed comparison between these assays using the same formalin-fixed, paraffin-embedded (FFPE) tumor samples.

Methods

Study Design

The study was prospectively designed to analyze archival samples from patients with ER-positive early-stage breast cancer. The sample size was calculated to be 70 samples based on the number of samples needed to assess the concordance between the two assays in terms of risk stratification and accounting for approximately 20% dropout. The assays were run in independent laboratories and each blinded to the results of the other. The primary objective of the study was to assess the agreement in risk stratification between the two assays. Secondary objectives included calculating the correlation between results of the two assays; assessing ER expression within Prosigna risk groups; and determining the distribution of the Recurrence Score and Prosigna results within luminal subtypes as defined by the Prosigna Breast Cancer Intrinsic Classifier.

The study was reviewed and approved by an institutional review board and was granted a waiver for obtaining patient consent since there were no patient outcomes included in the analysis. All samples were de-identified for patient specific information.

Patients and Samples

Consecutive FFPE breast cancer samples (all excisional specimens) were obtained from Marin Medical Laboratories (Novato, CA, USA), that serves as the tissue repository for the Marin General Hospital and provides tissue samples (from patients originally seen at the Marin clinics) for research purposes. For this study, tissue samples that were made available were from patients seen between 2000 and 2001, as for them, the assay results were not going to impact treatment decisions. In addition, the following inclusion criteria were met: Postmenopausal, ER-positive (by immunohistochemistry [IHC] or RT-PCR) and HER2 negative (by IHC or fluorescence in situ hybridization [FISH]). The 21-gene assay was performed at the Genomic Health® laboratory (Redwood City, CA, USA), and the Recurrence Score result, the predefined Recurrence Score risk group (low: <18; intermediate: 18–30; high ≥31) [1], and quantitative ER, progesterone receptor (PR), and HER2 gene scores were reported. The cut-off values used for ER and PR positivity were 6.5 and 5.5 units, respectively. For HER2, the positive cut-off was ≥11.5 units, equivocal ranged from 10.7 to 11.4 units, and the negative cut-off was <10.7 units. Samples were excluded from the analysis if they were ER-negative, if there was insufficient tumor material for testing, or if the patients were node-positive or premenopausal. FFPE-tumor samples prepared per Prosigna protocol were sent to an independent laboratory in Europe that was blinded to the Recurrence Score results, for obtaining the Prosigna score for risk group (for node-negative patients: low: 0–40; intermediate: 41–60; high: 61–100) [23], and intrinsic subtype determination which is not currently available in the US.

Statistical Considerations

All analyses were descriptive. Two-way frequency tables and exact 95% Clopper-Pearson confidence intervals [CIs] were used to assess the agreement between risk group classifications based on the two assays. Spearman correlation between the Recurrence Score and the Prosigna results was calculated. SAS® Enterprise guide® version 5.1 (SAS Institute Inc., Cary, NC, USA) was used for the analysis.

Results

Patients and Samples

A total of 70 consecutive samples were obtained from Marin Medical Laboratories. Samples from 18 patients were excluded: four were ER-negative, six were node-positive, five were premenopausal, and three yielded insufficient tumor RNA for the 21-gene assay. The number of samples included in the final analysis cohort was 52.

More than half of the samples (55.8%) were from patients aged ≥70 years. The majority of tumors were invasive ductal carcinoma (73.1%), ≤2 cm in size (78.9%), and grade 1 or 2 (90.4%; Table 1).

Table 1 Baseline patient and tumor characteristics

Distribution of Recurrence Score and Prosigna Results

The distribution of the Recurrence Score and the Prosigna results exhibited marked differences as there were more patients classified as low risk and fewer patients classified as intermediate or high risk by the Recurrence Score result compared to the Prosigna result (Fig. 1). The median Recurrence Score result was 12 (range 0–36) with 37 (71.2%), 12 (23.1%), and 3 (5.8%) in the low, intermediate, and high Recurrence Score risk groups, respectively. The median Prosigna score was 39 (range 0–88) with 28 (53.8%), 17 (32.7%), and 7 (13.5%) samples in the low, intermediate, and high risk groups, respectively (Fig. 1a, b).

Fig. 1
figure 1

Distribution of scores and correlation between assays (N = 52). a Distribution of Recurrence Score results in the cohort; b distribution of Prosigna results in the cohort; c correlation between the Recurrence Score and the Prosigna results in node-negative postmenopausal patients

Comparison of Risk Scores

Overall agreement for risk classification based on the Recurrence Score and the Prosigna score results was 53.8% (Table 2). Thirty-seven patients had a low Recurrence Score result versus 28 patients with a low Prosigna score. Twenty-two patients had a low score from both assays.

Table 2 Agreement in risk group assignment between Recurrence Score and Prosigna results in postmenopausal, node-negative, ER-positive patients (N = 52)

The correlation between the Recurrence Score and the Prosigna score results was poor (Fig. 1c; r = 0.08; 95% CI, −0.2 to 0.35; Spearman correlation). There were only three patients with a high Recurrence Score result versus seven with a high Prosigna score. Of note, 57.1% (4/7) of the patients classified as high risk by the Prosigna assay, were classified as low risk by the Recurrence Score result. Of the three patients with high Recurrence Score results, only one patient was high by the Prosigna assay, the other two were low and intermediate (Fig. 1c).

Quantitative ER Expression

Evaluation of quantitative ER expression (by RT-PCR) showed a wide range of expression within each Prosigna score risk group (Fig. 2). All four patients classified as high risk by the Prosigna assay and low risk by the Recurrence Score result exhibited high ER expression. In addition, there were two patients whose ER expression was close to the positivity threshold with high Recurrence Score results that were low or intermediate by the Prosigna assay.

Fig. 2
figure 2

Quantitative ER expression by Recurrence Score and Prosigna results (N = 52). The horizontal line at 6.5 represents the threshold for ER positivity; the dashed lines within each Prosigna risk group represent the median within each Recurrence Score group. CT threshold cycle, ER Estrogen receptor

Recurrence Score and Prosigna Results Within Intrinsic Subtypes

The Prosigna assay classified 38 (73.1%) of the samples as luminal A and 12 (23.1%) as luminal B. Two samples were non-luminal (1 Her2 enriched and 1 basal like). As expected (since intrinsic subtype determination and Prosigna score calculations are interrelated), all samples identified as luminal A were characterized as low or intermediate risk according to the Prosigna assay, whereas all samples identified as luminal B were characterized as intermediate or high risk. However, in both the luminal A and B samples there was a range of Recurrence Score results (Fig. 3). Specifically, among the 38 luminal A samples, 1 (2.6%) had a high Recurrence Score result, and among the 12 luminal B samples, 10 (83.3%) had low Recurrence Score results.

Fig. 3
figure 3

Distribution of Recurrence Score and Prosigna results within luminal A and luminal B subtypes

Discussion

The current analysis, the first prospectively designed comparison between the Recurrence Score and Prosigna assays, shows that these assays classify patients differently. Specifically, the study showed a wide variation in Recurrence Score results within each Prosigna risk category and a poor correlation between the Recurrence Score and the Prosigna results, with more than half of the patients classified as high risk by the Prosigna assay being classified as low risk by the Recurrence Score result. This study is also the first formal/pre-specified analysis of the distribution of Recurrence Score results within the PAM50-defined luminal subtypes and showed that not all luminal B patients have high Recurrence Score results. Of note, the Prosigna assay was optimized to not have any high scores in luminal A subtype and no low scores in luminal B subtype. The impact of this on the distribution of the Recurrence Score results within luminal subtypes is not known.

Our findings are generally consistent with those of Dowsett et al. [17] who compared the Recurrence Score results and the Risk of Recurrence (ROR) score (the precursor to the Prosigna assay) in the TransATAC study and showed an agreement in risk group assignment (in node-negative patients; n = 739) of 58%, which is similar to the 53% observed here. In the TransATAC study samples, the ROR score identified fewer intermediate patients (the risk thresholds used there were <10% for low, 10–20% for intermediate and >20% for high risk). In our analysis, where the risk groupings provided on each patient report were used, the Prosigna assay yielded more intermediate risk results (17) than the Recurrence Score assay (12) [17]. Notably, in the Dowsett et al. [17] study, the ROR assay was performed on residual RNA extracted by Genomic Health in an earlier study (i.e., samples were microdissected and the RNA was extracted using Genomic Health proprietary methods), which may be relevant, since RNA extraction methods have been shown to impact gene expression analysis [24]. Indeed, the results in the validation study conducted in TransATAC were substantially different from those reported for the validation study conducted in the ABCSG-8 trial with the risk of recurrence being approximately twofold higher in patients with a high and intermediate ROR scores [16, 17]; possibly due to different RNA extraction methods, differences in patient populations, or both. Furthermore, in the Dowsett et al. [17] analysis, the cut points for the Recurrence Score risk groups were not based on the validated values defining the Recurrence Score risk groups (<18, 18–30, ≥31), but rather percentage risk based on the ROR categories (<10%, 10–20%, >20%).

Our results are also consistent with findings from previous studies comparing Recurrence Score-based risk assessments with risk assessment based on other assays, including the 70-gene assay, the five-antibody assay, and the 12-gene assay [2022], as well as with recent findings from the prospective OPTIMA pilot feasibility study which compared risk assignments between the Recurrence Score assay, the 70-gene assay, the Prosigna assay, IHC4, and IHC4 AQUA® (Genoptix, Carlsbad, CA, USA) as part of the feasibility of using a genomic profile assay for treatment decisions [25]. These studies revealed consistent differences in risk classification between the assays. Most notably that the Recurrence Score assay consistently classifies fewer patients as high risk. These differences may be due to the different platforms (e.g., array vs. RT-PCR), different genes included in the assays, and differences in clinical validation (e.g., using legacy trials with long-term follow-up vs. convenience samples where patients were not treated uniformly) [16, 1618]. To date, the Recurrence Score assay is the only one shown to predict the likelihood of chemotherapy benefit (i.e., which patients are likely to benefit from adding chemotherapy to endocrine therapy and which are unlikely to derive benefit) [2, 4].

Our study was able to identify patients with high Prosigna score results and low Recurrence Score results. Based on the Recurrence Score validation studies [2, 4], such patients are likely to have little to no chemotherapy benefit. Furthermore, in our study, these patients had high ER expression levels, and thus were likely to derive substantial benefit from endocrine therapy alone which is of clinical relevance since the Prosigna assay was developed in untreated patients.

Of note, the 5-year outcomes for the study arm of 1626 ER-positive, node-negative patients with a Recurrence Score result <11 enrolled in the TAILORx study (the largest prospective adjuvant trial to date) were recently published in the New England Journal of Medicine and confirmed the very low distant recurrence risk for patients treated with endocrine therapy alone (rate of freedom from distant recurrence at 5 years: 99.3%) [7]. This 5-year distant recurrence rate in this contemporary cohort is even lower than expected based on the prior experience reported in the validation studies [26]. Furthermore, with a recurrence rate so low, it is certain that patients with a low Recurrence Score result will not derive additional benefit from the addition of chemotherapy.

Additional clinical relevance of our findings pertains to the use of intrinsic subtypes (luminal A or B) to assess recurrence risk and make treatment decisions. To date, there has not been a standardized method to determine luminal subtype, aside from the original PAM50 microarray. In addition, while the luminal subtypes differ prognostically, there is no evidence that luminal subtyping is predictive of chemotherapy benefit. In our study, 83% of the luminal B patients had low Recurrence Score results, and are therefore likely to have little to no benefit from chemotherapy. Furthermore, in our study, these patients also had high ER expression, and thus, are expected to benefit substantially from endocrine therapy alone, as noted above.

While this is the first truly blinded head-to-head comparison of the Recurrence Score and Prosigna assays, the study does have some limitations. The study was designed to compare the Recurrence Score and Prosigna assays, focusing solely on risk estimates. Discerning the impact of the differences in risk stratification on treatment decisions and clinical outcomes was beyond the scope of the current study (as outcome data were not available). In addition, it is a single-center study with a relatively small sample size. Consequently, subgroup analysis exploring the discordance between the assays in subgroups of patients based on tumor/patient characteristics such as age, grade, and tumor size was not possible.

Conclusions

The consistency of the results from this comparison of the Recurrence Score and Prosigna assays and prior studies showing that different assays vary substantially in risk assignation, indicates that genomic assays cannot be used interchangeably. In particular, since the correlation of the Recurrence Score results (which is the only assay validated to predict chemotherapy benefit) and the results of other assays is poor-to-modest, it cannot be assumed that the other assays are also predictive of chemotherapy benefit. As new assays become available, it will be critically important to understand the differences between the assays to comprehend the implications for clinical practice.