Introduction

Prognostic assessment for early breast cancer in the clinic is currently made from clinical and pathological parameters, which at present include three biomarkers: estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) [13]. Of these conventional prognostic factors, nodal status is consistently held to be the most important parameter for determining prognosis [35]. The widely referenced St Gallen consensus guidelines, for primary therapy of early breast cancer, define patients with four or more positive axillary nodes as 'high risk' irrespective of the status of any other prognostic factor [5]. From the perspective of recommendations for the use of adjuvant chemotherapy, the presence of four or more positive axillary lymph nodes defines all such patients into a group offered treatment regardless of other conventional parameters aside from performance status and age [5, 6].

It has become clear that breast cancer is in fact a collection of heterogeneous disease processes, with variable biological behavior and outcome, that current models for prognostication do not completely capture [2, 712]. Protein or mRNA expression profiling has been shown to permit the molecular classification of breast cancers via a range of techniques including cDNA microarray, quantitative RT-PCR and tissue microarray (TMA) into consistently observable groupings [710, 1219]. Each of these approaches provides prognostic information through a molecular subtype classification of breast cancer, but there is less evidence as to how these approaches compare or add to the use of conventional prognostic factors [710, 14, 17, 18]. The potential to use such methodologies, in the setting of axillary lymph node negative breast cancer, to inform the decisions regarding chemotherapy is currently being tested in prospective randomized trials [1, 1721].

We hypothesized that TMA profiling of a panel of biomarkers, either proven or potentially relevant for prognostic and/or predictive assessment of breast cancer, might permit the detection of clinically relevant prognostic groups from those with four or more positive axillary lymph nodes above that attainable from conventional factors alone. Such information might be helpful in providing treatment recommendations and prognosis but might also be helpful in the design and stratification of patients on clinical trials.

Materials and methods

Study population

The study population was derived from a TMA constructed from archival formalin-fixed paraffin-embedded specimens of 4,444 patients from the Canadian province of British Columbia. All patients had been diagnosed with invasive breast cancer without metastatic disease between 1986 and 1992, and represented 34% of patients diagnosed with breast cancer during this period [22]. Clinical and pathological information was collected prospectively through the Breast Cancer Outcomes Unit Database of the British Columbia Cancer Agency. Patients were randomly allocated into two groups of 2,222 after stratification for treatment. Inclusion criteria for this study were: female sex, known cause of death, new breast cancer diagnosis at the time of referral to the British Columbia Cancer Agency, and a known number of positive axillary lymph nodes. From this set, those patients with four or more positive axillary nodes formed the final cohorts. The 'test set' was used to define prognostic subgroups based on patterns of immunohistochemical biomarker expression. The prognostic value of the biomarker-derived subgroups was then further evaluated in the 'validation set'. The study was approved by the Clinical Research Ethics Board of the University of British Columbia.

Tissue microarray, immunohistochemistry and biomarker scoring

TMAs were constructed as described previously, requiring 17 TMA blocks [22]. TMA slides were stained for eight biomarkers by immunohistochemistry. ER (SP1, dilution 1:250), HER2 (SP3, dilution 1:100) and Ki-67 (SP6, dilution 1:200) were from Lab Vision (Fremont, CA, USA). Human epidermal growth factor receptor 1 (EGFR; PharmDx Kit, undiluted) and p53 (DO-7, dilution 1:400) were from Dako Corporation (Carpinteria, CA, USA). PR (1E2, undiluted) was from Ventana Medical Systems (Tucson, AZ, USA). Cytokeratin 5/6 (CK5/6; D5/16B4, dilution 1:100) was from Zymed Laboratories (San Francisco, CA, USA). Carbonic anhydrase IX (CA IX; M75, dilution 1:50) was a gift from Dr Stephen Chia (British Columbia Cancer Agency, BC, Canada) [23]. Biomarkers were chosen for known prognostic, and in some cases predictive, effect and relevance to biologic classification of subtypes. There were no assumptions about which would be of value for the detection of patients with good versus poor prognosis in the study cohort. Cut points to dichotomize outcome were defined prospectively as follows. ER, <1% versus ≥1% nuclei stained; PR, <1% versus ≥1% nuclei stained; EGFR, negative versus any staining; Ki-67, <10% versus ≥10% positive nuclei; p53, ≤10% versus >10% positive nuclei; CA IX, negative versus tumor and/or stroma positive; CK5/6, negative versus any staining. For HER2, TMA slides were scored by using the immunohistochemical HercepTest (Dako Corporation) scoring system. Cases with a HER2 HercepTest score of 3 were scored as positive, and those of 0 or 1 were scored as negative. Those cases with HER2 HercepTest score of 2 were re-evaluated by using fluorescence in situ hybridization (FISH) assays, and only those cases with a HER2 FISH amplification ratio of at least 2.0 were scored as HER2 positive.

The full set of eight biomarkers were not available for all patients as a result of tissue cores falling off slides during processing, insufficient or absent tumor tissue within cores, or artefactual distortion of the tissue making interpretation impossible. Stained TMA slides were digitally scanned and linked to a relational database [22, 24]. For each biomarker, images were scored visually by two pathologists, blinded to clinical outcome. An internet website was then constructed from this database by using a WebSlide-Viewer Java applet provided by the manufacturer to view the microarray images and to permit an image-zooming functionality. This website is publicly accessible [25].

Statistical analysis and result validation

Statistical analysis was performed with SPSS software, version 13.0 (SPSS Inc, Chicago, IL, USA). Univariate analysis of relapse-free and overall survival was performed with the Kaplan–Meier method, with survival differences analyzed by log rank tests. Cox proportional-hazards models were used to determine hazard ratios in univariate and multivariate analyses. P < 0.05 was considered statistically significant. The primary outcome measure for this study was of relapse-free survival (RFS); the secondary outcome measure was overall survival (OS). RFS was defined as the time from the date of diagnosis to either the first local, regional or distant recurrence or death from breast cancer before a recorded relapse. OS was calculated from the time of diagnosis to death from any cause. We used a split-sample validation technique for statistical analysis, as described previously [22]. In brief, a large data collection (n = 4,444) was randomly split into a 'test' set and a 'validation' set, each containing 2,222 observations. After exploratory analyses with the test set, selected final analyses were repeated with the validation set. Analyses with the validation set were undertaken by a different investigator from those using the test set.

Determination of mean predicted relapse-free survival outcomes

Ten-year outcomes for RFS were determined by Kaplan–Meier analysis for the test set for the overall eligible cohort and prognostic subgroups defined in this study. These were compared with the means of the predicted RFS values for each patient with respect to these same subgroups provided by the online breast cancer prognostic tool Adjuvant! (version 8.0, accessed 29 December 2006) [2628]. In determining predicted outcomes by Adjuvant! for each patient, a default option of 'average for age' was selected for the 'comorbidity' data entry point. Data for age, pathological ER status, tumor grade, tumor size, number of positive axillary nodes (four to nine versus ten or more), and type of hormonal therapy and chemotherapy used were inputted from abstracted clinical and pathological details.

Results

In the test set, the number of positive axillary nodes was known for 2,115 patients. Of these, 325 had four or more positive axillary lymph nodes, from which 313 met the remaining eligibility criteria for inclusion. Scoring was possible for all eight of the biomarkers assessed for 227 of these 313 patients. Baseline clinical, pathological and treatment details are shown in Table 1 for the 313-patient overall test set cohort and for the 227-patient subgroup with complete biomarker scores. The 227-patient subgroup did not differ from the 313-patient overall group with respect to median RFS (5.2 years (95% confidence interval (CI) 3.6 to 6.9) and 5.2 years (3.9 to 6.5), respectively) or overall survival (6.6 years (5.2 to 8.0) and 6.7 years (5.7 to 7.8), respectively).

Table 1 Frequencies of conventional prognostic factors and adjuvant treatments in the test set

Univariate analysis of conventional prognostic markers was performed with respect to RFS in the test set (Table 2). Increasing tumor grade (grade 3 versus 1 or 2), increasing tumor size, negative baseline pathological ER status, presence of lymphovascular invasion and increasing percentage of positive axillary nodes were predictive of inferior outcome with respect to RFS. In multivariate Cox regression analysis, baseline pathological ER status (P = 0.0005) and tumor size (P = 0.03) retained prognostic significance (Table 3).

Table 2 Univariate analysis of relapse-free survival for conventional prognostic factors in the test set cohort
Table 3 Multivariate analysis for relapse-free survival in the test set cohort of baseline prognostic factors

The prognostic value of eight biomarkers determined by immunohistochemistry with TMA was assessed. In univariate analysis within the test set (Table 4), increased expression of EGFR, Ki-67, p53 and CA IX, and lower expression of ER and PR, indicated poorer prognosis with respect to RFS. Increased expression of HER2 and CK5/6 did not significantly predict outcomes. In multivariate Cox regression analysis inclusive of all eight biomarkers, PR (P = 0.006), Ki-67 (P = 0.001) and CA IX (P = 0.03) retained independent prognostic significance in the test set (Table 5).

Table 4 Univariate analysis of relapse-free survival for immunohistochemical biomarkers in the test and validation sets
Table 5 Multivariate analysis of relapse-free survival in the test set for all eight tissue microarray biomarkers

Univariate analysis of RFS outcomes was repeated for the same eight biomarkers within the validation set (Table 4). In this cohort, 289 had four or more positive axillary lymph nodes and met the eligibility criteria, with 219 having data for all eight biomarkers for analysis. Biomarkers reaching statistical significance with respect to RFS in the validation set were ER, PR, HER2 EGFR, CA IX and CK5/6.

To investigate the ability to stratify patients into prognostic groups by using these biomarkers, a scoring system based on immunohistochemical scores was created to define prognostic subgroups within the test set. Among the 227 patients with scores for all eight biomarkers in the test set, we scored the dichotomized outcome for each marker as 0 for good prognosis and 1 for poor prognosis with respect to univariate analysis of RFS outcomes (that is, 1 each if ER negative or PR negative, and 1 each if positive with respect to the other six biomarkers). Each patient was therefore assigned a score from 0 to 8. Patients were then banded by this score into three groups based on scores of 0, 1 to 4, or 5 to 8. Banding was performed without assumption regarding the relative importance of each marker or weighting to any one in particular and was defined prospectively. In considering the use of adjuvant chemotherapy for these three scoring groups, an imbalance was seen with use in 35.1%, 52.6% and 73.0% of the 0, 1 to 4, and 5 to 8 scoring groups, respectively. RFS outcomes for the three banded groups were markedly different within the test set (Figure 1 and Table 6). The subgroup scoring 0 for all eight biomarkers (38 patients, 16.7%) had 10-year RFS of 75.4% (SEM 7.1%) with a median not yet reached at a median follow-up of 11.7 years. By comparison, the groups scoring 1 to 4 (154 patients, 67.8%) and 5 to 8 (35 patients, 15.4%) had 10-year RFS rates of 35.3% (SEM 4.1%) and 19.3% (SEM 7.0%), and median RFS of 4.8 years (95% CI 3.6 to 6.1) and 1.6 years (95% CI 0.8 to 2.3), respectively. Similar differences in median and 10-year outcomes were also seen with respect to overall survival (Figure 1 and Table 6), which again determined good outcome for the group scoring 0 for all eight markers.

Figure 1
figure 1

Relapse-free and overall survival by banded biomarker score in the test and validation sets. For each patient, scores for eight immunohistochemical biomarkers assessed were determined; each biomarker was scored as 1 if predicting poor prognosis in univariate analysis for that patient. Patients were then banded by scores of 0, 1 to 4, and 5 to 8. P values were obtained by log rank test.

Table 6 Relapse-free and overall survival with respect to biomarker score for test and validation sets

The same analysis was repeated in the validation set with respect to this scoring system. OS and RFS by Kaplan–Meier analysis demonstrated statistically significant differences between the prognostic subgroups; however, the difference in survival outcomes was less marked between the prognostic groups compared with the test set (Figure 1). Confidence intervals overlapped for both RFS and OS for the groups scoring 0 and 1 to 4 but were non-overlapping between the groups scoring 1 to 4 and 5 to 8 (Table 6).

After this, we compared actual RFS outcome within the test set of each banded group with the mean of the predictions for 10-year RFS outcomes determined by the online prognostic tool Adjuvant! [27, 28]. This program uses conventional prognostic factors of age, comorbidity, ER status, grade, tumor size and number of positive nodes and provides an estimated outcome with respect to different options for adjuvant systemic therapies. Consistent with previous validation of Adjuvant! in a large population-based cohort [26], mean predicted values for RFS at 10 years agreed closely with actual outcomes determined by Kaplan–Meier analysis in the overall 313-patient cohort (Figure 2) and additionally for the 227-patient subgroup with scores for all eight biomarkers (data not shown). In contrast, for the good-prognosis subgroup scoring zero for all eight biomarkers, the mean of predictions for percentage 10-year RFS by Adjuvant! was 36.7%, but a better actual outcome of 75.4% (SEM 7.1%) was in fact observed. Values for the 5 to 8 biomarker score group were 33.4% and 19.3% (SEM 7.0%), respectively, indicating an actual outcome that was worse in this group than predicted by Adjuvant!. By comparison, values for the intermediate group scoring 1 to 4 were similar at 34.4% and 35.3% (SEM 4.1), respectively.

Figure 2
figure 2

Comparison of mean predictions for relapse-free survival by Adjuvant! with actual outcomes. Predicted outcome for percentage relapse-free survival at 10 years for each patient, based on their baseline clinical and pathological factors, was determined with the online prognostic tool Adjuvant!. The means of these predicted outcomes (black bars) are shown compared with the actual outcomes determined by Kaplan–Meier analysis (white bars, ± SEM) for the complete 313-patient cohort in the test set and with respect to patients subgrouped by banded biomarker score for the eight immunohistochemical biomarkers assessed in this study.

Having seen a less impressive distinction between prognostic groups with our biomarker scoring system in the validation set, we performed exploratory multivariate analysis using the combined test and validation sets inclusive of baseline prognostic factors and each TMA biomarker. Similarly to the results in the test set, tumor size, percentage of positive axillary nodes and the TMA biomarkers PR, Ki-67 and CA IX each maintained independent prognostic significance with respect to RFS (Table 7).

Table 7 Multivariate analysis of relapse-free survival in the combined 602-patient cohort

Discussion

Early breast cancer involving four or more axillary nodes carries a poor prognosis; however, a proportion of patients do well and are cured of their disease. Ten-year RFS rates of 38% seen in this study mirror those from historical series [4]. Management decisions might be improved if prognostic subgroups can be identified.

We first investigated current conventional prognostic factors for breast cancer to assess their ability to determine prognosis in such patients. Tumor size and percentage of positive axillary nodes were the most important factors, with each retaining prognostic significance in multivariate analysis, consistent with previous data for each [4, 29]. Both reflect overall tumor burden at diagnosis, increasing risk of occult metastatic disease at diagnosis and issues of surgical resectability. Additionally, we addressed the utility of eight biomarkers to determine prognosis in this group. Of these, PR, Ki-67 and CA IX retained prognostic significance after multivariate analysis that included conventional prognostic factors. The biological relevance of each in determining prognosis must remain somewhat speculative. PR might be important in prognostication in luminal-type subclasses, which remain an indistinct area of breast cancer molecular subtype classification. PR expression may show independent prognostic value (in addition to molecular markers for genomic grade) in ER-positive breast cancers, and this seems to be mirrored in our study for heavily node-positive disease [13]. Ki-67 is probably represented here as a marker of tumor proliferation relating to intrinsic phenotypic aggressiveness, risk of occult metastatic disease and as a predictive factor for responsiveness to systemic therapies. Finally, the hypoxia-inducible gene CA IX is an established and validated poor prognostic factor in breast cancer [23, 30]. Its precise function remains inadequately determined and so the underlying biological explanation for its independent prognostic value in this cohort remains to be fully explained.

Our attempt to develop a prospectively defined prognostic scoring system, based on immunohistochemical biomarkers, for this patient group resulted in marked separation in survival outcomes within the test set cohort. In the validation set, cohort distinction in outcomes with this scoring system, although retaining statistical significance, indicated smaller differences between subgroups. This approach would therefore seem to be an imperfect method of predicting differential outcomes among those with four or more positive axillary nodes, and the scoring method described here requires refinement. Our results do, however, indicate that conventional baseline prognostic factors can be usefully augmented by the addition of information derived from molecular biomarkers in patients with heavy axillary nodal involvement, who as a group have received less attention in the age of molecular breast cancer subtyping. Options for refinement of our approach might include incorporation of other biomarkers that have been shown to predict prognosis independently of conventional prognostic factors for breast cancer, for example Bcl-2 [31]. Alternatives to immunohistochemistry for detection and expression analysis of relevant prognostic genes may also be appropriate; for example, analysis by array-comparative genomic hybridization, cDNA microarray and RT-PCR approaches have each been shown to permit prognostic classification [11, 12, 1619, 21, 3234]. The most appropriate methodology for subsequent application in the clinic has yet to be defined.

The finding in the test set that the Adjuvant! online prognostic tool predicted accurately for the overall group but did not discriminate those within different prognostic groups argues for the validation of molecular markers that can enhance such a mathematical model to individualize prognostic information further and be more sensitive to the heterogeneity of the disease. Such approaches are being prospectively tested in the axillary node negative setting [1, 1721] and we believe they also hold promise in those with heavy axillary nodal involvement.

Our internal validation approach represents one option for exploratory testing and subsequent confirmation of experimental prognostic methodologies. It is widely accepted that validation in independent cohorts, from those in which a model is originally derived, is a mandatory step in the development of prognostic methods. However, no clear consensus exists on the most robust internal method of undertaking this. Others have advocated alternatives to our straightforward approach of randomization to two cohorts, such as dividing data in a non-random way (for example by time period of patient presentation) or the use of bootstrapping or 'leave one out' cross-validation approaches [35]. The gold standard remains external validation by separate investigators, but this leaves the issue of how best to first internally validate findings.

With respect to potential limitations of our study, our TMA cohort includes patients presenting between 1986 and 1992, who were treated in accordance with therapeutic strategies that have since evolved. Overall, the figure of only 50.5% receiving chemotherapy in the test set is significantly lower than would be expected for this patient group in the modern era. Furthermore, no patients received certain treatments options that are now standard, such as trastuzumab or taxanes. However, we believe our conclusions remain valid, for three reasons. First, we have found a difference in the use of chemotherapy between the three prognostic groups created by the novel scoring system developed in our study (35.1%, 52.6% and 73.0% of the 0, 1 to 4 and 5 to 8 scoring groups, respectively). If one assumes that the chemotherapy will have improved outcome in the three respective groups, then the imbalance would in fact have biased against seeing a difference in the three prognostic groups we had created. The use of RFS as an outcome might be affected by treatment imbalance. However, we have also provided data for overall survival that essentially showed similar, if less marked, findings for outcomes with respect to the three prognostic divisions created. Second, since the mid 1970s, the British Columbia Cancer Agency has periodically circulated updated consensus provincial practice guidelines to all physicians in the province. Published data from the time span of this study confirm that the degree of compliance with provincial practice guideline recommendations for radiotherapy, chemotherapy and tamoxifen was high [36]. We believe that the same excellence for achieving management standards in the heavily node-positive disease cohort considered here can be assumed. Third, this large cohort, derived from a TMA including more than 4,400 patients and from within a single healthcare setting, comprises patients presenting during the period 1986 to 1992 and is 'population based' by nature, which is a strength of the data set. Chemotherapy should therefore be expected to be a less commonly used modality. A further question with regard to TMA-based biomarker studies is the quality of the pathological samples available and the concordance between the eligible patient cohort and those with scorable results for marker(s) of interest. In our study all eight biomarkers were scored in 227 of 313 patients in the test set cohort. Baseline pathological characteristics and survival outcomes were not significantly different in this subgroup from those in the overall group. Thus, our biomarker data are likely to be representative of the group as a whole.

Conclusion

This study demonstrates that conventional prognostic factors of tumor size and the percentage of positive axillary nodes, together with biomarkers of PR, Ki-67 and CA IX, are independent prognostic factors in breast cancer patients with four or more positive axillary lymph nodes. Our prognostic scoring system, based on the expression of eight biomarkers, identified markedly different survival outcomes in the test set, with less marked but statistically significant differences in the validation set. This study highlights the importance of validation of initial findings. Further investigation is warranted to determine how prognostic stratification can best be evolved to incorporate biomarkers to permit the development of more tailored therapeutic decision making for this patient group.