Introduction

The management of breast cancer patients is still guided based on a constellation of clinicopathological features, including prognostic markers derived from careful histo-pathological analysis of tumours, namely tumour size, histological grade, presence of lymph node metastasis and vascular invasion [13]. Despite the huge amount of resources allocated to translational research endeavours, only three predictive markers are utilised to define the therapy of breast cancer patients: oestrogen receptor (ER) and progesterone receptor (PR), the predictive markers of response to endocrine therapy, and human epidermal growth factor receptor 2 (HER2), the molecular target of trastuzumab and lapatinib. These parameters are then used in conjunction either in the form of guidelines (for example, St Gallen's consensus criteria) or included in multivariable algorithms (for example, Adjuvant!Online) for clinical decision making [13]. Albeit seemingly simplistic, this approach has been shown to be clinically relevant, given that predictions made with Adjuvant!Online do correlate with the actual outcome of breast cancer patients [4], and, most importantly, the use of this framework to define the systemic therapy of breast cancer patients has contributed to the steady decline in the mortality of breast cancer patients [5]. Although eective, this approach is not sucient for the potential of individualised therapy to be realised.

The promise of high throughput technologies, and in particular of gene expression profiling with microarrays, has been of apocalyptic dimensions [69]. The objectivity of the methodology coupled with the elaborate, if not mind boggling [10], bioinformatic approaches to answer clinically relevant questions have led some of the proponents of this technology to compare histopathology with some rituals practiced by ancient tribes [7], and some experts in the field predicted back in 2000 that microarrays would make conventional diagnostic techniques obsolete [6].

Microarrays and their derivatives have undoubtedly contributed to our understanding of breast cancer (for reviews, see [1, 2]). They have provided direct evidence to demonstrate that breast cancer is a heterogeneous disease at the molecular level [11], that ER-positive and -negative diseases are fundamentally different [1114], that molecular subtypes of breast cancer do exist [11, 1518], and that some special histological types of breast cancer are distinct entities at the molecular level [1922]. Furthermore, they have led to the development of a molecular taxonomy that is currently being tested in clinical trials [16], and of prognostic 'gene signatures', some of which have already been approved by the US Food and Drug Administration [1, 2, 13, 23].

Molecular taxonomy

From a conceptual standpoint, the development of a molecular taxonomy [11, 1518] for breast cancer has reshaped the way breast cancer is perceived. According to this classification, breast cancers can be subdivided into luminal tumours, which are ER-positive, express ER-related genes and are reported to be subclassified into A and B according to the expression level of proliferation-related genes [1, 2, 15, 16, 24]; HER2 tumours, which express HER2 and genes related to the HER2 amplicon; normal breast-like cancers, which are still poorly understood and are reported to express genes usually found in normal breast samples; and basal-like cancers, which largely lack expression of ER, PR and HER2, and express genes usually found in basal/myoepithelial cells of the breast [1, 2, 11, 1518, 24]. The terms luminal A, luminal B, normal breast-like, basal-like and HER2 have become part of our lexicon. The approach pioneered by the Stanford group, however, has some important limitations. First, our recent re-analysis of the methods for the identification of the molecular subtypes of breast cancer (that is, single sample predictors) demonstrated that only basal-like cancers can be reliably identified [25]; the classification of samples into the other molecular subtypes is dependent on the methods used for the classification of the samples and the agreement rates between different methods is modest [25, 26]. In fact, even when the authors of the molecular taxonomy themselves classified the same cohort of breast cancer patients (that is, NKI-295 [23]) using two different methods [27, 28], one by Sorlie and colleagues [17, 27] and the other by Hu and colleagues [15, 28], the agreement was only moderate (Kappa scores = 0.527 (95% confidence interval 0.456 to 0.597)). Second, there are several lines of evidence to suggest that normal breast-like cancers may constitute an artefact of gene expression profiling (that is, samples with a disproportionately high content of normal breast epithelial cells and stromal cells) [16, 25]. Third, given that the subdivision of luminal tumours into A and B is driven by the levels of expression of proliferation-related genes and that several studies have demonstrated that proliferation in ER-positive cancers is a continuum rather than a bi-modal distribution, this subclassification of luminal cancers is likely to be arbitrary [1, 2, 12, 14, 16, 25, 29]. Fourth, the HER2 molecular subtype neither comprises all cases classified as HER2-positive with clinically validated methods (that is, immunohistochemical analysis and chromogenic/fluorescence in situ hybridisation) and not all HER2-positive cancers by clinical methods are classified as HER2 subtype by microarrays [16, 25, 30]. Therefore, for the microarray-based molecular taxonomy of breast cancer to be incorporated into clinical practice, standardisation of the definitions and the methodologies for the identification of the molecular subtypes and prospective clinical trials to validate the contribution of these five molecular subtypes in addition to the current clinicopathological parameters for prognosis prediction of breast cancer patients are required, and this is yet to be achieved.

Prognostic gene signatures

The development of microarray-based prognostic gene signatures was heralded as a major breakthrough for the management of breast cancer patients [1, 2, 8, 9, 13, 3133]. It was thought then that these signatures would provide a more objective assessment of the risk of relapse of breast cancer patients and would be more reproducible than the methods currently used [1, 2, 8, 9, 33]. The first prognostic gene signatures (that is, the 70-gene signature also known as Mammaprint® [13], and the 76-gene signature [31]) were developed to be applied to all breast cancer patients. Their performance in the training and validation datasets demonstrated objectively that the prognostic information provided by these signatures is indeed independent of the information provided by tumour size, presence of lymph node metastasis and histological grade [1, 2, 32]. Subsequent to these initial stories of success, several groups developed their own prognostic signatures either employing bottom-up or top-down approaches (for reviews, see [1, 2]). In addition, independent groups developed microarray signatures to capture the information provided by histological grade [34, 35].

Following the initial enthusiasm with microarray-based prognostic gene signatures, re-analyses of the initial studies on cancer prognosis with microarrays have revealed that the overlap between gene signatures was negligible; that these first generation signatures were not stable in terms of their gene composition [36, 37]; and that these gene signatures were time dependent (that is, their prognostic power is substantially reduced from 5 to 10 years of follow-up) [1, 2, 3638]. These observations have led to a wave of (over)scepticism, with an expert in the field of biomarker discovery and validation stating that '... on close scrutiny, in five of the seven largest studies on cancer prognosis, this technology performs no better than flipping a coin. The other two studies barely beat horoscopes" [39]. Fortunately, with the greater availability of microarray datasets in public repositories, meta-analyses performed by independent groups revealed that different gene signatures identify similar groups of patients as of poor outcome; that the assignment of cases as of poor outcome is based on the expression of proliferation-related genes; that these first generation signatures only have discriminatory power in ER-positive disease; and that proliferation is perhaps the strongest determinant of outcome in ER-positive disease [12, 14, 28, 40].

In parallel with the development of microarray-based gene signatures, a 21-gene signature based on quantitative real time RT-PCR was developed through a re-analysis of microarray datasets and a review of the literature [41, 42]. This signature, named OncotypeDX™(Genomic Health, Redwood, CA, USA) was developed and validated through a retrospective analysis of formalin-fixed, paraffin-embedded material from the prospective clinical trials B-20 and B-14 [41, 42] (for reviews, see [43, 44]). OncotypeDx™has been shown to be prognostic in ER-positive tumours, but also identifies those patients who are likely to benefit most from chemotherapy [4144]. Therefore, it can be used to determine which patients should receive endocrine therapy or a combination of endocrine plus chemotherapy. This test is only offered for central analysis in the Genomic Health laboratories and has been shown to be robust, so much so that it has been recommended for the management of breast cancer patients by the American Society of Clinical Oncology (ASCO) guidelines on the use of tumour markers in breast cancer, and in the National Comprehensive Cancer Network (NCCN) guidelines for breast cancer treatment, as a predictor of recurrence for ER-positive, lymph node-negative breast cancer patients. Despite the important contribution of OncotypeDx™for the management of breast cancer patients, it should be noted that this test is meant to be used in conjunction with the clinicopathological prognostic factors [2, 4144]. Furthermore, there is evidence to suggest that the prognostic power of OncotypeDx™, in a way akin to the other first generation signatures, largely if not exclusively stems from the quantitative analysis of the levels of expression of proliferation-related genes [1, 2, 14].

Despite the controversies above, the question that remains germane is whether molecular profiling offers more than the information provided by clinicopatho-logical parameters and a handful of immunohisto-chemical markers. This was in part addressed by Dunkler and colleagues [45], who re-analysed the data from the cohort employed to validate the 70-gene signature and demonstrated that the contribution of this signature to the prognostication of breast cancer patients above and beyond that offered by the clinicopathological parameters was minimal. Furthermore, a recent comparison of the prognostic information provided by OncotypeDx™or four immunohistochemical markers (that is, ER, PR, HER2 and Ki67 - a proliferation marker) semi-quantitatively assessed in the material from the ATAC (Arimidex, Tamoxifen, Alone or in Combination) prospective trial demonstrated that these four markers would at least be equivalent to OncotypeDx™[46].

Conclusion

Taken together, it would be fair to say that, currently, molecular profiling does provide additional prognostic and to some extent predictive information to the current clinicopathological features and immunohistochemical markers routinely used. However, this information benefits a limited number of patients, is restricted to patients with ER-positive cancers, and seems only to constitute a reproducible and quantitative analysis of tumour cell proliferation. Th erefore, pathologists should strive for developing robust and reproducible methods for the assessment of proliferation (for example, a standardised Ki67 immunohistochemical protocol and scoring system). Although the enthusiasm with micro-arrays has waned, this technology has provided an incremental step towards the individualisation of therapy for breast cancer patients. It is probable that this goal will be achieved through the integration of different layers of high-throughout data (that is, transcriptomics, proteomics, functional genomics). Furthermore, the development of massively parallel sequencing approaches [47, 48] and their application to the study of breast cancer is likely to provide information that will constitute another quantum leap in the way we perceive this complex disease, and help develop more accurate prognostic and predictive tests to each subgroup/subtype of breast cancers.