Introduction

Different molecular subtypes of breast cancer have been described [1]. The most profound effects on gene expression profiles in breast cancer are related to estrogen (ER), and proliferation status, and to a lesser extent to Human Epidermal Growth Factor Receptor 2 (HER2) status. Not surprisingly, molecular classification and current prognostic signatures mainly reflect these molecular features [2]. However, substantial clinical and molecular heterogeneity remains within current molecular subsets, particularly among ER, progesterone (PgR) and HER2 receptor negative (that is, triple negative breast cancers, TNBC [3]). Furthermore the relationship between clinically defined TNBC and the gene expression profile-based basal-like breast cancer subtype (BLBC) [4] is not fully defined [5]. Some authors use these two terms synonymously given the substantial overlap between the two definitions [6, 7]. However, immunohistochemical and molecular profiling studies have shown that only a subset of TNBC express the combination of basal cell markers (for example, CK5 and CK14) that is required for the molecular definition of this disease [5]. The prognostic significance and therapeutic implications of molecular heterogeneity within TNBC remains to be established. From a clinical point of view, further understanding of TNBC is important because better prognostic markers and new treatments are needed [8].

The goal of this analysis was to assemble all currently available TNBC gene expression datasets generated on Affymetrix gene chips and search for molecular structures in the data to define gene expression-based subsets within TNBC. We defined metagenes as the average expression of groups of highly co-expressed genes in the data without considering any clinical outcome variable. These metagenes identified several molecular subsets within TNBC, some with good prognosis even in the absence of systemic therapy. Our results also suggest possible new therapeutic strategies for TNBC. This study represents the largest attempt to define clinically important molecular subsets within TNBC [9].

Materials and methods

All analyses were performed according to the REporting recommendations for tumour MARKer prognostic studies (REMARK) recommendations for prognostic and tumor marker studies [10, 11] and the respective guidelines to microarray-based studies for clinical outcomes [12]. A respective diagram of the complete analytical strategy and the flow of patients through the study, including the number of patients included in each stage of the analysis, is given in Additional file 1, Supplementary Figure S1. Tissue samples of invasive breast cancer cases (dataset Frankfurt) were obtained with IRB approval and informed consent from consecutive patients undergoing surgical resection between December 1996 and July 2007 at the Department of Gynecology and Obstetrics at the Goethe-University in Frankfurt. Gene expression data have been deposited into the GEO database (accession number GSE31519).

Assembly of TNBC microarray data and definition of metagenes

In order to facilitate pooling of data sets from different laboratories we only used data from a single platform (Affymetrix U133A and U133 Plus 2.0 chips) and included only samples that were defined as triple negative based on the mRNA expression of ER, PgR, and HER2 as previously described [1315]. To obtain a large enough sample size for discovery it was necessary to pool several datasets. A major concern during this exercise is the possible confounding effect of systematic technical differences that exist between individual datasets. These could lead to false discovery during metagene definition and could also weaken the power of validation. We applied two different strategies to minimize this problem. First, we selected only highly comparable datasets for discovery. We initially identified 579 TNBC from a total of 3,488 publicly available primary breast cancer gene expression profiles representing 28 individual datasets (Additional file 2, Supplementary Table S1). We excluded 13 datasets contributing 185 TNBC cases from the discovery cohort because they did not fulfill our criteria of comparability of the microarray data (for details see Additional file 4, Supplementary Methods Section 1 and Additional file 1, Supplementary Figure S2). The final discovery cohort to identify metagenes included 394 TNBC from 15 datasets (cohort-A). The 185 samples excluded from discovery were retained as a validation set (cohort-B) to assess correlations between various metagenes and between metagenes and clinical outcome (Additional file 1, Supplementary Figure S1). This strategy maximized the integrity of metagene discovery at the cost of possibly reducing the power of the validation study. The two cohorts did not significantly differ with respect to age, tumor size and histological grade. However, the validation cohort-B contained a larger number of lymph node positive patients and a higher proportion of fine needle aspiration (FNA) samples. Follow-up data were available for 2,348 of the total 3,488 samples and 327 of the 579 TNBC samples. Since the number of patients with follow-up in validation cohort B was too small (n = 30 of 185) an additional independent validation cohort-C [16] (n = 76) was included to assess the prognostic value of the metagenes (Additional file 1, Supplementary Figure S1). The patient characteristics of the discovery and validation cohorts are given in Table 1. For analysis of normal tissue a dataset from a benign breast was used (Additional file 2, Supplementary Table S1).

Table 1 Clinical data of TNBC patients from the finding-cohort-A and the validation cohorts-B and -C

Unsupervised analysis, without input of clinical variables, was performed to identify metagenes that were defined as the arithmetical average expression of highly correlated genes. Gene clusters were selected with either a minimal membership of 10 genes and a minimal correlation threshold of 0.7, or a minimum of 25 genes and a correlation of 0.6, respectively (for details see Additional file 4, Supplementary Methods Section 2). We also employed a screen to remove genes that showed data-set bias. The dependence of the expression levels of the metagene probesets on the dataset vector was analyzed using the Kruskal-Wallis statistic (Additional file 4, Supplementary Methods Section 3). Only Stroma and Hemoglobin metagenes displayed a bias for FNA samples that reflect frequent contamination of these types of samples with blood and the lack of stromal elements compared to core needle or surgical biopsies (Additional file 1, Supplementary Figure S3 and Additional file 4, Supplementary Methods). Therefore, these two metagenes were analyzed only in surgical biopsies.

No systematic bias was observed between the U133A and U133 Plus2.0 arrays, which differ only in the spatial feature size of the probesets (for details see Additional file 4, Supplementary Methods Section 4). Both metagene distributions and "Centroid methods" were used to classify subtypes of TNBC as given in Additional file 4, Supplementary Methods Sections 8 and 9).

Survival analysis

Relapse free survival (RFS) was preferentially used as a clinical endpoint for event free survival (EFS). Only if RFS was not available in some datasets was it replaced by distant metastasis free survival (DMFS). Details on used endpoints, Kaplan-Meier and Cox regression analysis are given in Additional file 4, Supplementary Methods Section 5. Optimized cutoffs for dichotomizing of metagene scores to plot survival curves were derived from the discovery cohort and were applied without modification to the validation cohorts (Additional file 4, Supplementary Methods Section 6). All P-values are two-sided and 0.05 was considered as a significant result. Analyses were performed using the R software [17] and SPSS version 17.0 (SPSS Inc. Chicago, IL).

Results

Identification of subsets of TNBC based on metagene expression profile

In our discovery cohort we identified 16 clusters of correlated genes by unsupervised methods whose expression values were averaged as metagenes (Figure 1). As expected, no cluster of genes correlated with ER, PgR, and HER2 status [4] were identified. In contrast the identified metagenes presented in Table 2 included the basal-like phenotype [4], an apocrine/androgen receptor signaling signature [18, 19], five signatures related to different types of immune cells [4, 2025], a stromal signature [26, 27], the claudin-CD24 signature [28, 29], markers of blood [30] and adipocytes [4], as well as an inflammatory signature [3133] and an angiogenesis signature [23, 34]. These phenotypes corresponded to previously described gene signatures that have also been used to define subsets of TNBC in a recent smaller study [9]. The angiogenesis signature (VEGF metagene) has been described very recently as a "hypoxia signature" associated with poor outcome and expressed in distant metastases [34]. As shown in Figure 1, we observed the highest correlation between different types of immune cell metagenes. Similar relationships between the metagenes were detected in the validation cohort-B (Figure 1) and -C (Additional file 1, Supplementary Figure S4). The presence of B-lymphocytes in the tumor is the primary source of the expression of the B-Cell metagene that is largely composed of immunoglobulin genes [20, 22]. In contrast, immunohistochemical analyses of IL-8 expression and analysis of gene expression data of breast cancer cell lines indicate that carcinoma cells are the main source of the IL-8 metagene (Figure 2).

Figure 1
figure 1

Principal biological phenotypes identified as metagenes among TNBC. Heatmaps of expression values of the 16 metagenes (upper panels) and the 355 individual Affymetrix probe sets (lower panels) are shown for the finding cohort (left panels, n = 394) and validation cohort (right panels, n = 185). The dendrogram at the left presents the results from hierarchical clustering of the metagenes. Three major clusters were observed representing (i) basal-like, apocrine, CLDN-CD24, proliferation, and adipocyte metagenes (ii) all five immune cell metagenes, and (iii) the IL-8 and VEGF metagenes, when the hemoglobin and stroma metagenes were left out which display some dataset-bias (see methods). In keeping with these three major phenotypes the samples were sorted according to (1.) Basal-like phenotype, (2.) low vs. high B-Cell metagene, and (3.) the expression value of the IL-8 metagene. (The 355 individual Affymetrix probesets and the respective metagenes are listed in the Additional file 4, Supplementary Methods).

Table 2 Principal biological phenotypes identified as metagenes among TNBC
Figure 2
figure 2

Immunohistochemical analyses of the cellular source of expression of the B-Cell and IL-8 metagenes in TNBC. A) Detection of B-lymphocytes by a CD20 antibody (red staining) in a triple negative breast cancer from the Frankfurt cohort with high expression of B-Cell and IL-8 metagenes. B) An adjacent section of the same tumor as in (A) is stained with an IL-8 antibody demonstrating that carcinoma cells are the source of IL-8 expression (red staining). Note the strong IL-8 staining in rod-like structures in the carcinoma cells. Further analyses using antibodies specific for macrophages (CD68) also demonstrated that macrophages are not the cellular source of IL-8 expression in the tumor (Additional file 1, Supplementary Figure S15).

Relationship between TNBC and basal-like breast cancer (BLBC)

We observed a clear bimodal distribution of the basal-like metagene score among TNBC (Figure 3). This bimodal distribution allows us to derive a cutoff to separate cases into high and low expression groups by fitting two normal distributions to the data (Figure 3). According to this cutoff, 72.8%, 73.0% and 69.7% of TNBC were defined as BLBC in the discovery cohort-A, validation cohort-B, and validation cohort-C, respectively. Table 3 compares the clinical characteristics of BLBC or non-BLBC triple negative cancers the discovery cohort-A. The positive association between high histological grade (G3, P < 0.001), younger age (P = 0.004) and BLBC were also observed in the validation cohort-C and validation cohort-B, respectively (Additional file 2, Supplementary Table S2).

Figure 3
figure 3

Distribution of the expression of the basal-like metagene among TNBC of cohort-A. The bimodal distribution of the expression of the basal-like metagene among the 394 TNBC samples in the finding cohort-A is shown. A mixture (black line) of two normal gaussian distributions (blue and red lines) was fitted to these data. The interception of the two gaussians was derived as a cutoff (0.0014) for the definition of basal-like tumors. Similar results were obtained for the validation cohorts-B, and -C, as well as from all samples combined.

Table 3 Clinical parameters of TNBC with basal-like breast cancer (BLBC) or non-BLBC phenotype

In unsupervised clustering of the metagenes the basal-like metagene clustered next to the apocrine metagene but showed a strong inverse correlation (Figure 1). To quantify the correlation between the basal-like metagene and all other metagenes from Table 2 we used quartiles of the respective metagenes. Additional file 2, Supplementary Table S3 presents the six metagenes that displayed significant correlations with the BLBC phenotype in both the discovery and validation cohorts. A positive correlation was found between the BLBC phenotype and the proliferation and angiogenesis (VEGF) metagenes. A negative correlation was observed for the apocrine/androgen receptor signaling and two immune system related metagenes (MHC-2 and T-Cell metagenes), as well as an adipocyte related signature.

Since we observed a negative correlation between the basal-like metagene and potential markers of normal breast tissue, such as the adipocyte metagene, we had to exclude the possibility that we are only distinguishing stroma-rich and stroma-poor samples. As shown in Additional file 1, Supplementary Figure S5, when metagenes for proliferation, adipocytes and histones were compared between BLBC, non-BLBC, and normal breast samples it is clearly demonstrated that the non-BLBC subtype is distinct from normal breast tissues in the expression of several metagenes. Proliferation genes have been previously shown to be the most important determinant of cancer vs normal signatures [35]. Furthermore, the strong bimodal distribution of the basal-like metagene argues against the possibility that this metagene is inversely describing the degree of contamination with normal tissue which should rather result in a continuous distribution. The non-BLBC tumors in our TNBC dataset mainly represent samples of the "molecular apocrine" type (16.5%), which demonstrates the inverse bimodal distribution as the basal-like metagene, and a relatively small group of "claudin-low" tumors (6.3%). The mutual relationship of these three metagenes is shown in Additional file 1, Supplementary Figure S6.

Prognostic value of the different biological phenotypes in TNBC

To assess the prognostic value of the metagenes, we analyzed the event free survival of patients as a function of metagene expression. The basal-like metagene had no significant effect on survival (Additional file 1, Supplementary Figure S7). In contrast, five other metagenes including the IL-8, Histone, VEGF, B-Cell, and T-Cell metagenes showed significant prognostic values when considered as continuous variables in univariate analysis (Additional file 2, Supplementary Table S4). In a stepwise multivariate Cox regression analysis only three of these, the IL-8, Histone, and the B-Cell metagenes, remained significant (Additional file 2, Supplementary Table S5). The IL-8 and Histone metagenes were positively correlated with one another in all data sets (see Figure 1). The B-cell and IL-8 metagenes were associated with prognosis but with an opposing direction. Based on these observations, we derived a B-Cell /IL-8 metagene ratio as a prognostic index for TNBC. Figure 4A demonstrates that patients with a high expression of the B-Cell and low expression of the IL8 metagene have significantly better prognosis than other TNBC patients (HR 0.37, 95% CI 0.22 to 0.61; P < 0.001). The five-year event-free survival was 84 ± 4% for the good prognosis group (n = 95) compared to 59 ± 4% for the rest of the patients. In validation cohort B (n = 30), there was a non-significant trend for better survival for patients with high B-cell low IL8 metagene expression (P = 0.3, Figure 4B). Since this cohort has limited power due to the small sample size, we also tested the prognostic value on a separate and larger (n = 75) validation cohort of TNBC samples [16]. The B-cell/IL8 metagene ratio had significant prognostic value in this second validation cohort C, the hazard ratio (HR) was 0.26, (95% CI 0.10 to 0.68) and the five-year DFS was 78 ± 9% vs. 45 ± 8%, (P = 0.003) (Figure 4C). The prognostic value was independent of histological grade; Figure 4D, E shows pooled data from all three cohorts to increase sample size, (see also Additional file 1, Supplementary Figure S8 for the individual cohorts). Moreover, the prognostic value of the B-cell/IL8 metagene ratio was observed both in BLBC and non-BLBC TNBCs (P = 0.001 and P = 0.006, respectively; Additional file 1, Supplementary Figure S9). The proportion of BLBC cases was similar in the Good and Poor prognosis groups defined by the B-cell/IL8 metagene ratio (75.2% and 71.8%, respectively; P = 0.54).

Figure 4
figure 4

Prognostic value of the combined B-Cell/IL-8 metagenes among TNBC. Kaplan Meier analysis of event free survival of 297 TNBC patients with follow up from the finding cohort A. Samples were stratified according to prognostic predictor of the combined B-Cell/IL-8 metagenes. "Good" refers to 95 samples with both high B-Cell and low IL-8 metagene expression whereas all other samples (n = 202) are referred as "Poor". A) Prognostic value of the B-Cell/IL8-metagene prognostic predictor in the 30 TNBC patients with follow up from the validation cohort-B. Samples were stratified as in (A). B) Prognostic value of the B-Cell/IL8-metagene prognostic predictor in the 75 TNBC patients with follow-up from the independent validation cohort-C. Samples were stratified as in (A). C) Prognostic value of the combined B-Cell/IL-8 metagenes among the subset of high grade (G3) TNBC tumors from all three cohorts -A, -B, and -C (n = 186). Samples were stratified as in (A). (Results from the individual cohorts are given in Additional file 1, Supplemental Figure S8). D) Prognostic value of the combined B-Cell/IL-8 metagenes among the subset of low to medium grade (G1 and G2) TNBC tumors from all three cohorts -A, -B, and -C (n = 77). Samples were stratified as in (A). (Results from the individual cohorts are given in Additional file 1, Supplemental Figure S8).

To assess a potential predictive value for sensitivity to systemic adjuvant chemotherapy, the patients were stratified by adjuvant treatment. In the discovery cohort, 186 patients received no adjuvant systemic treatment and 81 patients received chemotherapy (mostly Cyclophosphamide Methotrexate Fluorouracil; CMF)). Better prognosis was observed for the high B-cell/low IL8 group in both untreated (P = 0.001) as well as chemotherapy treated patients (P = 0.05; not shown). A potential predictive value of the B-cell and IL8 metagenes was also analyzed in 191 patients with TNBC who received neoadjuvant chemotherapy. We assembled this cohort of samples with information on pathologically complete response (pCR) from seven datasets. As shown in Additional file 1, Supplementary Figure S10 the B-cell metagene had a modest predictive value with an area under the curve (AUC) of 0.606 consistent with our previous results [22]. The predictive value for the IL8 metagene was smaller (AUC -0.552). Combining both metagenes increased the AUC to 0.612 (95% CI 0.519 to 0.704; P = 0.018).

In multivariate Cox regression analysis, including lymph node status, age, tumor size, and histological grade, only the combined B-Cell/IL8-metagene score showed strong independent prognostic value in both the discovery cohort (HR 0.38, 95% CI 0.22 to 0.67, P = 0.001) and in the second, larger validation cohort-C, (HR 0.21, 95% CI 0.07 to 0.62, P = 0.005). The only other variable with borderline statistical significance (HR 0.40; 95% CI 0.17 to 0.99, P = 0.046) was lymph node status in validation cohort-C (Table 4). However, even in univariate analyses the remaining clinical variables did not show a significant prognostic value in the analyzed cohorts. This might be attributed to the fact that most TNBC are usually highly proliferating and grading is not as important for prognosis in this subtype as it is in ER positive disease; in addition, the power of our analysis may be limited to detecting the modest effect of age and tumor size on prognosis within this sample set. The inclusion of a term for chemotherapeutic treatment in the multivariate analysis further reduced the sample size to 213 patients in cohort-A (no treatment information was available for patients from validation cohort-B). Of these 213 patients only 37 were treated with chemotherapy. The combined B-Cell/IL8-metagene score remained significant (P = 0.001) in the corresponding multivariate analysis (Additional file 2, Supplementary Table S9A). Unexpectedly, chemotherapy treatment was associated with a worse prognosis probably due to chance or some form of selection bias to include higher risk patients in these public data sets (Additional file 2, Supplementary Table S9A). This selection bias is consistent with a significant higher portion of node positive patients in the chemotherapy group (P = 0.001) and a trend for a higher histological grade (P = 0.074; Additional file 2, Supplementary Table S9B).

Table 4 Multivariate analysis of EFS according to standard parameters and the combined B-Cell/IL8-metagene in TNBC

Relationship of the identified metagenes to known prognostic signatures

The correlation of several published prognostic gene signatures to the metagenes discovered within the pure TNBC cohort was analyzed by hierarchical clustering using the gene expression data from cohort-A (Additional file 4, Supplementary Methods Section 13). As shown in Additional file 1, Supplementary Figure S11, the "recurrence score" [36], "genomic grading index" (GGI) [37], and the "wound response signature" [38] display high correlation to the proliferation metagene. On the other hand the "7-gene immune response (IR) signature" [39], the "stroma derived prognostic predictor" (SDPP) [40], and the "368 gene medullary breast cancer signature" [16] were all highly correlated to immune cell metagenes. The magnitude of the correlation (R2 = 0.4 to approximately 0.7) between the different immune metagenes and the related signatures is at the same high level as the correlation between genes within other metagene clusters (R2 = 0.5 to approximately 0.7; Table 2). We demonstrated previously [22] that even if the different immune metagenes can discriminate between distinct types of immune cells, the actual infiltration of tumors generally represents a mixture of these different immune cells. In most cases, the differences in the proportions in this mixture are smaller than the global differences in lymphocyte infiltration between individual tumors. Therefore, different immune signatures often carry redundant prognostic information and can replace each other. In contrast to the immune cell metagenes no correlation between the IL8 metagene and other signatures were observed.

Discussion

It has been suggested that TNBC represent a group of several molecularly [3] and clinically [41, 42] distinct disease subtypes. We used gene expression data of a cohort of 394 TNBC to identify molecular subsets within this tumor type. The definition of TNBC was based on gene expression data which is not the standard definition used in the clinic. This might be a caveat but holds the promise that samples erroneously characterized as receptor-negative by immunohistochemistry do not introduce noise into our analysis. We identified 16 metagenes associated with several distinct biological processes that showed variable expression across TNBC (Table 2). Some of the metagenes seem to point to the distinct origins of these cancers [43, 44]. These include the basal-like [4], the apocrine [18, 19], and the claudin-low [28, 29] subtypes of TNBC. Other metagenes were related to non-neoplastic cellular constituents of the tumor microenvironment including stroma [26, 27], blood cell [30] and adipocytes [4], as well as signatures for angiogenesis [23, 34] and inflammation [3133]. Five metagenes appear to reflect the variable presence of immune cells and may contribute to the clinical behavior of the cancer [4, 2025, 27, 45] (Table 2).

Kreike et al. [9] detected similar metagenes among 97 TNBC analysed with a different microarray platform. That study suggested that the TNBC clinical phenotype can be equated to the BLBC molecular class determined by the centroid method [46] since 95% of the TNBCs were assigned basal-like molecular class [47]. However, the centroid method is highly susceptible to the composition of the dataset that is used to define the reference centroids [48] and variants of the method can lead to different results [49]. Bertucci et al. [50] identified only 71% of their 172 TNBC cases as basal-like when using a slightly different version of the centroid method for molecular classification. When we applied different versions of the centroid method to 1,364 breast cancers, 65% to 90% of the TNBC samples (n = 172) were assigned to the basal-like class depending on the method used (Additional file 2, Supplementary Table S6). In this paper we took a different approach and first identified metagenes and used these metagenes to define molecular subsets among TNBC. One of our metagenes corresponded closely to the gene signatures that are used to define BLBC in the centroid based methods. Our results indicate that BLBC defined based on the basal-like metagene expression represent around 73% of TNBC (Table 3 and Additional file 2, Supplementary Table S2).

The proportion of BLBC among TNBC in our study is similar to results from an immunohistochemical study by Rakha et al. [7] that defined BLBC by the expression of CK5/6, CK14, CK17 or EGFR. These authors observed a worse survival of the 165 patients with BLBC compared to the remaining 67 TNBC cases, which expressed none of these markers. However, we did not detect differences in the prognosis of BLBC and non-BLBC type triple negative cancers (Additional file 1, Supplementary Figure S7). In the study by Rakha et al. the prognostic effect was mainly confined to 103 untreated patients. Still, even when we analyzed untreated patients (n = 186) separately, we detected no prognostic value of the BLBC phenotype (not shown). Our results are also contrary to the immunohistochemical study of Cheang et al. [51], which used CK5/6 and EGFR antibodies for TNBC stratification. They also observed a worse prognosis of 336 BLBC TNBC compared to 303 non-BLBC TNBC. However, our study is not directly comparable to these prior reports because our definition of BLBC is fundamentally different from the IHC-based methods. Our results are in line with several other genomic profiling studies that reported limited prognostic value for the BLBC molecular class among clinically triple negative cancers [18, 19, 50].

We observed strong prognostic value for several of the other metagenes (Additional file 2, Supplementary Table S4). An improved prognosis was observed for patients with tumors displaying high expression of immune system related metagenes which supports recent reports [20, 2325, 27, 39, 40, 52, 53]. An association with decreased survival was observed for high expression of inflammation (IL-8), an angiogenesis/hypoxia signature (VEGF) [34], and histone-related metagenes (Additional file 2, Supplementary Table S4 and Figure 1). A simple combination of high B-Cell and low IL8 metagene expression identifies a subset of TNBC patients (32% of all) with a favorable prognosis and a five-year event-free survival of 84%. In multivariate analysis, only this metagene ratio and lymph node status were significant predictors of TNBC in our cohort of patients (Table 4 and Figure 4D, E). Other known prognostic factors in breast cancer, such as age, tumor size and histological grade, were not significant in our cohorts, even in univariate analysis. Most TNBC are high grade and, therefore, grade is not as important for prognosis in this subtype as it is in ER positive disease. TNBCs are also often associated with younger age but the impact of age and tumor size for prognosis within this subtype is not yet fully clear. Still it cannot be excluded that a bias in our cohort is the reason for the lack of the significance of these factors. Our analyses of neoadjuvant treated TNBC samples suggest modest predictive value of the B-cell/IL8 metagene ratio for currently used chemotherapies [22, 54] (Additional file 1, Supplementary Figure S10). We also observed a pure prognostic value in untreated patients of finding the cohort in line with other reports on B-cell metagene [24, 27]. Treatment information on the samples from the validation cohort was not available.

Our observation is important since every currently available genomic prognostic signature, (for example, the 70-gene profile [55], Recurrence Score [36], Genomic Grading Index [37]), assigns poor prognostic risk status to all TNBC samples despite their variable outcome [5658]. One of these signatures, the Rotterdam-76-gene prognostic signature [59], was developed in a way to allow prognostic stratification of ER-negative cancers. However, similar to other reports [9] we were not able to demonstrate a prognostic value for this signature (Additional file 1, Supplementary Figure S12).

We used an unsupervised class discovery approach to first identify the main molecular subtypes within the data and then assess the prognostic differences between the molecular subsets. Interestingly, when we performed an independent supervised analysis that compared TNBC cases with or without recurrence, we also identified IL-8 as the top ranked gene associated with poor prognosis (Additional file 1, Supplementary Figure S13 and Additional file 2, Supplementary Table S8). However, gene signatures obtained through supervised analysis were not superior to the molecular structure based prognostic predictions in validation (Additional file 1, Supplementary Figure S14). In addition, the biological interpretation of the empirically derived prognostic signature is more difficult than the interpretation of metagenes. In summary, we performed the largest unsupervised analysis of pooled gene expression data from TNBC. We describe a new prognostic signature for these cancers that identify about one-third of TNBC as relatively low risk for recurrence. These cancers are characterized by high B-cell and low IL-8 metagene expression and have about 84% recurrence-free survival at five-years. Whereas, this may not be sufficiently high to forego adjuvant chemotherapy, these observations pave the way to develop a clinically useful multivariate prognostic model for TNBC. A combined, prognostic score, including clinical variables, such as nodal status and perhaps tumor size, and molecular variables, such as optimized B-cell and IL-8 metagenes (measured by an RT-PCR or array-based method), may identify patients with very low risk of recurrence even with ER-, PgR- and HER2-negative breast cancer. Equally important, the prognostic importance of B-cells and the negative impact of IL-8 suggest potential novel therapeutic strategies for TNBC that can be tested in the clinic [31, 32]. It could allow the selection of those patients who could profit most from novel immune stimulating drugs like anti-CTLA-4 antibodies that have shown promise in melanoma [60, 61]. IL8 could also directly increase the survival of breast cancer stem cells after chemotherapy [62], which can be blocked with IL8 directed drugs [63]. Such an effect might explain the triple negative paradox with high relapse rates despite a good initial response to chemotherapy.

Conclusions

In the largest and most comprehensive analysis of all available gene expression data in TNBC, we first identified structures in the molecular data without considering any clinical outcome. Subsequently, these molecular phenotypes were correlated with survival in multivariate analysis, including routine clinical and pathological variables. Our most important observation is that a high B-cell presence and low IL-8 activity identifies a good prognosis group, even in the absence of systemic therapy, among TNBC. These observations directly point to therapeutic interventions, such as the inhibition of the IL-8 pathway and activation of the immune system in the tumor microenvironment that could benefit patients with this disease.