Basal-like breast cancer: molecular profiles, clinical features and survival outcomes
Basal-like constitutes an important molecular subtype of breast cancer characterised by an aggressive behaviour and a limited therapy response. The outcome of patients within this subtype is, however, divergent. Some individuals show an increased risk of dying in the first five years, and others a long-term survival of over ten years after the diagnosis. In this study, we aim at identifying markers associated with basal-like patients’ survival and characterising subgroups with distinct disease outcome.
We explored the genomic and transcriptomic profiles of 351 basal-like samples from the METABRIC and ROCK data sets. Two selection methods, labelled Differential and Survival filters, were employed to determine genes/probes that are differentially expressed in tumour and control samples, and are associated with overall survival. These probes were further used to define molecular subgroups, which vary at the microRNA level and in DNA copy number.
We identified the expression signature of 80 probes that distinguishes between two basal-like subgroups with distinct clinical features and survival outcomes. Genes included in this list have been mainly linked to cancer immune response, epithelial-mesenchymal transition and cell cycle. In particular, high levels of CXCR6, HCST, C3AR1 and FPR3 were found in Basal I; whereas HJURP, RRP12 and DNMT3B appeared over-expressed in Basal II. These genes exhibited the highest betweenness centrality and node degree values and play a key role in the basal-like breast cancer differentiation. Further molecular analysis revealed 17 miRNAs correlated to the subgroups, including hsa-miR-342-5p, -150, -155, -200c and -17. Additionally, increased percentages of gains/amplifications were detected on chromosomes 1q, 3q, 8q, 10p and 17q, and losses/deletions on 4q, 5q, 8p and X, associated with reduced survival.
The proposed signature supports the existence of at least two subgroups of basal-like breast cancers with distinct disease outcome. The identification of patients at a low risk may impact the clinical decisions-making by reducing the prescription of high-dose chemotherapy and, consequently, avoiding adverse effects. The recognition of other aggressive features within this subtype may be also critical for improving individual care and for delineating more effective therapies for patients at high risk.
KeywordsBreast cancer Intrinsic subtypes Basal-like Triple-negative Molecular profile Survival outcome Gene expression Signature Copy number aberration MicroRNA
Basal-like breast cancer
Copy number aberration
Database for annotation, visualization and integrated discovery
European genome-phenome archive
Human epidermal growth factor receptor-2
Human research ethics committee
Invasive ductal carcinoma
Invasive ductal carcinoma/medullary carcinoma
Invasive lobular carcinoma
Molecular taxonomy of breast cancer international consortium
Minimum spanning tree
Nottingham prognostic index
Research online cancer knowledgebase
Triple-negative breast cancers
Approximately 15% of all breast cancer cases are of basal-like subtype, often aggressive and highly recurrent lesions [1, 2, 3]. Basal-like breast cancers (BLBCs) are defined by the lack of expression of the hormone receptors oestrogen (ER) and progesterone (PR), and the human epidermal growth factor receptor-2 (HER2) [4, 5]. Histologically, these tumours show high grade, high mitotic indices, presence of central necrotic or fibrotic zones, pushing borders of invasion, lymphocytic infiltrate and atypical medullary features . The breast basal cell layer is also characterised by high expression of cytokeratins (CK5/6, CK14, and CK17) and epidermal growth factor receptor (EGFR), amongst other markers [7, 8, 9, 10, 11]. All these features contribute to the limited therapeutic response and therefore impact in the refractory nature of these tumours [12, 13]. Thus, patients diagnosed with BLBC have a poor prognosis and a short-term disease-free and overall survival . A better understanding of the pathophysiology and molecular basis of basal-like tumours is necessary to delineate patient outcomes.
At the molecular level, basal-like tumours are considered more homogeneous than the immunohistochemically defined triple-negative breast cancers (TNBCs), even though the terminologies are used interchangeably [1, 15]. Despite the relative molecular homogeneity, patients within this group still show divergent disease outcomes [12, 14, 16]: some patients show high mortality and recurrence rates within the first 3-5 years, in contrast to others who survive over 10 years – with no recurrence – following the diagnosis [12, 14, 16]. For the latter group, the prognosis is better than those of luminal breast cancer subtype [8, 17]. These observations suggest that BLBCs may be composed of at least two clinically distinct groups, with poor or excellent survival . The molecular characterisation of these basal-like tumours is of particular interest in medicine since it may bring new insights to the disease understanding and management. Identifying markers and mechanisms involved in the differentiation of BLBCs is therefore an essential progression towards this end. Moreover, it would allow the development of tailored treatments with more effective individual response, leading to more personalised and conservative interventions for breast cancers .
Recent investigation of TNBCs pointed to the existence of intrinsic basal-like subtypes, with distinct molecular patterns [19, 20, 21]. The stratification performed and described by Lehmann et al. (2011)  revealed the involvement of enriched cell cycle and cell division components in Basal-like 1 (BL1); growth factor signalling, glycolisis and gluconeogenesis pathways in Basal-like 2 (BL2); and immune cell processes in Immunomodulatory (IM). The authors also determined two other groups partially overlapping the basal-like subtype defined by the PAM50 classifier : Mesenchymal (M) and Mesenchymal stem-like (MSL). Alternatively, Burstein and colleagues  defined the Basal-Like Immune-Suppressed (BLIS) and Basal-Like Immune-Activated (BLIA) subtypes. The former tumour type is characterised by multiple SOX family transcription factors, while the latter is described by Stat signal transduction molecules and cytokines. More recently, Jézéquel et al. (2015)  pointed to two other groups: a basal-like with low immune response and high M2-like macrophages, and a basal-enriched with high immune response and low M2-like macrophages. All studies above described have focused on investigating the molecular heterogeneity of TNBCs, partially supporting each other.
Multi-gene models have also been applied to predict breast cancer subtype [22, 23], recurrence  and survival [25, 26]. The selection of genes across samples has generally been associated with hormonal expression levels and proliferation modules. Since BLBCs and TNBCs are hormone receptor (ER and PR) negative and highly proliferative, the prediction power of markers to further separate patients at risk within these groups is of limited value in the current models . Clinical assays independently modelling triple-negative samples have revealed superior ability in predicting outcomes of early stage tumours [28, 29]. These assays and most approaches, however, have focused on the immunohistochemically defined TNBCs [10, 30, 31]. A more robust approach for characterising BLBC outcomes is yet to be developed. Accordingly, a proper investigation of BLBCs remains mandatory and determinant for patients diagnosed within this subtype .
As the classification of TNBCs is not an ideal surrogate for defining BLBCs entities, a characterisation of basal-like tumours at the genomic and transcriptomic levels is an urgent need. In this contribution, we aim at identifying markers associated with patients’ survival using larger breast cancer cohorts from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC)  and Research Online Cancer Knowledgebase (ROCK) . Through the determination of this signature, our objective is to stratify 351 tumours into basal-like subgroups, with varying clinical features and survival outcomes, and further describe each of them. Accordingly, we plan to explore the microarray data – including gene (mRNA) and microRNA (miRNAs) expression values, and copy number aberration (CNA) measurements – to expand the molecular characterisation of BLBCs, which to our knowledge has not yet been performed. The assessment of more comprehensive profiles of BLBCs is relevant for defining groups-at-risk in clinical settings and, more importantly, for improving therapy response.
Breast cancer data sets
The METABRIC genomic and transcriptomic data sets were downloaded from the European Genome-Phenome Archive (EGA) (http://www.ebi.ac.uk/ega), under the accession numbers EGAS00000000083 and EGAS00000000122. These publicly available collections contain genotyping (Affymetrix SNP 6.0), log2 normalised gene expression (Illumina_Human_WG-v3) and miRNA expression (Agilent ncRNA 60k) arrays for over 2000 breast tumours and 144 control (non-tumour) breast samples . The original METABRIC study was approved by the ethics Institutional Review Boards in the UK and Canada (Addenbrooke’s Hospital, Cambridge, United Kingdom; Guy’s Hospital, London; Nottingham; Vancouver; Manitoba). Further analysis on this data was approved by the Human Research Ethics Committee (HREC) at the University of Newcastle, Australia (approval number: H-2013-0277).
The METABRIC cohort has a comprehensive description of patients long-term clinical and pathological outcomes. Tumour samples were assigned to a breast cancer subtype (luminal A, luminal B, HER2-enriched, normal-like, or basal-like) using an ensemble learning approach , employing the set of 50 genes defined by Parker et al. (2009) . This approach has been previously shown to improve the samples classification and subtypes’ assignement in METABRIC data set, and has revealed more consistency in terms of clinical features and survival outcomes . Based on these labels, a subset of 250 basal-like tumours was selected for analysis in this study. For training and test purposes, this subset was randomly split into two sets of equal size (125) to avoid possible bias from the original cohort. The sets are hereafter referred to as the training and validation sets.
For additional validation across platforms, we used the ROCK data set obtained at Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/), under data source number GSE47561 [33, 35]. This data set integrates ten different studies (GSE2034, GSE11121, GSE20194, GSE1456, GSE2603, GSE6532, GSE20437, GSE7390, GSE5847 and E-TABM-185) performed on the Affymetrix HG-U133A technology. The compiled matrix contains log2 RMA renormalised gene expression values for 1570 tumour samples, 101 of which are of basal-like subtype. The ROCK data set includes representative information for survival analysis, however, it lacks standard clinicopathological data which therefore has not been considered in this study.
Probe selection approach
Since the first aim of our study is to identify markers driving survival among basal-like patients, we designed a filtering technique to select a representative probe signature and reduce the bias arising from the high number of probes (48,803) and low number of samples (125) in the training set. We defined two relevant criteria to select probes, which are involved in tumour initiation and/or progression, and are also correlated to survival, as detailed below.
The Differential filter  was employed to select probes exhibiting distinct expression levels between tumours and controls. The underlying assumption is that probes truly correlated with breast cancer are linked to genomic changes or variations from healthy to cancerous tissue. We applied the Differential filter to each of the 48803 probes to test their separation power between the 125 tumours and 144 controls. This filter tests for three feasible cases: the expression levels in tumours are (a) lower than, (b) higher than, or (c) lower and higher than in control samples. The last case refers to genes that are up-regulated in some tumours and down-regulated in others, while the expression levels of controls lie between these two groups. To calculate a p-value for this case, we mirrored all expression levels on one side with respect to the mean value of controls. The separation power of each probe was defined as the minimal Wilcoxon test p-value calculated for the three cases. To determine the number of probes passing the Differential filter, we plotted the ordered log10-normalised p-values against the corresponding probe ranks. The threshold was set approximately at the point of the highest curvature of this function. This threshold is based on the naturally emerging systemic behaviour and does not require an external definition. Probes passing this filter are referred to as the differential probe set.
The Survival filter  was used to further identify probes for which the expression levels are associated with patients’ survival. This filter employs the Kaplan-Meier estimator to compute the survival probabilities. The stratification power of each probe is calculated using the Log-rank test applied to two groups of samples corresponding to quantiles with the lowest and the highest expression values, respectively. We defined these quantiles by ordering all samples by their expression values of a probe and selected samples in the first and last thirds (the quantile from 0 to 33% in the relatively under-expressed and from 67 to 100% in the relatively over-expressed group). This analysis was performed in R using the package survival . Since the survival information is not provided for all samples, this calculation was based on 115 basal-like tumour samples (from the total of 125) in the METABRIC training set. To determine the number of probes passing the Survival filter we used a similar threshold definition as for the Differential approach, i.e. by ordering the log10-normalised p-values that emerged from the Log-rank test. These probes are further referred to as the survival probe set.
Clustering basal-like tumour samples
The second aim of our study is to identify and characterise basal-like subgroups with varying disease outcomes. To this end, we performed a hierarchical clustering of samples based on the previously defined survival probe set. This procedure exploits the assumption that probes showing most variations in expression and co-expression among each other are involved in similar biological mechanisms and have a high impact on the groups delineation. To calculate the dissimilarity between the 115 samples from the METABRIC training set, for which the survival information is provided, we used the square root of the Jensen-Shannon divergence [38, 39, 40]. We then generated the hierarchical clustering with the Ward’s criterion that minimises the variance within clusters, using the R package stats .
We further examined which probes from the survival probe set contribute the most to the separation of basal-like subgroups using the Wilcoxon test. We then ordered the log10-normalised p-values to determine the probes that significantly differentiate between the subgroups by using the same threshold criterion as for the Differential filter. The purpose of this procedure is to refine the probes that best segregate basal-like subgroups of distinct disease outcome. These probes are further referred to as the probe signature and expose striking genes and cell mechanisms involved in the subgroups differentiation.
Validation across data sets
The basal-like entities were first matched to the METABRIC validation set by means of centroids computed based on the previously defined probe signature. Samples in this data set were then assigned to a subgroup according to the minimal Euclidean distance to a centroid.
Following the centroids’ normalisation, an analogous transformation of Affymetrix gene expression values was necessary to enable their direct application. Thus, we applied the same formula (Eq. 1) to the ROCK data set, where the number N of total samples is 101. The assignment to subgroups was based on the minimal Euclidean distance to a standardised centroid.
With the purpose to identify key players within the probe signature and their relation to each other, we generated and plotted a network graph using the Minimum Spanning Tree (MST) . The distance d(x,y) between two probes x and y were defined as d(x,y)=1−|ρS(x,y)|, where ρS(x,y) is the value of the Spearman correlation between the probe expression calculated for 125 tumour samples from the training set. To quantify the network analysis, we computed the betweenness centrality and node degree of each node (probe) using the package igraph  in R.
Generally, nodes with high betweenness centrality and degree values represent potential key players within the network. With regards to the centrality values, the most representative entities are highly connected to the rest of the tree; leaf-nodes have a betweenness centrality value of 0, while the most traversed nodes are assigned with the highest values (normalised up to 1). Node degree, on the other hand, is indicative of the number of direct neighbours of a node. Thus, probes with high degrees are also central (representative) for local groups with a relatively strong probe co-expression.
MicroRNA differential expression
To uncover the miRNAs differentiating the most between the basal-like subgroups, we applied the Wilcoxon test to expression values of each of the 853 probes available in the METABRIC data set. We considered those miRNAs with the emerging p-values smaller than 0.01 in both training and validation sets, as relevant for the separation between the subgroups. Both data sets were used due to the limited number of samples (146 in total) for which the miRNA expression profiles were provided. The miRNA probes were further investigated for possible target genes within the probe signature using RBioconductor (RmiR.Hs.miRNA ) across five databases: miRBase, TarBase, PicTar, MirTarget2 and miRanda. For the miRNA and gene annotation we used the packages hgug4112a.db  and illuminaHumanv3.db , respectively.
Copy number aberration profiles
To quantise the CNA information we employed the cytobands defined in the hg18 data base that corresponds to the METABRIC platform. Aberrations were divided into two categories: losses (originally denoted as homozygous and heterozygous deletions) and gains (gains and amplifications). For each basal-like subgroup we then calculated the occurrence rates of gains and losses per cytoband, and applied the Binomial test to examine the hypothesis that the CNA distributions were the same among patient subgroups.
We further calculated the Percent Genome Altered (PGA) for each of the basal-like subgroups and applied the Wilcoxon test to these rates to obtain a significance value of the difference between them. The aim of this approach is to identify stable/unstable genome profiles associated with the patient subgroups defined by our probe signature and to statistically describe whether they are consistently diverging.
Survival-related probes defining basal-like breast cancer subgroups
With the application of the Differential and Survival filters in the METABRIC training set – as detailed in “Methods” – we identified 15000 and 400 probes related to cancer initiation and/or progression, and patients survival, respectively. The corresponding probes in the differential probe set with distinct expression levels between tumours and controls showed significant p-values ranging from 2.36·10−45 to 1.53·10−7. The reduced number of probes in the survival probe set related to the individual survival had significant p-values ranging from 1.11·10−4 to 0.038. These probes, ultimately, comprise a representative signature driving the outcome of basal-like patients in the METABRIC breast cancer cohort.
The 80-probe signature related to survival
The 80-probe signature related to survival (Continuation)
Basal I and Basal II validated across independent data sets and microarray platforms
The quality of the 80-probe signature was evaluated using centroids calculated for the training set and applied to the METABRIC and ROCK validation sets. In ROCK, 55 annotated probes matched from Illumina to Affymetrix and were validated across the microarray platforms. The corresponding heat maps, in Fig. 1, showed the existence of two main basal-like subgroups, Basal I and Basal II, in both METABRIC and ROCK validation sets. The two subgroups are consistent with regards to the population size and mRNA expression levels (in G1, G2 and G3) and further support the quality of the 80-probe signature. The definition of more than two subgroups in the hierarchical clustering would lead to the separation of entities with highly similar molecular profiles.
Clinical features and survival outcomes supporting the basal-like subgroups
Clinicopathological information for patients in the METABRIC data set
41 to 50
51 to 60
≤ 2 cm
> 2 cm
2.4 to 3.4
3.4 to 5.4
MicroRNAs differentially expressed between Basal I and Basal II subgroups
MicroRNAs differentiating between basal-like breast cancer subgroups
MicroRNAs and corresponding target genes
C3AR1, CEBPA, GM2A, MIAT, SURF6, TIMP3
MXD3, PSMG3, PTCRA, PTPRC, TIMP3
C10orf2, CXCL11, KCTD15, PNPLA4, PRKCSH, RRP12, STK25
CXCL11, DSN1, FCGR2A, GPR65, IKZF3, PNPLA4
DOK2, GM2A, HSD11B1, MXD3, PNPLA4, STK25, TIMP3
C10orf2, CD24, CEBPA, EGR2, FBXL5, FPR3, HSD11B1, RASSF5, TIMP3
CD24, EGR2, PNPLA4, SH3BGRL
ASPSCR1, CASP4, IKZF1, PSRC1
CCR1, EGR2, FBXL5, MIAT
CLEC7A, DNMT3B, FCGR2A, FMO1, KCTD15, MIAT, TPX2
GARNL3, HJURP, MIS18A
CLEC7A, DNMT3B, FCGR2A, FMO1, KCTD15, MIAT, TPX2
CXCR6, FCGR2A, HSD11B1, MXD3
AIM2, BEND3, CEL, CTSK, EGR2, FBXL5, PNPLA4, PYHIN1, SNTB1, TIMP3
DOK2, HJURP, IL2RA, PSRC1, RRP12
Copy number aberration profiles further differentiating basal-like subgroups
Cytobands associated with significant CNA acquisitions
Notably, the percent of the genome being altered in the training set for Basal I was 2.74% for gains and 0.23% for losses; in Basal II it was 9.06 and 1.03%, respectively. The Wilcoxon test showed significant heterogeneity among the subgroups for the gains (p-value = 1.91·10−6) and for losses (p-value = 9.55·10−4). The same pattern was observed in the validation set for Basal I (3.58% for gains and 0.13%) and Basal II (10.46% for gains and 2.54%), also highly significant (Wilcoxon test: p-value = 1.11·10−6 for gains and p-value = 5.37·10−6 for losses). The increasing genome instability represented by increasing PGA, plotted in Fig. 5, occurred consistently, from Basal I to Basal II, with the decreasing rates of patients’ survival.
Survival-related probes defining the molecular signature of basal-like breast cancer subgroups
The basal-like subgroups defined in this study show distinct patterns in terms of tumour molecular profiles, clinicopathological features and patients survival outcomes. The characterisation of BLBCs, considering the two major entities Basal I and Basal II, is supported by the identification of the 80-probe signature, validated across Illumina and Affymetrix platforms in the METABRIC and ROCK cohorts. The importance of this signature, genes and gene-families, is defined by their functionality for each set: G1, G2 and G3. The annotated probes revealed their association with cell cycle and cell division components, immune/inflammatory regulation and metal binding, respectively, and defined Basal I (Immune Active) and Basal II (High Proliferative) subgroups. In Basal I, the over-expression of G2 probes suggests an immune activation and lymphocytic infiltration, particularly regulating tumour growth and patients’ survival. This role has been previously associated with a better prognosis and therapy response , and has the potential to stratify basal-like breast cancers. On the other hand, the over-expression of G1 cell cycle-related genes and under-expression of G3 metal binding genes in Basal II impact on cell proliferation rates and energy metabolism. In this case, the cells reproduce at a rate far beyond the common bounds of a controlled cell cycle, concomitantly with other molecular changes in metabolic processes.
The G1 genes PSMG3, HJURP, BEND3, TPX2, RRP12 and DNMT3B exhibited the highest centrality values and were over-expressed in the Basal II subgroup. HJURP, for instance, plays a central role in the maintenance of newly replicated centromeres and mitotic regulation. Increased levels of this gene in primary tumours and breast cancer cell lines have been previously correlated to decreased disease-free and overall survival . Also involved in the mitotic spindle assembly, TPX2, when over-expressed, has been associated with proliferation networks and metastasis enhancement, holding a prognostic value for breast cancer patients . Additionally, the hyperactivity of the DNA methyltransferase enzymes, or the over-expression of DNMT3B, has been further reported in BLBCs and TNBCs, where the hypermethylation events were more frequent than in other breast cancer subtypes . Hypermethylated tumours also presented decreased levels of regulatory miRNAs, including hsa-miR-29a and -29b. In particular, the under-expression of hsa-miR-29c has been marked as characteristic of BLBCs, segregating them into two subsets , which has been supported by our findings. More studies, however, are required to investigate the biological role of other representative genes, such as PSMG3, BEND3 and RRP12 in G1.
A number of G2 genes are key regulators of the basal-like tumorigenesis, such as CXCR6, HCST, C3AR1, GBP4, LY96, ANKRD22, FPR3 and FCGR2A. These genes show the highest betweenness centrality and node degree among tumours, and appeared over-expressed in Basal I. In other reports, the CXCR6 over-expression has been linked to TNBCs, with distinct roles in autoimmunity and cancer . The co-expression of CXCR6 and CXCL16, a chemokine ligand and receptor, has been associated with inflammatory response and cell migration [57, 58]. In addition, high levels of HCST [59, 60], C3AR1 , GBP4 , LY96 , ANKRD22 , FPR3  and FCGR2A , have also been related to immune activation and/or inflammatory response in tumours; however, their role in basal-like breast malignancies are yet to be uncovered. In our study, the increased expression levels of these probes, among others genes in the signature, has brought new insights on the basal-like tumour origin and progression, and Basal I and Basal II differentiation.
Standard clinical variables such as tumour size, histology and p53 status have also corroborated with the existence of the two basal-like subgroups. Basal I showed the highest frequency of medullary type, whereas Basal II exhibits the largest average of tumour size and highest frequency of p53 mutation. The interpretation of these features, in practice, support the better outcome of patients within Basal I subgroup, when compared to Basal II. Patients’ age, post-menopausal status, tumour grade, NPI and lymph node invasion, on the other hand, are of a limited value for distinguishing the subgroups. Most of these variables reflect the overall tumour aggressiveness and the subtype poor prognosis.
MicroRNA expression levels differentiating Basal I from Basal II subgroup
This work is the first instance of miRNA data coverage yielding the analysis of basal-like subgroups, which includes patients with matched genomic, transcriptomic and long-term survival data . The miRNAs have showed an important value for differentiating Basal I (15) and Basal II (4). In Basal I, hsa-miR-361-3p, -342-3p, -140-3p, -34a, -22, -142-5p, -142-3p, -155, -342-5p, -150, -29c and -29a presented increased expression relative to Basal II. Overall, hsa-miR-361-3p has been found over-expressed in TNBCs with respect to other subtypes and healthy controls ; and used to discriminate BRCA1/2 mutation carriers and non-carriers tumours . Greater levels of this miRNA, however, have been associated with a protective value in tumour progression  and further linked to inflammatory response . In line with our findings, these results contain additional information for the better understanding of basal-like subgroups. Additionally, high levels of hsa-miR-342-5p [72, 73] and -34a [74, 75] have been correlated to breast cancer decreased recurrence and increased survival; whereas low levels have been associated with cell death inhibition and therapy resistance. The hsa-miR-22 [76, 77] and members of the hsa-miR-29 family (-29a, -29b and -29c) [55, 78] – previously identified as tumour suppressors – have also been implicated in increased survival  and pointed out as promising prognostic biomarkers [77, 79].
In Basal II, hsa-miR-19b-1, -17 and -200c presented higher expression levels relative to Basal I and control samples. Tumour cells with enhanced expression of hsa-miR-19 (-19a and -19b-1) have been shown to trigger epithelial-mesenchymal transition . Notably, members of the hsa-miR-200 family have been described as major regulators of this biological process. High levels of hsa-miR-200c and -200b have been observed in circulating tumour cells from patients with metastatic breast cancers , indicating the prognostic significance of this biological marker [82, 83]. Consistent with these observations, our results demonstrated the recurrent over-expression of hsa-miR-19b-1 and -200c in Basal II, with the worst disease outcome among the two basal-like subgroups. Ultimately, high levels of hsa-miR-17 has been commonly detected in TNBCs , associated with cell migration in vitro and metastasis in vivo .
The above described miRNAs matched 50 gene-targets from the 80-probe signature. In our study, hsa-miR-200c* and -29c have been associated with HJURP expression levels in G1, hsa-miR-19b-1* with CXCR6 in G2, and hsa-miR-17 with CTSK in G3, which are among the most important genes in the signature. None of these associations, however, have been reported in the literature. On the other hand, studies have demonstrated hits on the gene regulation between hsa-miR-142-5p and CD24 , hsa-miR-29 and DNMT3B [87, 88], hsa-miR-142-3p and EGR2 , hsa-miR-150 and EGR2 , hsa-miR-34a and IKZF3 , hsa-miR-150 and MIAT , hsa-miR-342-3p and PSMG3[93, 94], hsa-miR-17 and TIMP3 . Our results further suggested an important correlation between miRNAS and gene expression values in both Basal I and Basal II, identified by this in silico approach. These and other correlations are, however, highly complex and not fully understood. Additional analysis using in vitro and in vivo models are required to validate our achievements.
Genomic aberrations further characterise Basal II and Basal I subgroups
Basal-like and triple-negative tumours exhibit the highest frequencies of genomic gains and losses in comparison to other breast cancer subtypes . Significant aberrations observed in this study confirmed the genomic instability among basal-like and further differentiated the two subgroups. The most common aberrations delineating Basal II, with respect to Basal I, occurred on the chromosomes 1, 3, 4, 5, 8, 10, 17 and X.
Gains in 1q, 3q, 8q, 10p and 17q have been identified in our analysis and previously reported in triple-negative tumours [48, 49, 50]. Overall, gains on chromosome 1q are the most frequent CNAs detected in breast carcinomas and are normally complex and discontinuous [96, 97]. Amplicons of 1q, 8p and 10p have been also described. These amplicons have contributed to the molecular understanding of this disease and, specially, of basal-like intrinsic subtype . For instance, amplifications in 8q21 have been associated with high tumour grade, high levels of Ki67 and other proliferation markers, including MYC, MDM2 and CCND1 . Gains in 10p have further differentiated triple-negative cancers , and in 17q25 have distinguished BRCA1-mutated tumours .
Losses in 4q, 5q, 8p, Xp and Xq have been defined as key aberrations within basal-like tumours in our analysis and among other breast cancer studies [20, 49]. Frequent losses in 4q and 5q in BRCA1-mutated tumours have distinguished them from sporadic neoplasms. In particular, the loss in 5q has impacted the expression of several BRCA1-dependent genes involved in DNA repair, such as RAD17 and RAD51 . High incidence rates of gains in 5q14 have also been associated with a poor prognosis in BLBCs . Other evidence suggests that aberrations on the X chromosome are common to both BRCA1-mutated and sporadic tumours .
Overall, these aberrations yielded an additional characterisation of Basal I and Basal II. The increasing PGA, or genome instability, from one subgroup to the other complemented the 80-probe signature via the transcriptomic assessment, which is still considered more representative of cellular processes at the proteomic scale . Although the identified CNA did not show a direct correlation with the 80 probes’ expression levels, generally it may lead to widespread disruptions beyond the proposed signature. Ultimately, the above described gains and losses in cytobands – supported by a range of distinct approaches in the literature – further corroborate the differentiation of basal-like subgroups with divergent clinical features and survival outcomes.
Consensus on the analysis of basal-like breast cancer subtypes: a literature overview
In this section, we further established a consensus on the description of basal-like subgroups (Basal I and Basal II) by comparing our results with other achievements across the literature [10, 19, 20, 21, 31], as per the focus of each study. Notably, most of them have centred on the classification of triple-negative entities, a more heterogeneous group than basal-like. For instance, among the six intrinsic TNBC subtypes defined by Lehmann et al. (2011) , three were considered relevant for further comparisons against the proposed basal-like subgroups: the basal-like (BL1 and BL2) and the immunomodulatory (IM). The groups were described based on cell cycle regulation, DNA damage response and immunomodulatory related-genes, respectively. These genes hint to the involvement of similar mechanisms differentiating between Basal I and Basal II, indicating that both classifications are somehow related. Genes (G1) with high node centrality values in Basal II, such as HJURP and TPX2 have been linked to aberrant proliferation networks, cell invasion and metastasis in breast cancer, in line with the definition of BL1 . In addition, genes (G2) defining the Basal I subgroup, including CXCR6, HCST, C3AR1, GBP4, LY96, ANKRD22, FPR3 and FCGR2A, have association with immune activation and inflammatory response, closer to IM . Major regulations involving these genes support the existence of the two subgroups, even though the pool of samples were considerably distinct, BLBCs and TNBCs.
In the recent classification of TNBCs performed by Burstein et al. (2014) , two groups were described: the basal-like immune-activated (BLIA) and immune-suppressed (BLIS) subtypes, corresponding to the best and worst prognosis, respectively. In BLIA, tumours display an over-expression of Stat signal transduction molecules and cytokines; in BLIS, high levels of the immunosuppressing molecule VTCN1. The mechanisms defining BLIA follow the characteristics of Basal I, and BLIS follows Basal II. For example, Basal I and BLIA  contain common genes and/or genes belonging to the same family, such as CXCL9/10/11/13, GBP4/5 and CD2/24. Similarly, Jézéquel et al. (2015)  identified two relevant subtypes: basal-like with low immune response and high M2-like macrophages (C2), and basal-enriched with high immune response and low M2-like macrophages (C3). The defined basal-like and basal-enriched groups shared evident similarities with Basal II and Basal I, respectively, and corroborated with our study in terms of probe signature and functionality. With regards to the TNBC classification, however, Lehmann et al. (2011) , Burstein et al. (2014)  and Jézéquel et al. (2015)  partially support each other.
An alternative approach to differentiating two subgroups of basal-like – associated with either a low or high risk of disease relapse – has been tested by Hallett et al. (2012) , using a 14-gene signature. Among the genes in the signature, RPL3 and GPR27 were listed as key markers of relapse, while RPL36AL and GPR65 appeared as variants in the 80 survival-related probes. In the same direction, Sabatier et al. (2011)  identified a 28-kinase metagene signature – associated with disease-free survival and immune response – used to divide the BLBCs into two groups: ‘Immune High’ and ‘Immune Low’. This approach revealed key genes, including IL2RG/B, GBP2, CCR5/7, CXCR3/5/6 and CXCL9/13, related to their family members in our signature, such as IL2RA, GBP4, CCR1, CXCR6 and CXCL11. These genes appeared over-expressed in ‘Immune High’  and in Basal I subgroup, when compared to ‘Immune Low’  and Basal II.
Integrating these observations, there is a clear consensus on the segregation of basal-like breast cancers into at least two subgroups. Basal I (Immune Active) show molecular overlaps and phenotypic similarities with BLIA , IM  and C3 ; Basal II (High Proliferative) matched with BLIS  and C2 . The comprehensive genomic and transcriptomic characterisation of the two subgroups, provided in this study, will lead to the better understanding of the mechanisms involved in basal-like tumours and to the identification of groups of patients with distinct disease outcome, supported by additional survival features [10, 31]. The latter is crucial for improving the clinical decision-making and for helping tailor treatments that are focused on the immune system manipulation and the cell cycle pathway intervention. In general, tumours with activated immune response have shown a favourable prognosis  and are likely to respond to chemotherapy , whereas the high proliferative ones have revealed increased risk of metastasis and recurrence . In this context, patients at a low risk should follow more conservative therapies and those at high risk should receive more effective drugs for improving individual response, towards a more personalised medicine.
Studies have demonstrated that the heterogeneity of BLBCs extends beyond the classic immunohistochemistry. Although several clinicopathological features have been used to discriminate between low- and high-risk patients, the identification of novel biomarkers with prognostic value remains an urgent need for improving breast cancer management. The 80-probe signature defined in this study, associated with varying survival outcomes, contains putative markers of disease progression and represents a promising asset for clinical applications. The integrated assessment of miRNA expression and CNA information, ultimately, contributes towards the definition of more comprehensive profiles of basal-like tumours. The importance of defining groups-at-risk of BLBCs is reflected in the impact of survival-related features in clinical settings and, more importantly, in therapy response.
The authors acknowledge Dr Luke Mathieson and Mr Shannon Fenn for proofreading the manuscript.
PM is supported by Australian Research Council (ARC) Future Fellowship FT120100060. This project is partially funded by ARC Discovery Project DP120102576, Australia.
PM and RB also acknowledge the support of Cancer Institute of New South Wales, Big Data Big Impact Grant 13/DATA/1-03 “The integration of bioinformatics, chemoinformatics, and toxicogenomics methods: a new approach for the identification of combination tailored therapies and novel drug targets in breast cancer.”
HHM gratefully acknowledges the financial support from Jennie Thomas Medical Research Travel Grant and Hunter Medical Research Institute (NSW, Australia).
Availability of data and material
The METABRIC data sets are hosted by the European Bioinformatics Institute (EBI) and deposited in the European Genome-Phenome Archive (EGA) at http://www.ebi.ac.uk/ega/, under accession number EGAS00000000083 and EGAS00000000122. Information on the data access can be downloaded from http://www.compbio.group.cam.ac.uk/publications/supplementarymaterial. With regards to our application, the “Data Access Application Form” was submitted in December/2012, with a project following the rules and procedures respectively established in “Data Access Agreement” and “Guidelines and Information”. The permission for downloading the microarray files was granted in February/2013.
The ROCK data set is publicly available at Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/), under data source access GSE47561. This interface integrates ten microarray data sets (GSE2034, GSE11121, GSE20194, GSE1456, GSE2603, GSE6532, GSE20437, GSE7390, GSE5847 and E-TABM-185) into a matrix containing log2 RMA gene expression information - normalised, anonymised and encoded. No application required.
HHM, IT, CR and PM participated in the study design and data analysis. HHM accomplished the major part of data interpretation. IT provided major contributions to the methodology design and data analysis. HHM and IT drafted the manuscript. The authors (HHM, IT, CR, RB and PM) contributed at all stages and critically reviewed the content.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
METABRIC: “This study makes use of data generated by the Molecular Taxonomy of Breast Cancer International Consortium. Funding for the project was provided by Cancer Research UK and the British Columbia Cancer Agency Branch.”
Primary invasive breast cancers and normal breast tissues were obtained with appropriate ethical consent from the relevant institutional review board. The study protocol, detailing the molecular profiling methodology, was approved by the ethics committees in Cambridge and Vancouver (Addenbrooke’s Hospital, Cambridge, United Kingdom; Guy’s Hospital, London; Nottingham; Vancouver; Manitoba), the two sites responsible for the molecular analysis of the samples (Curtis et al., 2012b). The data is protected and subjected to applicable international laws, which include the UK Data Protection Act 1998 the Personal Information Protection and Electronic Documents Act (Canada) (“PIPEDA”), the Freedom of Information and Protection of Privacy Act, R.S.B.C. 1996 c. 165 (“FOIPPA”) and the Personal Information Protection Act, 2003, S.B.C., c. 63 (“PIPA”).
Further ethics consent was obtained from the University of Newcastle by staff and students, from the University’s Human Research Ethics Committee (HREC). According to HREC, the project nominated “An investigation on the consensus between different genomic and transcriptomic results in breast cancer” ensures compliance with regulatory and legislative requirements and policies relating to human research. The use of this data set was approved by committee, under approval number of H-2013-0277. ROCK: This data set integrates ten different studies (GSE2034, GSE11121, GSE20194, GSE1456, GSE2603, GSE6532, GSE20437, GSE7390, GSE5847 and E-TABM-185) for which the ethics is supported individually, as per each author.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 9.Badve S, Dabbs DJ, Schnitt SJ, Baehner FL, Decker T, Eusebi V, Fox SB, Ichihara S, Jacquemier J, Lakhani SR, et al. Basal-like and triple-negative breast cancers: a critical review with an emphasis on the implications for pathologists and oncologists. Mod Pathol. 2011; 24(2):157–67.CrossRefPubMedGoogle Scholar
- 20.Burstein MD, Tsimelzon A, Poage GM, Covington KR, Contreras A, Fuqua S, Savage M, Osborne CK, Hilsenbeck SG, Chang JC, et al. Comprehensive genomic analysis identifies novel subtypes and targets of triple-negative breast cancer. Clin Cancer Res. 2014; 21(7):1688–98.CrossRefPubMedPubMedCentralGoogle Scholar
- 21.Jézéquel P, Loussouarn D, Guérin-Charbonnel C, Campion L, Vanier A, Gouraud W, Lasla H, Guette C, Valo I, Verrièle V, Campone M. Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 2015; 17(1):43.CrossRefPubMedPubMedCentralGoogle Scholar
- 24.Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351(27):2817–26.CrossRefPubMedGoogle Scholar
- 31.Sabatier R, Finetti P, Mamessier E, Raynaud S, Cervera N, Lambaudie E, Jacquemier J, Viens P, Birnbaum D, Bertucci F, et al. Kinome expression profiling and prognosis of basal breast cancers. Mol Cancer. 2011; 10(86):24.Google Scholar
- 37.Therneau T. A Package for Survival Analysis in S. version 2.38. 2015. https://CRAN.R-project.org/package=survival.Google Scholar
- 42.Dunning M, Lynch A, Eldridge M. illuminaHumanv3.db: Illumina HumanHT12v3 annotation data (chip illuminaHumanv3). [R package version 1.22.1].Google Scholar
- 43.Cormen TH. Introduction to algorithms: The MIT press (3rd edition); 2009.Google Scholar
- 44.Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006; Complex Systems:1695.Google Scholar
- 45.Favero F. RmiR. hs. miRNA: Various databases of microRNA Targets. [R package version 1.0.7].Google Scholar
- 46.Carlson M. hgug4112a.db: Agilent “Human Genome, Whole” annotation data (chip hgug4112a). [R package version 3.1.3].Google Scholar
- 48.Loo LW, Wang Y, Flynn EM, Lund MJ, Bowles EJA, Buist DS, Liff JM, Flagg EW, Coates RJ, Eley JW, et al. Genome-wide copy number alterations in subtypes of invasive breast cancers in young white and african american women. Breast Cancer Res Treat. 2011; 127(1):297–308.CrossRefPubMedPubMedCentralGoogle Scholar
- 49.Weigman VJ, Chao HH, Shabalin AA, He X, Parker JS, Nordgard SH, Grushko T, Huo D, Nwachukwu C, Nobel A, et al. Basal-like breast cancer dna copy number losses identify genes involved in genomic instability, response to therapy, and patient survival. Breast Cancer Res Treat. 2012; 133(3):865–80.CrossRefPubMedGoogle Scholar
- 57.Darash-Yahana M, Gillespie JW, Hewitt SM, Chen Y-YK, Maeda S, Stein I, Singh SP, Bedolla RB, Peled A, Troyer DA, Pikarsky E, Karin M, Farber JM. The chemokine cxcl16 and its receptor, cxcr6, as markers and promoters of inflammation-associated cancers. PLoS ONE. 2009; 4(8):6695.CrossRefGoogle Scholar
- 64.Caba O, Prados J, Ortiz R, Jiménez-Luna C, Melguizo C, Álvarez PJ, Delgado JR, Irigoyen A, Rojas I, Pérez-Florido J, et al. Transcriptional profiling of peripheral blood in pancreatic adenocarcinoma patients identifies diagnostic biomarkers. Dig Dis Sci. 2014; 59(11):2714–20.CrossRefPubMedGoogle Scholar
- 69.Tanic M, Yanowski K, Gómez-López G, Socorro Rodriguez-Pinilla M, Marquez-Rodas I, Osorio A, Pisano DG, Martinez-Delgado B, Benítez J. Microrna expression signatures for the prediction of brca1/2 mutation-associated hereditary breast cancer in paraffin-embedded formalin-fixed breast tumors. Int J Cancer. 2015; 136(3):593–602.PubMedGoogle Scholar
- 86.Venkatesan N, Deepa PR, Khetan V, Krishnakumar S. Computational and in vitro investigation of mirna-gene regulations in retinoblastoma pathogenesis: mirna mimics strategy. Bioinforma Biol insights. 2015; 9:89.Google Scholar
- 93.Czimmerer Z, Varga T, Kiss M, Vázquez CO, Doan-Xuan QM, Rückerl D, Tattikota SG, Yan X, Nagy ZS, Daniel B, et al. The il-4/stat6 signaling axis establishes a conserved microrna signature in human and mouse macrophages regulating cell survival via mir-342-3p. Genome Med. 2016; 8(1):1.CrossRefGoogle Scholar
- 97.Mesquita B, Lopes P, Rodrigues A, Pereira D, Afonso M, Leal C, Henrique R, Lind G, Jerónimo C, Lothe R, Teixeira M. Frequent copy number gains at 1q21 and 1q32 are associated with overexpression of the ets transcription factors etv3 and elf3 in breast cancer irrespective of molecular subtypes. Breast Cancer Res Treat. 2013; 138(1):37–45.CrossRefPubMedGoogle Scholar
- 98.Vincent-Salomon A, Gruel N, Lucchesi C, MacGrogan G, Dendale R, Sigal-Zafrani B, Longy M, Raynal V, Pierron G, de Mascarel I, Taris C, Stoppa-Lyonnet D, Pierga JY, Salmon R, Sastre-Garau X, Fourquet A, Delattre O, de Cremoux P, Aurias A. Identification of typical medullary breast carcinoma as a genomic sub-group of basal-like carcinomas, a heterogeneous new molecular entity. Breast Cancer Res. 2007; 9(2):24.CrossRefGoogle Scholar
- 100.Toffoli S, Bar I, Abdel-Sater F, Delree P, Hilbert P, Cavallin F, Moreau F, Van Criekinge W, Lacroix-Triki M, Campone M, Martin AL, Roche H, Machiels JP, Carrasco J, Canon JL. Identification by array comparative genomic hybridization of a new amplicon on chromosome 17q highly recurrent in brca1 mutated triple negative breast cancer. Breast Cancer Res. 2014; 16(6):466.CrossRefPubMedPubMedCentralGoogle Scholar
- 101.Johannsdottir HK, Jonsson G, Johannesdottir G, Agnarsson BA, Eerola H, Arason A, Heikkila P, Egilsson V, Olsson H, Johannsson OT, et al. Chromosome 5 imbalance mapping in breast tumors from brca1 and brca2 mutation carriers and sporadic breast tumors. Int J Cancer. 2006; 119(5):1052–60.CrossRefPubMedGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.