Introduction

Previous gene expression profiling studies of human breast tumors have shaped our understanding that breast cancer is not one disease, but is in fact many biologically separate diseases. A classification of tumors by expression profiling into five distinct groups (Luminal A, Luminal B, HER2-enriched, Basal-like, and Claudin-low subtypes) has added prognostic and predictive value to the existing repertoire of biomarkers for breast cancer [16]. For many cancers, improper maintenance of genome stability is a major cause of tumorigenesis and thus, the characterization of the tumor genomic DNA landscape is an important avenue of investigation [7]. Array comparative genome hybridization (aCGH) studies of tumor copy number states have demonstrated that tumors with similar gene expression subtypes may also share similar DNA copy number aberrations (CNA) [812] and that CNA can be used to further sub-divide expression classes [12]. In breast cancers, genomic instability-driven tumorigenesis is most prevalent in the Basal-like subtype (also referred to as triple-negative breast cancers), where the majority of tumors exhibit many CNA [913]. Identifying the genes that contribute to this instability phenotype would be useful not only from a biological perspective, but also possibly as a clinical predictor of therapeutic response.

Methods

A detailed description of all methods is provided in the “Supplemental Methods” section, while here we provide an abbreviated methods section for the major new approaches.

Breast cancer patient datasets

For the genomic studies, three patient datasets were used, each containing gene expression and DNA copy number microarray data. We combined two sets into a single training set (n = 180 with expression and copy number) so that we could have increased statistical power to detect subtype-specific CNA. The combined training set included breast tumors from the United States (“UNC”) (n = 77) and tumors from Norway (“NW”) (n = 103). The third data set (“Jonsson”) was used as a validation/testing set (n = 359) [14]. All samples were collected using IRB-approved protocols. Data is available from Gene Expression Omnibus series GSE10893. Sample information including clinical data, subtype, source, GEO Sample ID, and overlap with copy number information can be found in Supplemental Table 1.

Assessment of tumor genomic DNA copy number changes

77 UNC and 103 NW samples had normal and tumor DNA samples each assayed using the Infinium Human-1 109K BeadChip (Illumina, San Diego, CA, USA). Sample information is provided in Supplemental Table 1 and LogR (A+B signal) values can be found on GEO series GSE10893, platform GPL8139. To determine regions of copy number aberration (CNA), we developed a new analysis method that is a modification of the SupWald method [15, 16]; we created an R suite of functions called “SWITCHdna”, which can identify breakpoints in aCGH data. SWITCHdna detects transition points that maximize the F statistic and have regions on either side of the breakpoint that are larger than the user-defined range. Following detection of the transition points, a segment average value and corresponding z-score are determined, along with the number of observations used. The end results are the identification of segments of CNA, along with a quantitative value for that copy number change (i.e., loss or gain).

A significance filter is applied to the raw SWITCHdna-identified segments in order to reduce noise and increase the probability of identifying biologically relevant regions. All subsequent plots and tables were produced after applying this significance filter to our data. SWITCHdna is provided as a source script in R [17] and available for download at: https://genome.unc.edu/pubsup/SWITCHdna/.

Determining subtype-specific CNAs

Using the cnaGENE function of SWITCHdna, the segment output file was converted into an indicator matrix, where for each sample, each gene’s copy state was represented as −1 = loss, 0 = no change, 1 = gain. For each subtype, the counts of gains and losses were compared versus all other samples in order to identify subtype-specific CNAs. A Fisher’s exact test was performed on the subtype versus rest counts for each gene. The resulting P values were adjusted by the Benjamini-Hochberg method [18] to correct for multiple-hypothesis testing and genes with P values <0.05 were then gathered for each subtype. Regions within the cytobands of localized CNA were determined by the significant genes found within each cytoband (Supplemental Table 2).

Supplemental methods

Numerous additional methods, and more detail on SWITCHdna is provided in the Supplemental Methods section. These methods include details on the cell lines used, RNAi knockdown experiments, and other cell biology type experiments performed here.

Results

Identifying subtype-specific regions of copy number aberration

To identify CNA that might be causative of Basal-like breast cancers, we assembled a dataset of 180 tumors with Agilent gene expression microarrays and Illumina 109,000 SNP marker DNA copy number microarrays (UNC-NW). We classified each tumor into one of five previously defined expression subtypes using the published intrinsic subtypes (i.e., PAM50) and Claudin-low subtype predictors [5, 6]. To identify regions of copy number gain/loss, we developed a new segmenting method called “SWITCHdna” (Sup Wald Identification of copy CHanges in dna). Specifics of the SWITCHdna method can be found in the “Supplemental Methods” and at https://genome.unc.edu/pubsup/SWITCHdna/.

SWITCHdna-identified regions/segments of copy number gains and losses in each tumor, which were then aggregated based on subtype to look at the frequency of each copy number event in each subtype and identify regions specific to each subtype (Fig. 1; Supplemental Table 2). A heat map display of the copy number data is provided in Supplemental Fig. 1. A number of new findings were observed including the first aCGH characterization of the Claudin-low subtype (Fig. 1b). Despite its high grade and similarity to Basal-like tumors [5, 6], Claudin-low tumors showed few copy number changes, which may correspond to the previously described ER-negative and copy number neutral tumor subtype reported in Chin et al. [19]. In addition, human Claudin-low cell lines, which are often called “Basal B” lines, also have a similar flat copy number profile of showing very few chromosomal abnormalities [20].

Fig. 1
figure 1

Copy number frequency plots from SWITCHdna show regions of aberrations shared by members of the same subtype. Gray shading indicates regions of change with the y-axis representing frequency of aberration at each site within each subtype. Regions in black were statistically associated with a particular subtype and remained significant after Benjamini-Hochberg correction. Regions below the center (negative values) represent losses, and areas above the center (positive values) indicate gains. a Basal-like, b Claudin-low, c HER2-enriched, d Luminal A, e Luminal B, and f Normal-like. g Expanded view of the Basal-like copy number landscape. INPP4B, MAP3K8, FAM107B, and ZEB1, each in Basal-like specific regions of CNA, are marked. BRCA1, BRCA2, PTEN, RB1, and TP53, are genes/regions that were frequently, but not specifically, lost in the Basal-like subtype, and KRAS, which is frequently but not specifically gained in the Basal-like subtype, are also noted. The dashed horizontal lines indicate 50% gain or loss. h Enlarged view of the Basal-like chromosome 5q region showing the location of RAD17, MSH3, RAD50, and RAP80. Loss frequency is indicated on the y-axis and the level of 50% loss is highlighted by the horizontal line

We next searched for CNA occurring specifically within each subtype (Fig. 1a–f, black shading). The Basal-like subtype had the most subtype-specific events (Fig. 1a, g) including the previously described amplicon at 10p containing MAP3K8, ZEB1, and FAM107B [13, 21, 22], 16q loss [23], deletion of 5q11–35 [10], and deletion of 4q. This last region contains INPP4B, which has recently been identified as a potential tumor suppressor involved in the inhibition of PI3K signaling [24] and that is selectively lost in Basal-like/Triple-negative breast cancers [25].

Basal-like tumors have previously been observed to have copy number loss and/or low expression of genes involved in BRCA1 DNA damage repair [26], and we noted that loss of 5q11–5q35 would delete several genes involved in BRCA1-dependent DNA repair including RAD17, RAD50 [27], and RAP80 (Fig. 1h). Closer examination of the pattern of loss of these genes revealed that each gene was rarely lost as an individual event, but predominantly lost as a pair or triplet (Table 1a). These doublet or triplet losses occurred at the highest rates in the Basal-like subtype, but also occurred less frequently in the HER2-enriched subtype. These paired or triplet losses were not simply due to loss of the entire chromosomal arm as >65% of the analyzed tumors did not show a loss pattern indicative of such an event and several samples had intervening regions of normal copy number. Loss of 5q11–35 was also found to statistically co-occur with CNA of other regions including 10p amplification (~50%), INPP4B/4q31.21 loss (~40%), PTEN/10q23.31 loss (~40%), BRCA1/17q21 loss (~50%), and most frequently loss of RB1/13q14.2 (~80%) (Table 1e), which are genes/regions that have all previously shown to be associated with Basal-like breast cancers.

Table 1 Frequency of copy number alterations data for the UNC-Norway combined dataset for selected (a) deletions, (b) amplifications, (c) average number of changes, (d) % Tumor Cellularity, and (e) co-occurrences

In order to validate these subtype-specific findings observed in the UNC+NW dataset, we classified the samples in Jonsson et al. [14] according to PAM50 and Claudin-low subtype predictors and performed similar supervised analyses using their BAC-based DNA copy number data; very similar associations between CNA and subtypes were observed (Table 2). Jonsson et al. identified six unique tumor subtypes based upon CNA landscapes, which we determined were highly correlated with our expression-defined intrinsic subtypes (P value <0.001, Table 3); importantly, there was high overlap between our Basal-like subtype and their Basal-complex phenotype, both of which showed the frequent loss of 5q11–35 and amplification of 10p.

Table 2 Frequency of copy number alterations data for the Jonsson dataset [14] for selected (a) deletions, (b) amplifications, (c) average number of changes, and (d) co-occurrences
Table 3 Comparison of Jonsson et al. copy number based classifications versus intrinsic subtypes

Increased genomic instability of tumors associated with loss of specific regions/genes

To objectively assess “genomic instability”, we calculated a loss/normal/gain value for every gene using the SWITCHdna assigned copy number states, and calculated the levels of genomic instability by subtype using the average number of gains/losses per sample on a gene by gene basis. The Basal-like subtype was the most prone to aberrations, while the Claudin-low and Luminal A subtypes showed the lowest number of gene-based CNA (Table 1c). To control for a large number of genes being gained or lost by a large single genomic aberration event (i.e., whole chromosome loss), we also calculated the average number of SWITCHdna-defined segments and their length for each subtype, as more genomic breaks will result in more segments. The subtypes that had greater numbers of gene aberrations were also the same ones that had more SWITCHdna segments of shorter average length (Table 1c). Thus, the increased number of aberrant gene-based events in the copy number unstable subtypes was due to more frequent aberrations in the genome, rather than as a large number of genes gained or lost by a few large-in-size aberration events.

Tumors with loss of PTEN/10q23.31, RB1/13q14.2, or TP53/17p13.1, or amplification of the 10p region were also found to have high rates of total gene-based CNA compared to tumors without loss of these genes (Table 4). Loss of 5q11–35 was also associated with the highest numbers of CNA, with the greatest instability seen when all three DNA repair genes were lost.

Table 4 Examination of possible correlations between the specific CNA and overall genomic instability

Low expression of genes residing in Basal-like regions correlates with poor survival and predicts therapeutic response

To determine if these DNA loss events also impacted gene function, we determined whether the mRNA levels of candidate genes contained within these regions correlated with DNA loss. The expression of ten genes selected based on their associations with the basal-like subtype, or breast cancer in general, was evaluated. Most showed significantly lower mRNA expression when the genomic DNA was lost including RAD17, RAD50, RAP80, MSH3, RB1, PTEN, BRCA1, and INPP4B (Fig. 2); these data suggest that these losses have functional consequences (noting that only TP53 and BRCA2 did not show in cis correlation between expression and copy number). It is also of note that MSH3 (a gene involved in DNA mismatch repair), located within the 5q11–35 loss region (between RAD17 and RAD50, Fig. 1h), and it also showed reduced mRNA expression when lost and low expression within Basal-like tumors in general (Figs. 2, 3e). In addition, the mRNA expression levels of RAD17, RAD50, MSH3, RAP80, INPP4B, and PTEN were lowest in the Basal-like subtype (Fig. 3, UNC337 expression dataset [5]); thus loss of 5q11–35 likely affects multiple aspects of DNA repair.

Fig. 2
figure 2

Gene expression values for RAD17, RAD50, RAP80, MSH3, BRCA1, BRCA2, PTEN, RB1, TP53, and INPP4B in the UNC-Norway dataset (n = 180) separated by copy number status (DNA copy number loss vs no loss). P values determined by ANOVA test

Fig. 3
figure 3

ANOVA boxplots for individual genes that are commonly lost in Basal-like cancers according to intrinsic subtype determined using the UNC337 sample set. P values were determined by 2-way ANOVA. a RAD17, b RAD50, c RAP80, d PTEN, e MSH3, and f INPP4B

Using patient survival data from two additional data sets containing gene expression data (UNC337 [5] and NKI295 [28]), Kaplan–Meier analysis showed that the low average expression of RAD17+RAD50 was associated with worse outcomes compared to high expression (Fig. 4a). A similar trend was observed with INPP4B, mirroring previous observations (Fig. 4b) [24]. RAD17+RAD50 expression was also examined for treatment effects using the Hess et al. [29] data set, which examined T/FAC neoadjuvant chemotherapy responsiveness across 130 breast cancer patients. Low expression of RAD17+RAD50 was correlated with pathological complete response (pCR) (ANOVA P value <0.0001). This finding may be due to the association between low expression of RAD17+RAD50 and Basal-like tumors, as Basal-like tumors have also been shown to have high neoadjuvant chemotherapy pCR rates [30, 31].

Fig. 4
figure 4

Survival analysis according to expression of RAD17+RAD50 and INPP4B. Patients in the UNC337 and NKI295 data sets were ranked ordered organized by average gene expression values of a RAD17+RAD50 combined, or b INPP4B. The patients were split into thirds based upon rank order expression values and Kaplan–Meier analysis was done on the three groups to examine trends in relapse-free survival and overall survival. P values determined by log-rank test

Knockdown of RAD17±RAD50 affects sensitivity to chemotherapeutics and BRCA1 foci formation

Given the involvement of RAD17, RAD50, and RAP80 in the BRCA1-DNA repair pathway, we determined whether disruption of these genes via RNAi knockdown would lead to changes in sensitivity to drugs whose mechanism of action has already been linked to BRCA1 loss like carboplatin/cisplatin [32, 33] and PARP inhibitors [34, 35]. RAD17 was stably knocked down with shRNA in the HME-CC cell line (an hTERT-immortalized Human Mammary Epithelial Cell) [36] and knockdown was confirmed by Western blotting (Fig. 5a). HME-CC cells with RAD17 knockdown exhibited increased sensitivity to ABT-888 (PARPi) and carboplatin (Fig. 5c). No difference in paclitaxel sensitivity was observed, which was used as a non-DNA-damaging agent control. A RAD50 knockdown line did not exhibit any change in sensitivity to ABT-888 and had a paradoxical increase in resistance to carboplatin. We next emulated the most common in vivo co-occurring loss by generating a double knockdown of RAD17 and RAD50, which showed the greatest increased sensitivity to ABT-888 and carboplatin (Fig. 5c). Similar results were observed when this experiment was repeated in ME16C cells, a second hTERT-immortalized human mammary epithelial cell line (Supplemental Fig. 2).

Fig. 5
figure 5

RNAi knockdown experiments in an immortalized HMEC (BABE cell line). Western blot analysis showing reduction of RAD17 and RAD50 protein expression in HME-CC a single, or b double RNAi knockdown lines. (KD knockdown line, C vector control line). Tubulin staining was performed as a loading control. c Estimated IC50 with 95% CI for ABT-888, Carboplatin, and Paclitaxel based on mitochondrial dye-conversion assay. Results are based on the average of two experiments per condition, each done in triplicate, with knockdown-control pairs with significant differences in IC50 are designated with a *

In order to assess the effects of RAD17/RAD50 loss on BRCA1-dependent DNA repair, we performed a DNA repair foci formation assay on the control and RAD17+RAD50 double knockdown line. Using anti-BRCA1 protein immunofluorescence, and automated foci counting within geminin-positive cells, we observed a significant decrease in the number of BRCA1-containing DNA repair foci in the double knockdown line when treated with ionizing radiation or ABT888 versus control (Fig. 6); cells were simultaneously stained for geminin in order to control for differences in proliferation as described by Graeser et al. [37, 38]. These data suggest that loss of RAD17 and/or RAD50 may impair BRCA1 function, and could contribute to increased sensitivity to DNA-damaging agents.

Fig. 6
figure 6

BRCA1-mediated DNA repair foci formation assay. a Representative images of BRCA1 foci formation in RAD17-RAD50 double knockdown cells and control cells after treatment with 2.5 Gy of ionizing irradiation and 20 min recovery (ionizing radiation), or no treatment (untreated). b Representative images of BRCA1 foci formation in RAD17-RAD50 double knockdown cells and control cells with 200 μM ABT-888 (ABT-888), or no treatment (untreated). Green channel BRCA1, Red channel Geminin, Blue channel DAPI images. All images were taken with a 63× objective and post processed to 300% of their original size. Automated BRCA1 foci counting results from each cell line for c ionizing radiation and d ABT-888 treatment. Error bars represent 95% confidence intervals (*P < 0.05 of knockdown relative to control). P values were calculated from t tests comparing foci counts in treated double knockdown cells versus treated control cells or untreated double knockdown cells versus untreated control cells

Discussion

The presence of distinct breast cancer expression subtypes suggests different underlying genetic events may be driving each subtype. To address this hypothesis, we used 180 diverse tumors and performed supervised analyses of their tumor DNA copy number landscape and identified subtype-specific copy number events. Many studies have identified numerous regions of gain and loss in human breast tumors [9, 10, 14, 23, 39]; however, most did not specifically search for regions uniquely associated with specific intrinsic subtypes. Some previous attempts were made to identify basal-like specific CNA [10, 22] and we observed a number of the same findings. We take these previous findings as validation of our identified regions, and we build and expand upon these here, along with the addition of functional studies.

Overall, we identified many subtype-specific CNA and validated these findings on a second, independent dataset. Here we have focused on the Basal-like subtype, which showed by far the greatest number of subtype-specific CNA and were the most genomically unstable as determined by the sheer number of CNA, a feature which has been observed in the past [9]. Basal-like tumors also showed consistent loss of 4q (which harbors INPP4B and FBXW7), and 5q11–35, which contains many DNA repair genes. Basal-like tumors are known to be associated with BRCA1-pathway dysfunction in that 80–90% of BRCA1 mutation carriers, if and when they develop breast cancer, develop Basal-like tumors [3, 40, 41]; however, in most sporadic Basal-like tumors, the BRCA1 gene appears normal in sequence [42]. The loss of 5q11–35 may provide an alternative means to impair BRCA1-pathway function and explain why despite many Basal-like patients having normal BRCA1 gene/protein, high levels of genomic instability and a “BRCAness” phenotype are observed in Basal-like tumors. Previous evidence indicates a link between genes involved in BRCA1 DNA damage control and genes that are deleted and downregulated in Basal-like cancers, lending further credence to our hypothesis [26].

In order to expand our understanding of the relationship between the Basal-like subtype and impaired BRCA1-pathway function, we pursued functional studies by RNAi-mediated knockdown of two members of the pathway, RAD17 and RAD50, in order to emulate the genomic losses observed in tumors. Besides being members of the BRCA1-pathway, others have highlighted these genes for their possible Basal-like association, but without functional studies [10, 27]. We show here that genetic ablation of these genes results in impaired DNA repair and increased drug sensitivity, and furthermore, deletion of RAD17 and RAD50 in yeast has also been shown to result in increased sensitivity to DNA-damaging agents including platinum drugs (http://fitdb.stanford.edu) [43]; these data highlight that there is an evolutionarily conserved role for these genes in DNA repair.

By building upon the discovery of the subtype association and the deletion phenotypes in yeast, we propose a role in DNA repair function for the 5q11–35 region. The drug sensitivity assays show the importance of these genes in DNA damage sensitivity and the foci formation experiments show that their function is mediated through BRCA1. In addition, from the combination of our genomic analyses and functional data, it is our hypothesis that the somatic loss of RAD17, RAD50, and/or RAP80 leads to impaired BRCA1-pathway function, impaired homologous recombination mediated DNA repair, and thus, contributes to overall genomic instability.

There are, however, two caveats to these analyses and our hypothesis. First, the 5q11–35 loss is a large region that typically involves >100 genes, therefore, we cannot definitively say that loss of these three genes is the target of this deletion, or that these three genes are the most important targeted genes of this region. Second, a high frequency of co-occurrence with other DNA chromosomal losses happens in tumors with 5q11–35 loss; for example, in ~80% of tumors with 5q11–35 loss, RB1/13q14.2 DNA loss also occurs (and by itself is associated with increased genomic instability). In addition, ~60% of these tumors show TP53/17p13.1 loss (Table 1, 2). The co-occurrence of 5q11–35 loss with RB1 and TP53 loss are likely causative events in Basal-like carcinogenesis (the latter two being corroborated by mouse studies) [4446]. Given the high co-occurrence of chromosome region losses that are not physically linked, it is impossible to say which one is the cause of the genomic instability. However, our hypothesis is that each of these regions harbors genes needed for maintenance of the genome and that the combinatorial loss of 2–3 of these regions is what results in the genomic instability phenotype seen in Basal-like breast cancers. In this article, we examine DNA losses, but do note that it is possible that loss of these same genes could also occur via methylation, altered microRNA regulation, and/or somatic mutation, although the last of these has yet to be found when searching current somatic mutation databases for RAD17/RAD50/RAP80. Preliminary sequence analysis of RAD17 and RAD50 (data not shown), as well as evaluation of previous breast cancer sequencing efforts [47] and the COSMIC database [48], revealed few, if any, somatic variants/mutations in these two genes, which is consistent with the finding that loss of any one gene is rarely seen; thus, if loss of two or more genes is the target of this CNA, then somatic mutation of any one gene would not impart a selective tumorigenic advantage. Therefore, these data suggest that the target of 5q11–35 loss is two or more genes in this region, with loss of RAD17 and RAD50 likely contributing to genomic instability.

Conclusions

The gene expression-defined intrinsic subtypes of breast cancer are mirrored by DNA copy number changes. The Basal-like subtype is the most distinct in the copy number landscape world, and these subtype-associated CNA have clinical implications. If 5q11–35 loss results in impaired homologous recombination mediated DNA repair, as was suggested by our in vitro studies and in vivo correlates, then the loss of this region may sensitize tumors to specific classes of DNA-damaging agents. Based upon BRCA1 studies in vitro [49, 50] and in vivo [32, 34], these drugs could include PARP inhibitors and cis/carboplatin. Loss of RAD17+RAD50 (mRNA and/or genomic DNA) may thus be a biomarker of chemotherapy responsiveness, which is supported by our finding of an association for predicting a likelihood of achieving a pathological complete response. We hypothesize that the loss of these DNA repair genes and the 5q11–35 region, contributes to genomic instability and mutability, ultimately causing high proliferation rates and aggressive behaviors. Our integrated studies of gene expression and genomic DNA copy number have identified important pathway-based determinants of Basal-like cancers and a possible therapeutic biomarker.

All relevant gene expression and copy number data new to this manuscript can be found in the GEO database under series GSE10893.