Cross-species DNA copy number analyses identifies multiple 1q21-q23 subtype-specific driver genes for breast cancer
- 1.3k Downloads
A large number of DNA copy number alterations (CNAs) exist in human breast cancers, and thus characterizing the most frequent CNAs is key to advancing therapeutics because it is likely that these regions contain breast tumor ‘drivers’ (i.e., cancer causal genes). This study aims to characterize the genomic landscape of breast cancer CNAs and identify potential subtype-specific drivers using a large set of human breast tumors and genetically engineered mouse (GEM) mammary tumors. Using a novel method called SWITCHplus, we identified subtype-specific DNA CNAs occurring at a 15 % or greater frequency, which excluded many well-known breast cancer-related drivers such as amplification of ERBB2, and deletions of TP53 and RB1. A comparison of CNAs between mouse and human breast tumors identified regions with shared subtype-specific CNAs. Additional criteria that included gene expression-to-copy number correlation, a DawnRank network analysis, and RNA interference functional studies highlighted candidate driver genes that fulfilled these multiple criteria. Numerous regions of shared CNAs were observed between human breast tumors and GEM mammary tumor models that shared similar gene expression features. Specifically, we identified chromosome 1q21-23 as a Basal-like subtype-enriched region with multiple potential driver genes including PI4KB, SHC1, and NCSTN. This step-wise computational approach based on a cross-species comparison is applicable to any tumor type for which sufficient human and model system DNA copy number data exist, and in this instance, highlights that a single region of amplification may in fact harbor multiple driver genes.
KeywordsCopy number alterations Intrinsic subtypes Driver genes Gene expression Genetically engineered mouse Network analysis
Breast cancer is a heterogeneous disease that is characterized by distinct histological forms, genetic alterations, and patient outcomes [1, 2, 3, 4, 5, 6]. Consistent with these observations, differential gene expression can distinguish molecular subtypes that separate breast cancer into distinct groups including Basal-like, Claudin-low, HER2-enriched, Luminal A, and Luminal B subtypes [2, 3, 4, 7, 8, 9]. These so called “intrinsic subtypes” are predictive of relapse-free survival, overall survival, and responsiveness to treatment [7, 8, 9, 10, 11]. Previous work highlighted numerous somatic mutations  and DNA copy number alterations (CNAs)  that are linked to specific intrinsic subtypes, suggesting that these genetic events may be causative of these subtypes. Beyond a few well-known drivers, the identification of genetic drivers present in many of these recurrent regions of DNA copy number change remains to be determined. Specifically, numerous CNAs are located on chromosome 1 and occur at high frequency among various cancer types including breast and liver [12, 14]. In breast cancer, copy number loss frequently occurs at 1p while copy number gains are frequent at 1q . Furthermore, copy number gains at 1q often encompass the majority of the 1q arm, which include hundreds of genes.
To identify additional genetic drivers of breast cancer in common regions of amplification, we have taken a cross-species conservation approach based on the hypothesis that important etiological events in breast tumors will occur both in human breast cancers and mouse mammary tumor models. Through combined DNA copy number analyses of human breast tumors and multiple genetically engineered mouse (GEM) mammary tumor models, we identified 662 CNA regions conserved between these two species. Our ultimate selection strategy also incorporated gene expression data, an RNAi screen, and a network analysis to focus the list on the most likely driver genes within CNAs. Furthermore, using published functional studies, we provide new insights on the potential implications of Basal-like tumor-specific chromosome 1 drivers, some of which are therapeutically targetable.
Breast cancer tumor datasets
Copy number array sample information of (a) human and (b) mouse tumors
Number of samples
UNC: 54, TCGA: 89
UNC: 20, TCGA: 8
UNC: 16, TCGA: 55
UNC: 35, TCGA: 213
UNC: 34, TCGA: 120
(b) Expression SigClust group
Number of samples
Cross-species assessment of subtype-specific changes in genomic DNA copy number
Computational analysis of candidate driver genes within conserved CNAs
In order to identify putative driver alterations within regions of copy number gains or losses, we began with all the conserved CNAs with a subtype segment frequency of 15 % or greater. To distinguish putative drivers from passengers, three further criteria were used. We first identified genes within a CNA that demonstrate concordance between the DNA and RNA expression. The second criterion filtered for conserved CNAs that contained genes with a breast cell line RNAi-associated phenotype as published in the Solimini et al. 2012 RNAi screen on human mammary epithelial cells . The third criterion was to identify top ranking genes when scored using DawnRank . By combining all these features together, we further decrease the false positive genes by filtering out genes without functional implications (Supplemental Table 3). A more extensive and detailed “Methods” section can be found as Supplemental File 1.
Subtype-specific breast cancer copy number landscapes
In order to identify both known and novel genetic drivers of breast cancer on the DNA copy number level, we developed a multi-step and multi-platform computational strategy (Fig. 1). This strategy is predicated on using a “cross-species” comparative genomics approach where we searched for spontaneous copy number events across two different species (human and mouse). For this study, we created a new murine genomic resource of 73 mammary tumors profiled by both gene expression and DNA copy number microarray data (GSE52173); this new resource complements our human data set that contains 644 human breast tumors that have both gene expression and DNA copy number data (GSE52173 and http://tcga-data.nci.nih.gov/tcga).
We began using gene expression data to identify subtypes, separately for human tumor samples and GEM mammary models. For clarity, we refer to the classification of mouse tumors as “groups” to distinguish them from human classes that are termed “subtypes”. Using the PAM50  algorithm and the Claudin-low predictor  we assigned each of the human tumor samples within the dataset to a specific intrinsic breast cancer subtype (Table 1). However, since there is no established expression-based classifier for mouse mammary tumors, we performed a supervised hierarchical cluster analysis of the murine mRNA expression data using the Herschkowitz et al. 2007 intrinsic mouse list of 866 genes. SigClust  analysis was used to identify 7 significant mouse groups (Supplemental Fig. 1), which were given a unique group name based on the majority mouse model contributor in that group (i.e., Myc, Neu/PyMT, Wnt1, C3Tag, Mixed, p53null-Basal, and p53null-Luminal). The “Mixed” mouse group lacked a single dominant mouse model contributor, however, this group comprised mouse tumors that all demonstrate the previously described Claudin-low gene expression features [18, 19], and hence forth this mouse group is referred to as “ClaudinLow”.
We next analyzed the human DNA copy number landscape in the combined UNC/TCGA breast cancer dataset (Supplemental Fig. 2; Supplemental Table 5). Our results, not surprisingly, were consistent with previous publications [6, 12, 13]. For example, our analyses confirmed previously identified breast cancer copy number gains of 8q that is common and present irrespective of breast cancer subtype, as well as a number of subtype-specific CNAs. For instance, we again identified Basal-like-specific DNA copy number losses at 4q, 5q, and gains of 10p; Luminal A-specific copy number gains at 16p; Luminal B-specific copy number gains at 17q; and a Luminal-associated (encompassing both Luminal A and Luminal B) copy number loss at 16q (Supplemental Fig. 2; Supplemental Table 5) [6, 12, 13, 20, 21]. The HER2-enriched subtype contained few subtype-specific CNAs, noting that the HER2/ERBB2 amplicon was not a HER2-enriched subtype-specific copy number gain event as it also occurred in many Luminal tumors. Additionally, the Basal-like subtype contained the highest number of subtype-specific CNAs (Supplemental Table 5). In contrast to what was observed in the mouse groups, human tumors on average demonstrated more frequent subtype-specific regions of copy number loss compared to copy number gains (Supplemental Table 5).
Comparisons of copy number landscapes of mouse and human breast tumors
The extent to which mouse models of breast cancer recapitulate human phenotypes has been examined at the gene expression level [18, 19, 20], as well as on the copy number level, albeit only in a much smaller subset of these data . We examined sub-chromosomal events and compared human subtype-specific copy number landscape plots to mouse group-specific landscape plots and identified shared cross-species CNA events [after re-ordering the mouse chromosomal landscape into human chromosome order (see “Methods” section)]. We first selected for “conserved regions”, which were DNA segments/regions that were altered at high frequency (≥15 %) and in the same direction (i.e., amplified or lost) in both human and mouse copy number landscapes. Applying this selection criterion reduced the search space for potential subtype-specific drivers more than 2-fold, leaving a total of 662 conserved regions when all mouse groups and human subtypes were considered (Supplemental Fig. 3; Supplemental Table 7).
In comparison among subtypes, the Claudin-low subtype had the fewest number of conserved regions (and the fewest CNAs overall) (Supplemental Table 7). Conversely, the Basal-like subtype contained the most conserved CNAs; however, this may be due to the fact that the Basal-like subtype also contained the most subtype-specific CNAs (Supplemental Table 7). Consistent with a previous publication , shared Basal-like-specific and murine p53null-Basal-specific regions of DNA copy number loss was observed spanning human 4q31-q35.2 and encompassing INPP4B, and also spanning 14q22.1-23.1 (Supplemental Table 7). By comparing shared sub-chromosomal CNAs between the human Basal-like subtype and all mouse groups, we noted that the C3Tag mouse group contained the most human Basal-like-specific copy number amplified regions, while the p53null-Basal mouse group contained the most human Basal-like-specific copy number loss regions (Supplemental Table 7). Both of these mouse models were previously shown to have the Basal-like tumor gene expression phenotypes [18, 20], therefore, for this study, we largely focused on copy number commonalities between human Basal-like tumors and these two mouse groups.
Identification of Basal-like tumor chromosome 1 amplification driver genes
In order to identify the driver(s) present on chromosome 1, we next applied our filtering criteria outlined in Fig. 1. Of the 120 chromosome 1 conserved CNAs, 79 contained at least one gene that showed DNA–RNA concordance (Supplemental Table 8); 25 CNAs contained at least one RNAi-identified essential gene (Supplemental Table 9), and 20 CNAs contained genes showing DNA–RNA concordance and a RNAi-identified essential gene (Supplemental Table 10). Interestingly, all 20 CNAs were copy number gained segments, even among the 1p CNAs (Supplemental Table 10).
To further study the biology of the conserved chromosome 1 genes, we performed a cohort-based DawnRank  analysis using genes from human chromosome 1. DawnRank uses gene–gene interaction networks to measure the impact of genomic alterations on the differential gene expression of downstream genes in the network. Then, DawnRank scores (as previously described ) the level of perturbation on the gene interaction network caused by the alteration (either amplification or deletion) of the gene of interest. We selected human chromosome 1 gene blocks with shared synteny with the mouse genome for the DawnRank analysis. There were 7 such gene blocks, totaling 1509 genes (Supplemental Table 11). Using the chromosome 1 syntenic regions, we identified 44 chromosome 1 genes that represented the top 5 % DawnRank scores (Supplemental Table 12) using DNA copy number changes as the input “mutation” features along with the gene expression for each human tumor sample. The 44 DawnRank genes mapped to 9 copy number gained segments, which also harbored genes with DNA–RNA concordance, or an RNAi-identified essential gene (Supplemental Table 10). Within the 9 CNAs, encompassing a total of 182 potential genes, only 3 genes met all four filtering criteria of (1) subtype-specific CNA, (2) DNA–RNA concordance, (3) a RNAi “GO” gene, and (4) a DawnRank hit: these genes were phosphatidylinositol 4-kinase (PI4KB), src homology 2 domain containing (SHC1), and nicastrin (NCSTN) (Fig. 3; Supplemental Table 10).
The three chromosome 1 potential driving genes span 1q21-q23 and are altered with an average segment subtype frequency of 47 % (Supplemental Table 10). Interestingly, PI4KB and SHC1 span 1q21, falling less than the average Basal-like subtype segment length apart (Fig. 3), thus suggesting that on chromosome 1q21-23 multiple target genes lie within a single amplicon. Furthermore, SHC1 is in a subtype-specific high frequency altered segment among Basal-like tumors only (Fig. 3; Supplemental Table 5), while NCSTN and PI4KB CNAs appeared across multiple subtypes, passing the significance threshold in the Basal-like and Luminal A subtypes (Supplemental Table 5). However, NCSTN and PI4KB also passed the significance threshold for the p53null-Luminal, p53null-Basal, and C3Tag mouse groups (Supplemental Table 7), the last two of which are models linked to human Basal-like disease as determined in previous gene expression comparative studies [18, 19].
Notch pathway features in 1q21-23 amplified Basal-like breast cancers
In breast cancer, there are many copy number gains and losses, a few of which like amplification of ERBB2, are of known clinical and biological significance. Over the years, many of these CNAs have been studied and candidate genes identified [12, 13, 27, 28, 29, 30], but there are still many regions for which the genetic drivers remain unknown. The simultaneous analysis of DNA copy number change in both human and mouse tumors, and their corresponding gene expression patterns, provides for a biologically meaningful way to identify important regions of CNAs. The basic hypothesis being that a CNA found to spontaneously occur in two different mammalian species breast cancers is being repeatedly selected and must therefore contain an important tumor causing gene(s).
Although many studies have identified frequent CNAs within groups of human breast tumors [13, 21], most do not functionally narrow down the candidate genes within a specific segment. In addition to the mere presence of a highly frequent CNAs being identified across species, we took a biologically based approach to refine the list of genes within a given segment into a subset of candidate driver genes. These analyses prompted the development of a new bioinformatics tool (SWITCHplus) to identify and highlight subtype-specific DNA copy number events using a visual display in a user-friendly format. Using this tool and a systematic data-mining schema that includes identifying regions that show: (1) shared DNA CNAs cross-species, (2) concordance between mRNA expression and relative DNA copy number value, (3) functional effects in a genome-wide RNAi screen, and (4) functional effects in a network analysis (i.e., DawnRank), we identified a limited number of CNAs that harbored potential breast cancer driver genes. From these analyses, we identified human chromosome 1q21-23 as a region of amplification consistently present in human and mouse Basal-like tumors, and which contains at least three potential driver genes (Fig. 3).
The first of these three genes, PI4KB encodes for a lipid kinase member of the phosphoinositide signaling pathway. The phosphoinositide signaling system regulates cell migration [31, 32, 33]and proliferation [31, 32, 33], and activation of this signaling pathway is observed in many aggressive tumors [33, 34, 35]. Specifically, phosphatidylinositol 4-phosphate is utilized by phosphoinositide kinases, such as PI3KCA, to signal to downstream protein kinase targets including AKT and PDK1 [33, 35, 36]. In the 2012 TCGA publication on breast cancer, it was noted that Basal-like cancers showed high activity of the PIK3CA/AKT pathway, and that these tumors tended to show few PIK3CA mutations, but frequent loss of PTEN and/or INPP4B (negative regulators of the pathway) and amplification of PIK3CA and AKT3 (positive regulators of the pathway) . Here we show that yet another positive regulator of the pathway is amplified in Basal-like cancers.
SHC1 encodes for a member of the Shc family of adapter proteins. SHC1 is composed of multiple protein domains that can bind to multiple transmembrane receptors including phosphorylated insulin-like growth factor 1 receptor, and the platelet-derived growth factor receptor (PDGFR), thus potentially activating multiple pathways involved in cell proliferation and differentiation [37, 38]. Specifically, SHC1 is a key signaling mediator, and can act as a scaffold between an activated receptor and downstream signaling proteins . In addition, growth factor signaling through PDGFR is known to occur in many TNBC , and thus SHC1 amplification may be contributing to these key signaling processes.
NCSTN encodes for a component of the GSC, which is a multi-protein complex that cleaves a number of transmembrane proteins to typically activate their functions [41, 42]; the GSC targets include Notch 1–4, ErBB4, CD44, and E-cadherin [24, 41, 42]. Importantly, Hu et al. 2002 demonstrated, in Drosophila, that NCSTN provides structural support and is required for GSC cleavage of Notch receptor . In our data, when Basal-like tumors were examined, those with copy number gains at NCSTN showed (1) perturbation/activation of the Notch pathway via the DawnRank network analysis (Fig. 4), (2) significantly higher expression of NOTCH1 and NOTCH3 (Supplemental Fig. 4c), and (3) high expression of other markers of the Notch pathway (Supplemental Fig. 4d). Further support for Notch pathway importance comes from previous mouse model experiments where genetic inactivation of a negative regulator of Notch signaling (i.e., lunatic fringe) resulted in Basal-like mammary tumors . Interestingly, Notch activity is also higher in Basal-like breast cancer cell lines compared with Luminal breast cancer cell lines . In vitro, by RNAi-mediated silencing of NCSTN in the TNBC cell line MDA-MB-231, Filipović et al. 2011 showed reduced transcription of Notch pathway targets, and a reduction in cell motility and invasion . In total, these results strongly suggest that activation of Notch pathway signaling is occurring within Basal-like/TNBC tumors, and we now provide additional evidence for a mechanistic explanation for this in vivo.
Other investigators using different computational approaches have also identified this region, but identified other genes (i.e., NIT1 and PVRL4) as potential drivers . The observed differences in potential driver genes are mostly likely due to the “filtering criteria”, where we focused on species conservation, and they focused on somatic mutation targets. It is clear that a multitude of targets and drivers are present, and that 1q21-23 is a region that is the target of selection as opposed to any single gene being the target of selection. In conclusion, our work here provides an objective analysis path for identifying potential driver genes using a cross-species computational approach, which can be applied to any tumor type for which sufficient mouse and human tumor data exist.
This study was supported by Funds from the Initiative for Maximizing Student Diversity Grant 5R25GM055336, the UNC Bioinformatics and Computational Biology Diversity Fellowship, the NCI Breast SPORE Program Grant P50-CA58223-09A1, RO1-CA148761, and the Breast Cancer Research Foundation. We would like to thank the Translational Breast Cancer Research Consortium for providing some of the tumor specimens used for gene and copy number profiling, which came from TBCRC001.
Conflict of interests
C.M.P. is an equity stock holder, and Board of Director Member of BioClassifier LLC., C.M.P., and J.S.P. are also listed as inventors on a patent application on the PAM50 assay.
- 24.Lombardo Y, Filipović A, Molyneux G et al (2012) Nicastrin regulates breast cancer stem cell properties and tumor growth in vitro and in vivo. Proc Natl Acad Sci USA 109:16558–16563. doi: 10.1073/pnas.1206268109
- 26.Sarajlić A, Filipović A, Janjić V et al (2014) The role of genes co-amplified with nicastrin in breast invasive carcinoma. Breast Cancer Res Treat 143:393–401. doi: 10.1007/s10549-013-2805-6
Open AccessThis article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.