Mouse Models of Breast Cancer Share Amplification and Deletion Events with Human Breast Cancer
- First Online:
- Cite this article as:
- Rennhack, J., To, B., Wermuth, H. et al. J Mammary Gland Biol Neoplasia (2017) 22: 71. doi:10.1007/s10911-017-9374-y
Breast tumor heterogeneity has been well documented through the use of multiplatform –omic studies in human tumors. However, there is no integrative database to capture the heterogeneity within mouse models of breast cancer. This project identifies genomic copy number alterations (CNAs) in 600 tumors across 27 major mouse models of breast cancer through the application of a predictive algorithm to publicly available gene expression data. It was found that despite the presence of strong oncogenic drivers in most mouse models, CNAs are extremely common but heterogeneous both between models and within models. Many mouse CNA events are largely conserved in human tumors and in the mouse we show that they are associated with secondary tumor characteristics such as tumor histology, metastasis, as well as enhanced oncogenic signaling. These data serve as an important resource in guiding investigators when choosing a mouse model to understand the gene copy number changes relevant to human breast cancer.
KeywordsCopy number variation Mouse model Breast cancer Gene expression Metastasis
Genomic instability, including point mutations, translocation, and gene copy number alteration in key oncogenic signaling genes, is an underlying driver of breast cancer development and progression. Gene copy number changes, containing amplifications such as HER2  and MYC  or deletion events such as PTEN , are key biomarkers of tumor onset [4, 5, 6, 7], histology , metastatic potential , and treatment response [10, 11]. The amplification of HER2 in 20–30% of breast cancer patients results in significantly more aggressive tumors and is an important prognostic marker in a patient’s ability to respond to anti-HER2 therapy such as trastuzumab. Despite the success of HER2 therapy, the majority of patients with the amplification event will have primary or acquired resistance to trastuzumab , indicating potential heterogeneity in these tumors.
To investigate tumor heterogeneity, large multiplatform studies such as The Cancer Genome Atlas (TCGA)  and Metabric  projects have begun to integrate transcriptional data and genomic data, as well other data platforms. Traditionally, breast cancer diversity has been classified at the transcriptional level into six basic subtypes: basal, luminal A, luminal B, HER2 positive, claudin-low, and normal like [14, 15]. The integration of gene copy number showed that there are a number of classical gene copy number alterations that are associated with each tumor subtype. For instance, this revealed that MYC amplification occurs in all breast cancer subtypes, but MYC is only transcriptionally active in the basal subtype . This study underscores the importance of integration of multiple platforms to understand tumor heterogeneity.
In order to understand the function of oncogenic drivers, research has employed mouse models. There are a variety of methods to induce breast cancer in a mouse model. These range from tissue specific overexpression using promoters including mouse mammary tumor virus (MMTV) or Whey acidic protein (WAP) promoters to drive oncogene expression [16, 17, 18, 19], to conditional knockouts of tumor suppressive genes through the use of a tissue specific Cre [20, 21, 22] or inducible system [23, 24] and carcinogen induced model such as DMBA treatment. Furthermore, models have been used to investigate particular aspects of tumor development such as metastasis via MMTV-PyMT  or genomic instability with loss of p53 . Given these varied methods and drivers of tumor formation, the transcriptional program in each model would be expected to be unique. Importantly the recent advent of patient derived xenografts (PDX) models has given a new option. These models have been shown to reflect their human counterpart at a genomic  and transcriptional level; however, other common mouse models of breast cancer have not been described in such a manner.
Recent work has captured the gene expression diversity between models and within tumors of the same model [28, 29, 30]. However, these works do not describe multiple levels of genomic diversity in mouse models like the TCGA and Metabric projects do in human tumors. Largely this is due to a lack of multi-platform “-omic” studies present for the mouse models. Small scale studies that integrate CNA with expression data across species have identified a unique CNA in Basal like breast cancer . The lack of such profiling on a large scale leaves researchers relatively uninformed about genomic changes present in a mouse tumor when choosing a mouse model which is representative of a specific subtype of human breast cancer.
Here we describe a large scale investigation of copy number changes in 600 tumors across 27 mouse models of breast cancer for the use of the ACE algorithm . In short, the ACE algorithm predicts CNA from gene expression data through the use of a weighted mean of gene expression across a given genomic region. Due to its reliance on consistent regulation across an entire genomic region, it has been shown to accurately predict copy number variants and has been shown to have consistent results to traditional genetic predictors of gene copy number in tumors . This predicted CNA across our dataset demonstrated wide heterogeneity across mouse models of breast cancer. Interestingly, consistent CNA changes were noted in the microacinar histological subtypes of breast cancer indicating a role in copy number changes and a tumor’s histological phenotype. Moreover, in an important observation we noted that CNA was associated with breast cancer metastasis and enhanced oncogenic signaling in both mouse models and human breast cancer.
Identification of Gene Copy Number Alteration in Mouse Models of Breast Cancer
We have also shown the translational effects of copy number gains and losses in human tumors through investigation of Reverse Phase Protein Array (RPPA) data associated with EGFR (Supplemental Figure 1A) and FOXO3 (Supplemental Figure 1B) amplification events. This analysis shows that in both EGFR and FOXO3 the protein level is directly correlated with gene copy number. This validation of the ACE software across the TCGA dataset shows that the events predicted in this manuscript are an understatement of the events in the tumor. However, the false low false positive rate shows that the events called in the manuscript can be used for predictive purposes and begin to show the copy number profile in mouse models of breast cancer.
To investigate the presence of copy number alterations in mouse models we applied the ACE algorithm to gene expression data from a normal mammary gland from an FVB Wildtype mouse (Fig. 1c) and an MMTV-Neu derived tumor from the same background (Fig. 1d). As expected, no CNA was identified in the control mammary gland. In contrast, the MMTV-Neu sample is characterized by a large amplification event on chromosome 3 as well as a large deletion on chromosome 4. This deletion is consistent with previously published findings of chromosome 4 loss in MMTV-Neu mouse models . In addition to these major CNA events, there are a number of smaller CNAs throughout of the genome of the sample.
We then hypothesized as a further check that unstable models would have significantly more CNAs than oncogene induced models. To investigate this hypothesis, we tested for CNA in multiple samples from unique mouse models of breast cancer. The ACE algorithm was applied to tumor samples from MMTV-Myc, MMTV-PyMT, MMTV-Neu, TAG, and DMBA treated models derived from the FVB/NJ mouse background (Fig. 1e). ACE analysis showed genomic stability in the MMTV-Neu driven mouse models. This model had significantly fewer amplification or deletion events than more classically unstable models such as the TAG (p < .05) and DMBA (p < .05). This is mirrored in human cancer where certain tumors such as Basal tumors are shown to be more unstable than other subtypes especially Luminal A . It was also noted that mice with the same oncogenic initiation event such as PyMT had a difference in copy number based upon the background of the mouse model (Supplemental Figure 2). We noted that the FVB background was the most unstable when compared to the AKXD background in PyMT driven tumors and the Balb/C background in the TAG or various p53 driven models.
Conservation of CNA Variability in Mouse Models
Composition of dataset
Number of Samples
GSE3165, GSE30864, GSE20416, GSE10193, GSE23938
GSE23938, GSE8828, GSE3165
This data shows the vast majority of genes across the genome are amplified or deleted in less than 5% of the total samples with a few distinct regions being amplified or deleted in a larger fraction of samples. However, we did identify regions of instability that were conserved across models. We identified a number of genes that were both amplified and deleted in greater than 10% of mouse models. Specifically, we identified Gsn is amplified 10.9% of samples and deleted in 11.4% of samples. Other genes that were amplified and deleted at a high level included Cct4, Hnrnpab, Cp, Cklf, Cenpo, and Dnm2. These genes are all located at regions previously described in the mouse genome to be unstable . The percentage of amplification or deletion for all genes can be found in Supplementary Table 1.
Given the extensive heterogeneity in breast cancer, we sought to test the hypothesis that heterogeneity was present at the level of gene copy number within individual mouse models of cancer. The extent of heterogeneity of CNAs within a tumor model was analyzed by examining the fraction of mice within a model with a given amplification or deletion event at a particular locus (Fig. 2b, Table S1). This analysis revealed a large degree of heterogeneity within models. Most models have the majority of loci amplified or deleted in less than 50% of the sample within a model. This is despite that fact that many tumor samples are driven by the same oncogenic driver and are biological replicates. Some of the genes that were amplified in greater than 50% of samples within a given tumor model represent key genes in tumor development, progression and metastasis including well known genes such as Cdkn2, Mmp23, Sumo2, and Adcy33. Interestingly, conserved CNA events were not seen to span models, reinforcing the genomic diversity both within each model system and between the model systems. In addition, we noted some models with more copy number events, such as the p53 induced models.
These CNA changes were then divided into amplification events (Supplemental Figure 3A) or deletion events (Supplemental Figure 3B) to reflect the copy number diversity in each model. This revealed that mouse tumor models largely fall into three categories. First, we observed unstable models with a high degree of amplification or deletion in a large number of genes but with low levels of conservation, this including many models with a p53 mutation. Secondly, we noted models that are relatively stable with no amplification or deletion at the vast majority of genes, including models such as MMTV-PyMT. Lastly there are models with a few highly conserved amplification or deletion events. These conserved events were noted in more than 25% of samples in lines such as the Erbb2 knock-in model or the WAP TAG model.
Conserved Role of CNAs in Human and Mouse Tumors
A key feature of the mouse tumors is their ability to model human cancer. To test the hypothesis that there were conserved CNAs in both species, we began by identifying the fraction of tumors within each model with an amplification or a deletion at genes prone to copy number alteration in human breast cancer such as ERBB2, MYC, PTEN, RB, and others identified in the TCGA study. Driver genes were found to be amplified or deleted in specific mouse models. Specifically, this occurred in the BRCA/p53 modified models, which have a fraction of samples with amplification in common oncogenes such as CCNE or deletion of common tumor suppressors like CMTM3.
The genes amplified in mice with a microacinar histological subtype, located on mouse chromosome 11, are also conserved in humans. Specifically, we found a region of fourteen genes which mapped to chromosome 17q25.1 in humans. These genes are shown to be amplified in a subset of breast tumors as identified by cBio portal. An assessment of mouse (Fig. 4b), and human (Fig. 4c) tumors, from the MMTV-Myc mouse model and TCGA breast cancer dataset respectively, showed conserved histology across species.
Human tumor samples were divided into those containing the microacinar specific genes through the use of a 14 gene signature (Fig. 4d). Tumors with at least two of the 14 genes amplified were considered to be in the amplified subgroup. This produced a subgroup for 28 human tumors and was compared against 30 randomly chosen tumors that contained none of the 14 amplified microacinar associated genes. Histology for each of the groups of samples was determined and it was found that the microacinar associated gene amplified subgroup contained an enrichment of tumors with a microacinar histology (Fig. 4e). This indicated a conserved role in gene copy number across mouse and human breast cancer in determining tumor histology specifically with respect to the microacinar subtype.
To test the role of chromosome 3F on metastasis, we examined mouse tumor samples where metastasis data and pathway activity predictions were available. In particular, we used the MMTV-Myc, MMTV-Neu, and MMTV-PyMT models. As predicted those samples with the 3F amplification had much higher predicted Ras activity and number of lung metastases than those with a deletion. (Fig. 5d). The 3F amplification event is also conserved in human Luminal A tumors. When tumors were split on the basis of amplification of the analogous human region it was seen that they exhibited higher Ras pathway activity (Fig. 5e) and had worse metastasis free survival (Fig. 5f).
Given that CNA was associated with metastatic progression we then hypothesized that CNA would also impact key cell signaling pathways. To test this hypothesis we examined the role of CNAs on major oncogenic pathways including BCAT, SRC, E2F, and others [38, 39]. This experiment then used the same workflow to coordinate CNA and pathway activation status as was used to coordinate amplification and deletion events with the tumor metastasis signature (Supplemental Figure 5A). This analysis revealed amplification and deletion regions associated with each major oncogenic signature. The regulation of signaling pathways can occur through amplifying key genes within the signaling pathway. An example of this was observed when specific amplified genes associated with high AKT activity located on chromosome 4 or the specific amplified genes associated located on chromosome 14 associated with high E2F2 activity were tested. When these genes are displayed in an interaction network, the vast majority of the genes can be found to be located either up or downstream of their respective key signaling protein such as or RB/E2F2 (Supplemental Figure 5B). This suggests the chromosome 14 region is associated with Rb/E2F signaling.
Here we have described the copy number alteration across the genome of 27 mouse models of breast cancer. This has been completed through the use of an algorithm (ACE) to infer gene copy number profile from gene expression data. The ACE algorithm was identified to have a high rate of false negatives and a relatively low rate of false positive calls. When the algorithm was run across the TCGA dataset, it was seen to have a moderate rate of concurrence with the TCGA copy number calls. Due to this, it is important to note the predictive nature of this database. While the copy number calls found in this dataset have not been validated using traditional means, the dataset begins to identify potential copy number variants in the mouse models of breast cancer. This is an important step in understanding tumorigenesis in these models specifically from a copy number point of view until a more robust and accurate profiling of the tumors can be completed.
The copy number profiles have been examined in a number of ways to classify inter and intra model heterogeneity as well as the similarities between copy number profiles in mouse models and human breast cancer. This study makes important contributions in understanding CNA in mouse models. Beyond this the CNAs are profiled for their contribution to tumor progression and the conservation of this role in human tumors.
Despite the presence of strong oncogenic signals, gene copy number alterations are still extremely common in mouse models of breast cancer. Common human drivers of breast cancer such as HER2, MYC or PTEN were not observed to be amplified or deleted at a high level across mouse models. This is unsurprising due to the lack of selective pressure for CNAs in these oncogenes or tumor suppressors because of the presence of a strong oncogenic signal. The exceptions to this are the p53/BRCA induced models which do not have a strong oncogenic signal but instead induce genomic instability. Amplification or deletion events in common human oncogenes are more frequent in these models.
A key finding of this manuscript is the heterogeneity of copy number alterations both within a model and between models. The within model heterogeneity is surprising due to the fact that each tumor is a biological replicate with the same driving oncogenic event. We identified that most events that occur within a given tumor model are not shared among even 50% of tumors from that same model. This finding underscores the importance of gene expression and genomic characterization of tumor studies when dealing with mouse models due to the inherent genomic variability. It further emphasizes the need for a large enough cohort to capture the heterogeneity of all tumor models.
During preparation of this manuscript, a complementary study was published examining CNA in mouse models of breast cancer . This publication uses a different algorithm to predict CNAs, one that predicts resolution on the whole chromosome scale while the ACE algorithm provides finer resolution. Due to the predictive nature of defining gene amplification events from gene expression data, we believe that it is important to compare their manuscript with the data herein. This demonstrates that multiple algorithms call the same dataset with overlapping findings, resulting in a comprehensive view of CNA in mouse model tumors. The two manuscripts agree on a number of findings including the stability of mouse models with rapid latency, the p53 KO model being the most unstable, within model heterogeneity, and the association of CNA changes with the microacinar subtype. The increased precision of the ACE method has allowed us to identify small focal events in many of the models including PyMT that the published paper did not uncover. Indeed, the data we present here allows one to search for a mouse model with amplification or deletion of particular genes. We have also leveraged human data through the use of the TCGA, Metabric, and KMplotter datasets to provide a comprehensive comparison of mouse models and the five main subtypes of human breast cancer tumors. In addition, we have also shown the conservation of regions between the two species to predict a number of new metastasis related copy number changes.
We noted that the amplification or deletion events are associated with secondary tumor characteristics such as tumor histology, enhanced oncogenic signaling, and tumor metastasis. Specifically, we observed unique copy number profiles for the microacinar tumor histology including the amplification of fourteen genes on chromosome 17q25.1. However other histological subtypes did not have characteristic copy number profiles. Surprisingly, we noted that EMT tumors were stable in regards to copy number change, likely due to activation of Kras in MYC tumors [23, 41, 42]. The lack of pattern of amplification or deletion of other histological subtypes indicates that there are other factors such as point mutations or transcriptional changes associate with these subtypes. This can also be said for oncogenic signaling pathways and tumor metastasis. While CNAs contribute to each of these, there are also contributions of single nucleotide variants (SNVs) and transcriptional changes. For this reason, it is important to integrate multiple platforms to understand tumor heterogeneity.
There is conservation between mouse and human subtypes in regards to tumor metastasis and oncogenic signaling. 132 genes that were amplified or deleted in mouse and human contributed to increased metastasis. Furthermore, these genes were located in the same oncogenic signaling pathways indicating conserved mechanisms of metastasis in human and mouse.
When comparing heterogeneity of breast cancer, we found a large degree of heterogeneity both between models and within specific models. Given these findings, it is therefore critical to understand copy number profile when choosing a strain to model human breast cancer. For example, if one is interested in the HER2 oncogene there are a number of mouse models including the MMTV-Neu [16, 25], Erbb2 Knock-in , NDL  and others with conditional activation . Each of these models has completely different CNA and transcriptional profiles leading to different oncogenic signaling and subsequently different tumor properties.
This heterogeneity also exists in other common models such as MMTV-Myc. This strain has previously been identified to be heterogeneous from a transcriptional viewpoint  and therefore it is unsurprising that it is also heterogeneous from a copy number standpoint. Due to the heterogeneity present at a gene expression and copy number level, investigators must take care when choosing tumor models of breast cancer to ensure that the chosen model reflects all aspects of the human breast cancer subtype they wish to model. We have also noted strain specific differences for some of the models. It was seen that the FVB model was found to be more unstable when compared to tumors derived from other backgrounds. This finding emphasizes the importance of researchers understanding the background of their mouse strain when choosing mouse models for their study.
Projects such as TCGA have profiled human tumors at multiple levels. This allows researchers to stratify human tumors by gene expression, copy number profile, as well as SNVs and epigenetic markings to find a tumor population that is relevant for their study. However, there is not a mouse model equivalent to this dataset, so researchers are unable to choose mouse models which represent their specific tumor subtype at multiple levels. Recent studies such as this and others have begun to make strides in this area by profiling tumors at a CNA and expression levels. However, there is still a need to continue to profile mouse models through the use of whole genome sequencing as well as epigenetic markings. This information needs to be available to researchers in order to design studies that accurately represent the human subtypes of breast cancer.
This study clearly illustrates the importance of gene copy number alterations in tumor progression even in the presence of strong oncogenic drivers. Many mouse models contain a high degree of gene copy number alterations. These copy number alterations are highly heterogeneous both between models and within a model of breast cancer. Despite this heterogeneity, it was seen that the CNAs found in mice are conserved in humans. Conserved variants were associated with tumor progression and potentially play a role in enhanced oncogenic signaling, histological appearance, and the tumor’s metastatic potential in both human and mouse tumors.
Beyond the profiling of mouse tumors and the conserved roles of CNAs in mouse and human tumors this study has a broader impact on the field of cancer research. It, when used in combination with gene expression studies, begins to create a comprehensive molecular portrait of tumors derived from mouse models of breast cancer. These studies could be significantly enhanced if outcome, pathology, metastasis and other clinical data was included when publishing tumor data from mouse models. However, this current study provides an essential resource to researchers to contemplate as they choose a model system to mimic a specific subtype of human breast cancer.
Materials and Methods
Dataset and ACE Analysis
A comprehensive mouse dataset was downloaded and assembled as previous described  including GSE15263, GSE3165, GSE37954, GSE32152, GSE10450, GSE22406, GSE42533, GSE15904, GSE8836, GSE27101, GSE30864, GSE20416, GSE10193, GSE23938, GSE15119, GSE16110, GSE25488, GSE21444, GSE8828, E-TABM-684. ACE analysis was run as previously described  comparing each individual sample to a wildtype mouse of the same strain (FVB/NJ = GSE25488, Balb/C = GSE21444, C57BL/6 = GSE14753) with a significance threshold p and q value of .05 with any size of the event.
The microarray based expression data was downloaded from the TCGA dataset. Z score was calculated for each gene in each sample and each event was classified based on the Z-score for validation analysis.
Human Dataset and Analysis
The TCGA breast cancer and KMplot.com datasets were used for human copy number analysis  and validation of results. Specific tumor breast cancer subtype and copy number calls were used from the TCGA dataset [12, 33]. To run ACE the gene symbols were replaced with their Affymetrix U133A_2 probe ID. This was queried using the cbio portal visualization tool. Distant metastasis free survival results were obtained using the KMplot.com dataset . ACE analysis for the human analysis compared expression to normal HMEC gene expression (n = 10) data gathered from GSE24468.
Mouse Metastasis Dataset
Mouse and Human Gene Location Conversion
Locations of mouse and human genes were taken from the Affymetrix array annotation files from mouse 430A_2 and human U133A_2 array. These locations were merged by common gene symbol to provide a conversion table between the two species for the location of a particular gene.
Clustering of Human and Mouse Tumors
ACE analysis was performed as previously described on the TCGA breast cancer human dataset as well as the mouse dataset. Significant CNAs were mapped onto the mouse genome for clustering. For human to mouse comparisons genes were filtered to those genes that were amplified or deleted in at least 5% of human and mouse tumors (n = 594). Unsupervised hierarchical clustering was performed using cluster 3.0 and Java Tree View. For tumor histology the MMTV-Myc dataset with histological annotations, GSE15904, was used. To cluster this dataset we filtered the genes to 4118 genes through the use of standard deviation of the neighborhood score. This removed all genes that were unaltered across the dataset. For all clustering analysis Euclidian distance, complete linkage was used for the similarity metric and clustering method respectively.
Jaccard index was calculated between clusters through use of the R package “sets” through use of the similarity function.
Mouse and Human Histological Comparisons
Histological annotations were analyzed from a group of MMTV-Myc mouse tumors , GSE15904, as well as the human TCGA tumors . We identified the genes within mouse chromosome 11 which were also amplified in a subset of human tumors using cbio portal. These fourteen identified genes mapped to the 17q25.1 region in humans and are referred to as the microacinar associated event. For overrepresentation analysis we compared the number of human tumors with the microacinar subtype from a group with the microacinar associated amplification event found in mice against a random set of tumors not containing that amplification event through the use of a 2 × 2 contingency table.
Oncogenic Signature Application
Predefined oncogenic signatures were applied to the dataset. Briefly, the training data was merged with the full dataset and batch effects removed through the use of COMBAT. These samples were then subjected to binary regression analysis with a predefined gene list and conditions for each individual signature [38, 39, 41, 49, 50].
Coordination of CNA with Oncogenic Signature
Oncogenic signatures [38, 39, 41, 50] and lung metastasis signatures  were applied to mouse and human datasets as previously described. These scores were coordinated to neighborhood score through the use of a Spearman rank correlation applied through R. A significance threshold of P < .01 was applied and the results were visualized using MATLAB.
Gene Network Interaction
Interaction networks were visualized through the use of STRING-DB . Input nodes were those genes significantly correlated with the particular pathway in a specific region as well as key signaling proteins for the pathway (Rb/E2F2). Twenty additional white nodes were added to complete the network.
We thank the members of the Andrechek laboratory for helpful discussions.
JR and EA collaborated on the study conception, design, and interpretation of results. BT provided histological coordination with CNA. HW provided the ACE analysis for many mouse models. JR performed all other experiments and drafted the manuscript. All authors have critically read, edited, and approved the final version of the manuscript.
Compliance with Ethical Standards
Conflict of Interest
The authors declare they have no conflict of interest.
This work was supported with NIH R01CA160514 to E.R. A and 1F99CA212221–01 to J.R.