Introduction

Chromosomal amplifications are a major type of genetic aberration found in human cancers of many histologies [1]. The importance of chromosomal amplifications to tumor biology is exemplified by the ERBB2/HER-2 amplification in 15–20 % of breast cancers [2]. Identification of this amplification was initially shown to be a poor prognostic feature. Subsequently molecular genetic analysis identified the HER-2 gene as a major oncogenic driver in the amplicon, which has been subsequently targeted for therapy with great success in the clinic. Chromosomal amplification of the dihydrofolate reductase (DHFR) gene was initially described as a mechanism of resistance to the chemotherapeutic drug methotrexate, and there are recent examples of acquired resistance to epidermal growth factor receptor tyrosine kinase inhibitors for lung cancer via MET amplification [3, 4]. Thus, chromosomal amplification plays a key role in cancer origination and therapeutic response.

Much of our knowledge of chromosomal amplification derives from studies in model organisms such as yeast or from experimental amplification of the endogenous DHFR locus. Stepwise increases in methotrexate concentration in culture can lead to amplification of the endogenous DHFR locus or a DHFR-linked transgene in rodent and some human cell lines [3, 58]. Random insertion of a mutant DHFR transgene in HCT-116 + chr3 colon cancer cells was used with methotrexate selection to amplify insertion sites and surrounding genomic loci [9].

Some recurrent amplifications in human cancers occur near the so-called DNA fragile sites, which are prone to spontaneous DNA breakage under conditions of replication stress, but the mechanisms underlying recurrent amplification at many loci remain poorly understood [10, 11]. We hypothesize that there are likely locus and cell-type-specific differences in terms of susceptibility to chromosomal amplification and that it would be useful to be able to study amplification of a greater variety of loci beyond DHFR. While genetic engineering has enabled modeling of specific mutations, deletions, and chromosomal translocations found in human cancer, we are not aware of a method for engineering specific, rather than random, chromosomal amplification events. We have developed such a method and as a proof of principle have engineered a recurrent amplification found in human breast cancers in a human breast cancer cell line.

Materials and methods

Cell lines

MCF-7 and MDA-MB-134VI cells were obtained from ATCC. Identity of MCF-7 cells was verified by sequencing for the described PIK3CA and GATA3 mutations, as well as by identification of described chromosomal amplifications by comparative genomic hybridization. Parental MCF-7 cells and their derivatives were maintained in DMEM 4.5 g/dL glucose (Invitrogen) supplemented with 5 % FBS and 100 U/mL penicillin and 100 µg/mL streptomycin (Invitrogen). Targeted 38C-3 cells were maintained in the above medium supplemented with 10 μM mycophenolic acid (MPA) (Sigma). Amplified subclones were selected and maintained in the above medium supplemented with 10 μM MPA and 10 μM mizoribine (Sigma). MDA-MB-134VI cells were grown in DMEM with 10 % FBS and penicillin/streptomycin as above. All cells were cultured at 37 °C at 5 % CO2.

Gene targeting of the ZNF703 locus

The SEPT targeting vector has been described [12]. The neomycin resistance cassette was replaced with an insert containing the E. coli inosine monophosphate dehydrogenase (IMPDH) cDNA, which was PCR amplified from Top10F E. coli cells (Invitrogen). The IMPDH insert was cloned downstream of the IRES sequence and upstream of the polyadenylation signal in SEPT. 5′- and 3′-homology arms targeting the ZNF703 locus were constructed by PCR using genomic DNA from MCF-7 as template for the homology arms. Primer sequences are shown in Supplementary Table 1. Recombinant AAV production and infection were performed as described [13, 14]. MCF-7 cells were selected in 10 μM MPA in 96-well plates. Surviving colonies were screened for homologous targeting of the cassette using a pooling strategy and PCR as described [15]. Positive colonies were re-cloned by limiting dilution and re-screened to ensure homogeneity.

Amplification drug selection

Targeted 38C-3 cells were plated at 4000 cells per well in a 96-well plate in medium containing 10 μM MPA and 10 μM mizoribine (Sigma). After approximately 4 weeks, resistant colonies were identified and expanded.

DNA and RNA extraction, cDNA synthesis, and PCR

Genomic DNA and total RNA were prepared from cells using QIAamp DNA Blood kits and RNeasy kits (Qiagen), respectively. cDNA was synthesized with First-Strand cDNA Synthesis kits (GE Biosciences). PCR amplification was performed with a GeneAmp 9700 (Applied Biosystems) and Phusion Hot Start II polymerase (NEB). qRT-PCR was performed on cDNA with forward and reverse primers located in distinct exons on an iCycler machine (Bio Rad) using Platinum Taq polymerase (Invitrogen) and SYBR Green dye (Invitrogen). Primer sequences for RT-PCR are in Supplementary Table 1.

Droplet digital PCR

TaqMan Primer/probe sets for ZNF703 (FAM-label) and the reference gene RPP30 (VIC-label) were obtained from Life Technologies. ddPCR was performed as described [16]. Genomic DNA was digested with Mse I. Eight ng of digested gDNA was mixed with ddPCR supermix (Bio Rad) and one microliter each of the ZNF703 and RPP30 primer/probe mixes. Twenty microliters of this mixture was combined with 70 microliters of droplet generator oil and emulsified in a droplet generator (Bio Rad). Thirty-nine microliters of this sample was transferred to a PCR plate and amplified using conditions as described [16], and droplets were read for fluorescence using a Bio Rad QX100 droplet reader. Results were analyzed using QuantaSoft software (Bio Rad) to normalize copy number relative to RPP30.

Array CGH

RNAase A-treated genomic DNA from parental MCF-7 cells, 38C-3 cells, and subclones E8, F3, and G5 was labeled and hybridized to Agilent 4 × 180 K arrays using normal human female genomic DNA as a control, according to the manufacturer’s instructions (Agilent Technologies, Santa Clara, CA). Data were extracted using Feature Extraction Software v9.1 (Agilent Technologies), and visualization was performed using Agilent Genomic Workbench v.7.0 using the hg19 version of the human genome as a reference. Hybridization and data acquisition and processing was performed at the Johns Hopkins SKCCC Microarray Core facility.

FISH

Cells were seeded in 8-well chamber slides, fixed overnight in a 10 % neutral buffered formalin solution and allowed to dry. Slides were then treated with 2 N HCl for 20 min, Vysis Pretreatment Kit I (Abbott Molecular), washed with a 2 × SSC buffer, and incubated in pretreatment buffer at 80 °C for 30 min. Slides were rinsed with dH2O and washed with 2x SSC buffer, placed in Vysis protease buffer (Abbott Molecular) at 37 °C for 8 min, washed with 2x SSC buffer, and fixed in 10 % neutral buffered formalin for 10 min. Slides were then dehydrated with ethanol baths and kept at room temperature for up to 2 weeks. Samples were hybridized with probes at 95 °C for 5 min and incubated at 37 °C for 48 h, treated with a 0.3 % NP-40 at 75 °C, counter-stained with DAPI (1:10,000) and sealed with Prolong Gold (Invitrogen). Samples were imaged using a Nikon fluorescence microscope and NIS-Elements BR 2.30 imaging program. FISH probes pre-labeled with fluorophores were to centromeric sequence of chromosome 8 (Abbott Molecular) or BAC probes to FGFR1, ZNF703 (RPCI-11-101H15), NRG1 (RPCI-11-15H14) (Empire Genomics).

Immunoblotting

Whole-cell protein extracts prepared in Laemmli sample buffer were resolved by SDS-PAGE using NuPage 4–12 % gels (Invitrogen), transferred to Invitrolon polyvinylidene difluoride membranes (Invitrogen), and probed with primary and horseradish peroxidase-conjugated secondary antibodies. Primary antibodies to RAB11FIP1 (#9438), ASH2L (#5019), FGFR1 (#9740), and GAPDH (#5174) were from Cell Signaling Technologies. ZNF703 antibody was from GeneTex (#107721). Blots were exposed to Kodak XAR film using chemiluminescence for detection (Perkin Elmer).

Results

A strategy to engineer site-specific chromosomal amplifications

Model systems for gene amplification have generally relied on dominantly selectable enzymes such as DHFR and CAD [3, 17]. Cells expressing the enzyme are treated with an inhibitor, such as methotrexate or PALA, respectively, selecting for a subset of surviving cells with increased expression of the enzyme. In some cases, the increase in enzyme expression is caused by increased copy number of the locus encoding the enzyme. We reasoned that targeting such an amplifiable selectable marker to a genomic locus of our choice could lead to subsequent amplification of that locus and surrounding sequences under selective pressure from an inhibitor. We re-designed a recombinant adeno-associated virus (AAV) gene targeting vector for this purpose (Fig. 1; [12]). Since we did not wish to disrupt the coding sequence of our targeted gene, we designed homology arms targeting the selection cassette to the 3′ UTR of the chosen gene, downstream of the stop codon but upstream of the endogenous polyadenylation signal.

Fig. 1
figure 1

AAV gene targeting strategy for engineering chromosomal amplifications. 5′ and 3′ homology arms (orange) flank a selection cassette containing an internal ribosome entry site (IRES, blue), the E. coli IMPDH gene (green), and a polyadenylation signal (PA, black). Targeting to the 3′ UTR of the gene of interest is selected for with mycophenolic acid (MPA) and identified by PCR screening. Subsequent selection with mizoribine leads to pressure to amplify the targeted cassette and flanking genes. Both exons of ZNF703 are depicted, but ERLIN2 is shown schematically as a single exon, and other genes are not shown

In order to avoid first having to generate a cell line null for the enzyme we planned to employ for amplification selection, we wished to use a dominantly selectable marker, i.e., one which we could select in the presence of the endogenous cellular genes. Such a marker must have a differential sensitivity to available enzyme inhibitors. We chose E. coli IMPDH. IMPDH is a rate limiting step in de novo synthesis of GTP. E. coli IMPDH has been shown previously to function as a dominant selectable marker in various human cell lines, as it is resistant to the inhibitor MPA, which effectively inhibits the endogenous human IMPDH enzymes [18]. Cells are infected with the recombinant AAV vector, selected in MPA, and resistant colonies are screened by pooling and PCR to identify correctly targeted clones as described [1315].

Once targeted clones are identified and single cell cloned to ensure homogeneity of the starting population, cells are plated in the presence of both MPA and a second IMPDH inhibitor mizoribine. Mizoribine inhibits both human cellular IMPDH and E. coli IMPDH with similar potency. We reasoned that increased IMPDH expression from gene amplification might cause resistance to mizoribine, whereas the presence of MPA would continue to inhibit endogenous IMPDH, even if amplified. Much evidence points to DNA breakage as an initiating event in the amplification process. Our strategy relies on spontaneous DNA breakage somewhere near our inserted IMPDH cassette, perhaps aided by exposure of single-stranded DNA due to depletion of nucleotide pools and stalling of DNA replication. The extent of amplification of the targeted locus and surrounding loci would depend on where these breaks occur, how well they are tolerated, and other unknown factors that determine the extent of chromosome amplifications, which are often highly variable in terms of size and complexity.

Targeted amplification of 8p11-12 in human breast cancer cells

As a proof of principle, we attempted to engineer amplification of the 8p11-12 region in the human breast cancer cell line MCF-7. Amplification of 8p11-12 occurs in approximately 15 % of human breast cancers, predominantly of the estrogen receptor positive subtype, of which MCF-7 is representative [19, 20]. This amplification event harbors multiple sub-regions of amplification, and various investigators have identified oncogenic functions for nearly a dozen candidate driver genes in the region using traditional overexpression or loss of function approaches [2126]. We reasoned that our experimental approach might serve as a method to simultaneously amplify and overexpress multiple genes in an amplicon, which is difficult to perform with traditional transgene methods. Furthermore, transgenes do not recapitulate transcriptional regulation of the genes from their endogenous promoter and enhancer elements. MCF-7 does have copy number abnormalities, which shows that at some point in its transformation to cancer, it was capable of endogenous chromosomal amplification. MCF-7 does not have amplification of chromosome 8p, however. In addition, MCF-7 cells were shown to amplify the DHFR locus in vitro in response to methotrexate selection [5]. This is important as the genetic basis for amplification remains obscure, although presumably certain deficits in DNA repair, replication, or cell cycle checkpoints are required to permit amplifications to occur.

We designed homology arms to target the E. coli IMPDH cassette to the 3′ UTR of the ZNF703 gene, which is at the telomeric end of the core 8p11-12 amplicon (Fig. 1). Multiple targeted clones were identified by PCR screening and purified to homogeneity by limiting dilution. We next plated one of the targeted clones, named 38C-3, in mizoribine. We identified three colonies resistant to 10 μM mizoribine, designated as E8, F3, and G5. We initially tested these colonies for increased copy number of the targeted ZNF703 locus by performing qPCR with primers specific to the targeting cassette and to the ZNF703 locus outside of the region of the homology arms (data not shown). Subsequently, we used droplet digital PCR to more precisely measure copy number at the ZNF703 locus using primers and a probe located near exon 1, normalized to the RPP30 gene, of which MCF-7 has two copies. As shown in Fig. 2, clones E8, F3, and G5 showed average ZNF703 copy number increases of approximately 2.5-fold relative to parental MCF-7 and the targeted clone 38C-3 before mizoribine selection. This indicates that the ZNF703 amplification occurred during mizoribine selection and was not present in the targeted 38C-3 clone prior to selection.

Fig. 2
figure 2

Droplet digital PCR measurement of copy number at the ZNF703 locus at 8p12. Copy number is normalized to the two copy RPP30 locus. From left to right, parental MCF-7 cells, a targeted clone (38C-3) before mizoribine amplification selection, and three amplified subclones of the 38C-3-targeted clone (E8, F3, G5). Bars represent 95 % confidence intervals. Results are representative of three experiments

To determine the extent and pattern of amplification, we performed genome-wide array CGH on the pre-amplified cells and amplified subclones (Fig. 3 and Supplemental Figures 1–3). As expected from the ddPCR result, all three subclones showed increased copy number of the ZNF703 locus. However, all three clones showed unique patterns of copy number change at surrounding loci. Clones F3 and G5 showed broad, homogeneous amplification (much longer in extent in F3) with concomitant copy number loss telomeric to the amplification. This pattern of amplification with distal loss is frequently observed in breast cancers on 8p [19]. Clone E8 showed a different pattern of amplification involving almost the entire 8p chromosome arm. Focal regions of copy number gain were interspersed with normal copy number in a sawtooth pattern, which has also been commonly observed in human tumors. Thus, these experimentally engineered multi-gene amplifications recapitulate several of the features of amplifications from actual human tumors. Importantly, the amplified subclones did not differ from parental MCF-7 or pre-amplified 38C-3 cells on the long arm of chromosome 8 (where MCF-7 has existing copy number gains, Fig. 3) or on the other chromosomes (not shown). This indicates that the induced copy number changes are specific and that the drug treatment does not select for generalized chromosomal instability.

Fig. 3
figure 3

Copy number profile of chromosome 8 by array CGH. From top to bottom, MCF-7, the targeted, non-amplified MCF-7 clone 38C-3, and the mizoribine-amplified clones E8, F3, and G5. The y-axis represents log2 ratios of copy number, with 0 representing diploid copy number. Red boxes copy number gain, Green boxes copy number loss. The ZNF703 locus is indicated by an arrow. Copy number profiles did not differ from parental MCF-7 cells for the remaining chromosomes (not shown)

Because ddPCR and array CGH average copy number over the entire population, we performed FISH to assess copy number changes at the level of individual cells, using probes for centromeric sequences on chromosome 8 and three BAC probes located near FGFR1, at the centromeric end of the 8p11-12 amplification, ZNF703, and NRG1, which is located 5 Mb telomeric to ZNF703. Parental MCF-7, 38C-3, and amplified subclones all showed two signals for centromere 8, and MCF-7 and 38C-3 were diploid for the other loci tested (Fig. 4 and Supplemental Figure 4). Clones E8, F3, and G5 all showed increased FISH signals for ZNF703, consistent with the estimated copy number by ddPCR, and F3 and G5 showed similar increases in signals for FGFR1. Clone E8 showed low level gain of FGFR1, also consistent with the array CGH results (Supplemental Figures 1 and 4). Clone G5 showed only a single copy of NRG1 by FISH, consistent with the telomeric copy number loss observed by array CGH (Supplemental Figures 3 and 4). Clone E8 showed more heterogeneity than clones F3 and G5 at the cellular level, with significant variability of NRG1 copy number among individual cells, possibly indicating a greater degree of genomic instability in this clone (Supplemental Figure 4).

Fig. 4
figure 4

FISH on parental MCF-7 cells and amplified ZNF703-targeted subclones E8, F3, and G5. Nuclei are stained with DAPI. The green probe is to chromosome 8 centromeric sequences. a The red probe is a BAC in the ZNF703 region on 8p11-12. b The red probe is a BAC in the FGFR1 region

Copy number variation is a leading cause of gene expression variation among tumors, and copy number-associated overexpression can be used as a criterion to narrow down the list of candidate driver genes in a given region. We performed qRT-PCR for genes in the core 8p11-12 amplification in our experimentally amplified clones (Fig. 5). All clones showed increased expression of ZNF703, as would be predicted; however, the clones differed in the extent and degree of copy number-associated overexpression of neighboring genes in the region. Clone G5 showed the highest relative expression in the greatest number of genes, followed by clone F3, and clone E8 exhibited more modest changes. This trend is in keeping with the broader increase of copy number for these genes in G5 and F3 versus E8 seen by array CGH. These differences may also reflect epigenetic variation among the clones. Indeed, the correlation between copy number gain and gene overexpression in cancer-associated amplifications is imperfect.

Fig. 5
figure 5

Copy number-associated overexpression of co-amplified genes on 8p11-12. Quantitative real-time RT-PCR for selected genes in the 8p11-12 region in their genomic order (ZNF703, telomeric; MYST3, centromeric). Expression for each gene is normalized to a reference housekeeping gene, TBP. The expression level in the pre-amplified 38C-3 clone is set at 1. The mean and standard deviation of two experiments are represented

There have been few systematic investigations of overexpression of amplified genes at the protein level, although individual candidate genes have been studied and documented, such as HER-2. We examined protein expression for several of the genes in the region, compared to parental non-amplified MCF-7 cells and the 8p11-12 amplified breast cancer cell line MDA-MB-134VI (Fig. 6). We observed protein overexpression of full length FGFR1 or its proteolytically processed C-terminal fragments in clones F3 and G5 [27]. These clones also overexpressed RAB11FIP1, and F3 additionally overexpressed ASH2L. Protein expression differences for ZNF703 were less dramatic, in keeping with the low level increase in mRNA. It should be noted that the targeted IMPDH cassette is translated from an IRES, allowing independent posttranscriptional regulation of ZNF703. Thus, experimental amplification of a targeted locus can lead to overexpression of regional genes at the protein level, even when direct selection for the activity of these proteins is not applied.

Fig. 6
figure 6

Proteins in the amplified region are overexpressed. Western blot for selected proteins from the 8p11-12 region. Lane 1, MCF-7. Lane 2, MDA-MB-134VI, a human breast cancer cell line with known amplification of 8p11-12. Lane 3, the targeted, pre-amplified 38C-3 clone. Lanes 4–6, amplified subclones derived from 38C3. ZNF703, FGFR1, RAB11FIP1, and ASH2L are encoded by genes on 8p11-12. Asterisks indicate C-terminal proteolytic processed fragments of FGFR1. Migration of molecular weight standards in kilodaltons is indicated on the right

Discussion

We have demonstrated the use of gene targeting technology to engineer site-specific chromosomal amplifications in a human breast cancer cell line. Amplifications observed represent the diversity of such events typically observed in amplifications found in primary human breast tumors, including varying extent of the amplified region, broad homogeneous gain versus sawtooth pattern of copy number change, and concurrent copy number loss of telomeric sequences. Amplification led to overexpression of some, but not all, amplified genes, as with primary tumors. In some cases, amplified genes were overexpressed at the protein level as well.

We anticipate that this strategy can be applied to potentially any locus and can be extended to other cell types. Although we used AAV for gene targeting, other targeting approaches relying on homologous recombination could be used, such as CRISPR-Cas9 or TALEN technology. We also anticipate that other selection markers could be employed. For example, we have performed some preliminary experiments using the L22F DHFR mutant, which is more resistant to methotrexate and can be used as a dominantly amplifiable marker in the presence of intact cellular DHFR [9, 28]. In practice, it is unlikely that any single marker will be universally effective for all cell types, since cells will differ in their reliance on specific enzymatic pathways. Similarly, cells may become resistant to the selection drug through mechanisms other than gene amplification (as has been observed with methotrexate), so experimental approaches will have to be individualized and determined empirically.

The boundaries of the experimental amplifications are determined by the sites of DNA breakage and subsequent processes of replication, repair, and chromosome segregation. These forces are largely unknown and to some extent “random” from the experimenter’s point of view. However, we anticipate that our strategy could be modified by engineering-specific-targeted double-strand breaks (for example, by incorporating a SceI endonuclease sequence in the targeting cassette or using CRISPR-Cas9) to attempt to control the initiation of the amplification event.

Although much has been learned about the amplification process, many unknowns remain. It would be of interest to engineer a cancer-associated amplification in a non-transformed cell to determine whether a specific amplification as a primary oncogenic event is sufficient to cause cellular transformation. We expect that it may be more difficult for “normal” cells to amplify genomic loci since they typically have few aberrations in DNA repair and cell cycle checkpoint genes that are likely important for facilitating the process [29]. Some studies have suggested that cells with intact p53 will be resistant to experimental amplification [30]. MCF-7 has wild-type p53 genes, but it has a number of other genetic aberrations which may allow amplification to occur despite intact p53. We believe that our system can provide a useful experimental platform to dissect the role of specific genes and exposures as modifiers of the amplification process at specific loci in distinct cell types.

It is not uncommon for recurrent chromosomal amplifications in common human cancers to harbor anywhere from several to dozens of genes. For some of these amplifications, including the 8p11-12 and 11q13 amplifications in breast cancer, multiple plausible driver genes remain after correlating copy number and gene expression. Some investigators have recognized the possibility that multiple oncogenes may cooperatively drive oncogenesis in such amplicons, although it is difficult to overexpress more than two or three candidate genes in a cell by traditional methods. The approach presented here is a potential experimental strategy to model such cooperativity in an isogenic cellular background.