Breast cancer is the most prevalent cancer worldwide and is the second leading cause of cancer-related deaths in women in North America [1, 2]. It is a complex disease in which multiple genetic factors can combine to drive pathogenesis [35]. Changes in copy numbers of genes such as ERBB2 and c-MYC have been extensively documented in breast cancer and are present in model cell lines [69]. Amplified (and overexpressed) genes are prime therapeutic targets as for example, the use of the drug trastuzumab against ERBB2 has been shown to improve breast cancer survival rates alone or in combination with other treatments [1012].

Strategies to detect gene copy number alterations will facilitate the identification of novel molecular targets. Previous studies with 10-megabase (Mb) resolution conventional metaphase comparative genomic hybridization (CGH) have identified gross regions of recurrent chromosomal aberrations in multiple breast cancer cell lines including loci within chromosomes 1q, 8q, 11q13, 17q and 20q13. Many of these alterations proved to be relevant because they were also present in primary tumors investigated [1315]. Recent advances in array CGH have greatly improved the resolution of this technology, enabling the detection of segmental copy losses and gains [16, 17]. Regional genomic arrays, providing contiguous or tiling coverage of a locus of interest, have been constructed for the fine mapping of commonly altered regions in breast cancer (such as 20q13) [1820]. Whole chromosome arrays have been used to provide information at 500 kb intervals. For example, a chromosome 17 array was used to identify 13 regions of change present in breast cancer cell line models and primary breast cancers [21]. Similarly, a genome-wide array containing nearly 2,500 bacterial artificial chromosome (BAC) clones with a resolution at about 1.4 Mb was used to illustrate the detection of copy number alterations (CNAs) in various breast cancer cell lines [22]. Recently, a separate study using an array of 422 genomic loci detected frequent alterations at 1, 6, 7p, 9, 11q, 12q, 17, 20q and 22q in archival breast cancer specimens [23]. cDNA arrays have also detected DNA copy changes of amplicons containing ERRB2 on 17q [2427]. More recently, a cDNA array containing 6,691 mapped human genes was used to explore the relationship between copy number alteration and gene expression changes in breast tumors and cell lines [28]. While large-insert clone megabase-interval CGH arrays and cDNA arrays provide a robust platform for the rapid survey of tumor genomes, valuable information could be overlooked as a result of their limited resolution. It is clear that a more detailed description of breast tumor genomes would require re-examination with a higher-resolution array platform.

Genetic, biochemical and pharmacologic studies of breast cancer have been greatly dependent on several commonly used model breast cancer cell lines: MCF-7, BT-474, SK-BR-3, T-47D, UACC-893, MDA-MB-231 and ZR-75-30. That is, a summation of studies involving at least one of these seven cell lines produces over 13,500 hits on Medline. These cells are known to harbor gross chromosomal aberrations; measuring the precise segmental copy number status across their entire genome may uncover novel discrete changes. In the current study we expanded the use of array CGH to survey the genomes of these breast cancer cells at unprecedented detail with a recently developed whole-genome tiling path array that covered the genome with 32,433 overlapping BAC clones [29]. Analysis at this resolution has led to the identification of novel features in these genomes and to the delineation of segmental genetic alterations that have escaped detection by conventional molecular cytogenetic techniques and previous marker-based or interval array CGH analysis.

Materials and methods

Cell line DNA

A panel of seven breast cancer-derived cell line DNA was obtained from the American Type Culture Collection: MCF-7, T-47D, Sk-Br-3, MDA-MB-231, BT-474, UACC-893 and ZR-75-30. Pooled normal female DNA was used as reference for all array CGH experiments (Novagen, Mississauga, ON, Canada). DNA was quantified with a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).

Array CGH

The seven cell lines were assayed for genetic alterations with a whole-genome tiling path BAC array in comparative genomic hybridization experiments. The submegabase-resolution tiling set (SMRT) array contains 32,433 overlapping BAC-derived DNA segments that provide tiling coverage over the human physical genome map. All clones were spotted in triplicate, resulting in 97,299 elements over two sides [2931]. A detailed protocol is provided in Additional file 1.

Imaging and analysis

Hybridizations were scanned with an imaging system based on a charge-coupled device (Arrayworx eAuto; Applied Precision, Issaquah, WA, USA) and analyzed with SoftWoRx Tracker Spot Analysis software. Stringent criteria were applied to filter spot intensity data. A standard deviation greater than 0.075 between triplicate spots was deemed unreliable and such spots were therefore excluded from our analysis [29]. Only data points with a ratio of signal intensity to background intensity noise exceeding 15 were used in the analysis.

Custom software (SeeGH) was used to visualize log2 ratios of clones with respect to location in the genome [32, 33]. Because of the complexity of the genomes of these cell lines with respect to ploidy, we have set thresholds for high-level gains and losses to +0.8 and -0.7, respectively, to limit the number of regions for discussion. This threshold encompasses high-level or multi-copy changes previously reported while excluding the abundant number of low-level or single-copy changes common to these cell lines. The complete data set has been made publicly available for further inquiry. In addition, only those loci containing two or more altered overlapping clones were included in the analysis to reduce false positives, and breakpoints were confirmed with the publicly available aCGH-smooth software [34, 35].

Fluorescence in situhybridization

For fluorescence in situ hybridization (FISH) probe synthesis, DNA samples from BAC clones RP11-118L18, RP11-419H8, RP11-813P3 and RP11-790I13 were amplified with a modified ligation-mediated polymerase chain reaction protocol as described previously [31]. Imaging and analysis were performed as described previously [36].

Results and discussion

Whole-genome tiling path analysis of segmental alterations

SMRT array CGH technology provides a tool for assessing genomic aberrations comprehensively in great detail. Comprehensive genomic profiles of segmental gains and losses for seven commonly used breast cancer model cell lines were revealed with this technology. Because of the large amount of data generated, we present the complete genomic profiles and frequency analysis in Additional files 2, 3, 4, 5, 6 7 and 8 (Figure 1). The raw data of the signal intensity ratios of the 97,299 spots for each array CGH experiment have been made publicly available [33] and also deposited at the gene expression omnibus (GEO) database at NCBI, series accession number GSE3106.

Figure 1
figure 1

Comprehensive submegabase-resolution tiling set (SMRT) array comparative genomic hybridization profile of cell line UACC-893. Whole-genome SeeGH karyogram UACC-893. Individual data points denote log2 ratios plotted to corresponding chromosomal location. Log2 ± 0.5 scale bars are included for reference. Displacement of data points to the right and left of the centre line represents gain and loss, respectively. The inset shows a magnified view of complex alteration on chromosome 17q.

Figure 1 demonstrates the details of a tiling path SeeGH karyogram, summarizing SMRT array CGH results for cell line UACC-893. Whole chromosomal arm gains can be seen at 1q, 5p, 7p, 8q and 10p, whereas arm losses are evident at 3p, 4p, 5q, 8p, 13q, 17p, 19p, 20p and Xp. Smaller segmental changes such as the telomeric gained region of 6p or loss at 10q are readily detected. Complex alterations indicating multiple levels of change are denoted by higher-level peaks embedded within a region of change, for example the central region of the 2p arm. The magnified display of 17q demonstrates the identification of a discrete CNA. Beginning at the centromere, we can see two regions of segmental loss separated by a high-copy-number amplicon containing the ERBB2 gene. The centromeric breakpoint of this amplicon is located between the overlapping regions of clones RP11-25P3 and RP11-592L16, whereas the telomeric breakpoint is located between clones RP11-686E5 and RP11-259G21. The second region of segmental loss at 17q21.1-q21.31 is followed by a large segmental gain and a second discrete multiple copy amplification at 17q25.1.

To establish detection sensitivity, we first examined previously reported regions of CNA. Our data indicated high-level gains at the c-MYC locus in SK-BR-3 and MCF-7 (+2.84 and +1.19 log2 ratios, respectively) corresponding to previously reported change in copy number [7, 37, 38]. Similarly, BT-474, ZR-75-30, UACC 893 and SK-BR-3 are known to harbor a high-level amplification of the ERBB2 locus. SMRT array CGH, in addition to detecting the ERBB2 locus, revealed several additional discrete changes on the 17q arm in these cell lines. In another example, a previously reported homozygous deletion at 3q13.31, detected by a 10K single nucleotide polymorphism (SNP) array in MCF-7, yielded a log2 ratio of -1.2 in our SMRT array CGH analysis [39]. Further comparison of SNP data and SMRT array CGH for cell line BT-474 showed that many of the alterations detected by SMRT array CGH were not clearly delineated or were not detected by the SNP platform (Additional file 9). Although SNP arrays offer the advantage of genotype data, they are only suited to the detection of large-scale changes in copy number. However, the two technologies are clearly complementary because each is designed to address a different question.

Six of the seven cell lines (not MDA-MB-231) were previously profiled for genomic alterations with the use of a 6,691-gene cDNA microarray [28]. Pollack and colleagues showed numerous genomic alterations, both gains and losses, which were correlated with expression patterns on the same array platform. All the CNAs reported were detected by SMRT array CGH, along with the discovery of numerous novel alterations when re-evaluated at tiling path resolution. Known and novel CNAs for the seven genomes are summarized in Table 1. Interestingly, not all CNAs contain annotated genes, which is consistent with the fact that the annotation of coding and non-coding transcripts within the human genome sequence is a continuing process.

Table 1 High-level alterations detected by array CGH

Novel features of the genome of model cell lines

Among the seven cell lines, 75 regions of high-level (multi-copy) segmental gains and 48 regions of multi-copy loss were identified. Because these cell lines serve as model systems for investigating breast cancer biology, a detailed understanding of their genetic alterations is essential to the interpretation of studies with these cell lines. We first describe noteworthy features of the individual genomes and then compare across multiple profiles to identify common alterations.

MCF-7 genome

The MCF-7 genome harbors 21 high-level CNAs, summarized in Table 1. Remarkably, many of the previously reported regions of genetic alteration split into multiple segments upon tiling resolution analysis. The 1p13 amplification described previously [40] in fact divides into three distinct segments of high-level amplifications: a 1,300 kb segment at 1p13.3, containing only two genes, those encoding arginine N-methyltransferase-6 (PMRT6) and netrin G1 (NTNG1); a 300 kb segment at 1p13.2, encompassing a single gene, that encoding potassium voltage-gated channel subfamily D member (KCND3); and a 1,300 kb region at the centromeric end of 1p13.2, containing 20 genes including BCAS2, which has been shown to be amplified and overexpressed in breast cancer cell lines and tumors (Figure 2) [4042]. Although a loss at 4p15-qter has been reported [14], we observed a 7 Mb loss at 4q34.3-q35.2. The same group also reported an 11p loss; however, our data show that this alteration represents a large 45 Mb segment at 11p15.5-p11.2 and an adjacent but distinct 2 Mb loss at 11p11.2. Similarly, amplifications at the distal end of 15q [13, 14] were fine mapped to reveal a 4.9 Mb high-level gain at 15q21.1-q21.3 encompassed by clones RP11-416B20 and 664B9 containing FGF7, CYP19A1 and MAPK6. A lower-level gain was also observed at 15q22.2-qter.

Figure 2
figure 2

Magnified submegabase-resolution tiling set (SMRT) array comparative genomic hybridization profile of the 1p21.1-p11.1 region in MCF-7. Scale bars labelled ± 1 denote log2 ratio scale. Blue highlighted regions indicate the locations of independent amplicons.

BT-474 genome

BT-474 possesses the greatest number of high-level gains and complex alterations and has previously been profiled with the SMRT array CGH platform [29]. In brief, the 1q arm showed multiple rearrangements. A complex aberration at 1q21.2-q25.1 is highlighted by three peaks of high-level gain: 1q21.2-q21.3 (350 kb), 1q22-q23.1 (500 kb) and 1q24.2 (550 kb). In addition, two previously undocumented, distinct regions of gain were identified at 1q31.3 (1,650 kb) and 1q32.1 (950 kb). Figure 3a shows FISH verification of the 1q32.1 amplicon. Although a 1q42-qter gain has been previously reported for BT-474 [14] we observed four separate regions of high-level gain: 1q42.12-q42.13 (500 kb), 1q43 (450 kb), 1q44-q43 (850 kb) and 1q44 (1,700 kb). A 11q13-q14 gain was redefined by SMRT array CGH as a complex high-level amplification at 11q13.1-13.5 (19.8 Mb) containing two distinct and localized high-level peaks at 11q13.1 (700 kb) and 11q13.4 (1,050 kb).

Figure 3
figure 3

Fluorescence in situ hybridization (FISH) analysis in SK-BR-3 and BT-474 cells. (a) SK-BR-3 interphase FISH. Clone RP11-118L18, labeled in spectrum green, is located within a 680 kb amplicon at 3q25.1; clone RP11-419H08, labeled in spectrum red, denotes an unchanged site at 3q11.2. (b) BT-474 interphase FISH. Clone RP11-813P03, labeled in spectrum red, is located at the peak of a 940 kb amplicon at 1q32.1; clone RP11-790I13, labeled in spectrum green, is located within an adjacent unchanged region.

In addition to fine mapping of regions previously reported, several prominent novel alterations were detected: high-level gains at 4q21.1 (2,700 kb), 9p13.3 (2,050 kb), 11q22.1-q22.2 (3,600 kb), 14q11.2-q21.1 (21 Mb) and 14q31.3-q32.12 (3,100 kb). Gains of 20q have been well documented in breast cancer [13, 20, 23, 43]. In BT-474 we observed four distinct segments with increased copy numbers: 20q11.22 (1.3 Mb), 20q13.11-q13.32 (14.8 Mb), 20q13.33 (300 kb) and 20q13.33-tel (1.4 Mb). The gene encoding prefoldin 4 (PFDN4) located within 20q13.11-13.32 has been shown to be overexpressed in those cell lines in which it is amplified, including BT-474 [18]. This chromosome arm also harbors regions of loss at 20q11.22 (650 kb) and 20q11.23-13.11 (7,150 kb) that have not previously been reported.

ZR-75-30 genome

In total, 11 high-level losses and 13 high-level gains were identified in ZR-75-30. Multiple discrete alterations were observed on chromosome arms frequently implicated in breast cancer, including 1p (four deletions), 8q (eight amplicons) and 17q (seven amplicons and four deletions). Novel segmental losses of varying sizes were detected at 4q21.1 (150 kb), 11q13.5-qter (57.6 Mb) and 21q11.2-q22.11 (16.3 Mb). The discrete high-level amplifications on 8q at 8q11.21 (700 kb), 8q13.3 (500 kb) and 8q22.1 (700 kb) encompassed interesting gene loci such as those for the following: protein kinase DNA-activated catalytic subunit (PRKDC), which might have a role in DNA repair and non-homologous DNA end joining; transient receptor potential cation channel A1 (ANKTM1), which when overexpressed, affects normal eukaryotic cell growth; and cadherin 17 (CDH17), which shares structural features with the cadherin superfamily of calcium-dependent cell–cell adhesion proteins [4447].

UACC 893 genome

High-level gains at 11q13-q14 have been documented in UACC 893 [14]. We also observed this alteration (11q13.3-q14.3, 24.7 Mb); however, an additional discrete high-level gain at 11q22.1 (600 kb) was also discovered, which interrupts a portion of the gene locus for contactin 5 (CNTN5), a neural adhesion molecule. A novel gain at 7p21.1 (700 kb) was also detected that encompasses several gene loci, including those for anterior gradient 2 (AGR2) and breast cancer membrane protein (BCMP1). AGR2 has been shown to be positively correlated with estrogen receptor expression and negatively with epidermal growth factor receptor expression in breast cancer tissue [48]. A loss at 16p12.1 (1,400 kb) was also observed.

SK-BR-3 genome

Amplifications at 3p22-pter in SK-BR-3 have previously been reported [13, 14]. We observed a 400 kb amplification at 3p22.2 as well as two novel regions of high-level amplification at 3q25.1 (700 kb) and 3q22.3-q23 (2,000 kb). Figure 3b shows FISH confirmation of this amplification. Genetic alterations of 8q seem to be complex in SK-BR-3. We observed the three previously reported regions of gain at 8q13.2-q21.13 (10.6 Mb), 8q21.2-q21.3 (6 Mb) and 8q23.2-q24.21 (17 Mb). However, we also identified three distinct amplicons within the 6 Mb region (8q21.2 (300 kb), 8q21.3 (550 kb) and 8q21.3 (500 kb)) and also four distinct high-level peaks within the 17 Mb gain described above: 8q23.3 (750 kb), 8q23.3 (350 kb), 8q24.12 (800 kb) and 8q24.21 (700 kb, contains c-MYC). We also observed four regions of deletion not previously reported on 8q: 8q21.3-q22.1 (6 Mb), 8q22.3-q23.1 (4.9 Mb), 8q24.22 (1.6 Mb) and 8q24.23-q24.3 (3.8 Mb). In addition to losses on chromosomes 3 and 8, our analysis has also identified novel regions of loss at 12q23.3-q24.11 (1.4 Mb) and 12q24.21-q24.31 (5.4 Mb) and further delineated a 17q12 gain into two distinct high-level gains at 17q11.1-11.2 (3.2 Mb) and 17q12-21.2 (3.4 Mb). In addition a previously reported gain of 17q24-qter fine mapped to a 1,550 kb amplicon at 17q25.3 [13, 14].

MDA-MB-231 genome

MDA-MB-231 possessed the fewest number of high-level alterations. Gains at 6p have previously been reported [14, 49]; however, two distinct regions of high-level gain were observed within this arm in our analysis: at 6p21.31-21.2 (3.5 Mb) and at 6p21.2-21.1 (3.3 Mb). We also observed a novel 670 kb gain at 7q35. Loss at 9p has also been reported; however, we were able to discern two distinct segmental losses each containing an amplicon [4951].

T-47D genome

T-47D was unique in that it possessed three times as many genomic losses as gains. We observed gains at 18p11.32-p11.32 (350 kb) and 18q21.1 (300 kb) that have not previously been reported [14, 38, 49, 51]. Only five genes reside within the 18q21.1 region: that encoding protein inhibitor of activated STAT2 (PIAS2), elongin genes TCEB3L2 and TCEB3L and hypothetical genes DKFZP564D1378 and HSPC039.

Common regions of copy number alteration

Gains at 8q, 17q and 20q are among the most frequently documented alterations in breast cancer. Eight of the nine cell lines (MDA-MB-231 was the exception) showed high-level gains at one or more of these chromosome arms. Multiple alignment of genomic profiles delineated novel minimum altered regions (MARs) common to these cell lines.

Gains at 8q are arguably the most frequently documented alteration in a variety of cancers including breast and prostate cancer [5]. We have highlighted four that were common to multiple cell lines (Additional file 10). First, a discrete 500 kb amplicon at 8q13.3 in ZR-75-30 is also included within the larger alteration at 8q13.33-q21.13 in SK-BR-3. Only one gene resides within this MAR: that encoding transient receptor potential cation channel subfamily A, member 1 (TRAPA1). Hyman and coleagues [26] investigated 14 breast cancer cell lines including BT-474, MCF7, SK-BR-3, T47D and ZR-75-30 with a 13K cDNA array identifying four independent genomic amplicons at 8q, including 8q21.11-q21.13, 8q21.3, 8q23.3-q24.14 and 8q24.22. However, the distinct amplicon at 8q13.3 in ZR-75-30 detected by SMRT array CGH was missed in this study. We observed a second larger MAR at 8q21.2-q21.3 common to alterations in MCF7 and SK-BR-3. About 20 genes reside in this 5 Mb region, including those encoding E2F transcription factor, exonuclease GOR and matrix metalloproteinase 16. A third MAR is located at 8q24.12-q24.21 and is common to MCF-7, ZR-75-30 and SK-BR-3, whereas lower-level gains are apparent in BT-474, UACC-893 and MDA-MB-231. Although the genes encoding zinc finger transcription factor (TRPSI) and eukaryotic translation initiation factor 3 (EIF3S3) are excluded from this MAR (c-MYC is included), some of the cell lines possess highly complex gains that extend through a much larger region of the arm and can include the TRPSI and EIF3S3 loci. Savinainen and colleagues [37] reported 41 copies of TRPS1 and 21 copies of EIF3S3 and MYC in Sk-Br-3. The fourth and most telomeric MAR, 8q24.3, has boundaries defined by a peak of high-level change within the large complex alteration 8q22.2-q24.3 found in ZR-75-30. MCF-7, BT-474 and UACC-893 share low-level gains within this region of about 10 genes.

Chromosome 17q gains have been well documented in both breast cancer cell lines and clinical cases [14, 15, 21, 39, 50, 52]. Re-examination of this chromosome arm at tiling resolution suggests that the 17q amplification is complex and involves multiple but distinct regions (Fig. 4). First we identified a common high-level gain at 17q25.1 containing a narrow MAR of 760 kb bounded by BAC clones RP11-76G4 and RP11-552F3. The genes encoding RECQ protein-like 5 (RECQL5), H3 histone family 3B (H3F3B) and growth factor receptor-bound protein 2 (GRB2) reside within this gene-rich region, with GRB2 shown to interact with epidermal growth factor receptor (EGFR) [53]. Second, at 17q23, two separate amplicons in MCF-7 and one large amplicon in BT-474 have been described previously, although it is unclear whether these amplicons are overlapping and harbor the same candidate oncogene [25, 54]. Our data show the presence of a large complex alteration in MCF-7 at 17q21.32-q24.3 with a high-level amplification at 17q23.2. BT-474 contained two regions of complex alterations at 17q21.32-q23.2 comprising three distinct high-level peaks as well as a single peak at 17q24.1-q24.3 with a single peak. Similarly, three regions of high-level gains were observed in ZR-75-30 and one large region of lower-level gain in UACC-893. Interestingly, our alignment revealed that the high-level peaks involving the 17q23.2 region in MCF-7, BT-474 and UACC893 do overlap, defining a 800 kb MAR from RP11-50F16 to RP11-653P10 containing candidate genes RPS6KB1, LOC51136, FLJ22087, CA4, NY-REN-60, APPBP2 and PPM1D.

Figure 4
figure 4

17q SMRT aCGH profile of five cell lines sharing multiple minimum altered regions. Scale bars labelled ± 1 denote log2 ratio scale. Blue highlighted regions indicate the locations of MARs.

Another striking feature identified through our tiling resolution scan of 17q is the overlapping amplicons at 17q21.32-q21.33 present in BT-474 and ZR-75-30. The 600 kb MAR from RP11-71G24 to RP11-600O7 harbors the HOXB family (HOXB1 to HOXB9). Previously described by Hyman and colleagues [26], this amplicon is shown to be present in 10.2% of primary breast cancers, suggesting the involvement of developmental genes in breast cancer pathogenesis (Fig. 4).

Chromosome arm 20q has been shown to be frequently amplified in breast cancer, and amplification of 20q13 is associated with aggressive tumor phenotype, disease recurrence and reduced duration of survival [20]. We identified multiple copy number alterations within this cytoband and defined distinct minimal regions of alteration (Additional file 11). The detection of a 1.5 Mb MAR at 20q13.2 in MCF-7, BT-474 and SK-BR-3 (RP11-20J8 to RP11-346B3) containing the genes encoding zinc finger protein 217 (ZNF217), breast cancer-amplified sequence 1 (BCAS1), cytochrome P450 24A1 (CYP24A1), prefolding 4 (PFDN4) and docking protein 5 (DOK5) is consistent with previous CGH studies that identified amplification of this region in breast cancer [18, 19]. Similarly, we identified a MAR at 20q13.31 from RP11-44A6 to RP11-671P16, containing the gene encoding bone morphogenic protein 7 (BMP7), SPO11 and the gene encoding RNA export 1 (RAE1), corresponding to a previous report in MCF-7 and BT-474 [43]. A large 1.5 Mb amplification at 20q13.12 has also been reported in MCF-7 and BT-474 [43]. Our analysis identified an amplification at 20q13.12-q13.13 common to MCF-7, BT-474 and SK-BR-3. This spanned BAC clones RP11-702E3 to RP11-637D22 defining a narrow 680 kb MAR implicating the genes encoding protein kinase C-binding protein (PRKCBP1) and nuclear receptor coactivator (NCOA3) as potential oncogenes relevant to breast cancer.

EGFR (ERBB1) and associated pathways

The EGFR and associated pathways have an important role in several aspects of mammalian cell growth such as cell survival, proliferation and differentiation [55, 56]. The receptor family is composed of four type-1 tyrosine kinases (ERBB1 to ERBB4) that dimerize after stimulation by ligand and initiate downstream signaling. Receptor ligand recognition is redundant to some extent, and affinity varies. Although ERBB2 has no known ligand, it becomes activated after heterodimerization with other ERBB family members, the most preferred and potent combination being with ERBB1, whereas the ERBB3 homodimer remains inactive [57].

The redundancy of this pathway suggests its importance as cells have invested in the mechanisms to make this regulatory pathway fail safe. We have investigated genomic loci for about 60 genes implicated in this pathway (Table 2) [56]. Overall, gains were 2.4 times more frequent than losses, and all cell lines contained at least three loci of change. Our data revealed that, as expected, the ERBB2 locus is highly amplified in four cell lines (UACC-893, ZR-75-30, BT-474 and SK-BR-3), and overexpression has been shown in two of them [9]. Although amplification of EGFR-interacting genes RECQL5, H3F3B and GRB2 has been described above, other frequently altered loci include c-MYC, LIMK1, PRCKA, CHN2, ERBB2, PYK2, MAP2K3, MAP2K3 and PLG1. Interestingly, T-47D and the two ERBB2-overexpressing lines, BT-474 and SK-BR-3, share amplifications at five gene loci: MAP2K6, CHN2, PRKCA, LIMK1 and c-MYC.

Table 2 Components of the epidermal growth factor receptor pathway affected by copy number change


We examined the genomes of seven commonly used breast cancer cell models in unprecedented detail for segmental copy number status, cataloging the boundaries of gains and losses throughout these genomes. In addition, we demonstrated that copy number alteration of multiple genetic loci involved in the EGF family of pathways is common in these cell lines, which suggests that disruption of this frequently dysregulated pathway in breast cancer may occur at several points in the signaling cascade and that several disruptions may occur in combination.

Furthermore, because these cell lines serve as models for studying the molecular biology of breast cancer, it is essential to take into account the potential influence of genetic alterations when interpreting biological data. For example, using these lines to study the EGF family of pathways, multiple endogenous genetic alterations may have a role in biochemical and biological observations. Our work provides a comprehensive list of high-level segmental gains and losses for each genome, providing a database of copy number alterations as a resource for breast cancer research with these cell lines.