The search for differentially expressed genes in tumours has made extensive use of array technology. Most studies have involved tumour biopsy samples or established tumour-derived cell lines [1]. Differentially expressed genes may help to identify tumours with high metastatic potential and pathways that might be therapeutic targets.

As noted by Dairkee et al. [2], one drawback of using established cell lines is that the process of immortalization in vitro can bias the expression profile when compared to native tissues. Another problem is that a number of widely-used cell lines have sub-lines that differ in their karyotypes and RNA expression levels [35]. Biopsies from tumours, unless they are microdissected, contain heterogeneous cell types. A molecular signature (or expression profile) of the whole tumour can be misleading since it is a composite of all the cells, normal and malignant, present in it. Although tumours may consist of several populations of cells that deviate slightly from the clonal strain of origin, metastases may involve only the subline(s) with specific genetic changes that favour metastatic behaviour. Cultures from freshly-derived ascites or pleural effusions may therefore be relatively homogeneous before they acquire the further mutations that immortalize cell lines with unlimited growth potential.

In this communication we report which genes, within a defined set of well-known cancer-related genes, are differentially expressed in freshly-derived breast cancer cell cultures compared with primary normal, established normal and breast cancer cultures. We compared expression patterns in cultures of finite life-span cells from pleural effusions or ascites of breast cancer patients (Mount Sinai School of Medicine: MSSM) with commercially-available finite life-span normal breast epithelial cells (HMECs). To ensure comparability, cultures were used at similar passage numbers and the karyotypes were analyzed. We also compared the expression profiles of some of the most widely-used established breast cancer cell lines and three putatively normal established breast cell lines.


Cultures of finite life-span cells

a): HMECs: Different batches of normal human mammary epithelial cells cultures were obtained (Biowhittaker Molecular Applications Inc., NJ). HMEC specimens were enumerated serially and maintained with MEGM (Clonetics, MD) supplemented with 2.5% FCS (Gibco Invitrogen, CA).

b): MSSMs: Breast cancer cells obtained from ascitic fluids or pleural effusions of patients with breast cancer were seeded and maintained in our laboratory. They were designated MSSM 3 through MSSM 14 and maintained in MEGM supplemented with 2.5% of the corresponding original fluid when available and 2.5% FCS (Gibco Invitrogen, CA). The full characterization of these cultures will be published elsewhere (manuscript in preparation).

HMECs were passaged only twice or thrice in our laboratory. MSSMs were harvested at the 5th or 6th passage after initial plating.

Cultures of established cell lines

a): Normal-established (N-est.): the cell lines used were derived from normal tissues: MTSV1-7 (a gift from J. Taylor-Papadimitriou), MCF10A (from R Mira y Lopez) and MCF10F (ATCC, MD).

b): Tumour-established (T-est.): cell lines established from breast cancer pleural effusions or tumour tissues: MDA-MB231, MDA-MB453, MDA-MB468, two strains of MCF 7 (designated MCF 7N and MCF 7P), T47D, BT20 and BT474 (all from ATCC, MD). They were maintained in medium and supplements as recommended by ATCC.

All cell cultures were harvested 48–72 h after plating at about 80% confluence. Sources and markers are given in Table 1.

Table 1 Cell origin and RNA expression of some frequently used markers


The Atlas Human Cancer 1.2 cDNA expression array (Clontech, CA) is a nylon membrane printed with 200–600 bp fragments of 1176 characterized genes involved in cancer, 9 housekeeping genes and 6 negative controls. RNAs were extracted and labelled with the Atlas pure total RNA labelling system and hybridized according to the manufacturer's instructions.

All the cell lines used for the arrays (9 HMECs, 10 MSSMs, 3 N-est and 7 T-est) were probed twice in separate assays. The accuracy of the duplicates was assessed by Pearson's correlation coefficient based on the adjusted intensities of all the genes spotted on the membrane, which ranged from 0.93 to 0.99.

Hybridizations with 30 μg of total RNA were performed according to the manufacturer's instructions. The hybridized membranes were exposed to a phosphorimager screen and were read at 100 μm resolution in a Storm Phosphorimaging system (Molecular Dynamics, CA). The scanned images were transformed to TIFF files with a PC bit order and then aligned and analyzed using AtlasImage 2.01 software (Clontech, CA). To average or compare the samples, the adjusted intensity signal was normalized using the global normalization mode featured in the software. We report only (a) those genes with significant (p < 0.01) differential expression of 2-fold or more, and (b) genes that were undefined for all the cell lines belonging to one type of sample, but were detected in other types with a difference at least equivalent to one background (540 units in intensity). (Undefined genes are those for which the intensity was below the signal threshold).

The AtlasImage software compares only two samples at a time. When we used it to determine the differences between cell classes, we first averaged the cell lines in the four classes (HMEC, MSSM, N-est and T-est) and then performed the comparisons as instructed by the manufacturer.

Statistical analyses (correlation and two-tailed p values) were performed using Microsoft Excel 2000. Further analyses were performed with Significance Analysis of Microarrays (SAM) [6], Prediction Analysis for Microarrays (PAM) (Stanford Univ., USA), FatiGO [7], Pomelo tool and SVM (Bioinformatics unit, CNIO, Spain [8]) and GoMiner [9].

Quantitative real-time PCR (Q-PCR)

To validate the results of the cDNA array experiments, some of the genes found to be differentially expressed were further examined by real-time PCR in 10 HMECs, 9 MSSMs, 3 N-est and 8 T-est cell lines. Five μg of total RNA (corresponding to about 100 ng of mRNA) were reverse-transcribed with oligo(dT) (SuperScript II system, Invitrogen, CA) in a 20 μl reaction volume, and after 125-fold dilution, 1.25 μl were used for a 40-cycle PCR on an ABI Prism 7900 thermal cycler. The reaction was carried out in a 384-well plate with a QuantiTect SYBR Green PCR kit (Qiagen Inc, CA) at an annealing temperature of 63°C and detection at 2–5°C below the Tm of the product as determined from its dissociation curve. Product size was confirmed by agarose gel electrophoresis. The efficiency of each pair of primers for amplification was determined and expression of each gene relative to G3PDH was assessed by the program Qgene [10]. Primers were designed using the program PrimerQuest or Primer3, unless otherwise stated. Primer sequences, lengths and Tms of the products are given in Additional file 4.

Samples were tested twice in triplicate. Pearson correlation coefficients for the duplicate Q-PCR results ranged between 0.89 and 0.99.


Overall gene expression and class prediction

All the MSSM cultures displayed similar growth rates and had no or minimal chromosomal changes (data not shown). Table 1 shows the main characteristics of these cells.

Each cell line was probed twice. Averages of the duplicates after normalization of the adjusted intensities (as described in the Atlas manual and briefly in Materials and Methods) were used to obtain the expression values for further analysis. The overall gene expression profile, as determined by Pearson's correlation coefficient, discriminates between HMECs and cells derived from metastasis of breast cancer (MSSMs). As seen in Table 2 (and detailed in Additional file 1) the correlation within cell classes was > 0.90; the correlation between cell classes was < 0.82.

Table 2 Correlation between cell lines.

Hybridization signals of 928 genes (78.9%) were represented in at least three cell lines. (One class, N-est, comprised three cell lines; the other three classes, HMEC, MSSM and T-est, comprised more.) We used this subset of 928 genes, plus the 9 control genes, to construct a classification model using two prediction programs: PAM and SVM. Since these algorithms do not work well when the numbers of members differ among classes, we trained the SVM program using comparable numbers of lines from three classes (8 HMECs, 9 MSSMs and 7 T-est) and treated the three N-est cell lines and the remaining HMECs and MSSMs as unknowns to be classified. The model thus generated had a classification accuracy of 100% (24/24) by the leave-one-out cross-validation method. The prediction for the three N-est lines was T-est, suggesting that these cells have an expression profile resembling those of the other established cell lines rather than the cultures of finite life-span cells. The remaining HMEC and MSSM cell cultures were correctly assigned to their respective classes (see Additional file 5).

The same analysis with PAM, applying a threshold of 3.5, which gives the maximum number of significant genes yielding no misclassification error upon cross-validation, predicted MTSV1-7 and MCF 10F as members of the "T-est" class and MCF 10A as an HMEC-class member with a probability of 1. Again, HMEC 13 and MSSM 14 were properly assigned (Fig 1 and Additional file 2). Further to elucidate the classification of MCF10A, which is the most widely-used putatively normal breast cell line in array analysis, we examined the expression values of KT14, KT8/18 and KT19 (Fig. 2A and 2B) and CD104. The MCF10A values were similar to those of the HMECs.

Figure 1
figure 1

PAM (Prediction Analysis of Microarrays). This is a statistical technique for class prediction from gene expression data using nearest shrunken centroids. This method identifies the subsets of genes that best characterize each class. a) Plot of the shrunkencentroids for the specified threshold with the ranked list of significant genes that are used for the predictions. b) Prediction and probabilities of the test cell lines (MTSV1-7, MCF10A, MCF10F, HMEC 13 and MSSM 14).

Figure 2
figure 2

Expression of KT14, KT18 and KT19. a) Plot showsthe correlation of the adjusted intensity (AI) signal of the pair KT8/KT18 for each cell line. b) AI of KT14, KT18 and KT19 of each cell line and its relative percentage.

Comparison of non-immortalized normal and cancer breast cells

Of the 1176 genes in the array, 862 (73.3%) gave hybridization signals in at least three of the finite life-span cell lines used. Of these, 123 (14.3%) showed differential expression when HMEC and MSSM cells were analyzed using the AtlasImage software (ratio >2 and difference in adjusted intensity > 540), and 101 were deemed significant genes with SAM (q-value: 0.7298, median #FDR: 0.73718) (Additional file 6).

Of these 123 from Atlas and 101 from SAM, 75 genes were differentially expressed with at least a 2-fold change according to SAM and AtlasImage, a minimum difference in intensity of 540 units and a p-value < 0.01 (Excel). These two programs use different algorithms to calculate the ratio (AtlasImage) and fold (SAM) values, so the outputs are not exactly the same. Fig. 3 shows the relative expressions of these 75 genes in all the cell lines analyzed by the arrays using the software Pomelo Tool (FDR and p-values for each gene in Additional file 7).

Figure 3
figure 3

Relative expression of all the 29 cell linestested by Atlas arrays. Output of the Pomelo tool showing the log2 of the relative expression of the selected genes in each cell line to the average expression of all the cell lines. The 75 genes shown here were considered differentially expressed by SAM and by AtlasImage, with at least a 2-fold change for both programs, when we compared the HMEC and MSSM cell classes.

Some genes showed a broad range of expression values among cell lines belonging to the same group: one or two individual cell lines over-expressed the gene (>5 fold the average of the cell class), while others in the same class gave no signal or were barely above background. Since we were looking for genes that could enable us to differentiate among types of cell lines, and hence be useful as markers for each class, we inspected the values of these genes manually. The aim was to exclude from the 75 genes in Fig 3 any that gave extreme values in 2 or more cell lines within a class; "extreme values" were those that lay in the range of the other class, or biased the average of the class in question to generate the required 2-fold change. Twenty seven up-regulated and 21 down-regulated genes remained, in addition to the cytokeratins, which are not included in Tables 3 and 4 but shown in Fig 2. Their distribution according to GO (Gene Ontology) terms by FatiGO at level 3 of biological processes is shown. These changes in gene expression stress the importance in the malignant process of both a diminished capacity for cell-cell adhesion and the remodelling of the extracellular matrix (ECM). For example: genes involved in adhesion and downstream signalling (int α4/α6, P-cadherin (CDH3), γ catenin (JUP)) or inhibitors of remodelling (SPINT-2) were down-regulated, and ECM remodelling enzymes (MMP11, TIMP2 and the cascade TIMP1-SPARC-TGFBI) were up-regulated, as shown in Tables 3 and 4 (SVG file and p-values of GOMiner in Additional files 3 and 8 respectively).

Table 3 Gene expression of MSSM cells in relation to HMEC cells Up regulated Genes in MSSM cells
Table 4 Gene expression of MSSM cells in relation to HMEC cells Down regulated Genes in MSSM cells.

Comparison of immortal and non-immortal breast cancer cells

Fifty genes showed significant >2-fold (according to SAM software) and >2.5-fold (according to Atlas) differences between the MSSMs and the established breast cancer cell lines. Twenty-five were up-regulated and 25 down-regulated in the established breast cancer cell lines compared to the finite life-span cultures. Most of the differentially expressed genes can be classified under the GO terms "metabolism" and "cell communication". In particular: most of the down-regulated genes seem to be related to remodelling of the extracellular matrix, cell adhesion and receptor-linked signal transduction, while the up-regulated genes are related to general signal transduction pathways and cell proliferation (Tables 5 and 6).

Table 5 Gene expression of T-est cells in relation to MSSM cells 25 Significant Up-regulated Genes in T-est cells
Table 6 Gene expression of T-est cells in relation to MSSM cells 25 Significant Down-regulated Genes in T-est cells

Real-time PCR

To validate the differences seen in the arrays, some genes were tested by real time PCR. We selected several genes of which the expression was significantly increased or decreased, according to SAM and Atlas, for both the MSSM/HMEC and T-est/MSSM comparisons: 14-3-3σ (also called SFN), SPINT2, FES, SPARC, BIGH3 (or TGFBI), TIMP1, TIMP2, MMP11 and DAB2. Also, we analyzed some genes that were deemed significant for only one of the comparisons: NOTCH1, PLAU, CDA and SERPINB2 (or PAI2), and a few genes that were non-significant but somehow related to some of the aforementioned genes: tPA, PAI1, uPAR, DCK.

It has been reported that correlation between Q-PCR and array data is highly variable [11]. It depends, in part, on the sensitivity of arrays in detecting genes with low expression levels or saturation due to very high ones. The genes we have tested showed correlations that ranged from 0.563 to 0.959. For the comparison between the normal (HMEC) and the tumour (MSSM) finite life-span cultures, the Q-PCR results for the manually curated genes (as explained above) supported the findings of the arrays in 17/17 (100%) of cases. In the T-est/MSSM evaluation, agreement between the two techniques was found in 15/17 (88.2%) of the cases (Fig. 4 and data available upon request).

Figure 4
figure 4

Relative expression of some genes validated byQ-PCR. a) Mean normalized expression (MNE) of SPARC. b) MNE of BIGH3 = TGBFI. c) MNE of PAI-1. d) MNE of NOTCH1 in HMECs and MSSMs cells. e) Relative expression, of MSSMs and established cell lines, toMCF10A. Note the difference between the two MCF7s. f) MNE of CYR61. g) MNE of DAB-2. For the complete list of validated genes, see Table S1.

One of the genes excluded from the HMEC/MSSM comparison was CYR61, which was tested by Q-PCR. Even though the overall up-regulation value for this gene was consistent (SAM: 3.53, Atlas: 3.48, Q-PCR: 3.29), its expression levels crossed over to the values of the other cell class in 4/17 cell lines, thus invalidating CYR61 as a reliable marker by itself.


Established breast cancer cell lines have been widely used to study signal transduction pathways, test new pharmaceuticals and determine expression profiles that might predict the metastatic capabilities of primary tumours. In many cases MCF10A has been chosen as the "normal" control, even though this cell line has been reported to possess markers for both myoepithelial and luminal phenotypes [12].

One of the controversies about the use of HMECs as controls is the probable myoepithelial origin of these cells [13] based on expression of KT14 and CD104 (ITGB4) [14, 15]. The expression levels of these genes in the MCF10 cells were equivalent to those in the HMECs. Furthermore, the prediction of PAM, based on the 58 genes that discriminate between the different classes of cells used in this study, was that MCF10A was similar to HMECs. Hence, MCF10A cells would have the same limitation as controls as the HMECs. Nevertheless, HMEC cells have the advantage of not being immortalized and pooling them combines the genetic backgrounds of a large cohort.

Both classes of non-immortalized cell lines, HMEC and MSSM, are more homogeneous than the established ones (N-est and T-est), as seen in Table 2 and Additional file 1. This is probably due to the cumulative effects of the mutations accrued individually by the established cell lines during successive passages and immortalization.

As seen in Fig 3, the expression of some genes in a particular class is clearly different from the others, e.g. up-regulation of FES, MMP11, DAB2 and down-regulation of SPINT2, SFN, JUP for the MSSM cells lines. Others are more distinctive of a "state", e.g. distribution of cytokeratins and down-regulation of certain integrins (ITGA6/ITGB4, ITGA7) in all the tumour-derived cell lines. Therefore, the expression levels of these genes can be seen as specific attributes of certain classes. Their combined expression defines a profile that can be used to construct a model similar to those built by SVM and PAM for predicting the classification of an unknown cell line accurately, as shown in fig 1B.

We tested only two sub-lines of MCF-7, well-known to be highly variable [35], and a single sub-line of each of the other established cell lines. We recognize that other sub-lines might be different.

When the MSSMs are compared with the HMECs, several genes display differential expression in a mode consistent with previous publications, where they have been shown to be significant for malignant progression or metastasis: down-regulation of the tumour suppressor and inhibitor of mitotic phase entry 14-3-3σ (SFN) [16, 17], the serine protease inhibitor SPINT2 [18] and JUP (γ-catenin) [19]; and up-regulation of FES [20] and SPARC [21]. In addition, there is the "cadherin switching" (CDH3 = P-cadherin to CDH2 = N-cadherin) and its relationship to FGFR1 and MMP9 [22, 23].

SPARC expression was detected in 17/17 human breast tumour biopsies and to a lesser extent in some established cell lines [21]. It has also been associated with malignant progression and invasive potential in breast cancer [24, 25], and together with MMP11 in colorectal [26] and oesophageal cancer [27]. Its over-expression increases motility and invasion [28] and induces growth inhibition [29] in established breast cancer cell lines. In addition, it has been shown to induce expression of BIGH3 and PAI-1[30]. MSSM cells showed up-regulation of SPARC, BIGH3, PAI1 and MMP11, while the T-est cells showed down-regulation of SPARC, BIGH3 and PAI1 (MDA-MB231 is an exception for the latter gene) (Fig 4A-B-C).

We also found genes of which the behaviour did not fully agree with previous descriptions, i.e. NOTCH1, CYR61 and DAB2. The many and varied functions of Notch signalling, achieved through activation or down-regulation, have been recently reviewed [31]. In the MSSM samples, NOTCH1 and its ligand JAG2 are down-regulated. In this case, this pathway is more likely to function as a tumour suppressor than an oncoprotein. This conclusion would be less clear if we had only compared established cell lines (mainly MCF10A with both MCF7s) (Fig 4D–E). CYR61 is a pro-angiogenic, secreted protein encoded by a growth factor-inducible immediate-early gene. It is over-expressed in some invasive established breast cancer cell lines and in 30–36% of primary tumours [32, 33]. In this study we found only three cell lines with a truly high over-expression (MSSM6, MDA-MB231 and BT20) and five with a moderate over-expression compared to the mean expression in HMECs (between 2 and 4 fold). The same conclusion would have applied if we had considered MCF10A as a control (Fig 4F). DAB2 is considered a tumour suppressor since its expression is down-regulated in ovarian carcinomas and in some established breast cancer cell lines [34], and up-regulated during megakaryocyte differentiation [35]. Its continued expression in tumour cells led to growth inhibition or cell death [36] unless the cells were in contact with some type of basement membrane [37]. MSSM cells showed up-regulation of this gene (Fig 4G), perhaps because they grew as an attached cell line or because of their finite life-span phenotype in vitro.

Epithelial-mesenchymal transition (EMT) is considered a mechanism for carcinoma progression and metastasis, and the expression of vimentin (VIM) is its main marker. This view has now been extended to include whole pathways and a more complex relationship with the microenvironment of the cell [38, 39]. Among the other genes regulated during EMT [40], the MSSM cells showed up-regulation of COL6A1, SPARC, CDH2 and DAB2 and down-regulation of JUP and BTG2 (Tables 3 and 4).


Using arrays, we have studied 10 finite lifespan breast cancer cell lines freshly isolated from metastatic pleural or peritoneal fluids, 9 finite lifespan normal breast cell lines, 7 established breast cancer cell lines, and 3 established normal breast cell lines. We tested 1176 genes considered to be related to cancer. Within each cell class there was significant homogeneity of gene expression. Two clusters of genes distinguished the MSSMs from the HMECs. These 48 genes, which were differentially expressed by at least 2 fold, concerned cell-cell interactions and remodelling of the extracellular matrix. Fifty genes that were differentially expressed at least 2 fold between MSSMs and established breast cancer cell lines are generally considered to be involved in cell communication and metabolism. Established breast cancer cell lines have been used to model biochemical and pharmacological responses in human breast cancer; the differences from freshly isolated breast cancer lines imply they are not wholly satisfactory models.