Background

About ten percent of breast cancer patients have a history of multiple breast cancer cases in their family, suggesting the inheritance of breast cancer susceptibility alleles in these families. Germline mutations in the BRCA1 and BRCA2 genes are identified in about one quarter of the families with breast cancer. Female carriers of BRCA1 and BRCA2 mutations have an estimated 50–90% life-time risk to develop breast cancer, classifying both genes as high-risk susceptibility genes [1, 2]. Other high-risk breast cancer genes include the p53, PTEN and STK11 genes, but mutations in these genes account for only few familial breast cancers. CHEK2 was the first moderate-risk breast cancer gene being identified [35]. Germline mutations in CHEK2 are identified in up to 5% of breast cancer families, albeit that their prevalence varies widely among populations. Female carriers of CHEK2 mutations have a moderate two to three fold increased risk to develop breast cancer. By now, several other moderate-risk breast cancer genes have been identified, including ATM, BRIP1 and PALB2 [69]. Mutations in these genes all confer increased breast cancer risks of two to three fold and mutations in each of these genes are identified in about 1% of the familial breast cancers. Recently, the international breast cancer association consortium (BCAC) has conducted a large genome-wide association study and identified five single nucleotide polymorphisms (SNPs) that associated with breast cancer [10]. Four of these SNPs were within haplotype blocks that contained genes: SNP rs2981582 locates in intron 2 of the FGFR2 gene at chromosome 10q; SNP rs889312 locates near MAP3K1 at 5q; SNP rs3803662 locates between TNRC9 and the LOC643714 gene at 16q; and SNP rs3817198 locates intronic in LSP1 at 11p. SNP rs13281615 locates at 8q24 in a region without any annotated genes. Importantly, independent genome-wide association studies have associated other SNPs in FGFR2 with breast cancer [11, 12]. As FGFR2 had already been implicated in breast cancer [1320], the significance of the FGFR2 SNPs as susceptibility alleles seemed evident. The TNRC9 SNP had also been associated with breast cancer in another study [21]. Lastly, the 8q24 SNP was of particular interest because other SNPs at 8q24 had been associated with increased risks of prostate cancer and colorectal cancer [2226]. BCAC estimated that each of the five identified SNPs associated with rather small increased breast cancer risks, ranging from just over 1.0 to 1.3 fold, classifying them as low-risk susceptibility alleles [10]. However, these low-risk SNPs are very common and their impact is therefore still substantial, together accounting for almost 5% of the familial breast cancers.

The mechanism by which the low-risk susceptibility alleles confer breast cancer risks was obscure [10]. In analogy with the high-risk and moderate-risk breast cancer genes, it had been anticipated that the identified SNPs associated with disease-causing alleles in the coding sequences of nearby located genes. However, extensive sequencing efforts have not identified such alleles in the SNP-associated haplotype blocks, suggesting that the SNPs themselves might be the disease-causing susceptibility alleles [10]. BCAC therefore proposed an alternative disease mechanism that involves expression modulation of genes located in the vicinity of the identified SNPs, thereby conferring low breast cancer risks. Here, we have evaluated expression modulation in a well-characterized cohort of 40 human breast cancer cell lines, allowing us to specifically address whether this mechanism might operate in breast cancer cells.

Methods

Breast cancer cell lines

The 40 human breast cancer cell lines used in this study are listed in Table 1 and have been described in detail elsewhere [27]. Microsatellite analysis with nearly 150 polymorphic markers had shown that all cell lines are unique and monoclonal [28].

Table 1 Genotypes of five low-risk SNPs in 40 human breast cancer cell lines

Genotyping

Genotypes of five low-risk susceptibility alleles have been determined: rs889312 (A>C) near the MAP3K1 gene; rs2981582 (C>T) in the FGFR2 gene; rs3803662 (C>T) near the TNRC9 gene; rs3817198 (T>C) in the LSP1 gene and rs13281615 (A>G) that located in a gene desert at chromosome 8q24 [10]. Genotyping was performed by direct sequencing of PCR-amplified genomic templates, using the BigDye Terminator V3.1 Cycle Sequencing Kit (Applied Biosystems) and an ABI 3130xL Genetic Analyzer. Primer sequences are available upon request.

Allele frequencies of cases and controls reported by BCAC have been obtained by using their reported Odds Ratio data [10], and inferring allele frequencies by assuming that Odds Ratios reflect the ratio of minor allele carriers versus major allele carriers from the cases divided by the ratio of minor allele carriers versus major allele carriers from the controls.

Expression analysis

Transcript expression levels of four genes have been determined: MAP3K1, FGFR2, TNRC9 and LSP1. Quantitative real-time PCR (qPCR) was performed on cDNA templates that had been generated with oligo-dT and random hexamer primers from total RNA isolates, using Power SYBR Green PCR Master Mix (Applied Biosystems) and an ABI Prism 7700. Ct values were normalized according HPRT and HMBS housekeeper Ct values. Transcript expression had also been determined by Human Exon 1.0 ST microarrays (Affymetrix), as described elsewhere [29]. The exon array data have been deposited in NCBI's Gene Expression Omnibus [30] and are accessible through GEO Series accession number GSE16732.

Statistical analysis

Statistical analyses were performed with Statistical Package for the Social Sciences (SPSS) version 11.5, considering P-values of less than 0.05 significant. Fisher's exact test was used to determine association of the SNP genotypes with the breast cancer cell lines. The Kruskal Wallis test was used to compare gene expression levels among three SNP genotype groups (major homozygotes, heterozygotes, and minor homozygotes).

Results and discussion

Genotyping of low-risk susceptibility alleles in breast cancer cell lines

Genotypes of five low-risk susceptibility alleles [10] were determined in a cohort of 40 human breast cancer cell lines. For each SNP, frequencies of major homozygotes, heterozygotes and minor homozygotes are shown in Figure 1a and genotypes are detailed in Table 1. Frequencies of homozygote genotypes typically were higher than anticipated, likely related to allelic losses in the cell line samples (Figure 1a; [10]). For four SNPs (8q24, MAP3K1, FGFR2 and TNRC9), the minor allele frequencies among the cell lines were higher than among the 21,860 BCAC breast cancer cases and 22,578 population controls (Figure 1b; [10]). Fisher's exact testing indicated that the minor allele frequencies among the cell lines were significantly higher than the BCAC population controls for two SNPs: MAP3K1 and TNRC9 (Figure 1b). In Table 1 and 2, we also included previously-determined phenotypic and genotypic data on the breast cancer cell lines, including data on molecular subtyping and allelotyping (Hollestelle et al. submitted for publication; [28]). Together with the SNP genotypes, we provide a base line for functional studies in this cohort of breast cancer cell lines.

Figure 1
figure 1

Genotypes and minor allele frequencies of five low-risk breast cancer susceptibility alleles or SNPs in human breast cancer cell lines. 1a. Gray bars represent SNP genotype frequencies of 21,860 blood-derived samples from breast cancer cases reported by the breast cancer association consortium BCAC [10], and white bars represent genotype frequencies in 40 breast cancer cell lines. Maj H, major homozygotes; Min H, minor homozygotes; and Het, heterozygote allele carriers. The major and minor alleles of each allele are indicated between brackets. 1b. Black and gray bars represent minor allele frequencies in 22,578 population controls and 21,860 breast cancer cases, respectively, as reported by BCAC [10]. White bars represent frequencies identified in 40 breast cancer cell lines.

Table 2 Molecular and phenotypic characterizations of 40 breast cancer cell lines

Expression levels of nearby located genes in breast cancer cell lines do not correlate with their SNP genotype

Surprisingly, BCAC had not identified disease-causing gene variants within the haplotype blocks of the five low-risk SNPs [10]. They proposed an alternative disease mechanism, in which SNP genotypes modulate expression levels of nearby located genes. Such disease mechanism was conceivable because the minor SNP alleles confer only low risks for breast cancer. Here, we have evaluated whether gene expression modulation is operative in breast cancer cell lines, by associating SNP genotypes of the breast cancer cell lines with the expression levels of nearby located genes.

Gene expression data of the four genes physically nearest to the SNPs were obtained by Affymetrix Human Exon 1.0 ST microarray profiling and by qPCR analysis. Both transcript expression analysis methods revealed similar expression levels for each of the four genes: MAP3K1, FGFR2, TNRC9 and LSP1, with Spearman correlation coefficients of -0.6, -0.7, -0.8 and -0.4, respectively, among the 40 breast cancer cell lines (Table 3 and Figures 2 and 3). Because BCAC had shown that the low-risk SNPs confer breast cancer risks in a dose-dependent manner, with the highest risks for the minor homozygotes [10], association between gene expression levels and SNP genotypes was performed by three-group comparisons. Exon array data are shown in Figure 2, with cell lines from each genotype group depicted in a different color. Unique outliers typically represented decreased expression of one or more probes sets, such as exon 17 of MAP3K1 or exons 3–5 of TNRC9, possibly related to the presence of SNPs in probe sequences, alternative splicing or genomic deletions [29]. Expression of recurrent isoforms as reported by NCBI was detected only for the FGFR2 gene, with two cell lines expressing the isoform that lacked exon 9. Both cell lines were minor homozygotes for the FGFR2 SNP. Overall, there was no apparent association between the exon array expression level of each of the four genes and their SNP genotypes (Figure 2). The qPCR Ct-values are detailed in Table 3 and the three-group comparisons are shown in Figure 3. Again, we did not detect any association between gene expression levels with SNP genotypes for the four genes. It is possible that gene expression levels are affected by allelic loss of the gene loci. We therefore also have compared gene expression levels in major and minor homozygotes with allelic loss to the gene expression levels in cell lines without allelic loss, but gene expression levels did not correlate with allelic losses either (Table 4). Altogether, these results strongly suggest that a putative disease mechanism by expression modulation does not operate via cancer cells. Yet, recent studies have shown that expression levels of the FGFR2, MAP3K1 and TNRC9 genes associated with their SNP genotype in clinical breast cancer samples [31, 32]. It may be that expression modulation is operative in non-neoplastic stromal or epithelial cells and perhaps only early in carcinogenesis. Alternatively, it may be that expression modulation of these genes was operative in invasive breast cancer cells but was lost upon in vitro propagation of the cell lines. Expression analysis of carefully dissected tumor cells and non-neoplastic epithelial and stromal cells from clinical breast cancer samples should resolve this issue and may determine the precise mechanism of expression modulation by low-risk breast cancer susceptibility alleles.

Figure 2
figure 2

Normalized expression levels from Affymetrix Human Exon 1.0 ST microarrays of 2a. MAP3K1 ; 2b. FGFR2 ; 2c. TNRC9 ; and 2d. LSP1 ; from 40 human breast cancer cell lines. Kruskal Wallis testing using the average expression among all probe sets for each gene did not reveal significant associations between gene expression and SNP genotypes. Each line represents a cell line, with the color-coding according the genotype groups: green, major homozygotes; red, minor homozygotes; and blue, heterozygotes. Two cell lines with the delEx9 isoform of FGFR2 are indicated with bold red lines. Probe sets for each gene were ordered by physical location and indicated by exon, where probe sets that were not unique for that gene were omitted. Probe sets with expression values less than the background of 50 were also omitted, unless more than 3 cell lines had expression levels higher than 100.

Figure 3
figure 3

Correlation of gene expression levels of 3a. MAP3K1 ; 3b. FGFR2 ; 3c. TNRC9 ; and 3d. LSP1 ; with the SNP genotypes in 40 human breast cancer cell lines. Kruskal Wallis testing did not reveal any significant association between gene expression and SNP genotypes. Maj H, major homozygotes; Min H, minor homozygotes; and Het, heterozygote allele carriers. The number of cell lines in each genotype group is indicated under the genotypes and data are detailed in Table 1.

Table 3 Gene expression analysis of MAP3K1, FGFR2, TNRC9 and LSP1 in 40 human breast cancer cell lines by quantitative RT-PCR, represented by normalized Ct values
Table 4 Gene expression of MAP3K1, FGFR2, TNRC9 and LSP1 in human breast cancer cell lines according to their allelic loss status at the gene locus

Conclusion

We present the genotypes of five low-risk susceptibility alleles or SNPs of 40 human breast cancer cell lines. Using this cell line model, we have evaluated the BCAC hypothesis that low-risk SNPs confer breast cancer risks by modulation of expression levels of nearby located genes. We found no evidence for expression modulation in the breast cancer cell lines, suggesting that such disease mechanism is more likely to operate in non-neoplastic epithelial or stromal cells or has been lost during in vitro propagation of the cell lines.