Cancer stem cell theory proposes that tumor biology is driven by cancer stem and progenitor cells present in the tumor.15 In culture, these cells are refractory to treatment and may be responsible for the high recurrence rate observed in both estrogen receptor negative or positive (ER or ER+) breast cancers.68 The process by which cancer stem cells arise is unknown. It is not known whether stem cells are present in all breast cancers or to what extent they contribute to the overall number of tumor cells. One possible mechanism by which cancer stem cells may form is through the malignant transformation of normal stem cells via acquisition of somatic genetic abnormalities. In this proposed model, breast stem cells would initiate breast cancer and orchestrate tumor progression. They would retain some or all stem cell functional capacities but through mutational events would acquire malignant capabilities as well.

Attempts to identify and isolate breast stem cells have relied upon expression of cell surface markers. Although a definitive marker profile is not yet known, research in the mouse and human mammary gland indicates that benign breast stem cells do not express cell surface markers CD31 or CD45 but do express CD49f and CD24 (lin−CD49f+CD24+ cells).913 In contrast, malignant breast stem cells have been distinguished from benign breast stem cells by a variance in the cell surface marker profile.14 They reportedly express CD44 but do not express CD24 (lin−CD44+CD24). However, the CD44+CD24 expression profile may not be an absolute requirement of breast cancer stem cells.15

In this study, we sought to determine whether certain cancer-associated genetic abnormalities are also in breast cancer stem/progenitor populations. If so, this would suggest that these abnormalities are involved in the initiation of breast cancers that progress to clinical detection. Two genes commonly disrupted in breast cancers are PIK3CA and AKT1. The frequency of PIK3CA mutations is 8–40%, with mutations residing primarily in two hotspots of the gene, exons 20 (E545K) and 9 (H1047R).1619 A recent study observed PIK3CA mutations in matched samples of in situ and invasive tumors, suggesting that this mutation may occur early in breast cancer development.20 PIK3CA encodes the p110alpha catalytic subunit of phosphoinositol-3-kinase (PI3K).21 When defective, it increases the catalytic activity of PI3K and the phosphorylation of AKT1 inducing oncogenic transformation.2224 AKT1 mutations are rarer in breast cancer, with a frequency of 2–8%.2527 Mutations in exon 2 (E17K), of the AKT1 gene, are similar to the PIK3CA mutations in that they result in constitutive activation of AKT1. The biological consequences of AKT1 activation are increased cell proliferation, survival, and motility.28 Coexistent mutations in AKT1 and PIK3CA are reportedly infrequent in breast cancer.29,30

Much of what is known about human breast cancer stem cells has been achieved through the in vitro study of breast cancer cell lines. This study was designed to achieve an in vivo genetic examination of uncultured stem/progenitor cells, freshly obtained from surgical specimens.

Methods

Benign and Cancer Breast Tissue Procurement

This study was approved by the institutional review board. Patients with invasive ductal carcinomas were enrolled. Cancer specimens were collected at the time of the mastectomy or lumpectomy, before any adjuvant treatment. Pathologically confirmed benign breast tissue specimens were obtained from women undergoing reduction mammoplasty. Two breast cancer cell lines, HCC1954 and T47D, were included in this study.

Collection of Breast Cancer Stem/Progenitor Cells (FACS)

All specimens were minced and digested in mammary epithelial cell-specific medium containing 1× collagenase/hyaluronidase (Epicult, StemCell Technologies). Cell lines were cultured in Roswell Park Memorial Institute-1640 supplemented with 10% serum and 0.05% Gentamicin. Approximately 106 cells were labeled with fluorochrome-conjugated monoclonal antibodies against human CD45 (FITC), CD31 (FITC), CD24 (PE), CD49f (PE-Cy 5), and CD44 (PE-Cy7). Isotype control testing indicated no nonspecific binding. Subpopulations were separated based on surface antibody labeling and collected by discriminatory gating. The CD31+, CD45+ endothelial cells and leukocytes were removed (lin−). Cells were sorted into four lineage negative (lin−) populations: CD49f+CD24+, CD49f+CD24, CD49fCD24+, and CD49fCD24.

Mammosphere Production

Mammospheres were initiated from four breast cancer samples. Mammospheres, spherical structures enriched for stem/progenitor cells,31 were initiated from disassociated tumors or from stem/progenitor cell sub-populations isolated by FACS. In both cases, 104 cells/2 ml were cultivated in Mammocult Basal medium and 10% proliferation supplement (StemCell Technologies), supplemented with 0.2% heparin, and 10−3 M hydrocortisone on nonadherent plates at 37°C, 5% CO2. After 7 days, mammospheres were prepared for genetic analysis as described below.

Isolation of Genomic DNA and RNA

mRNA was collected from all FACS samples by standard spin column protocol (RNeasy Mini Kit, Qiagen). cDNA was produced by reverse transcription using random hexamers (Superscript III First Strand Kit, Invitrogen). Adequate amounts of genomic DNA were isolated from FACS samples by Whole Genomic DNA (wgDNA) amplification (REPLI-g, Qiagen).

Human Stem Cell Pluripotency Gene Expression

An average of 50 ng of cDNA, 15 μl TaqMan’s PreAmp Master Mix (2×) (Applied Biosystems) and 7.5 μl of TaqMan Human Stem Cell Pluripotency PreAmp Pool (Applied Biosystems) were combined. cDNA was amplified for 14 cycles (95°C 10 min, 95°C 15 s, 60°C 4 min). Upon completion, 27.5 μl of preamplified cDNA was amplified as per manufacturer protocol on a TaqMan Human Stem Cell Pluripotency Array Micro Fluidic Card (384 TaqMan Low Density Array, Applied Biosystems) (see Supplemental Material 1).

Genomic Mutation Analysis with the Sequenom MassArray Panel

wgDNA obtained from the four FACS cell populations of benign and breast cancer samples was examined using a multiplex PCR Sequenom MassARRAY system (Sequenom Inc). Samples of 10 ng of DNA were tested for 410 mutations in 30 human oncogenes (see Supplemental Material 2).

Statistical Methods

Quantitative Sequenom MassArray results were obtained through MALDI-TOF mass spectroscopy. The gene mutation status between groups was analyzed for statistically significant differences by Fisher’s exact test or chi-squared analysis using the Statview software (SAS Institute Inc., Cary, NC). The fold change in gene expression for a tested gene was calculated by 2−ΔΔCt.32 The gene expression differences between groups were determined by χ2 analysis.

Results

Thirteen invasive ductal breast cancer and 14 benign breast specimens were examined. Patient and tissue characteristics are displayed in Table 1.

Table 1 Patient demographics, specimen characteristics, and Sequenom MassARRAY mutation results

FACS Results

Figures 1 and 2 illustrate the strategy and results of FACS analyses of benign breast tissue, malignant breast tissue, and breast cancer cell lines. Through FACS, all four lin−cell subpopulations were acquired from benign breast tissue, malignant breast tissue and the two cell lines. There was no significant correlation between patient age and the number of lin−CD49f+/−CD24+/− cells in benign or malignant samples.

Fig. 1
figure 1

Flow sort dot plots demonstrating the distribution of breast cancer cells according to cell surface markers CD49f and CD24. Isotype control testing indicated no nonspecific binding. Discriminatory gating was designed to collect discrete subpopulations by initial testing of individual cell surface marker. a HCC 1954 cell line. b T-47D cell line. c Dot plot of a benign breast tissue sample. d Dot plot of an invasive ductal breast cancer sample. The benign breast tissue and the invasive ductal breast cancer dot plots are representative of all benign and malignant FACS results

Fig. 2
figure 2

Percentage of FACS cells collected from benign breast tissues, malignant breast tissues, and malignant breast cell lines. a Average percentages of the lin− subpopulations collected from benign and malignant tissue specimens. b Average percentages of specific lin− subpopulations among the total lin− subpopulations collected from benign and malignant breast tissues. c Average percentage of lin− subpopulations that demonstrated coexpression of CD44. d Average percentages of lin− subpopulations in ER+ tumors, ER tumors, as well as in the ER+ T-47D cell line and ER HCC1954 cell line

Figure 2a demonstrates the percentage of each of the four CD49f+/−CD24+/− subpopulations that were collected from benign or malignant breast tissues. In total, cells that were neither CD31+ nor CD45+ (lin−) but were CD49f+/−CD24+/− contributed less than 20% of the cells in both benign and tumor malignant tissues. On average, lin−CD49f+CD24+ constituted 1.2%, CD49f+CD24 cells 6.6%, CD49fCD24+ cells 0.7%, and CD49fCD24 cells 10.5% of the cells that were counted by FACS in each tumor. These frequencies were not significantly different between benign and tumor samples. Figure 2b depicts the frequencies of the individual CD49f+/−CD24+/− populations when measured specifically among the lin− cells.

CD44-positive cells were present in all four FACS subpopulations of benign and malignant breast tissues (Fig. 2c). The only differences were that CD44 coexpression was significantly lower in the lin−CD49f+CD24 malignant subpopulations compared with the benign CD49f+CD24 subpopulations, and CD44 expression was lower in the malignant lin−CD49fCD24+ subpopulations than any other malignant subpopulation.

Figure 2d shows that there were no significant differences in the frequencies of lin−CD49f+/−CD24+/− cells between tumors with different hormone receptor statuses. The two breast cancer cell lines examined in this study demonstrated a different frequency of subpopulations than the fresh benign and malignant tissues, but this was not dependent on hormone receptor status (Fig. 2d).

Mammosphere Productions

Mammospheres were successfully cultured from four of four tumors (Fig. 3a, b). All four lin− subpopulations were collected by FACS from the pool of mammospheres produced by each tumor (Fig. 3b). The frequencies of the four lin− populations within the pool of mammospheres were similar to those observed in benign and malignant tissues.

Fig. 3
figure 3

Two methods of examining lin− subpopulations in mammospheres. a All tumor lin− subpopulations developed into mammospheres and in the same ratios as the lin− subpopulations in benign and malignant tissues (see Fig. 2 for comparison). b Representative images of mammospheres produced on day 7 from the FACS lin− populations

Pluripotency/Multipotency Stem Cell Gene Expression

Figure 4 and Supplemental Table 3 indicate differences that were observed in the expression of 90 genes associated with stem cells or cell differentiation. Gene expression patterns were similar between matched malignant and benign lin− subpopulations. On average, only nine genes (10%) in each subpopulation demonstrated a significant tenfold change.

Fig. 4
figure 4

Comparisons of human stem cell pluripotency array micro fluidic card gene expression profiles of the four lin− subpopulations. This panel depicts the gene expression results of malignant lin− subpopulations compared to benign breast lin− subpopulations, tumors, mammospheres, and cell lines. Green squares indicate a tenfold increase in gene expression, red squares indicate a tenfold decrease in gene expression, yellow squares indicates significant differences in gene expression that are less than tenfold; “+” indicates an increase in gene expression, “−” indicates a decrease in gene expression. A pluripotent and multipotent-associated genes, B stem cell-associated genes. PP CD49f+CD24+, PM CD49f+CD24, MP CD49fCD24+, and MM CD49fCD24

Significant gene expression changes were observed between the four tumor lin− subpopulations and breast tumor cells in general. Each of the four subpopulations demonstrated an average of 28 (30%) tenfold gene expression changes that were statistically significant. The lin−CD49f+CD24+ and lin−CD49f+CD24 subpopulations were similar with an average of 3.5 stem cell-associated genes over- or under-expressed, compared with the tumor. Subpopulations lin−CD49fCD24+ and lin−CD49fCD24 both had nine multipotent genes overexpressed and three underexpressed compared with the tumor. Specifically, the lin−CD49fCD24+ cells over-expressed GDF3 relative to the tumor. Along with lin−CD49fCD24 cells, they over-expressed POU5F1, relative to the tumor. All four subpopulations varied in their stem cell-associated gene expression patterns, but all over-expressed FGF4, REST, and ZFP42.

Two of the four mammosphere samples were selected for genetic analyses after FACS. When the gene expression of malignant lin− subpopulations was compared with the matched FACS lin− subpopulations of mammospheres, an average of 14 (15.6%) changes per subpopulation was seen. In general, stem cell gene expression was slightly increased in the mammosphere subpopulations (Fig. 4), whereas differentiation-associated gene expression was decreased (Supplemental Table 3). Of note, GDF3 expression was increased in the lin−CD49fCD24 mammosphere-derived cells, whereas POU5F1 was underexpressed in the mammosphere lin−CD49fCD24+ cells.

When FACS malignant lin− subpopulations were compared with the FACS lin− subpopulations of the two cell lines, an average of 9.3 (10.3%) differences was observed in gene expression per lin− subpopulation. Of interest, there was significantly more NANOG expressed in lin−CD49fCD24+ and lin−CD49fCD24 subpopulations from fresh malignant samples.

Oncogene Mutation Analyses

The results of Sequenom MassARRAY gene mutation testing are shown in Table 1. No mutations were found in the 30 oncogenes tested on the Sequenom MassArray panel in any of the four FACS subpopulations of the 15 benign breast tissue samples. Within tumors tested, 88% demonstrated a mutation in their lin− subpopulations. Table 1 lists the 11 of 13 breast cancers that were tested for mutation analysis in this study. FACS sorting of 2 did not yield enough genetic material for gene expression and mutation analyses; thus gene expression studies were preferentially completed.

Table 1 specifically indicates the lin− subpopulations in which mutations were found. Similar PIK3CA mutations were detected in both cell lines in all four subpopulations. Mutations were observed in eight of 11 malignant breast specimens (73%). Mutations in PIK3CA were the most prevalent, occurring in six of 8 (75%) tumors carrying mutations. These mutations were detected in ER+PR+ and ERPR tumors and did not correlate with HER2 status. An AKT1 mutation in exon 2 (E17K) and a mutation in HRAS exon 2 (Q61R) also were detected in the subpopulations of one tumor each. Nearly all tumors with mutations contained them in the lin−CD49f+CD24+ and lin−CD49fCD24+ subpopulations. Only one tumor had a mutation in the lin−CD49f+CD24 subpopulation. Only in the breast cancer cell lines were mutations found in the lin−CD49fCD24 subpopulation.

Discussion

Fresh surgical specimens were an excellent source of the subpopulations of benign and malignant stem/progenitor cells. In most specimens, even from very small samples, sufficient numbers of cells were obtained to perform genetic analyses. Among the fresh benign and malignant breast samples there was a remarkable consistency of the percentage at which each of the subpopulations was observed. Although tumors contain increased numbers of epithelial cells and demonstrate loss of normal ductal and lobular structures, the percentages of the FACS lin− subpopulations remained similar to benign breast tissue. CD49fCD24+ and CD49f+CD24+ cells were consistently the rarest, whereas CD49fCD24 cells made up the bulk of lin− cells in both benign and malignant tissues. This suggests that there is a structured relationship with regard to the production rates of these cells. The consistency of the observation between benign and tumors samples suggests that the benign orderly established production rate is maintained after malignant transformation. It may be a requirement for successful tumor progression, as these are the tumors that came to clinical detection. In this study, technical limitations of FACS did not allow us to specifically collect CD44-positive or -negative cells with the CD49f+/−CD24+/− cells, but CD44 expression was measured in a subset of the benign and malignant tumor FACS samples. All four lin− subpopulations in benign and malignant breast samples contained a mixture of CD44-positive and -negative cells, indicating that CD49f+/−CD24+/− cells do not segregate exclusively with CD44 expression.

The genetic results of this study support two conclusions. First, this study showed that lin− subpopulations are different than the cells that make up the bulk of the tumor from which they are derived and may be responsible for performing multipotent stem cell functions in that tumor. FACS subpopulations demonstrated an increased level of stem cell-associated gene expression. This was seen primarily in the lin−CD49fCD24+ and lin−CD49fCD24 cells. These two subpopulations variably over-expressed primitive developmental genes POU5F1 and GDF3. That stem cell gene expression was increased in mammospheres developed from the lin−CD49fCD24 subpopulation suggests that there is a subset of more primitive early stem cells within this lin− subpopulation.

The second conclusion of this study is that oncogenic abnormalities occurred in specific stem/progenitor cell populations, but only when they were obtained from tumors. Mutations in both PIK3CA and AKT1 genes were observed primarily in the lin−CD49fCD24+ and CD49f+CD24+ cell populations. PIK3CA mutations occurred in 75% of the tumors, which is higher than the frequencies previously reported.1619 AKT1 abnormalities were observed among 9% of the malignant subpopulations, similar to the reported frequency.25,26 Comparable to what has been reported in tumors, we did not observe coexistent mutations in AKT1 and PIK3CA in any of the tumor subpopulations. PIK3CA29,30 and AKT130 mutations are reportedly more common in hormone receptor positive and HER2+ than basal-like tumors. In this study, PIK3CA mutations were detected more frequently in ER+ HER2 tumors. The one tumor in our study with an AKT1 mutation was ER+. The HRAS mutation, Q61R that was observed in a single tumor sample in this study is more commonly found in thyroid cancers.33,34 Its clinical significance in breast cancer is unknown.

The stem cell gene expression and mutation results obtained in this study support the model that through the acquisition of genetic mutations, breast stem cells obtain malignant capacities, while retaining stem cell-associated functions. Our results further suggest that malignant breast stem/progenitor cells can be differentiated from benign breast stem/progenitor cells by their genetic differences. There were three tumors in this study for which no mutations were detected. We suggest two explanations. A broader genomic search may be required to detect the mutations or epigenetic events responsible for tumorigenesis in these particular tumors. Alternatively, the cancer stem cell subpopulations containing the genetic abnormality may be so rare within the tumors that they were below the detection threshold of our testing.

The stem cell gene expression in benign and malignant lin− population, as well as the mutation patterns observed among the four malignant lin− subpopulations, suggest a possible hierarchical relationship for benign stem and progenitor cells (Fig. 5). The lin−CD49fCD24+ and lin−CD49fCD24 subpopulations expressed the most developmentally primitive stem cell-associated genes, suggesting that they may emerge earlier in the benign breast stem cell hierarchy. Furthermore, the clustering of specific mutations within certain malignant subpopulations and their absence in others not only supports the hierarchical model shown but also indicates that breast cancers initiate within different subpopulations along the hierarchy.

Fig. 5
figure 5

A model of stem/progenitor cell hierarchy and cancer initiation points. Subpopulations of cells are indicated by boxes. Large arrows indicate points along the lineage where breast cancers may initiate. The finding that one tumor had mutations in CD49fCD24+, CD49f+CD24+, and CD49f+CD24 subpopulations suggested a common origin (P). Findings that four tumors had mutations in CD49fCD24+, CD49f+CD24+ subpopulations, but not the CD49f+CD24 subpopulation separated the first two populations from the third in the hierarchy. The finding of two tumors with mutations solely in the CD49f+CD24+ subpopulation indicates that it follows the CD49fCD24+ in the hierarchy. No mutations were found in the CD49fCD24 subpopulation, indicating that these cells are separated from the other three in this lineage model. The stem cell gene expression data indicated that the four subpopulations were related. Because no mutation common to all four subpopulations was found, a primitive stem cell that precedes these four subpopulations is proposed (marked as stem cell)

The loss of CD24 expression is considered a requirement of breast cancer stem and/or progenitor cells. This may not be a strict requirement of all breast cancer stem cells or may reflect the cell surface profile of stem cells found in infiltrating lobular breast carcinomas.14 The results of this study are based on an examination of infiltrating ductal carcinomas. In this tumor type, we found that mutations were predominantly in the CD24+ cells, rather than the CD24 cells. We have previously demonstrated that CD24+, as well as CD24 cells from an infiltrating ductal carcinoma cell line (HCC1954) were tumorigenic when 103 cells were injected into NOD/SCID rats.35 We therefore suggest that stem/progenitor cells in addition to those that are CD24 may be critical for breast cancer initiation and progression.

The high frequencies of PIK/AKT pathway mutations observed in this study among breast cancer stem/progenitor cells may help elucidate the processes by which breast cancers originate. PIK/AKT mutations provide a cell with the abilities to evade apoptosis and survive hypoxic and avascular conditions.36,37 Acquisition of these mutations in a multipotent stem/progenitor cell may be sufficient to initiate tumor development and sustain it to the point of clinical detection.

Finally, this study suggests that breast cancer stem cells may provide a regulatory structure to tumor growth that has not been previously appreciated. When breast cancer stem and progenitor cells are collected from fresh tumor specimens rather than cell lines, they maintain proportions of progenitor cells similar to those observed in benign tissue. The maintenance of this proportionality may be required for successful tumor progression and to reach the threshold of clinical detection. It may be crucial that such mutations occur in breast stem or progenitor cells for tumors to maintain proliferative momentum successfully. The hierarchical structure with constrained proportions suggests a therapeutic targeting strategy. Disruption of the proportions within the hierarchy, by eradicating the most primitive cancer stem cell or their immediate progenitors, may result in the collapse of tumor equilibrium and complete destruction of the tumor.