Abstract
Individual cells within a tumour can exhibit distinct genetic and molecular features. The impact of such diversification on metastatic potential is unknown. Here we identify clonal human breast cancer subpopulations that display different levels of morphological and molecular diversity. Highly variable subpopulations are more proficient at metastatic colonization and chemotherapeutic survival. Through single-cell RNA-sequencing, inter-cell transcript expression variability is identified as a defining feature of the highly variable subpopulations that leads to protein-level variation. Furthermore, we identify high variability in the spliceosomal machinery gene set. Engineered variable expression of the spliceosomal gene SNRNP40 promotes metastasis, attributable to cells with low expression. Clinically, low SNRNP40 expression is associated with metastatic relapse. Our findings reveal transcriptomic variability generation as a mechanism by which cancer subpopulations can diversify gene expression states, which may allow for enhanced fitness under changing environmental pressures encountered during cancer progression.
Similar content being viewed by others
Introduction
A given cancer type can display tremendous variation from patient to patient, while within a patient, individual neoplastic lesions often grow at different rates and respond differentially to the same therapy. Even within a given tumour, individual cells can display substantial variation at the genetic1,2,3, epigenetic4,5 and phenotypic levels6,7. This heterogeneity might be particularly beneficial when cancers face strong selective pressures such as chemotherapy8,9 or metastatic barriers10,11,12. Notably, functional variability can be sustained over time without genetic changes8, suggesting epigenetic control or other mechanisms as paths to molecular variability generation13. Many important studies on tumour heterogeneity have provided static snapshots of genetic heterogeneity1,2; however, functional and phenotypic characterization of individual clones within a tumour population can provide insights into the molecular and cellular features that propagate heterogeneity and diversity generating capacity14.
Despite its pervasiveness in cancer, the mechanisms, aside from genetic mutations, that mediate phenotypic heterogeneity generation in driving cancer progression remain poorly understood. These mechanisms may contribute to the evolution of cancer populations, leading to heritable variation that provides fitness advantages under varying selective pressures15. Furthermore, it is not known whether phenotypic diversity among cancer cells within a population is molecularly regulated or whether it is simply an epi-phenomenon16. To generate an experimental model wherein genetic variation between cells is minimized so that non-genetic contributions to heterogeneity generation can be assessed, we have derived isogenic, clonal subpopulations from human cancer populations. Here we have discovered clonal subpopulations of cells that display high morphological variation. These subpopulations displayed variability of multiple phenotypes, and this feature was inherited by their single cell progeny. Highly variable (HV) subpopulations exhibited increased metastatic capacity and survival in the presence of chemotherapies, consistent with diversification-enabling enhanced fitness. Furthermore, in human breast cancers, nuclear morphological variation was found to associate with clinical metastasis. Molecular analyses revealed that highly variable subpopulations exhibit genetic stability, yet express enhanced cell-to-cell transcriptomic variability, which is transmitted to the protein level. Finally, gene set enrichment analysis revealed spliceosomal machinery components to display high-transcript expression variability, suggesting a means by which variation could be transmitted to a global level. Spliceosomal gene set expression variability is consistent with the increased pre-mRNA variability observed in these subpopulations. Indeed, engineered variation of the SNRNP40 spliceosomal gene’s expression among cells within a breast cancer population promoted their metastatic fitness. Further analysis revealed cell populations with low SNRNP40 expression exhibit enhanced metastatic capacity, displayed gene expression changes consistent with that seen in highly variable subpopulations, and contained increased unspliced pre-mRNAs. Clinically, low SNRNP40 expression was found to be associated with metastatic outcomes. These findings highlight an aspect of intra-clonal tumour heterogeneity that has not yet been previously addressed. The experimental model established here can be applied to various cancers to better understand non-genetic contributions to heterogeneity and to study the impact of such deregulation among cancer populations and their progeny.
Results
Isolation of clonal subpopulations with morphologic variation
To study phenotypic diversity in cancer cells, we derived nearly 200 clonal subpopulations from 2 breast cancer cell lines and assessed these subpopulations for intra-clonal heterogeneity in cell size through automated image analysis of 29,390 cells in total using CellMask stain to label entire cells and DAPI dye to label nuclei. Subpopulations, derived from the human cancer cell line MDA-MB-231 (MDA) and the minimally passaged primary CN34 breast cancer line (CN), displayed inter-clonal variation in six size parameters (Fig. 1a). To quantitatively assess intra-clonal size heterogeneity, coefficient of variation for each subpopulation was calculated for each size parameter, and principal component analysis was performed for each parental line. The majority of clonal subpopulations displayed a range of variability as assessed by using the first principal component—consistent with a single peak distribution (Fig. 1b,c). A few subpopulations demonstrated exceptionally high intra-clonal, cell-to-cell size variation without exhibiting significant differences in their population-level means (Fig. 1b–d).
Highly variable subpopulations exhibit phenotypic diversity
To determine whether these highly variable subpopulations could give rise to and maintain phenotypic diversity beyond cell size, we assessed proliferative variability as an independent functional measure by colony formation assays. We first confirmed that cell size and cell density within colonies were not contributing factors to colony area differences (Supplementary Fig. 1b). While bulk population growth in culture was similar between subpopulations (Supplementary Fig. 1a), MDA-derived subpopulations that exhibited high intra-clonal size variation also displayed high variability in proliferative capacity, which could be visualized as high variation in colony sizes (Fig. 2a). Among the CN-derived populations, only subpopulation C57 displayed a modest increase in proliferative variability (Supplementary Fig. 1c,d), suggesting that the other morphologically heterogeneous subpopulations from this parental line may not exhibit variation of additional phenotypes beyond morphology. These subpopulations, herein referred to as highly variable (HV) subpopulations, maintained high size variability in progeny clonal subpopulations when compared with lowly variable (LV) subpopulations (Fig. 2b,c). These HV subpopulations maintained enhanced size variability when measured in three dimensions (Fig. 2d,e), after passage in culture (Supplementary Fig. 2a), and after seeding at various densities (Supplementary Fig. 2b) with no consistent difference in cell cycle phasing (Supplementary Fig. 2c). In addition, to determine whether HV cells over-express markers associated with what some propose to be the ‘cancer stem cell state’, we assessed the expression levels of breast cancer stem cell markers CD44 and CD24 by flow cytometry and observed no difference between HV and LV subpopulations for these markers (Supplementary Fig. 2d–f). All in sum, these clonal subpopulations displayed an enhanced and heritable ability to generate diversity across multiple phenotypic dimensions.
Highly variable subpopulations exhibit metastatic fitness
Metastatic colonization of an end organ represents a major bottleneck during cancer evolution that would greatly benefit from diversity generation and is clinically responsible for the majority of cancer deaths17,18. Our identification of subpopulations of cells with high versus low diversification potential from the same individual patients’ cancer populations allowed us to directly test the impact of intra-clonal diversification ability on metastatic colonization capacity. HV subpopulations exhibited enhanced systemic metastatic capacity when introduced into the arterial circulation of mice as determined by total tumour burden (Fig. 3a,b). In addition, this increased tumour burden was in large part attributed to an increased number of systemic metastatic foci (Fig. 3c), indicating an increased frequency of colonization of systemic sites as opposed to an increased growth rate of formed metastases. Moreover, HV subpopulations derived from both human lines more efficiently colonized the lung on venous inoculation (Fig. 3d,e) and the liver on portal circulation injection (Fig. 3f,g), indicating that the enhanced metastatic colonization capacity is broadly applicable to multiple organs posing diverse selective barriers. In addition, portal circulation inoculation of a mixed population consisting of an equal number of HV and LV cells revealed HV cells to contribute between 81.2 and 99.6% of formed liver metastases (Fig. 3h), demonstrating that HV cells maintain an increased metastatic capacity in a mixed population. Because broad diversity generation has the potential to enhance cancer evolution under various selective pressures in addition to metastatic colonization, HV subpopulations were assessed for survival in the presence of various chemotherapeutics commonly used to treat breast cancer. HV subpopulations derived from both parental populations exhibited increased survival in the presence of doxorubicin, paclitaxel, cyclophosphamide and 5-fluorouracil (Fig. 3i,j). The ability of HV subpopulations to generate phenotypic diversity, metastasize more efficiently, and survive under mechanistically diverse chemotherapeutic agents is consistent with a positive role for phenotypic diversification in cancer progression.
Nuclear size variation is associated with metastatic disease
Given our observation of increased nuclear size variability in HV subpopulations (Supplementary Fig. 3a), we used this tractably quantifiable parameter as a readout of phenotypic variability in human invasive breast cancer tumour core biopsies. Mean nuclear size and mitotic index did not significantly correlate with disease stage (Fig. 4a,b). Consistent with our findings, nuclear area variability of cancer cells was significantly increased in tumours of patients with more advanced stage disease (Fig. 4c,d, Supplementary Fig. 3b). Primary tumours that progressed to lymph nodes displayed significantly higher variation in nuclear area than lymph node negative tumours. Furthermore, primary tumours that progressed to distant metastases exhibited significantly higher nuclear area variation than those tumours that did not metastasize. We also observed modestly higher, though not statistically significant, nuclear size variation in tumours with higher grade (Supplementary Fig. 3c), consistent with histopathologic observations19. These clinical correlations are consistent with our findings that cancer populations with greater diversification potential positively contribute to metastatic disease.
Genomic and transcriptomic analysis of subpopulations
These clonal subpopulations of cells derived from isogenic backgrounds offer the potential for studies into the molecular basis of tumour heterogeneity. Given the contribution of genomic instability to tumour heterogeneity20,21, HV subpopulations were assessed for mutational burden. We quantified single-nucleotide variant and insertion–deletion mutation frequencies from population exome sequencing to reconstruct phylogenic relationships1. HV subpopulations appeared to diverge from a common HV ancestor and to exhibit genetic similarity to the parental population (Fig. 5a, Supplementary Fig. 4a,b), indicating that substantial genetic mutational changes did not accumulate as they clonally expanded. Furthermore, HV subpopulations did not significantly differ in the number of population-specific nucleic acid variants relative to LV subpopulations or the parental population (Supplementary Fig. 4c). Analysis of DNA content revealed no increase in mean DNA content or DNA content variability in HV subpopulations, suggesting against aneuploidy as the source of cell-to-cell variability (Supplementary Fig. 4d,e). While it is difficult to exclude the contribution of genetic alterations to HV subpopulation variability, the comparable genetic phenotypes and relative genetic stability in the isogenic HV subpopulations suggests that enhanced genetic diversification may not be the primary source of the observed phenotypic diversity.
HV subpopulations display transcriptomic variability
To identify additional molecular mechanisms that could contribute to phenotypic diversity, single-cell RNA-sequencing from HV and LV subpopulations was performed to assess gene expression heterogeneity between cells (Fig. 5b). Assessment of single-cell RNA-sequencing fidelity demonstrated appropriate spike-in expression and no difference in spike-in variability between wells (Supplementary Fig. 5a,b). Total transcript abundance per cell showed no significant differences between HV and LV subpopulations (Supplementary Fig. 5c). Moreover, global mean transcript abundance was also not significantly different between HV and LV subpopulations (P=0.91 for HV-C57 versus LV-C92 and P=0.62 for HV-M42 versus LV-M26 by two-sided paired t-tests). Interestingly, linear and non-linear clustering22 of single-cell transcript expression profiles could not classify HV or LV subpopulations (Fig. 5c, Supplementary Fig. 5d), suggesting that averaged over-expression or repression of a gene or gene set may not be the primary cause of phenotypic variation recurrently observed in HV subpopulations23. However, global cell-to-cell transcript expression variability, as assessed by quantification of 8,218 transcripts from single MDA HV cells, was significantly elevated relative to expression variability between single LV cells (53.9% transcripts with higher CV in HV-M42; P=7e−8 by two-sided paired t-test; Fig. 5d). The same was observed for the HV-C57 subpopulation (5,826 transcripts; 55.0% transcripts with higher CV in HV-C57; P=3e−10 by two-sided paired t-test; Fig. 5e). Importantly, while an increased CV could occur by a decrease in mean expression, transcript variability observed in HV cells did not appear to be a byproduct of lowly expressed transcripts (Supplementary Fig. 5e,f).
To assess the robustness of this molecular phenotype, we determined whether sampling parameters of the single-cell sequencing experiments affected the observations. While more abundant transcripts are known to be detected with higher accuracy, elevated transcriptomic variability was observed regardless of transcript abundance (Supplementary Fig. 6a). To determine whether sequencing from limited number of cells affected the outcome, random sampling of single cells from subpopulations was performed to determine whether the enhanced transcriptomic variability could be detected with fewer cells. Indeed, transcriptomic variability was observed when setting the analysis to as few as five cells per population (Fig. 5f, Supplementary Fig. 6b). In addition, to determine whether a unique cell was responsible for the molecular variability, sampling was performed to assess population transcriptomic variability following removal of each single HV cell sequenced. Transcriptomic variability was consistently detected regardless of which HV cell was excluded from the analysis (Fig. 5g, Supplementary Fig. 6c), indicative of robust and indiscriminate population-level variability. While the increased transcriptomic variability appears modest, this effect is consistently observed under different analysis parameters and in two subpopulations derived from independent breast cancer populations. Taken together, these findings indicate that phenotypically diverse metastatic cancer subpopulations maintain enhanced intra-clonal transcriptomic variability generation capacity.
Transcriptomic variability is transmitted to proteins
To validate the relevance of transcriptomic variability as it pertains to biological function, we used flow cytometry to assess protein-level variation. XPNPEP3 and UPF2, two genes that displayed high transcript-level variability in both HV subpopulations (Supplementary Table 1) and whose protein expression per cell could be readily quantified in a high-throughput manner by flow cytometry, demonstrated a consistent level of increased protein expression variability in HV subpopulations (Fig. 6a). We extended these findings to the following additional five proteins that could be readily quantified by flow cytometry: ALDOA, PABPC1, HNRNPA1, CD110, and HNRNPA0. In all HV and LV subpopulations, we observed a significant correlation between transcript-level variation and protein-level variation for the seven genes tested (Fig. 6b) with no consistent correlation in mean protein abundance (Supplementary Fig. 7a). These findings reveal that transcript variation is transmitted to the protein level and highlight high-molecular variability as a principal feature of HV subpopulations.
Spliceosomal gene transcripts are highly variable
We next sought to identify a potential mechanism that could contribute to transcriptomic variation in highly variable subpopulations. MDA-derived and CN-derived subpopulations displayed similarity in highly variable genes (Supplementary Table 1), many of which alone could contribute to global transcriptomic variability such as chromatin modifiers (SENP7, ARID1A), transcription factors (TCF7L2, SP3), and regulators of non-sense-mediated decay (UPF2). While individual genes most likely contribute to the effect seen in highly variable populations, we hypothesized that coordinated variation of a common subset of transcripts may reveal a major contribution to cell-to-cell variability. To identify regulatory networks that might be responsible for transcriptomic variation, we searched for functional gene sets that exhibited high-transcript expression variability. Transcripts were binned into four categories depending on whether relative transcript variability was increased in either or both MDA-derived and CN-derived HV subpopulations. Pathway discovery analysis (iPAGE24) revealed spliceosome machinery and myeloid cell differentiation gene transcripts as the only two gene sets to exhibit significantly higher variability in HV subpopulations derived from both patients’ cancer populations (Fig. 6c, Supplementary Fig. 7b). This raised the possibility that cell-to-cell variation in the expression levels of splicing genes and the resultant mRNA processing activity may represent a conceivable mechanism through which population-level heterogeneity of mature transcripts may be achieved at a global scale25.
Improper splicing may lead to an abundance of unspliced pre-mRNAs, which are eventually degraded via non-sense-mediated decay26. Inefficient splicing could thus reduce the expression of a large number of mature transcripts. Given the expression variation of spliceosome machinery components, we reasoned that variation in constitutive spliceosome activity in HV cells may cause enhanced variation in global unspliced pre-mRNA levels. Variation in unspliced pre-mRNA levels could then contribute to transcript abundance variation. Indeed, HV subpopulations demonstrated higher cell-to-cell unspliced pre-mRNA variability, as determined by the analysis of 1,132 matched retained introns from MDA-derived cells (54.8% retained introns with higher CV in HV-M42, P=1e−4 by one-sided paired t-test; Fig. 6d) and 1,666 matched retained introns from CN-derived cells (57.6% retained introns with higher CV in HV-C57, P=2e−11 by one-sided paired t-test; Fig. 6e). In addition, if this unspliced pre-mRNA variability as measured by retained introns is propagated, it should be apparent in further processed forms of pre-mRNAs. Indeed, exon–exon junction expression variability measured from single-cell sequencing experiments was significantly increased in HV subpopulations from MDA-derived cells (54.7% exon–exon junctions with higher CV in HV-M42, P=2e−4 by one-sided paired t-test; Fig. 6f) and CN-derived cells (65.3% exon-exon junctions with higher CV in HV-C57, P=8e−10 by one-sided paired t-test; Fig. 6g). This molecular variability could be caused by variation in splicing as well as varied decay of improperly spliced transcripts. Consistent with our findings, intron retention has been previously observed to be the most significant splicing alteration in breast cancer patient samples27. These findings reveal enhanced splicing variability as one feature of highly variable subpopulations.
Variability in SNRNP40 expression enhances metastasis
Finally, we sought to determine whether enforced modulation of a highly variable spliceosomal gene could recapitulate the metastatic capacity seen with HV subpopulations. While highly variable subpopulations displayed variability in many spliceosomal genes, focused study on a single spliceosomal gene could serve as a model by which variation of other spliceosomal genes may contribute to expression variation. However, focus on a single gene would be expected to recapitulate only a fraction of the effects seen in highly variable subpopulations. We focused on SNRNP40, a component of the U5 small nuclear RNP complex, because this gene exhibited high transcript variability in HV subpopulations (Supplementary Fig. 8a), was among the top three most variable spliceosomal gene transcripts (Supplementary Table 2), has been described to directly interact with numerous highly variable spliceosomal genes (Fig. 7a, Supplementary Table 2), and its protein expression could be readily quantified by immunofluorescence-based imaging (Supplementary Fig. 8b). Indeed, HV subpopulations displayed increased SNRNP40 protein-level variability (Fig. 7b,c). To determine whether population-level variation in SNRNP40 observed with HV cells was sufficient to enforce metastatic fitness, LV subpopulations were generated to express varying levels of SNRNP40 (Supplementary Fig. 9a). These populations were pooled to generate a mixed population with increased cell-to-cell SNRNP40 expression variation without significantly altering mean SNRNP40 expression (Fig. 7d; mean: Control=9.0, hiCV=8.9, P=0.19 by two-sided t-test; CV: Control=0.068, hiCV=0.104, P=7e-14 by Levene’s test). Functional testing of this engineered population revealed that increased cell-to-cell variability in SNRNP40 expression enhanced the ability of these cells to metastasize more efficiently and to colonize more sites systemically (Fig. 7e,f), suggesting that variation in SNRNP40 between cells could, in part, contribute to enhanced metastatic capacity observed in HV subpopulations.
We sought to determine whether within this cell population engineered to express higher SNRNP40 variation, the cells with high SNRNP40 or those with low SNRNP40 levels imparted the enhanced fitness to the pooled population. To do this, we generated cells that represented the tails of the variable population—those that expressed high SNRNP40 through over-expression, or those that expressed low SNRNP40 through knockdown. While increased SNRNP40 expression did not affect metastatic capacity (Fig. 8a), SNRNP40 depletion significantly promoted systemic metastasis (Fig. 8b, Supplementary Fig. 9b–e). Next, we sought to determine whether variation in SNRNP40 expression contributes to the gene expression variability seen in highly variable subpopulations. Our model is that variable levels of SNRNP40 allows for the transmission of variability to many transcripts dependent on SNRNP40 expression. Thus, we used small interfering RNA (siRNA)-mediated depletion of SNRNP40 to identify these dependent transcripts and inferred a high magnitude of expression change to be indicative of strong transmission. If variable expression of SNRNP40 contributes to gene expression variability seen in HV subpopulations, SNRNP40-dependent transcripts would be enriched among variable transcripts in HV subpopulations. To test this hypothesis, the degree of overlap between SNRNP40-dependent transcripts (absolute log foldchange >1 on siRNA treatment) and highly variable transcripts (transcript CV ratio from MDA single-cell sequencing >1.5) was assessed. Indeed, transcripts with high magnitude expression change in SNRNP40 knockdown cells were significantly enriched among variable transcripts in HV single cells (Fig. 8c). This enrichment can also be demonstrated as a heat map (Supplementary Fig. 10a) and as a binned dot plot, which shows that the degree of gene expression deregulation on SNRNP40 depletion significantly correlated with transcript variability in the HV population (Supplementary Fig. 10b). These findings suggest that SNRNP40 contributes in part to the expression of a set of variable transcripts in breast cancer cells. In addition, SNRNP40 depletion significantly increased the fraction of unspliced pre-mRNAs (Fig. 8d), consistent with pre-mRNA variability observed in HV subpopulations. The expected modest effect is likely attributable to perturbation of only a single spliceosomal gene. Consistent with these functional findings, decreased SNRNP40 transcript-level expression in bulk primary breast cancer samples was associated with increased metastatic relapse outcomes in multiple independent data sets (Fig. 8e–g). These findings provide clinical association support for SNRNP40 expression in human breast cancer progression. While tumours can achieve reduced SNRNP40 expression through mean expression alteration, we show in our experimental model that cancer subpopulations may achieve SNRNP40 silencing through deregulation in a subset of cells that may not be apparent from averaged measurements derived from bulk tumours.
Discussion
Previous studies have revealed the benefit of non-genetic cell-to-cell variability to fitness of cells under cytotoxic conditions4,8,28,29. We show that enhanced variation at the transcriptomic level is generated and maintained within rare clonal cancer subpopulations, leading to phenotypic diversity and enhanced metastatic capacity. These HV cells were isolated from parental populations at a low frequency of 1–3%, which could be attributed to a number of factors. First, the increased variability of HV cells that leads to increased fitness in vivo may not be beneficial in vitro where the cell culture selective pressures may be more suitable for LV cells. Similarly, phenotypic trade-offs at the cellular level might contribute to the low frequency of HV cells17; for example, if beneficial diversity were hypothetically generated through an energetically demanding and slower cell division process, proliferation rate would be sacrificed as a result. Aside from potential cell-intrinsic mechanisms, clonal interactions within the parental population are likely involved in maintaining the presence of HV cells at a low frequency. Subclonal heterogeneity can be maintained through subclonal cooperation from both minor subpopulations that enhance proliferation of neighbouring cells as well as other populations that contribute to tumour growth30. HV cells might similarly utilize non-cell-autonomous mechanisms to maintain variation under in vitro conditions with other cells in the population more suited for growth in culture.
We consistently observe molecular variability in HV subpopulations by different experimental assays and at both the molecular and phenotypic levels. Molecular and phenotypic variation within cancer populations increases the likelihood that an individual cell is able to survive and repopulate a tumour under a given selection pressure, such as metastatic colonization and chemotherapy. We propose spliceosome-associated gene expression variability as one mechanism by which clonal cancer populations could increase mature transcript expression variability of target genes. We show one example of spliceosome gene expression variability where variable cell-to-cell expression of SNRNP40 enables enhanced metastatic fitness through generation of a cell subpopulation with reduced SNRNP40 expression (Fig. 8h). This mechanism is supported by the requirement of key splicing factors for maintenance of robust transcriptomes31. Minimal deregulation of multiple splicing factors has the potential for amplified alterations of gene regulatory networks and gene expression states. In addition, intron retention has been observed to regulate expression of genes involved in nuclear shape as well as splicing factor genes26. The consistent molecular variability observed in HV subpopulations derived from independent human cancer cell populations suggests that deregulation of specific sets of downstream targets is molecularly conserved. These experimental observations can be tested in clinical correlates to characterize non-genetic contributions to tumour evolution in patients. While the upstream cause of the transcript-level variability remains to be determined, we propose deregulated population transcriptomic variability to represent one mechanism by which molecular diversity can be achieved in cancer and reveal enhanced phenotypic diversification capacity to associate with metastatic progression.
Methods
Cell culture
MDA-MB-231 cells and their derivatives were maintained in DMEM supplemented with 10% FBS, glutamine, pyruvate, penicillin, streptomycin and fungizone. CN34 cells and their derivatives were maintained in M199 supplemented with 2.5% FBS, 10 μg ml−1 insulin, 0.5 μg ml−1 hydrocortisone, 20 ng ml−1 EGF, 100 ng ml−1 cholera toxin, glutamine, pyruvate, penicillin, streptomycin and fungizone. MDA-MB-231 and CN34, cell lines were originally obtained from ATCC, while CN34 cell lines were generated from pleural fluid of a breast cancer patient as previously described32. Cells in culture were routinely tested for mycoplasma contamination. Clonal populations were generated from parental populations by seeding cells sparsely, picking individual colonies, and expanding to ∼105 cells when cells were imaged for size measurements.
Cell size measurements and coefficient of variation analysis
3 × 104 cells were seeded on coverslips and stained with HCS CellMask Red (Invitrogen) and DAPI. Numerous fields were imaged on DeltaVision Image Restoration Microscope to capture at least 100 cells per population. Cell size parameters were measured using Cell Profiler 2.0. Debris was filtered out by generating a histogram of (cytoplasmic area—nuclear area) and applying a minimum threshold in R. To calculate coefficient of variation (s.d./mean) for each clonal population, a sampling size was determined from the clonal population with the fewest cells analysed (96 cells in the MDA-MB-231 population and 206 cells in the CN34 population). Subpopulations with more cells imaged than the sampling size were sampled 100 times, with the average CV used in the final analysis. For each subpopulation, coefficient of variation was calculated for each size parameter, which includes cell area, cytoplasmic area, nucleus, perimeter, major axis length, and minor axis length. Principal component analysis was performed using all subpopulations on the coefficient of variations for all morphological parameters. The first principal component accounted for 93 and 91% of the variance in MDA and CN34 cell lines, respectively, and is used for all size variation analyses.
Three-dimensional size coefficient of variation analysis
Cells were fixed in 8% paraformaldehyde, permeabilized with 0.1% Triton-X, and stained with CellMask Red. Cells were analysed by ImageStream-X (Amnis) to measure cell size, nuclear size, perimeter, major axis and minor axis. Principal component analysis was performed, and the first principal component was used for analyses.
Proliferation & colony formation assays
For proliferation assays, 5 × 103 cells were seeded in triplicate and assayed WST-1 reagent (Roche) 72 h after seeding. For colony formation assays, 200 cells were seeded in 10 cm plates in triplicate and were allowed to grow for 10 days (MDA-derived cells) or 20 days (CN-derived cells). Plates were fixed in 6% glutaraldehyde with 0.5% crystal violet and scanned. Colony areas were measured using ImageJ. R was used to remove debris, equalize colony numbers with samplings as above, and calculate coefficient of variation.
Animal studies
Animal experiments were conducted in accordance with protocols approved by the Institutional Animal Care and Use Committee at The Rockefeller University. For in vivo experiments with MDA-derived cell lines, populations were transduced with a retroviral construct expressing a luciferase reporter33. CN34 parental cell line was previously labelled with luciferase reporter. Athymic female mice aged 6 weeks (Jackson labs) were used for intracardiac injection. NOD-SCID female mice aged 6 weeks (Jackson labs) were used for tail vein injections with MDA-derived cells, while NOD-SCID gamma female mice aged 6 weeks (Jackson labs) were used for tail vein injections with CN-derived cells. Sample size in mice experiments was chosen based on biological variability observed with a given genotype. Animals were excluded from studies if inoculated cells did not arrive in the intended site. For portal circulation injections, cells were injected into the spleen of NOD-SCID gamma female mice followed by removal of the spleen. For portal circulation injection of mixed population, LV-M100 cells were transduced with pLKO.1 puro (Addgene plasmid #8453), and HV-M42 cells were transduced with pLKO.1 blast (Addgene plasmid #26655). Cells were mixed at a 1:1 ratio immediately before injections. In vivo bioluminescence was monitored weekly by retro-orbital injection of luciferin (Perkin Elmer) and normalized to bioluminescence signal immediately following cell injection.
Tissue microarray analysis
NCI CDP Breast Cancer Progression Tissue Microarray slides were obtained from the Cancer Diagnosis Program at the National Cancer Institute, US National Institutes of Health. Tumour cores were obtained from The Cooperative Breast Cancer Tissue Resource with informed consent and ethical approval as indicated34. TMA slides were deparaffinized, rehydrated, and exposed to Heat Induced Epitope Retrieval at pH 6. Slides were stained with DAPI, and tumour cores were imaged on Leica TCS SP5 system at × 40. Images were analysed using CellProlifer 2.0 to identify and measure cancer cell nuclei by Otsu Adaptive thresholding. Image acquisition and analysis were blinded until measurements were completed for all tumour cores. Mitotic index was calculated as follows: number of mitotic cells/total cells for each tumour core.
Exome sequencing and analysis
gDNA was extracted using DNEasy kit (Qiagen). Libraries were prepared using Nextera Extended Exome sequencing kit, as per manufacturer’s instructions (Illumina), and paired-end sequenced on HiSeq 2500 (Illumina). The analysis pipeline for the exome-seq data was based on the GATK best practices. Reads were aligned to the human genome (build hg19). The paired-end reads were then fixed and filtered using Picard (v. 1.107; http://picard.sourceforge.net/). The duplicates were also removed in the same step. Using GATK (v. 2.5)35, the reads were realigned and recalibrated. mpileup (samtools36) was used to create an input for VarScan (v2.3.6)37. In VarScan, mpileup2snp and mpileup2indel commands were used to identify variants across all exome-seq samples. To study population genetic divergence, the frequencies of all variants identified (148,234) were used to generate phylogenic tree by Nei’s genetic distance using neighbor-joining method in the PHYLIP package.
Single cell RNA-sequencing
Single cell isolation, cDNA synthesis, amplification, and processing for Illumina sequencing were performed as described38. Reads were distributed into separate samples based on their barcodes. Cells were excluded if wells were empty or generated library was low quality as assessed by the number of total reads, number of mapped reads, a high percentage of unmapped reads and a high percentage of spike-in reads39. To equalize the number of cells analysed from each population, cells were randomly selected and removed from analysis. The RNA-seq pipeline described above was then used to measure gene expression across RNA-seq data from each cell. In parallel, TopHat results were parsed to count the presence of every exon-exon junction across all the samples. Transcripts present in both cell populations, present in more than 25% of all cells and expressed above threshold mean of all cells based on a Gaussian distribution were included for analysis. Non-linear cell clustering by t-Distributed Stochastic Neighbor Embedding was performed using ‘tsne’ package in R. To measure retained introns, the number of reads mapping to each exon or intron were counted across the transcriptome for all samples. As a measure of intron retention (IR), for each intron, we then calculated the number of reads mapping to each intron relative to the two spanning exons (r_int_ex). For CV measurements, retained introns and exon-exon junctions were included in analysis if present in more than 25% of all cells.
Pathway analysis
iPAGE24 pathway analysis (http://iget.c2b2.columbia.edu) was used to identify gene sets with higher transcript variability in both HV subpopulations. Transcript coefficient of variation log-ratios (HV/LV) were used to categorize transcripts into four bins: bin 1) negative log-ratio in both MDA-derived and CN-derived comparisons, bins 2 and 3) positive log-ratio in either MDA-derived or CN-derived comparison, and bin 4) positive log-ratio in both MDA-derived and CN-derived comparisons. GO and KEGG annotations were analysed using a maximum P-value of 0.05 and maximum genes per category of 200. Gene sets were only considered relevant if enrichment was highest in bin 4 and lowest in bin 1. Ingenuity Pathway Analysis (Qiagen) was used to identify spliceosomal gene interactions based on published literature.
Flow cytometry
For antibody staining, cells were prepared using Cytofix/Cytoperm (BD). Dead cells were excluded using Live/DEAD Aqua (Invitrogen). Primary antibodies used were anti-XPNPEP3 (Abcam 25D, 1:200), anti-UPF2 (LSBio LS-C160443, 1:4) anti-ALDOA (Abcam ab54770, 1:8), anti-PABP (Abcam 10E10; 1:200), anti-HNRNPA1 (Cell Signalling D21H11, 1:30), anti-CD110 (BD clone 1.6.1, 1:50), anti-HNRNPA0 (Cell Signalling D8A3, 1:50), anti-ESR1 (Thermo MA1310, 1:20), anti-MCF2 (LSBio LS-C164083, 1:10), and anti-CSF2RA (eBiosciences 4H1, 1:30) conjugated to Alexa555 or Alexa647 Zenon secondary antibodies (Invitrogen). FACS was performed on LSRII (BD). Analysis was performed on FloJo. CV calculations were performed using greater than 25 × 103 cells, an equal number of cells per sample within each experiment. Ratios were calculated using the average value of all HV populations (MDA: HV-M35, HV-M42, HV-M56; CN: HV-C57) and LV populations (MDA: LV-M26, LV-M52, LV-M100; CN: LV-C65, LV-C92, LV-C100). Cell cycle analysis was performed by employing flow cytometry on fixed cells stained with DAPI and determining cell cycle phases in FloJo.
SNRNP40 protein quantitation
Cells were fixed, permeabilized and stained with DAPI and HPA026527 (1:100, Sigma) followed by fluorescent-conjugated secondary antibodies. Imaging was performed on Leica TCS SP5 system. SNRNP40 relative protein level was determined using Cell Profiler 2.0 by measuring total nuclear SNRNP40 fluorescence intensity as demarcated by DAPI signal.
SNRNP40 cell line generation
To generate high CV SNRNP40 population, cell populations were transduced individually at various titres with either virus for short hairpin RNA (shRNA) expression (shRNA#1 TRCN0000074608 and shRNA#2 TRCN0000074610, Sigma) or virus for stable ORF expression (pBabe vector). Expression in individual populations was confirmed by quantitative reverse transcriptase–PCR (RT–PCR). Populations were then pooled at equal ratios to generate mixed populations. Control population was generated similarly using a non-targeting shRNA (SHC016, Sigma) and an empty expression vector.
Quantitative RT–PCR
RNA was extracted using total RNA isolation kit (Norgen Biotek). cDNA was generated using Superscript III (Invitrogen). Fast SYBR Green Master Mix (Life Technologies) was used to analyse samples on Applied Biosystems 7900HT. Expression was normalized to GAPDH. Primers sequences are as follows: SNRNP40 Forward: 5′-CAGTGGAGCAGTGATGGAAT-3′; SNRNP40 Reverse: CCCTCTCACCTGTTTCACTATC-3′; GAPDH Forward: 5′-AGCCACATCGCTCAGACAC-3′; GAPDH Reverse: 5′-GCCCAATACGACCAAATCC-3′; blasticidin Forward: 5′-CCTGGGATCAAAGCCATAGT-3′; blasticidin Reverse: 5′-TTAGCCCTCCCACACATAAC-3′; puromycin Forward: 5′-GTCACCGAGCTGCAAGAA-3′; puromycin Reverse: 5′-CCGATCTCGGCGAACAC-3′.
siRNA transfection
The following siRNAs were used (IDT): siSNRNP40 #1: 5′-GGAAUAGACAAUGAUAUC-3′; siSNRNP40 #2: 5′-GGAUUUGACCGACUGAUA-3′. BLOCK-iT Fluorescent Oligo (Life Technologies) and NC1 (IDT) were used as controls.
105 cells were seeded and were transfected the next day with siRNAs via Lipofectaime 2000 (Invitrogen). Cells were extracted 48 h later for validation of knockdown by quantitative RT–PCR, RNA-sequencing or flow cytometry. For RNA-sequencing, two siRNAs were used for SNRNP40 and control, and independent siRNA replicates were averaged.
RNA-sequencing
RNA was extracted using total RNA isolation kit (Norgen Biotek) with DNAse I treatment followed by Ribo-Zero Gold rRNA removal (Epicentre). Libraries were generated using ScriptSeq v2 RNA-seq Library Preparation Kit (Epicentre) and run on HiSeq 2500. For RNA-seq data analysis, the reads were trimmed to remove matches to linkers and low-quality bases (cutadapt v1.2.1). Tophat (v. 2.0.8)40 was then used to map the reads to the human transcriptome (build hg19). Cufflinks and cuffmerge (v.2.0.2) were then used to calculate reads per kilo base per million and consolidated results across the samples. Finally, cuffdiff (v.2.0.2) was used to calculate log-foldchanges and the associated statistics. Enrichment analysis was performed as previously described41. Statistical test for Venn diagram overlap was performed using phyper() in R.
Clinical association analyses
GSE2034 (ref. 42) and GSE33926 (ref. 43) were used to generate Kaplan–Meier curves. Patients were stratified by SNRNP40 expression relative to median. KMplotter was used to assess distant metastasis-free survival with follow up threshold of 8 years44.
Additional information
Accession codes: Exome sequencing data has been deposited in the European Nucleotide Archive (ENA) under the accession code PRJEB12872. RNA-sequencing data have been deposited in the GSE under the accession code GSE78527.
How to cite this article: Nguyen, A. et al. Highly variable cancer subpopulations that exhibit enhanced transcriptome variability and metastatic fitness. Nat. Commun. 7:11246 doi: 10.1038/ncomms11246 (2016).
Accession codes
References
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
Park, S. Y., Gonen, M., Kim, H. J., Michor, F. & Polyak, K. Cellular and genetic diversity in the progression of in situ human breast carcinomas to an invasive phenotype. J. Clin. Invest. 120, 636–644 (2010).
Sharma, S. V. et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80 (2010).
Seligson, D. B. et al. Global histone modification patterns predict risk of prostate cancer recurrence. Nature 435, 1262–1266 (2005).
Almendro, V. et al. Genetic and phenotypic diversity in breast tumor metastases. Cancer Res. 74, 1338–1348 (2014).
Talmadge, J. E. & Fidler, I. J. AACR centennial series: the biology of cancer metastasis: historical perspective. Cancer Res. 70, 5649–5669 (2010).
Kreso, A. et al. Variable clonal repopulation dynamics influence chemotherapy response in colorectal cancer. Science 339, 543–548 (2013).
Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).
Wu, X. et al. Clonal selection drives genetic divergence of metastatic medulloblastoma. Nature 482, 529–533 (2012).
Yachida, S. et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 1114–1117 (2010).
Minn, A. J. et al. Distinct organ-specific metastatic potential of individual breast cancer cells and primary tumors. J. Clin. Invest. 115, 44–55 (2005).
Singer, Z. S. et al. Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319–331 (2014).
Wagenblast, E. et al. A model of breast cancer heterogeneity reveals vascular mimicry as a driver of metastasis. Nature 520, 358–362 (2015).
Merlo, L. M., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006).
Marusyk, A. & Polyak, K. Tumor heterogeneity: causes and consequences. Biochim. Biophys. Acta 1805, 105–117 (2010).
Aktipis, C. A., Boddy, A. M., Gatenby, R. A., Brown, J. S. & Maley, C. C. Life history trade-offs in cancer evolution. Nat. Rev. Cancer 13, 883–892 (2013).
Klein, C. A. Selection and adaptation during metastatic cancer progression. Nature 501, 365–372 (2013).
Rekha, T., Nandini, N. & Dhar, M. Expansion of Masood's cytologic index for breast carcinoma and its validity. J. Cytol. 30, 233–236 (2013).
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).
Amir el, A. D. et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31, 545–552 (2013).
Altschuler, S. J. & Wu, L. F. Cellular heterogeneity: do differences make a difference? Cell 141, 559–563 (2010).
Goodarzi, H., Elemento, O. & Tavazoie, S. Revealing global regulatory perturbations across human cancers. Mol. Cell 36, 900–911 (2009).
Munsky, B., Neuert, G. & van Oudenaarden, A. Using gene expression noise to understand gene regulation. Science 336, 183–187 (2012).
Wong, J. J. et al. Orchestrated intron retention regulates normal granulocyte differentiation. Cell 154, 583–595 (2013).
Eswaran, J. et al. RNA sequencing of cancer reveals novel splicing alterations. Sci. Rep. 3, 1689 (2013).
Spencer, S. L., Gaudet, S., Albeck, J. G., Burke, J. M. & Sorger, P. K. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432 (2009).
Gupta, P. B. et al. Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cells. Cell 146, 633–644 (2011).
Marusyk, A. et al. Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity. Nature 514, 54–58 (2014).
Jangi, M. & Sharp, P. A. Building robust transcriptomes with master splicing factors. Cell 159, 487–498 (2014).
Tavazoie, S. F. et al. Endogenous human microRNAs that suppress breast cancer metastasis. Nature 451, 147–152 (2008).
Ponomarev, V. et al. A novel triple-modality reporter gene for whole-body fluorescent, bioluminescent, and nuclear noninvasive imaging. Eur. J. Nucl. Med. Mol. Imaging 31, 740–751 (2004).
Glass, A. G. et al. The Cooperative Breast Cancer Tissue Resource: archival tissue for the investigation of tumor markers. Clin. Cancer Res. 7, 1843–1849 (2001).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Islam, S. et al. Highly multiplexed and strand-specific single-cell RNA 5' end sequencing. Nat. Protoc. 7, 813–828 (2012).
Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Goodarzi, H. et al. Endogenous tRNA-derived fragments suppress breast cancer progression via YBX1 displacement. Cell 161, 790–802 (2015).
Wang, Y. et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679 (2005).
Kuo, W. H. et al. Molecular characteristics and metastasis predictor genes of triple-negative breast cancer: a clinical study of triple-negative breast carcinomas. PLoS ONE 7, e45831 (2012).
Gyorffy, B. et al. An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients. Breast Cancer Res. Treat. 123, 725–731 (2010).
Acknowledgements
We thank S. Kurdistani and Saeed Tavazoie for critical reading of previous versions of this manuscript. We thank members of our laboratory for helpful discussion. We thank the Rockefeller Genomics Resource Center (C. Zhao, Director), Bio-Imaging Resource Center (A. North, Director), and the Flow Cytometry Resource Center (S. Mazel, Director) for technical assistance and resources. We thank A. Dunn for technical assistance. A.N. is supported by a Medical Scientist Training Program grant from the National Institute of General Medical Sciences of the National Institutes of Health under award number T32GM07739 to the Weill Cornell/Rockefeller/Sloan-Kettering Tri-Institutional MD-PhD Program. S.F.T. is a Department of Defense Era of Hope Scholar and a Rita Allen Foundation Scholar.
Author information
Authors and Affiliations
Contributions
S.F.T. conceived the project and supervised all research. A.N. and S.F.T. wrote the manuscript. A.N., M.Y., and S.F.T. designed the experiments. A.N. and M.Y. performed the experiments. H.G. and A.N. performed sequencing analyses.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figures 1-10 and Supplementary Tables 1-2 (PDF 5212 kb)
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Nguyen, A., Yoshida, M., Goodarzi, H. et al. Highly variable cancer subpopulations that exhibit enhanced transcriptome variability and metastatic fitness. Nat Commun 7, 11246 (2016). https://doi.org/10.1038/ncomms11246
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/ncomms11246
- Springer Nature Limited
This article is cited by
-
Data-driven structural analysis of small cell lung cancer transcription factor network suggests potential subtype regulators and transition pathways
npj Systems Biology and Applications (2023)
-
COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning
Communications Biology (2023)
-
Two isoleucyl tRNAs that decode synonymous codons divergently regulate breast cancer metastatic growth by controlling translation of proliferation-regulating genes
Nature Cancer (2022)
-
Machine learning phenomics (MLP) combining deep learning with time-lapse-microscopy for monitoring colorectal adenocarcinoma cells gene expression and drug-response
Scientific Reports (2022)
-
Self-supervised classification of subcellular morphometric phenotypes reveals extracellular matrix-specific morphological responses
Scientific Reports (2022)