Introduction

The value of molecular methods for cancer medicine stems from the enormous breadth of information that can be obtained from a single tumor sample. Microarrays assess thousands of transcripts, or millions of single nucleotide polymorphisms (SNPs), and next-generation sequencing (NGS) can reveal copy number and genetic aberrations at base pair resolution. However, because most applications require bulk DNA or RNA from over 100,000 cells, they are limited to providing global information on the average state of the population of cells. Solid tumors are complex mixtures of cells including non-cancerous fibroblasts, endothelial cells, lymphocytes, and macrophages that often contribute more than 50% of the total DNA or RNA extracted. This admixture can mask the signal from the cancer cells and thus complicate the inter- and intra-tumor comparisons, which are the basis of molecular classification methods.

In addition, solid tumors are often composed of multiple clonal subpopulations [13], and this heterogeneity further confounds the analysis of clinical samples. Single-cell genomic methods have the capacity to resolve complex mixtures of cells in tumors. When multiple clones are present in a tumor, molecular assays reflect an average signal of the population, or, alternatively, only the signal from the dominant clone, which may not be the most malignant clone present in the tumor. This becomes particularly important as molecular assays are employed for directing targeted therapy, as in the use of ERBB2 (Her2-neu) gene amplification to identify patients likely to respond to Herceptin (trastuzumab) treatment in breast cancer, where 5% to 30% of all patients have been reported to exhibit such genetic heterogeneity [47].

Aneuploidy is another hallmark of cancer [8], and the genetic lineage of a tumor is indelibly written in its genomic profile. While whole genomic sequencing of a single cell is not possible using current technology, copy number profiling of single cells using sparse sequencing or microarrays can provide a robust measure of this genomic complexity and insight into the character of the tumor. This is evident in the progress that has been made in many studies of single-cell genomic copy number [914]. In principle, it should also be possible to obtain a partial representation of the transcriptome from a single cell by NGS and a few successes have been reported for whole transcriptome analysis in blastocyst cells [15, 16]; however, as yet, this method has not been successfully applied to single cancer cells.

The clinical value of single-cell genomic methods will be in profiling scarce cancer cells in clinical samples, monitoring CTCs, and detecting rare clones that may be resistant to chemotherapy (Figure 1). These applications are likely to improve all three major themes of oncology: detection, progression, and prediction of therapeutic efficacy. In this review, we outline the current methods and those in development for isolating single cells and analyzing their genomic profile, with a particular focus on profiling genomic copy number.

Figure 1
figure 1

Medical applications of single-cell sequencing. (a) Profiling of rare tumor cells in scarce clinical samples, such as fine-needle aspirates of breast lesions. (b) Isolation and profiling of circulating tumor cells in the blood. (c) Identification and profiling of rare chemoresistant cells before and after adjuvant therapy.

Background

Although genomic profiling by microarray comparative genomic hybridization (aCGH) has been in clinical use for constitutional genetic disorders for some time, its use in profiling cancers has been largely limited to basic research. Its potential for clinical utility is yet to be realized. Specific genomic events such as Her2-neu amplification as a target for Herceptin are accepted clinical markers, and genome-wide profiling for copy number has been used only in preclinical studies and only recently been incorporated into clinical trial protocols [17]. However, in cohort studies, classes of genomic copy number profiles of patients have shown strong correlation with patient survival [18, 19]. Until the breakthrough of NGS, the highest resolution for identifying copy number variations was achieved through microarray-based methods, which could detect amplifications and deletions in cancer genomes, but could not discern copy neutral alterations such as translocations or inversions. NGS has changed the perspective on genome profiling, since DNA sequencing has the potential to identify structural changes, including gene fusions and even point mutations, in addition to copy number. However, the cost of profiling a cancer genome at base pair resolution remains out of range for routine clinical use, and calling mutations is subject to ambiguities as a result of tumor heterogeneity, when DNA is obtained from bulk tumor tissue. The application of NGS to genomic profiling of single cells developed by the Wigler group and Cold Spring Harbor Lab and described here has the potential to not only acquire an even greater level of information from tumors, such the variety of cells present, but further to obtain genetic information from the rare cells that may be the most malignant.

Isolating single cells

To study a single cell it must first be isolated from cell culture or a tissue sample in a manner that preserves biological integrity. Several methods are available to accomplish this, including micromanipulation, laser-capture microdissection (LCM) and flow cytometry (Figure 2a-c). Micromanipulation of individual cells using a transfer pipette has been used for isolating single cells from culture or liquid samples such as sperm, saliva or blood. This method is readily accessible but labor intensive, and cells are subject to mechanical shearing. LCM allows single cells to be isolated directly from tissue sections, making it desirable for clinical applications. This approach requires that tissues be sectioned, mounted and stained so that they can be visualized to guide the isolation process. LCM has the advantage of allowing single cells to be isolated directly from morphological structures, such as ducts or lobules in the breast. Furthermore, tissue sections can be stained with fluorescent or chromogenic antibodies to identify specific cell types of interest. The disadvantage of LCM for genomic profiling is that some nuclei will inevitably be sliced in the course of tissue sectioning, causing loss of chromosome segments and generating artifacts in the data.

Figure 2
figure 2

Isolating single cells and techniques for genomic profiling. (a-c) Single-cell isolation methods. (d-f) Single-cell genomic profiling techniques. (a) Micromanipulation, (b) laser-capture microdissection (LCM), (c) fluorescence-activated cell sorting (FACS), (d) cytological methods to visualize chromosomes in single cells, (e) whole genome amplification (WGA) and microarray comparative genomic hybridization (CGH), (f) WGA and next-generation sequencing.

Flow cytometry using fluorescence-activated cell sorting (FACS) is by far the most efficient method for isolating large numbers of single cells or nuclei from liquid suspensions. Although it requires sophisticated and expensive instrumentation, FACS is readily available at most hospitals and research institutions, and is used routinely to sort cells from hematopoietic cancers. Several instruments such as the BD Aria II/III (BD Biosciences, San Jose, CA, USA) and the Beckman Coulter MO-FLO (Beckman Coulter, Brea, CA, USA) have been optimized for sorting single cells into 96-well plates for subcloning cell cultures. FACS has the added advantage that cells can be labeled with fluorescent antibodies or nuclear stains (4',6-diamidino-2-phenyl indole dihydrochloride (DAPI)) and sorted into different fractions for downstream analysis.

Methods for single-cell genomic profiling

Several methods have been developed to measure genome-wide information of single cells, including cytological approaches, aCGH and single-cell sequencing (Figure 2d-f). Some of the earliest methods to investigate the genetic information contained in single cells emerged in the 1970s in the fields of cytology and immunology. Cytological methods such as spectral karyotyping, fluorescence in situ hybridization (FISH) and Giemsa staining enabled the first qualitative analysis of genomic rearrangements in single tumor cells (illustrated in Figure 2d). In the 1980s, the advent of PCR enabled immunologists to investigate genomic rearrangements that occur in immunocytes, by directly amplifying and sequencing DNA from single cells [2022]. Together, these tools provided the first insight into the remarkable genetic heterogeneity that characterizes solid tumors [2328].

While PCR could amplify DNA from an individual locus in a single cell, it could not amplify the entire human genome in a single reaction. Progress was made using PCR-based strategies such as primer extension pre-amplification [29] to amplify the genome of a single cell; however, these strategies were limited in coverage when applied to human genomes. A major milestone occurred with the discovery of two DNA polymerases that displayed remarkable processivity for DNA synthesis: Phi29 (Φ29) isolated from the Bacillus subtilis bacteriophage, and Bst polymerase isolated from Bacillus stearothermophilus. Pioneering work in the early 2000s demonstrated that these enzymes could amplify the human genome over 1,000-fold through a mechanism called multiple displacement amplification [30, 31]. This approach, called whole genome amplification (WGA), has since been made commercially available (New England Biolabs, Ipswich, MA, USA; QIAGEN, Valencia, CA, USA; Sigma-Aldrich, St Louis, MO, USA; Rubicon Genomics, Ann Arbor, MI, USA).

Coupling WGA with array CGH enabled several groups to begin measuring genomic copy number in small populations of cells, and even single cells (Figure 2e). These studies showed that it is possible to profile copy number in single cells in various cancer types, including CTCs [9, 12, 32], colon cancer cell lines [13] and renal cancer cell lines [14]. While pioneering, these studies were also challenged by limited resolution and reproducibility. However, in practice, probe-based approaches such as aCGH microarrays are problematic for measuring copy number using methods such as WGA, where amplification is not uniform across the genome. WGA fragments amplified from single cells are sparsely distributed across the genome, representing no more than 10% of the unique human sequence [10]. This results in zero coverage for up to 90% of probes, ultimately leading to decreased signal to noise ratios and high standard deviations in copy number signal.

An alternative approach is to use NGS. This method provides a major advantage over aCGH for measuring WGA fragments because it provides a non-targeted approach to sample the genome. Instead of differential hybridization to specific probes, sequence reads are integrated over contiguous and sequential lengths of the genome and all amplified sequences are used to calculate copy number. In a recently published study, we combined NGS with FACS and WGA in a method called single-nucleus sequencing (SNS) to measure high-resolution (approximately 50 kb) copy number profiles of single cells [10]. Flow-sorting of DAPI-stained nuclei isolated from tumor or other tissue permits deposition of single nuclei into individual wells of a multiwell plate, but, moreover, permits sorting cells by total DNA content. This step purifies normal nuclei (2 N) from aneuploid tumor nuclei (not 2 N), and avoids collecting degraded nuclei. We then use WGA to amplify the DNA from each well by GenomePlex (Sigma-Genosys, The Woodlands, TX, USA) to yield a collection of short fragments, covering approximately 6% (mean 5.95%, SEM ± 0.229, n = 200) of the human genome uniquely [10], which are then processed for Illumina sequencing (Illumina, San Diego, CA, USA) (Figure 3a). For copy number profiling, deep sequencing is not required. Instead, the SNS method requires only sparse read depth (as few as 2 million uniquely mapped 76 bp single-end reads) evenly distributed along the genome. For this application, Illumina sequencing is preferred over other NGS platforms because it produces the highest number of short reads across the genome at the lowest cost.

Figure 3
figure 3

Single-nucleus sequencing of breast tumors. (a) Single-nucleus sequencing involves isolating nuclei, staining with 4',6-diamidino-2-phenyl indole dihydrochloride (DAPI), flow-sorting by total DNA content, whole genome amplification (WGA), Illumina library construction, and quantifying genomic copy number using sequence read depth. (b) Phylogenetic tree constructed from single-cell copy number profiles of a monogenomic breast tumor. (c) Phylogenetic tree constructed using single-cell copy number profiles from a polygenomic breast tumor, showing three clonal subpopulations of tumor cells.

To calculate the genomic copy number of a single cell, the sequence reads are grouped into intervals or ‘bins’ across the genome, providing a measure of copy number based on read density in each of 50,000 bins, resulting in a resolution of 50 kb across the genome. In contrast to previous studies that measure copy number from sequence read depth using fixed bin intervals across the human genome [3337], we have developed an algorithm that uses variable length bins to correct for artifacts associates with WGA and mapping. The length of each bin is adjusted in size based on a mapping simulation using random DNA sequences, depending on the expected unique read density within each interval. This corrects regions of the genome with repetitive elements (where fewer reads map), and biases introduced, such as GC content. The variable bins are then segmented using the Kolmogorov-Smirnov (KS) statistical test [1, 38]. Alternative methods for sequence data segmentation, such as hidden Markov models, have been developed [33], but have not yet been applied to sparse single-cell data. In practice, KS segmentation algorithms work well for complex aneuploid cancer genomes that contain many variable copy number states, whereas hidden Markov models are better suited for simple cancer genomes with fewer rearrangements, and normal individuals with fewer copy number states. To determine the copy number states in sparse single-cell data, we count the reads in variable bins and segments with KS, then use a Gaussian smoothed kernel density function to sample all of the copy number states and determine the ground state interval. This interval is used to linearly transform the data, and round to the nearest integer, resulting in the absolute copy number profile of each single cell [10]. This processing allows amplification artifacts associated with WGA to be mitigated informatically, reducing biases associated with GC content [9, 14, 39, 40] and mapability of the human genome [41]. Other artifacts, such as over-replicated loci (‘pileups’), as previously reported in WGA [40, 42, 43], do occur, but they are not at recurrent locations in different cells, and are sufficiently randomly distributed and sparse so as not to affect counting over the breadth of a bin, when the mean interval size is 50 kb. While some WGA methods have reported the generation of chimeric DNA molecules in bacteria [44], these artifacts would mainly affect paired-end mappings of structural rearrangements, not single-end read copy number measurements that rely on sequence read depth. In summary, NGS provides a powerful tool to mitigate artifacts previously associated with quantifying copy number in single cells amplified by WGA, and eliminates the need for a reference genome to normalize artifacts, making it possible to calculate absolute copy number from single cells.

Clinical application of single-cell sequencing

While single-cell genomic methods such as SNS are feasible in a research setting, they will not be useful in the clinic until advances are made in reducing the cost and time of sequencing. Fortunately, the cost of DNA sequencing is falling precipitously as a direct result of industry competition and technological innovation. Sequencing has an additional benefit over microarrays in the potential for massive multiplexing of samples using barcoding strategies. Barcoding involves adding a specific 4 to 6 base oligonucleotide sequence to each library as it is amplified, so that samples can be pooled together in a single sequencing reaction [45, 46]. After sequencing, the reads are deconvoluted by their unique barcodes for downstream analysis. With the current throughput of the Illumina HiSeq2000, it is possible to sequence up to 25 single cells on a single-flow cell lane, thus allowing 200 single cells to be profiled in a single run. Moreover, by decreasing the genomic resolution of each single-cell copy number profile (for example from 50 kb to 500 kb) it is possible to profile hundreds of cells in parallel on a single lane, or thousands on a run, making single-cell profiling economically feasible for clinical applications.

A major application of single-cell sequencing will be in the detection of rare tumor cells in clinical samples, where fewer than a hundred cells are typically available. These samples include body fluids such as lymph, blood, sputum, urine, or vaginal or prostate fluid, as well clinical biopsy samples such as fine-needle aspirates (Figure 1a) or core biopsy specimens. In breast cancer, patients often undergo fine-needle aspirates, nipple aspiration, ductal lavages or core biopsies; however, genomic analysis is rarely applied to these samples because of limited DNA or RNA. Early stage breast cancers, such as low-grade ductal carcinoma in situ (DCIS) or lobular carcinoma in situ, which are detected by these methods, present a formidable challenge to oncologists, because only 5% to 10% of patients with DCIS typically progress to invasive carcinomas [4751]. Thus, it is difficult for oncologists to determine how aggressively to treat each individual patient. Studies of DCIS using immunohistochemistry support the idea that many early stage breast cancers exhibit extensive heterogeneity [52]. Measuring tumor heterogeneity in these scarce clinical samples by genomic methods may provide important predictive information on whether these tumors will evolve and become invasive carcinomas, and they may lead to better treatment decisions by oncologists.

Early detection using circulating tumor cells

Another major clinical application of single-cell sequencing will be in the genomic profiling of copy number or sequence mutations in CTCs and disseminated tumor cells (DTCs) (Figure 1b). Although whole genome sequencing of single CTCs is not yet technically feasible, with future innovations, such data may provide important information for monitoring and diagnosing cancer patients. CTCs are cells that intravasate into the circulatory system from the primary tumor, while DTCs are cells that disseminate into tissues such the bone. Unlike other cells in the circulation, CTCs often contain epithelial surface markers (such as epithelial cell adhesion molecule (EpCAM)) that allow them to be distinguished from other blood cells. CTCs present an opportunity to obtain a non-invasive ‘fluid biopsy’ that would provide an indication of cancer activity in a patient, and also provide genetic information that could direct therapy over the course of treatment. In a recent phase II clinical study, the presence of epithelial cells (non-leukocytes) in the blood or other fluids correlated strongly with active metastasis and decreased survival in patients with breast cancer [53]. Similarly, in melanoma it was shown that counting more than two CTCs in the blood correlated strongly with a marked decrease in survival from 12 months to 2 months [54]. In breast cancer, DTCs in the bone marrow (micrometastases) have also correlated with poor overall patient survival [55].While studies that count CTCs or DTCs clearly have prognostic value, more detailed characterization of their genomic lesions are necessary to determine whether they can help guide adjuvant or chemotherapy.

Several new methods have been developed to count the number of CTCs in blood, and to perform limited marker analysis on isolated CTCs using immunohistochemistry and FISH. These methods generally depend on antibodies against EpCAM to physically isolate a few epithelial cells from the nearly ten million non-epithelial leukocytes in a typical blood draw. CellSearch (Veridex, LLC, Raritan, NJ, USA) uses a series of immunomagnetic beads with EpCAM markers to isolate tumor cells and stain them with DAPI to visualize the nucleus. This system also uses CD45 antibodies to negatively select immune cells from the blood samples. Although CellSearch is the only instrument that is currently approved for counting CTCs in the clinic, a number of other methods are in development, and these are based on microchips [56], FACS [57, 58] or immunomagnetic beads [54] that allow CTCs to be physically isolated. However, a common drawback of all methods is that they depend on EpCAM markers that are not 100% specific (antibodies can bind to surface receptors on blood cells) and the methods for distinguishing actual tumor cells from contaminants are not dependable [56].

Investigating the diagnostic value of CTCs with single-cell sequencing has two advantages: impure mixtures can be resolved, and limited amounts of input DNA can be analyzed. Even a single CTC in an average 7.5 ml blood draw (which is often the level found in patients) can be analyzed to provide a genomic profile of copy number aberrations. By profiling multiple samples from patients, such as the primary tumor, metastasis and CTCs, it would be possible to trace an evolutionary lineage and determine the pathways of progression and site of origin.

Monitoring or detecting CTCs or DTCs in normal patients may also provide a non-invasive approach for the early detection of cancer. Recent studies have shown that many patients with non-metastatic primary tumors show evidence of CTCs [53, 59]. While the function of these cells is largely unknown, several studies have demonstrated prognostic value of CTCs using gene-specific molecular assays such as reverse transcriptase (RT)-PCR [6062]. Single-cell sequencing could greatly improve the prognostic value of such methods [63]. Moreover, if CTCs generally share the mutational profile of the primary tumors (from which they are shed), then they could provide a powerful non-invasive approach to detecting early signs of cancer. One day, a general physician may be able to draw a blood sample during a routine check-up and profile CTCs indicating the presence of a primary tumor somewhere in the body. If these genomic profiles reveal mutations in cancer genes, then medical imaging (magnetic resonance imaging or computed tomography) could be pursued to identify the primary tumor site for biopsy and treatment. CTC monitoring would also have important applications in monitoring residual disease after adjuvant therapy to ensure that the patients remain in remission.

The analysis of scarce tumor cells may also improve the early detection of cancers. Smokers could have their sputum screened on regular basis to identify rare tumor cells with genomic aberrations that provide an early indication of lung cancer. Sperm ejaculates contain a significant amount of prostate fluid that may contain rare prostate cancer cells. Such cells could be purified from sperm using established biomarkers such as prostate-specific antigen [64] and profiled by single-cell sequencing. Similarly, it may be possible to isolate ovarian cancer cells from vaginal fluid using established biomarkers, such as ERCC5 [65] or HE4 [66], for genomic profiling. The genomic profile of these cells may provide useful information on the lineage of the cell and from which organ it has been shed. Moreover, if the genomic copy number profiles of rare tumor cells accurately represent the genetic lesions in the primary tumor, then they may provide an opportunity for targeted therapy. Previous work has shown that classes of genomic copy number profiles correlate with survival [18], and thus the profiles of rare tumor cells may have predictive value in assessing the severity of the primary cancer from which they have been shed.

Investigating tumor heterogeneity with SNS

Tumor heterogeneity has long been reported in morphological [6770] and genetic [26, 28, 7176] studies of solid tumors, and more recently in genomic studies [13, 10, 7781], transcriptional profiles [82, 83] and protein levels [52, 84] of cells within the same tumor (summarized in Table 1). Heterogeneous tumors present a formidable challenge to clinical diagnostics, because sampling single regions within a tumor may not represent the population as a whole. Tumor heterogeneity also confounds basic research studies that investigate the fundamental basis of tumor progression and evolution. Most current genomic methods require large quantities of input DNA, and thus their measurements represent an average signal across the population. In order to study tumor subpopulations, several studies have stratified cells using regional macrodissection [1, 2, 79, 85], DNA ploidy [1, 86], LCM [78, 87] or surface receptors [3] prior to applying genomic methods. While these approaches do increase the purity of the subpopulations, they remain admixtures. To fully resolve such complex mixtures, it is necessary to isolate and study the genomes of single cells.

Table 1 Summary of tumor heterogeneity studies

In the single-cell sequencing study described above, we applied SNS to profile hundreds of single cells from two primary breast carcinomas to investigate substructure and infer genomic evolution [10]. For each tumor we quantified the genomic copy number profile of each single cell and constructed phylogenetic trees (Figure 3). Our analysis showed that one tumor (T16) was monogenomic, consisting of cells with tightly conserved copy number profiles throughout the tumor mass, and was apparently the result of a single major clonal expansion (Figure 3b). In contrast, the second breast tumor (T10) was polygenomic (Figure 3c), displaying three major clonal subpopulations that shared a common genetic lineage. These subpopulations were organized into different regions of the tumor mass: the H subpopulation occupied the upper sectors of the tumor (S1 to S3), while the other two tumor subpopulations (AA and AB) occupied the lower regions (S4 to S6). The AB tumor subpopulation in the lower regions contained a massive amplification of the KRAS oncogene and homozygous deletions of the EFNA5 and COL4A5 tumor suppressors. When applied to clinical biopsy or tumor samples, such phylogenetic trees are likely to be useful for improving the clinical sampling of tumors for diagnostics, and may eventually aid in guiding targeted therapies for the patient.

Response to chemotherapy

Tumor heterogeneity is likely to play an important role in the response to chemotherapy [88]. From a Darwinian perspective, tumors with the most diverse allele frequencies will have the highest probability of surviving a catastrophic selection pressure such as a cytotoxic agent or targeted therapy [89, 90]. A major question revolves around whether resistant clones are pre-existing in the primary tumor (prior to treatment) or whether they emerge in response to adjuvant therapy by acquiring de novo mutations. Another important question is whether heterogeneous tumors generally show a poorer response to adjuvant therapy. Using samples of millions of cells, recent studies in cervical cancer treated with cis-platinum [79] and ovarian carcinomas treated with chemoradiotherapy [91] have begun to investigate these questions by profiling tumors for genomic copy number before and after treatment. Both studies reported detecting some heterogeneous tumors with pre-existing resistant subpopulations that expanded further after treatment. However, since these studies are based on signals derived from populations of cells, their results are likely to underestimate the total extent of genomic heterogeneity and frequency of resistant clones in the primary tumors. These questions are better addressed using single-cell sequencing methods, because they can provide a fuller picture of the extent of genomic heterogeneity in the primary tumor. The degree of genomic heterogeneity may itself provide useful prognostic information, guiding patients who are deciding on whether to elect chemotherapy and the devastating side-effects that often accompany it. In theory, patients with monogenomic tumors will respond better and show better overall survival compared with patients with polygenomic tumors, which may have a higher probability of developing or having resistant clones, that is, more fuel for evolution. Single-cell sequencing can in principle also provide a higher sensitivity for detecting rare chemoresistant clones in primary tumors (Figure 1c). Such methods will enable the research community to investigate questions of whether resistant clones are pre-existing in primary tumors or arise in response to therapies. Furthermore, by multiplexing and profiling hundreds of single cells from a patient’s tumor, it will possible to develop a more comprehensive picture of the total genomic diversity in a tumor before and after adjuvant therapy.

Future directions

Single-cell sequencing methods such as SNS provide an unprecedented view of the genomic diversity within tumors and provide the means to detect and analyze the genomes of rare cancer cells. While cancer genome studies on bulk tissue samples can provide a global spectrum of mutations that occur within a patient [81, 92], they cannot determine whether all of the tumor cells contain the full set of mutations, or alternatively whether different subpopulations contain subsets of these mutations that in combination drive tumor progression. Moreover, single-cell sequencing has the potential to greatly improve our fundamental understanding of how tumors evolve and metastasize. While single-cell sequencing methods using WGA are currently limited to low coverage of the human genome (approximately 6%), emerging third-generation sequencing technologies such as that developed by Pacific Biosystems (Lacey, WA, USA) [93] may greatly improve coverage through single-molecule sequencing, by requiring lower amounts of input DNA.

In summary, the future medical applications of single-cell sequencing will be in early detection, monitoring CTCs during treatment of metastatic patients, and measuring the genomic diversity of solid tumors. While pathologists can currently observe thousands of single cells from a cancer patient under the microscope, they are limited to evaluating copy number at a specific locus for which FISH probes are available. Genomic copy number profiling of single cells can provide a fuller picture of the genome, allowing thousands of potentially aberrant cancer genes to be identified, thereby providing the oncologist with more information on which to base treatment decisions. Another important medical application of single-cell sequencing will be in the profiling of CTCs for monitoring disease during the treatment of metastatic disease. While previous studies have shown value in the simple counting of epithelial cells in the blood [53, 54], copy number profiling of single CTCs may provide a fuller picture, allowing clinicians to identify genomic amplifications of oncogenes and deletions of tumor suppressors. Such methods will also allow clinicians to monitor CTCs over time following adjuvant or chemotherapy, to determine if the tumor is likely to show recurrence.

The major challenge ahead for translating single-cell methods into the clinic will be the innovation of multiplexing strategies to profile hundreds of single cells quickly and at a reasonable cost. Another important aspect is to develop these methods for paraffin-embedded tissues (rather than frozen), since many samples are routinely processed in this manner in the clinic. When future innovations allow whole genome sequencing of single tumor cells, oncologists will also be able to obtain the full spectrum of genomic sequence mutations in cancer genes from scarce clinical samples. However, this remains a major technical challenge, and is likely to be the intense focus of both academia and industry in the coming years. These methods are likely to improve all three major themes of medicine: prognostics, diagnostics and chemotherapy, ultimately improving the treatment and survival of cancer patients.