Next-generation sequencing to guide cancer therapy
As a result of multiple technological and practical advances, high-throughput sequencing, known more commonly as “next-generation” sequencing (NGS), can now be incorporated into standard clinical practice. Whereas early protocols relied on samples that were harvested outside of typical clinical pathology workflows, standard formalin-fixed, paraffin-embedded specimens can more regularly be used as starting materials for NGS. Furthermore, protocols for the analysis and interpretation of NGS data, as well as knowledge bases, are being amassed, allowing clinicians to act more easily on genomic information at the point of care for patients. In parallel, new therapies that target somatically mutated genes identified through clinical NGS are gaining US Food and Drug Administration (FDA) approval, and novel clinical trial designs are emerging in which genetic identifiers are given equal weight to histology. For clinical oncology providers, understanding the potential and the limitations of DNA sequencing will be crucial for providing genomically driven care in this era of precision medicine.
Circulating tumor cell
circulating tumor DNA
Food and Drug Administration
Molecular Analysis for Therapy Choice
Major histocompatibility complex
Single nucleotide variant
The Cancer Genome Atlas
Many biological discoveries about cancer have been the product of a reductionist approach, which focuses on modeling phenomena with as few major actors and interactions as possible [1, 2]. This reductionist thinking led the initial theories on carcinogenesis to be centered on how many “hits” or genetic mutations were necessary for a tumor to develop. It was assumed that each type of cancer would progress through a similar, if not identical, process of genetic hits. Indeed, there are a handful of cancer types, such as chronic myelogenous leukemia, that feature a single and pathognomonic DNA mutation. Working on this assumption, early methods to explore the genomic foundations of different cancers involved targeted exploration of specific variants and genes in a low-throughput fashion . However, most cancers are genetically complex, and are better defined by the activation of signaling pathways rather than a defined set of mutations. The success of the Human Genome Project inspired similar projects looking at the genome in various cancers . That success, along with the increased affordability and reliability of sequencing , has led to the integration of genome science into clinical practice. The use of these data to assist in diagnosis is generally referred to as precision medicine [6, 7].
Next-generation sequencing (NGS), also known as massively parallel sequencing, represents an effective way to capture a large amount of genomic information about a cancer. Most NGS technologies revolve around sequencing by synthesis . Each DNA fragment to be sequenced is bound to an array, and then DNA polymerase adds labeled nucleotides sequentially. A high-resolution camera captures the signal from each nucleotide becoming integrated and notes the spatial coordinates and time. The sequence at each spot can then be inferred by a computer program to generate a contiguous DNA sequence, referred to as a read.
Choice of assay method
In hybrid capture, relevant DNA sequences are hybridized to probes, which are biotinylated. The biotin is bound to streptavidin beads and then non-bound DNA is washed away . This has the advantage of more reliable detection of copy number changes, although some research groups are using amplicon-based sequencing to detect copy number changes as well . The disadvantages of hybrid capture include a higher required depth of sequencing and a more advanced bioinformatics platform (see below). Hybrid capture does have the ability to detect fusion proteins, as they will be pulled down with the baited DNA. Fusions are still a challenge for hybrid capture, however, because while the fusion protein may be common, the breakpoint itself is found over the full range of an intron . If there is a high suspicion that a sample may contain clinically important fusions, an assay based on cDNA should be considered. These assays will show the fused exon–exon junctions, obviating the need to find the genomic breakpoint . Calling variants and DNA copy number changes can be difficult with both methods (as well as with microarray-based assays) when there is high tumor heterogeneity  or low tumor purity . For example, a high copy number gain in a small number of cells may be interpreted as a widespread low copy number gain. Thus, putatively actionable copy number variations are typically validated by fluorescent in situ hybridization in clinical settings.
Choice of clinical sample
Most specimens that are examined by anatomical pathologists are fixed in formalin (4 % formaldehyde) and embedded in paraffin (FFPE). The formalin introduces crosslinks that can both fragment DNA and cause chemical alterations that may alter sequencing results . Early studies demonstrated that using FFPE specimens in PCR-based sequencing led to more errors than using frozen specimens . Some projects, including The Cancer Genome Atlas (TCGA), required the use of fresh frozen tissue . There has been great progress in altering DNA extraction methods such that FFPE specimens are just as useful for NGS as fresh frozen samples . While there have been some early attempts at using FFPE specimens for other modalities besides DNA sequencing [23, 24], these tests are not yet widely used clinically, and the reliability of FFPE versus frozen samples is less well established. Clinicians should feel comfortable requesting NGS on FFPE samples, and do not necessarily have to handle the specimens differently from other diagnostic samples.
For most cancers, the standard pathological diagnosis will require a direct sample of tissue for biopsy. However, many research groups are exploring the diagnostic and therapeutic utility of “liquid biopsies”. One such source of genetic material for disease monitoring are circulating tumor cells (CTCs). These suffer from a low frequency (approximately 1 cell in 106–108 total circulating cells) and must, therefore, go through an enrichment step. A large number of CTC collection and sequencing protocols have been reported and are being evaluated prospectively [25, 26]. Alternatively, DNA released from apoptotic cells in the tumor can be assayed from the peripheral blood, and is usually referred to as circulating tumor DNA (ctDNA). Progress in utilizing ctDNA was recently reviewed , with the authors concluding that this approach shows great promise for the purpose of detecting minimal residual disease , or helping to improve diagnosis by looking for mutations specifically associated with a particular disease type . RNA is much less stable than DNA in circulating blood, but RNA species can be preserved in extracellular vesicles and information about tumor recurrence can be gleaned from them as well . However, reproducibility has plagued RNA-based studies, and RNA assays are not yet ready for clinical use .
Tumor heterogeneity is both a challenge for liquid biopsies and the reason they can be more useful than tissue biopsies . Initially, mutations with a low allele fraction owing to only being present in a subset of tumor cells may be missed by liquid biopsies, as the low amount of DNA input to the assay is compounded by the low incidence of the mutation. This makes distinguishing low allele fraction mutants from errors that are inherent to high-throughput sequencing very difficult (see below). However, the ability for minimally invasive samples to be sequenced repeatedly over time will allow for faster recognition of known resistance mutations. Sequencing artifacts should be random, but sequences that appear serially can be weighted and followed more closely. It should also be noted that errors in aligning reads to the correct locus will give what appear to be recurrent mutations, so all mutations that are used for serial tracking of tumor burden should be manually reviewed. Overall, there is much promise in sequencing tumor DNA from peripheral blood, but its use is still under investigation and clinicians should rely on other methods for tracking disease progression.
Clinical NGS data analysis
An additional area of innovation for clinical NGS involves bioinformatic analysis of raw genomic data and rapid clinical interpretation for consideration by the treating clinician. The first step in this process is to assign a genetic location to the read by mapping it on a reference genome . Some percentage of the reads will be “unmappable”, that is, the software cannot assign the sequence to a unique genomic location . An individual genome will have a number of deviations from a reference genome, referred to as single nucleotide variants (SNVs), and/or structural alterations such as insertions, deletions or translocations. Somatic mutation analysis, as is done in cancer, involves a number of additional challenges. There are robust algorithms available for identifying many clinically relevant alterations that occur as point mutations, short insertions or deletions, or copy number aberrations in clinical samples analyzed by NGS .
However, as DNA mutations accumulate within a tumor, there can be considerable sequence heterogeneity even within a single primary tumor . It can be very challenging to discern whether a read of a low allele fraction represents a true mutation that exists within a subset of tumor cells or is an artifact that should be discarded. While retrospective research endeavors may not require the identification of all possible clinically actionable alterations in a cohort study, prospective clinical cancer genomics requires increased sensitivity to detect low allelic fraction alterations in impure tumor samples that may impact an individual patient’s care. These issues can be exacerbated by low amount of tumor relative to normal tissue within the sample and mitigated by having more reads, that is, greater coverage. If a detected mutation is the result of a low allele fraction within the sample, the number of reads will rise proportionally with total reads, whereas if it is a technical artifact, the number of reads should be random and can be eliminated from analysis. Estimating tumor percentage from a standard pathology specimen should be helpful for giving an expected allele fraction within the sample, but is prone to very high inter-observer variation .
A second challenge is frequent DNA fusions, which represent a significant component of the clinically actionable spectrum of alterations in oncology (for example, ALK fusions, BCR-ABL fusions). Within NGS data, these events will cause both ends of a read to be mappable, but the whole contiguous sequence is not. This is referred to as a split read, and can be challenging in the presence of a high number of structural rearrangements, such as in cancers with chromothripsis . Notably, since most clinically relevant somatic fusions occur outside of coding regions, whole-exome sequencing assays often miss these variants, and gene panels that are not designed to cover known fusion territories will also be unable to identify these fusion products. Thus, when analyzing a clinical NGS data set, it is critical to understand the analytical limitations of a given assay as represented in the downstream data analysis.
Clinical interpretation of NGS data
After identification of the set of alterations within a given patient’s tumor, many cases will yield a small set of clinically relevant events as well as a long list of sequencing variants of uncertain significance. An emerging body of interpretation algorithms that automate the clinical relevance of the alterations will enable more rapid clinical interpretation of cancer genomic sequencing data. For instance, one algorithm called PHIAL applies a heuristic method to rank alterations by clinical and biological relevance, followed by intra-sample pathway analysis to determine potentially druggable nodes [22, 37]. As such approaches mature, they will be better equipped to apply tumor-specific “priors” to the genomic data, along with genotype–phenotype therapeutic outcomes data, to enable probabilistic approaches to ranking tumor genomic alterations by clinical relevance.
Recommended databases for interpreting somatic mutation results in cancer
Finally, for NGS technologies that require both somatic and germline testing (for example, whole-exome and whole-genome sequencing), the American College of Medical Genetics has released guidelines outlining which variants should always be reported to patients regardless of whether they are relevant to the presenting illness . Since most of these genes involve non-cancer-related syndromes, there is an increasing need for oncologists to be prepared to receive results that bring up unexpected inherited genetic issues . However, the germline component to clinical oncology NGS testing may have significant diagnostic and therapeutic utility, as demonstrated by the identification of pathogenic germline alterations in men with castration-resistant prostate cancer who respond to PARP inhibition , and its role in this arena is evolving rapidly.
FDA-approved drugs with a companion diagnostic
Imatinib, Dasatinib, Nilotinib, Bosutinib
Chronic myelogenous leukemia
Indication for therapy
Chronic myelogenous leukemia
Only indicated for T315I mutations
T315I resistance mutation
Indication for therapy
Exon 19 deletions
Indication for therapy
Indication for therapy
ALK gene fusions
Indication for therapy
KRAS codon 12, 13
Contraindication to therapy
BRCA1 and BRCA2 mutations
Indication for therapy
Efforts are ongoing to determine prognostic biomarkers in clinical oncology. Many false starts have been caused by extrapolating from what is called overfitting, which is building a precise model from a small, non-representative data set. Determining prognosis on the basis of non-druggable mutations from NGS has tended to follow from this tradition. Certain mutations, such as TP53 , portend a poor prognosis in almost all clinical situations. Others, such as ASXL1, are only associated with a particular disease . Mutations in IDH1 and IDH2 indicate a better prognosis in glioma , but often show contradictory results in myeloid malignancies , although this may change as targeted agents move through clinical trials . Caution should be used when communicating prognostic information to patients.
Clinical NGS case study
While much information can be gleaned from a tumor DNA sequence, we must be mindful that DNA itself is rather inert. Better information about the functionality of a cancer can be obtained by integrating information from different modalities. RNA sequencing could give information about the relative expression of a mutated gene. Approaches in mass spectrometry are giving a clearer picture of the proteomics of cancer . TCGA data were collected using a number of different modalities, and are available for several tumor types, and while useful information can be gleaned at different levels, tying everything together remains a prodigious challenge . The methods used to predict phenotypes from integrated -omics data have been reviewed recently .
Furthermore, immunotherapies are quickly gaining prevalence for cancer therapy, especially for use in melanoma . NGS sequencing could become very important for predicting responses to immunotherapy. Neoantigens — that is, antigens that are created by somatic mutations — are correlated with the overall rate of somatic mutation and clinical response . Immune response is mediated by T-cell recognition of these neoantigens . Exome sequencing can be paired with mass spectrometry to determine which neoantigens are successfully presented by the major histocompatibility complex (MHC) .
NGS is inextricably intertwined with the realization of precision medicine in oncology. While it is unlikely to obviate traditional pathologic diagnosis in its current state, it allows a more complete picture of cancer etiology than can be seen with any other modality. However, precision cancer medicine and large-scale NGS testing will require novel approaches towards ensuring evidence-based medicine. Treating each genetic abnormality as an independent variable when hundreds or thousands are queried in every patient will require new trial designs and statistical methods to ensure the utility of these approaches. Broadly, clinicians and translational researchers will need to continue to engage in direct dialog, both within and across institutions, to advance the integration of genomic information and clinical phenotypes, and enable precision cancer medicine through NGS approaches.
This work is supported by the National Institutes of Health 1K08CA188615 (EMVA), Prostate Cancer Foundation (EMVA), and the American Cancer Society (EMVA).
- 6.Roychowdhury S, Iyer MK, Robinson DR, Lonigro RJ, Wu YM, Cao X, et al. Personalized oncology through integrative high-throughput sequencing: a pilot study. Sci Transl Med. 2011;3:111ra121.Google Scholar
- 11.Jones S, Anagnostou V, Lytle K, Parpart-Li S, Nesselbush M, Riley DR, et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci Transl Med. 2015;7:283ra253.Google Scholar
- 23.Hedegaard J, Thorsen K, Lund MK, Hein A-M K, Hamilton-Dutoit SJ, Vang S, et al. Next-generation sequencing of RNA and DNA isolated from paired fresh-frozen and formalin-fixed paraffin-embedded samples of human cancer and normal tissue. PLoS One. 2014;9, e98187.PubMedCentralPubMedCrossRefGoogle Scholar
- 39.COSMIC. http://cancer.sanger.ac.uk/cosmic/.
- 41.cBioPortal. http://www.cbioportal.org/.
- 42.Meric-Bernstam F, Johnson A, Holla V, Bailey AM, Brusco L, Chen K, et al. A decision support framework for genomically informed investigational cancer therapy. J Natl Cancer Inst. 2015;107:djv098. doi:10.1093/jnci/djv098.Google Scholar
- 43.Personalized Cancer Therapy. https://pct.mdanderson.org/.
- 45.My Cancer Genome. http://www.mycancergenome.org/.
- 65.Niederst MJ, Hu H, Mulvey HE, Lockerman EL, Garcia AR, Piotrowska Z, et al. The allelic context of the C797S mutation acquired upon treatment with third generation EGFR inhibitors impacts sensitivity to subsequent treatment strategies. Clin Cancer Res. 2015. doi:10.1158/1078-0432.CCR-15-0560.Google Scholar
- 73.IntOGen. https://www.intogen.org/.
- 75.DGIdb. http://www.dgidb.org/.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.