The Role of New Sequencing Technology in Identifying Rare Mutations in New Susceptibility Genes for Cancer
Massively parallel sequencing (MPS) has transformed our capacity to analyze the genome. Technology now facilitates the production of hundreds of gigabases of sequencing data from single instrument runs and is flexible to study design, allowing analyses of full genomes through a range of targeted sequencing strategies involving one to thousands of samples. The search for new cancer susceptibility genes is no longer limited by sequencing technology; theoretically, MPS can be advantageous to studies searching for genetic variation responsible for cancer predisposition across the risk spectrum. Genetically uncharacterized rare syndromes are now being unraveled at a much increased rate (including rare cancer-related syndromes), yet complex diseases such as common cancers have proven to be more challenging. MPS has revealed the complexity of the human genome, and our current study designs and bioinformatic and computational approaches need to be refined to realize the full potential of MPS for it contribute to the identification of new cancer susceptibility genes.
KeywordsMassively parallel sequencing Common cancer Mendelian disease Bioinformatics Familial cancer
Sequencing continues to be a dynamic and evolving technology. The so-called next-generation sequencing technologies have become affordable to a large number of laboratories with increasingly diverse applications. Although cost still limits inspiration and aspiration in some settings, commercial competition has rapidly made significant advances such as whole human exome sequencing, via high-throughput exome-enrichment kits [1, 2], affordable for studies of considerable scope [3, 4•, 5•, 6, 7••]. Constant increases in the amounts of sequencing data that can be generated from single instrument runs now place whole genome sequencing close to parity with whole exome sequencing from several perspectives . Successful incorporation of barcoding and multiplexing has also impacted the cost-effectiveness of new sequencing technology and has allowed a broader number of research initiatives to use this technology in their study designs. Researchers aiming to identify rare mutations in new susceptibility genes for cancer have embraced this technology. Certainly, the application of this technology has revealed the human genome to be much more complex and variable than previous understood —but has (or will) the application of this technology lead to new discoveries?
Genetically uncharacterized rare syndromes are now being unraveled at a much increased rate (including rare cancer-related syndromes ). These studies have been attractive to early exome sequencing endeavors as they can be successfully performed on a very small number of well-selected samples and have demanded less from bioinformatic variant filtering pipelines, which are still in their infancy [11, 12, 13]. Although intrinsically valuable in their own right, studies of Mendelian disorders have also acted as pilot studies or foundation studies for the application of this new technology to the study of the missing heritability of more complex diseases, such as common cancers. Now the technology is considered routine in a large number of research-active environments, can it play a role in identifying rare mutations in new susceptibility genes for cancer?
What Is the New Technology?
Massively parallel sequencing (MPS) includes many high-throughput approaches to sequencing that share the feature of sequencing massive amounts of DNA (or RNA) in parallel. Although the detail of the chemistry used differs between platforms, all platforms work on the basis of sequencing spatially separated, clonally amplified templates (referred to as a library). MPS is also referred to as “next-generation sequencing,” but here I use the term “massively parallel sequencing” (MPS), allowing for several more and anticipated “next generations” of sequencers. At the time of writing, at least five sequencing platforms with various protocols for library preparation, sequencing chemistry, fragment read length, run times, and amount of sequencing data that can be produced in an instrument run are regarded as having established themselves in the marketplace.
The step from the (previous) gold standard Sanger sequencing (based on the chain termination method described by Sanger et al. ) to MPS (sequencing by synthesis) has not been another small step in the slow incremental improvements to sequencing technology—it is something like a technological revolution. Genetic research of diverse endeavors has been, or will be, substantially changed by this new technology, and activities are already under way to introduce this technology into the diagnostic/clinical arena [15, 16, 17].
To date, exome capture followed by MPS has been a much more feasible option compared with whole genome MPS for most applications in cancer genetics. This is a result of (1) the availability of commercially produced exome-enrichment products, (2) most of the currently identified disease-causing mutations in Mendelian disorders being in coding (or flanking intronic) regions of genes, thus making the approach arguably sound, (3) the availability of strategies to interpret exomic and proximal splice junction sequencing variation in terms of possible disease association [18, 19, 20, 21, 22, 23], and (4) cost (exome MPS had been, until 2012, approximately ten times more expensive that genome MPS despite the cost of the front-end enrichment process/products).
Very recently there has been less compelling justification for undertaking exome capture followed by MPS rather than whole genome MPS for cancer gene discovery projects. The key factors to change in a relatively short period of time are as follows: (1) the cost differential, which is now more moderate (around threefold to fourfold); (2) computational capacities and analytical pipelines have developed to make possible the analysis of significantly larger data files; (3) there is increasing interest in interpreting and ability to interpret noncoding regions of the genome, and (4) some of the technical limitations experienced with exome capture followed by MPS (associated with the capture process) are not encountered.
MPS platforms are not limited to the sequencing of genomes and exomes (DNA), transcriptomes, and methylomes, and other sequencing targeted strategies are emerging as innovative approaches to address current research questions.
The Search for Cancer Predisposition Genes
As linkage studies failed to identify further reliable signals, the search for further cancer predisposition genes moved into a phase of candidate gene screening. This screening was supported by advances in Sanger sequencing that allowed a move away from manual chain-termination radioactive sequencing in slab gels to fluorescent capillary-based sequencing that allowed automated base-calling. Candidate genes were predominantly selected on the basis of having some relationship with the previously identified cancer predisposition genes (same pathway and or function), and led to the identification of another small number of cancer susceptibility genes. Illustrative examples of this include the screening for MUTYH mutations (particularly in APC mutation negative families with a high number of somatic G:C>T:A mutations in their tumors) owing to its role in base excision repair [33, 34] and the screening for PALB2 mutations in BRCA1 and BRCA2 mutation negative affected women from multiple-case breast cancer families because the PALB2 gene product was known to interact with BRCA2, and biallelic mutations in PALB2, similar to biallelic BRCA2 mutations, caused Fanconi anemia [35, 36, 37]. Mutations in these genes are also generally very rare and are thought to be associated with, on average, more moderate cancer risk  and in some instances more moderate phenotype (e.g. biallelic MUTYH mutation associated colon cancer) [33, 38]. Further developments in high-throughput, cost-effective mutation screening methods that generally applied Sanger sequencing in a validation phase made possible considerable candidate gene sequencing and resequencing that further defined mutation frequency and associated cancer risks [39, 40, 41, 42, 43, 44, 45] (see Fig. 1). The frequency of some specific mutations in a number of these genes has made possible mutation-specific risk estimation, which provides evidence for much higher cancer risk associated with some mutations (ATM c.7271 T>G [46, 47, 48]; PALB2 c.3113G> , PALB2 c.1592delT ) and evidence for alternative modes of genetic inheritance of risk (e.g., colorectal cancer risk associated with monoallelic mutations in MUTYH .
Cancer, “Missing Heritability,” and MPS
A considerable proportion of the heritable risk for common cancers remains unexplained. Current estimates of what proportion of familial risk is explained by what we currently know about genetic cancer risk are in the order of 30–48 % for common cancers. These estimates include contributions from rare mutations in genes that convey high to moderate risk of cancer (discussed earlier) and the large number of common genetic variants (individually associated with very small increased risk) identified via genome-wide association studies. For breast cancer, the current estimate of the proportion of familial risk explained by common genetic variants is approximately 28 %, and another approximately 20 % is explained by rare mutations in breast cancer susceptibility genes such as BRCA1, BRCA2, and PALB2 . For prostate cancer, 70 identified common genetic variants account for approximately 30 % of the familial risk .
The nature of the missing heritability of common cancer is currently a matter of conjecture and has received much attention. A proportion of this missing heritability is likely to be explained by rare mutations in a large number of different genes [53, 54, 55].
Can the application of MPS technology uncover more of the missing heritability of common cancer? In the context of identifying rare mutations in genes that are associated with high to moderate risk of cancer, the challenge is no longer a technical one but rather an issue of study design and data interpretation.
A fundamental part of the leap from the methodology applied in past candidate gene mutation scanning projects (see earlier) to MPS platforms is the capability to search for cancer susceptibility genes in a so-called agnostic manner. To successfully achieve an agnostic approach to find additional cancer susceptibility genes, several aspects of the study design need to be carefully considered, although most require the incorporation of several assumptions about the underlying genetic architecture [56••].
It could be well argued that the most successful application of MPS is the discovery of rare variants in new susceptibility genes for cancer is the identification of HOXB13 as a prostate cancer susceptibility gene [57••]. This is a fusion of old data and the new technology as the investigators applied targeted capture of genes within a region of 17q21-22 that demonstrated linkage in a proportion of multiple-case families . Targeted capture followed by MPS in selected individuals from families selected for linkage to this region revealed a rare (found in four of 94 DNA screened) but recurrent mutation in HOXB13 (G84E) that has subsequently been extensively validated as associated with prostate cancer risk [59, 60, 61]. Similar large collections of linkage data are available for other cancer studies, and several similar initiatives are under way.
There are some recent reports from breast cancer researchers who have applied exome capture followed by MPS to multiple, affected members of multiple-case breast cancer families. This design offers researchers some capacity to use the family design in data filtering pipelines to both manage sequencing artifacts and annotate variant sharing between relatives [3, 56••, 62•].
Some demonstration of the intrafamily exome-sequencing approach to identify breast cancer susceptibility genes has been realized. Thompson et al. [5•] have reported rare variants in FANCC and BLM, which are responsible for the autosomal recessive disorders Fanconi anemia and Bloom syndrome, respectively, and Park et al.  have reported rare variants in XRCC2. These genes have been identified as strong candidates from early initiatives, with strong reliance on prior gene function knowledge, rather than application of an agnostic gene assessment algorithm. Larger-scale validation of these genes as breast cancer susceptibility genes has provided further insight into their possible role in cancer predisposition [63, 64], yet have also illustrated the challenges when conducting these studies for common cancers, with variable phenotypes, mutations that are likely to be rare (and in a number of different genes), and penetrance far from complete [3, 4•, 5•, 56••, 63].
Integration with Other MPS Datasets
Other data sets generated via the application of MPS, such as cancer genomes and transcriptomes, can provide additional information that can be useful in the search for new cancer susceptibility genes. A study of familial pancreatic cancer that applied exome capture followed by MPS of the cancer genome identified a PALB2 germline mutation and found a further three mutation carriers when PALB2 was sequenced in a further 96 highly selected pancreatic cancer families .
Transcriptome sequencing could offer a complement or indeed a substitute for exome sequencing [66, 67]. There are indeed some promising aspects to this approach as it facilitates the analysis of variants within the coding region, bypasses the need for exome enrichment, and offers some cost advantages. Naturally, there are also some limitations, notably tissue specificity, which needs to be considered in the study design and application .
Coordinated international collaborations are beginning to emerge that have the potential to advance the discovery of additional cancer susceptibility genes by increasing the likelihood of identifying functionally relevant genetic variants in the same genes in multiple families by combining MPS data  (COMPLEXO; http://www.path.unimelb.edu.au/research/labs/southey/Complexo.dwt). Pooling of information about common genetic variation in case–control studies has proven to be a successful way to measure the role of common variants associated with cancer predisposition via the consortium model such as the Breast Cancer Association Consortium (http://ccge.medschl.cam.ac.uk/consortia/bcac/index.html), the Consortium of Investigators of Modifiers of BRCA1/2 (http://ccge.medschl.cam.ac.uk/consortia/cimba/index.html), the Ovarian Cancer Association Consortium (http://ccge.medschl.cam.ac.uk/consortia/ocac/aims/aims.html), and the Collaborative Oncology Gene–Environment Study (http://www.cogseu.org). However, combining data from MPS studies has additional challenges that demand refinement of our bioinformatics analysis pipelines to both handle the volume of data available and conform to data interpretation and filtering methods. International collaboration also offers the opportunity to gather the magnitude of resources necessary to validate potential cancer susceptibility genes identified via combined MPS studies. However, even with international coordination, the rarity of mutations in some of the yet to be identified cancer susceptibility genes may mean that empirical demonstration of an association with cancer risk is not possible, and that other, potentially functionally based, assays may need to be relied on.
MPS has revealed the complexity of the human genome and has great potential to reveal more of the missing heritability of diseases such as common cancer. However, our current study designs and bioinformatic and computational approaches need to be refined to realize the full potential of MPS for it to contribute further to the identification of new cancer susceptibility genes.
The author is a Senoir Research Fellow of the National Health and Medical Research Council (Australia) and a Group Leader of the Victorian Breast Cancer Research Consortium.
M.C. Southey declares no conflicts of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
- 4.• Park DJ, Lesueur F, Nguyen-Dumont T, et al. Rare mutations in XRCC2 increase the risk of breast cancer. Am J Hum Genet. 2012;90(4):734–9. An early example of the application of MPS in the setting of multiple-case breast cancer families designed to identify new breast cancer susceptibility genes. Google Scholar
- 5.• Thompson ER, Doyle MA, Ryland GL, et al. Exome sequencing identifies rare deleterious mutations in DNA repair genes FANCC and BLM as potential breast cancer susceptibility alleles. PLoS Genet. 2012;8(9):e1002894. An early example of the application of MPS in the setting of multiple-case breast cancer families designed to identify new breast cancer susceptibility genes. Google Scholar
- 6.DeRycke MS, Gunawardena SR, Middha S, et al. Identification of novel variants in colorectal cancer families by high-throughput exome sequencing. Cancer Epidemiol Biomarkers Prev. 2013 (in press).Google Scholar
- 7.•• Southey MC, Park DJ, Nguyen-Dumont T, et al. COMPLEXO: identifying the missing heritability of breast cancer via next generation collaboration. Breast Cancer Res. 2013 (in press). This letter describes an international collaboration that could expedite the identification of new breast cancer susceptibility genes that could provide a useful working model for research into other complex diseases. Google Scholar
- 17.Rattenberry E, Vialard L, Yeung A, et al. A comprehensive next generation sequencing based genetic testing strategy to improve diagnosis of inherited pheochromocytoma and paraganglioma. J Clin Endocrinol Metab. 2013. doi:10.1210/jc.2013-1319.
- 51.Michailidou K, Hall P, Gonzalez-Neira A, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet. 2013;45(4):353–61, 361e1–2.Google Scholar
- 52.Eeles RA, Olama AA, Benlloch S, et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat Genet. 2013;45(4):385–91, 391e1–2.Google Scholar
- 56.•• Feng BJ, Tavtigian SV, Southey MC, et al. Design considerations for massively parallel sequencing studies of complex human disease. PLoS One. 2011;6(8):e23221. Design considerations for studies applying MPS are fundamentally important to successful outcomes. This paper explores several aspects of design and associated assumptions. Google Scholar
- 57.•• Ewing CM, Ray AM, Lange EM, et al. Germline mutations in HOXB13 and prostate-cancer risk. N Engl J Med. 2012;366(2):141–9. Identification of HOXB13 as a prostate cancer susceptibility gene via the application of targeted MPS utilizing information from previous linkage studies conducted in multiple-case prostate cancer families. Google Scholar
- 62.• Pope BJ, Nguyen-Dumont T, Odefrey F, et al. FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets. BMC Bioinformatics. 2013;25(14):65. Bioinformatic and analytical pipelines for analysis of MPS data need further development to support scientific questions being asked with MPS data-this report is an early example of a well considered improvement. CrossRefGoogle Scholar