Analytical Approaches for Exome Sequence Data

Collins, Andrew

doi:10.1007/978-94-007-5558-1_7

Analytical Approaches for Exome Sequence Data

Andrew Collins²

Chapter
First Online: 27 November 2012

2126 Accesses

Part of the book series: Translational Bioinformatics ((TRBIO,volume 1))

Abstract

Sequencing the 1% of the genome coding for proteins (the exome) offers a powerful and often cost-effective route to identifying genetic mutations underlying Mendelian disease. It is possible that exome sequencing in a relatively small number of individuals showing ‘extreme’ phenotypes or more familial subtypes of complex disease may also be productive. Larger-scale exome and whole genome sequencing studies offer the potential to interrogate the cumulative impact of the numerous rare variants presumed to underlie a substantial proportion of complex disease susceptibility. Exome and, particularly, whole genome sequencing studies yield enormous amounts of data and pose many analytical challenges. Aside from issues concerning the production of high-quality sequence reads and the management and manipulation of huge databases, a major concern, in the early stages of analysis, is the reliable alignment of the short sequence reads against a reference genome. A wide range of algorithms and software tools for alignment have been developed and implemented for this most critical step in every analysis ‘pipeline’. A similarly rich set of platforms and analytical tools are available to facilitate the reliable calling of DNA variants. Given the excellent resources now available, the production of a well-characterised database cataloguing novel and known variants in an individual exome is achievable. However, the difficulty of teasing out causal variants from the vast amount of neutral or irrelevant variation presents the greatest challenge. I review here the techniques and tools that have been developed and applied for the analysis of exome data. Exome mapping of genes involved in Mendelian disease has met with considerable success thus far, while applications to complex traits look promising given analysis of sufficiently large numbers of case and control exomes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9.
Article PubMed CAS Google Scholar
Alkan C, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 2009;41(10):1061–7.
Article PubMed CAS Google Scholar
Bainbridge MN, et al. Whole exome capture in solution with 3Gbp of data. Genome Biol. 2010;11:R62.
Article PubMed Google Scholar
Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701.
Article PubMed CAS Google Scholar
Burrows M, Wheeler D. A block sorting lossless data compression algorithm. Technical report 124. Palo Alto: Digital Equipment Corporation; 1994.
Google Scholar
Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–61.
Article PubMed CAS Google Scholar
Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet. 2010;11:415–25.
Article PubMed CAS Google Scholar
Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12(9):628–40.
Article PubMed CAS Google Scholar
Dering C, Hemmelmann C, Pugh E, Ziegler A. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet Epidemiol. 2011;35:S12–17.
Article PubMed Google Scholar
Feng B-J, et al. Design considerations for massively parallel sequencing studies of complex human disease. PLoS One. 2011;6(8):e23221.
Article PubMed CAS Google Scholar
Girard SL, et al. Increased exonic de novo mutation rate in individuals with schizophrenia. Nat Genet. 2011;43(9):860–4.
Article PubMed CAS Google Scholar
Goecks J, Nekrutenko A, Taylor J, Team TG. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86.
Article PubMed Google Scholar
Harismendy O, et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10:R32.
Article PubMed Google Scholar
Howrigan DP, et al. Mutational load analysis of unrelated individuals. BMC Proc. 2011;5 Suppl 9:S55.
Article PubMed Google Scholar
Johansen CT, et al. Mutation skew in genes identified by genome-wide association study of hypertriglyceridemia. Nat Genet. 2010;42(8):684–7.
Article PubMed CAS Google Scholar
Koboldt DC, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25(17):2283–5.
Article PubMed CAS Google Scholar
Krawitz PM, et al. Identity-by-descent filtering of exome sequence data identifies PIGV mutations in hyperphoshatasia mental retardation syndrome. Nat Genet. 2010;42(10):827–9.
Article PubMed CAS Google Scholar
Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007;80(4):727–39.
Article PubMed CAS Google Scholar
Kumar P, et al. Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat Protoc. 2009;4:1073–81.
Article PubMed CAS Google Scholar
Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet. 2011;27(9):377–86.
Article PubMed CAS Google Scholar
Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470(7333):187097.
Article Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.
Article PubMed Google Scholar
Lehne B, Lewis CM, Schlitt T. Exome localization of complex disease association signals. BMC Genomics. 2011;12:92.
Article PubMed Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–60.
Article PubMed CAS Google Scholar
Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010;11(5):473–83.
Article PubMed CAS Google Scholar
Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–21.
Article PubMed CAS Google Scholar
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–8.
Article PubMed CAS Google Scholar
Li H, et al. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009a;25:2078–9.
Article PubMed Google Scholar
Li R, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009b;25(15):1966–7.
Article PubMed CAS Google Scholar
Liu X, Jian X, Boerwinkle E. DbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32(8):894–9.
Article PubMed CAS Google Scholar
Luo L, Boerwinkle E, Xiong M. Association studies for next-generation sequencing. Genome Res. 2011;21:1099–108.
Article PubMed CAS Google Scholar
Majewski J, Scwartzentruber J, Lalonde E, Montpetit A, Jabado N. What can exome sequencing do for you? J Med Genet. 2011. doi:10.1136/jmedgenet-2011-100223.
McClellan J, King MC. Genetic heterogeneity and human disease. Cell. 2010;141:210–17.
Article PubMed CAS Google Scholar
Ng SB, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–6.
Article PubMed CAS Google Scholar
Ng SB, et al. Exome sequencing identifies the cause of a Mendelian disorder. Nat Genet. 2010a;42:30–5.
Article PubMed CAS Google Scholar
Ng SB, Nickerson DA, Bamshad MJ, Shendure J. Massively parallel sequencing and rare disease. Hum Mol Genet. 2010b;19:R119–24.
Article PubMed CAS Google Scholar
Pireddu L, Leo S, Zanetti G. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics. 2011;27(15):2159.
Article PubMed CAS Google Scholar
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20:110–21.
Article PubMed CAS Google Scholar
Price AL, et al. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010;86:832–8.
Article PubMed Google Scholar
Reis-Filho JS. Next-generation sequencing. Breast Cancer Res. 2009;11 Suppl 3:S12.
Article PubMed Google Scholar
Rivas MA, et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet. 2011;43(11):1066–75.
Article PubMed CAS Google Scholar
Ruffalo M, LaFramboise T, Koyuturk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011. doi:10.1093/bioinformatics/btr477.
Rumble SM, et al. SHRiMP: accurate mapping of short color-space reads. PloS Comput Biol. 2009;5(5):e1000386.
Article PubMed Google Scholar
Sathirapongsasuti JF, et al. Exome sequencing-based copy number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011. doi:10.1093/bioinformatics/btr462.
Schwartz JM, et al. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods. 2010;7:575–6.
Article Google Scholar
Stein LD. The case for cloud computing in genome informatics. Genome Biol. 2010;11:207.
Article PubMed Google Scholar
Tiacci E, et al. BRAF mutations in hairy-cell leukemia. N Engl J Med. 2011;364(24):2305–15.
Article PubMed CAS Google Scholar
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res. 2010;38:e164.
Article PubMed Google Scholar
Yoon S, et al. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009;19:1586–92.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Genetic Epidemiology and Bioinformatics Research Group, Faculty of Medicine, University of Southampton, Duthie Building (808), Tremona Road, Southampton, SO16 6YD, UK
Andrew Collins

Authors

Andrew Collins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Collins .

Editor information

Editors and Affiliations

35 Convent Drive, Bethesda, 20892, USA
Yin Yao Shugart

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Collins, A. (2012). Analytical Approaches for Exome Sequence Data. In: Shugart, Y. (eds) Applied Computational Genomics. Translational Bioinformatics, vol 1. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5558-1_7

Download citation

DOI: https://doi.org/10.1007/978-94-007-5558-1_7
Published: 27 November 2012
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5557-4
Online ISBN: 978-94-007-5558-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics