Skip to main content

Analytical Approaches for Exome Sequence Data

  • Chapter
  • First Online:
  • 2126 Accesses

Part of the book series: Translational Bioinformatics ((TRBIO,volume 1))

Abstract

Sequencing the 1% of the genome coding for proteins (the exome) offers a powerful and often cost-effective route to identifying genetic mutations underlying Mendelian disease. It is possible that exome sequencing in a relatively small number of individuals showing ‘extreme’ phenotypes or more familial subtypes of complex disease may also be productive. Larger-scale exome and whole genome sequencing studies offer the potential to interrogate the cumulative impact of the numerous rare variants presumed to underlie a substantial proportion of complex disease susceptibility. Exome and, particularly, whole genome sequencing studies yield enormous amounts of data and pose many analytical challenges. Aside from issues concerning the production of high-quality sequence reads and the management and manipulation of huge databases, a major concern, in the early stages of analysis, is the reliable alignment of the short sequence reads against a reference genome. A wide range of algorithms and software tools for alignment have been developed and implemented for this most critical step in every analysis ‘pipeline’. A similarly rich set of platforms and analytical tools are available to facilitate the reliable calling of DNA variants. Given the excellent resources now available, the production of a well-characterised database cataloguing novel and known variants in an individual exome is achievable. However, the difficulty of teasing out causal variants from the vast amount of neutral or irrelevant variation presents the greatest challenge. I review here the techniques and tools that have been developed and applied for the analysis of exome data. Exome mapping of genes involved in Mendelian disease has met with considerable success thus far, while applications to complex traits look promising given analysis of sufficiently large numbers of case and control exomes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9.

    Article  PubMed  CAS  Google Scholar 

  • Alkan C, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 2009;41(10):1061–7.

    Article  PubMed  CAS  Google Scholar 

  • Bainbridge MN, et al. Whole exome capture in solution with 3Gbp of data. Genome Biol. 2010;11:R62.

    Article  PubMed  Google Scholar 

  • Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701.

    Article  PubMed  CAS  Google Scholar 

  • Burrows M, Wheeler D. A block sorting lossless data compression algorithm. Technical report 124. Palo Alto: Digital Equipment Corporation; 1994.

    Google Scholar 

  • Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–61.

    Article  PubMed  CAS  Google Scholar 

  • Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet. 2010;11:415–25.

    Article  PubMed  CAS  Google Scholar 

  • Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12(9):628–40.

    Article  PubMed  CAS  Google Scholar 

  • Dering C, Hemmelmann C, Pugh E, Ziegler A. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet Epidemiol. 2011;35:S12–17.

    Article  PubMed  Google Scholar 

  • Feng B-J, et al. Design considerations for massively parallel sequencing studies of complex human disease. PLoS One. 2011;6(8):e23221.

    Article  PubMed  CAS  Google Scholar 

  • Girard SL, et al. Increased exonic de novo mutation rate in individuals with schizophrenia. Nat Genet. 2011;43(9):860–4.

    Article  PubMed  CAS  Google Scholar 

  • Goecks J, Nekrutenko A, Taylor J, Team TG. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86.

    Article  PubMed  Google Scholar 

  • Harismendy O, et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10:R32.

    Article  PubMed  Google Scholar 

  • Howrigan DP, et al. Mutational load analysis of unrelated individuals. BMC Proc. 2011;5 Suppl 9:S55.

    Article  PubMed  Google Scholar 

  • Johansen CT, et al. Mutation skew in genes identified by genome-wide association study of hypertriglyceridemia. Nat Genet. 2010;42(8):684–7.

    Article  PubMed  CAS  Google Scholar 

  • Koboldt DC, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25(17):2283–5.

    Article  PubMed  CAS  Google Scholar 

  • Krawitz PM, et al. Identity-by-descent filtering of exome sequence data identifies PIGV mutations in hyperphoshatasia mental retardation syndrome. Nat Genet. 2010;42(10):827–9.

    Article  PubMed  CAS  Google Scholar 

  • Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007;80(4):727–39.

    Article  PubMed  CAS  Google Scholar 

  • Kumar P, et al. Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat Protoc. 2009;4:1073–81.

    Article  PubMed  CAS  Google Scholar 

  • Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet. 2011;27(9):377–86.

    Article  PubMed  CAS  Google Scholar 

  • Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470(7333):187097.

    Article  Google Scholar 

  • Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.

    Article  PubMed  Google Scholar 

  • Lehne B, Lewis CM, Schlitt T. Exome localization of complex disease association signals. BMC Genomics. 2011;12:92.

    Article  PubMed  Google Scholar 

  • Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–60.

    Article  PubMed  CAS  Google Scholar 

  • Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010;11(5):473–83.

    Article  PubMed  CAS  Google Scholar 

  • Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–21.

    Article  PubMed  CAS  Google Scholar 

  • Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–8.

    Article  PubMed  CAS  Google Scholar 

  • Li H, et al. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009a;25:2078–9.

    Article  PubMed  Google Scholar 

  • Li R, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009b;25(15):1966–7.

    Article  PubMed  CAS  Google Scholar 

  • Liu X, Jian X, Boerwinkle E. DbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32(8):894–9.

    Article  PubMed  CAS  Google Scholar 

  • Luo L, Boerwinkle E, Xiong M. Association studies for next-generation sequencing. Genome Res. 2011;21:1099–108.

    Article  PubMed  CAS  Google Scholar 

  • Majewski J, Scwartzentruber J, Lalonde E, Montpetit A, Jabado N. What can exome sequencing do for you? J Med Genet. 2011. doi:10.1136/jmedgenet-2011-100223.

  • McClellan J, King MC. Genetic heterogeneity and human disease. Cell. 2010;141:210–17.

    Article  PubMed  CAS  Google Scholar 

  • Ng SB, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–6.

    Article  PubMed  CAS  Google Scholar 

  • Ng SB, et al. Exome sequencing identifies the cause of a Mendelian disorder. Nat Genet. 2010a;42:30–5.

    Article  PubMed  CAS  Google Scholar 

  • Ng SB, Nickerson DA, Bamshad MJ, Shendure J. Massively parallel sequencing and rare disease. Hum Mol Genet. 2010b;19:R119–24.

    Article  PubMed  CAS  Google Scholar 

  • Pireddu L, Leo S, Zanetti G. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics. 2011;27(15):2159.

    Article  PubMed  CAS  Google Scholar 

  • Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20:110–21.

    Article  PubMed  CAS  Google Scholar 

  • Price AL, et al. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010;86:832–8.

    Article  PubMed  Google Scholar 

  • Reis-Filho JS. Next-generation sequencing. Breast Cancer Res. 2009;11 Suppl 3:S12.

    Article  PubMed  Google Scholar 

  • Rivas MA, et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet. 2011;43(11):1066–75.

    Article  PubMed  CAS  Google Scholar 

  • Ruffalo M, LaFramboise T, Koyuturk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011. doi:10.1093/bioinformatics/btr477.

  • Rumble SM, et al. SHRiMP: accurate mapping of short color-space reads. PloS Comput Biol. 2009;5(5):e1000386.

    Article  PubMed  Google Scholar 

  • Sathirapongsasuti JF, et al. Exome sequencing-based copy number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011. doi:10.1093/bioinformatics/btr462.

  • Schwartz JM, et al. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods. 2010;7:575–6.

    Article  Google Scholar 

  • Stein LD. The case for cloud computing in genome informatics. Genome Biol. 2010;11:207.

    Article  PubMed  Google Scholar 

  • Tiacci E, et al. BRAF mutations in hairy-cell leukemia. N Engl J Med. 2011;364(24):2305–15.

    Article  PubMed  CAS  Google Scholar 

  • Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res. 2010;38:e164.

    Article  PubMed  Google Scholar 

  • Yoon S, et al. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009;19:1586–92.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Collins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Collins, A. (2012). Analytical Approaches for Exome Sequence Data. In: Shugart, Y. (eds) Applied Computational Genomics. Translational Bioinformatics, vol 1. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5558-1_7

Download citation

Publish with us

Policies and ethics