Skip to main content

Whole Genome Sequencing

  • Protocol
  • First Online:
Book cover Genetic Variation

Part of the book series: Methods in Molecular Biology ((MIMB,volume 628))

Abstract

Whole genome sequencing provides the most comprehensive collection of an individual’s genetic variation. With the falling costs of sequencing technology, we envision paradigm shift from microarray-based genotyping studies to whole genome sequencing. We review methodologies for whole genome sequencing. There are two approaches for assembling short shotgun sequence reads into longer contiguous genomic sequences. In the de novo assembly approach, sequence reads are compared to each other, and then overlapped to build longer contiguous sequences. The reference-based assembly approach involves mapping each read to a reference genome sequence. We discuss methods for identifying genetic variation (single nucleotide polymorphisms, small indels, and copy number variants) and building haplotypes from genome assemblies, and discuss potential pitfalls. We expect methodologies to evolve rapidly as sequencing technologies improve and more human genomes are sequenced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and Hobbs, H.H. (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science, 305, 869–872.

    Article  PubMed  CAS  Google Scholar 

  2. Estivill, X. and Armengol, L. (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet, 3, 1787–1799.

    Article  PubMed  CAS  Google Scholar 

  3. Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P. et al.(2007) The diploid genome sequence of an individual human. PLoS Biol, 5, e254.

    Article  PubMed  Google Scholar 

  4. Holt, R.A. and Jones, S.J. (2008) The new paradigm of flow cell sequencing. Genome Res, 18, 839–846.

    Article  PubMed  CAS  Google Scholar 

  5. Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics, 6, 31.

    Article  PubMed  Google Scholar 

  6. Wu, T.D. and Watanabe, C.K. (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21, 1859–1875.

    Article  PubMed  CAS  Google Scholar 

  7. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S.L. (2004) Versatile and open software for comparing large genomes. Genome Biol, 5, R12.

    Article  PubMed  Google Scholar 

  8. Ning, Z., Cox, A.J. and Mullikin, J.C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res, 11, 1725–1729.

    Article  PubMed  CAS  Google Scholar 

  9. Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A. et al.(2008) The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–876.

    Article  PubMed  CAS  Google Scholar 

  10. Sjoblom, T., Jones, S., Wood, L.D., Parsons, D.W., Lin, J., Barber, T.D. et al.(2006) The consensus coding sequences of human breast and colorectal cancers. Science, 314, 268–274.

    Article  PubMed  Google Scholar 

  11. Ng, P.C., Levy, S., Huang, J., Stockwell, T.B., Walenz, B.P., Li, K. et al. (2008) Genetic variation in an individual human exome. PLoS Genet, 4, e1000160.

    Article  PubMed  Google Scholar 

  12. Feuk, L., Carson, A.R. and Scherer, S.W. (2006) Structural variation in the human genome. Nat Rev Genet, 7, 85–97.

    Article  PubMed  CAS  Google Scholar 

  13. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D. et al.(2006) Global variation in copy number in the human genome. Nature, 444, 444–454.

    Article  PubMed  CAS  Google Scholar 

  14. Winkelmann, B.R., Hoffmann, M.M., Nauck, M., Kumar, A.M., Nandabalan, K., Judson, R.S. et al. (2003) Haplotypes of the cholesteryl ester transfer protein gene predict lipid-modifying response to statin therapy. Pharmacogenomics J, 3, 284–296.

    Article  PubMed  CAS  Google Scholar 

  15. Martin, E.R., Lai, E.H., Gilbert, J.R., Rogala, A.R., Afshari, A.J., Riley, J. et al.(2000) SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am J Hum Genet, 67, 383–394.

    Article  PubMed  CAS  Google Scholar 

  16. Drysdale, C.M., McGraw, D.W., Stack, C.B., Stephens, J.C., Judson, R.S., Nandabalan, K. et al. (2000) Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc Natl Acad Sci U S A, 97, 10483–10488.

    Article  PubMed  CAS  Google Scholar 

  17. Kong, A., Masson, G., Frigge, M.L., Gylfason, A., Zusmanovich, P., Thorleifsson, G. et al.(2008) Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet, 40, 1068–1075.

    Article  PubMed  CAS  Google Scholar 

  18. Stephens, M. and Donnelly, P. (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet, 73, 1162–1169.

    Article  PubMed  CAS  Google Scholar 

  19. Bansal, V., Halpern, A.L., Axelrod, N. and Bafna, V. (2008) An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res, 18, 1336–1346.

    Article  PubMed  CAS  Google Scholar 

  20. Zhang, K., Zhu, J., Shendure, J., Porreca, G.J., Aach, J.D., Mitra, R.D. and Church, G.M. (2006) Long-range polony haplotyping of individual human chromosome molecules. Nat Genet, 38, 382–387.

    Article  PubMed  CAS  Google Scholar 

  21. Turner, D.J., Tyler-Smith, C. and Hurles, M.E. (2008) Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping. Nucleic Acids Res, 36, e82.

    Article  PubMed  Google Scholar 

  22. Konfortov, B.A., Bankier, A.T. and Dear, P.H. (2007) An efficient method for multi-locus molecular haplotyping. Nucleic Acids Res, 35, e6.

    Article  PubMed  Google Scholar 

  23. Xiao, M., Gordon, M.P., Phong, A., Ha, C., Chan, T.F., Cai, D. et al. (2007) Determination of haplotypes from single DNA molecules: a method for single-molecule barcoding. Hum Mutat, 28, 913–921.

    Article  PubMed  CAS  Google Scholar 

  24. Bansal, V. and Bafna, V. (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24, i153-i159.

    Article  PubMed  Google Scholar 

  25. Parsons, D.W., Jones, S., Zhang, X., Lin, J.C., Leary, R.J., Angenendt, P. et al.(2008) An integrated genomic analysis of human glioblastoma multiforme. Science, 321, 1807–1812.

    Article  PubMed  CAS  Google Scholar 

  26. Romeo, S., Pennacchio, L.A., Fu, Y., Boerwinkle, E., Tybjaerg-Hansen, A., Hobbs, H.H. and Cohen, J.C. (2007) Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet, 39, 513–516.

    Article  PubMed  CAS  Google Scholar 

  27. Cohen, J.C., Pertsemlidis, A., Fahmi, S., Esmail, S., Vega, G.L., Grundy, S.M. and Hobbs, H.H. (2006) Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci U S A, 103, 1810–1815.

    Article  PubMed  CAS  Google Scholar 

  28. Jones, S., Zhang, X., Parsons, D.W., Lin, J.C., Leary, R.J., Angenendt, P. et al.(2008) Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science, 321, 1801–1806.

    Article  PubMed  CAS  Google Scholar 

  29. Greenman, C., Stephens, P., Smith, R., Dalgliesh, G.L., Hunter, C., Bignell, G. et al.(2007) Patterns of somatic mutation in human cancer genomes. Nature, 446, 153–158.

    Article  PubMed  CAS  Google Scholar 

  30. Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjoblom, T., Leary, R.J. et al.(2007) The genomic landscapes of human breast and colorectal cancers. Science, 318, 1108–1113.

    Article  PubMed  CAS  Google Scholar 

  31. Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068.

    Article  Google Scholar 

  32. Parmigiani, G., Boca, S., Lin, J., Kinzler, K.W., Velculescu, V. and Vogelstein, B. (2009) Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics, 93(1), 17–21.

    Article  PubMed  CAS  Google Scholar 

  33. Albert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X. et al.(2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods, 4, 903–905.

    Article  PubMed  CAS  Google Scholar 

  34. Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M.N., Smith, S.W. et al.(2007) Genome-wide in situ exon capture for selective resequencing. Nat Genet, 39, 1522–1527.

    Article  PubMed  CAS  Google Scholar 

  35. Okou, D.T., Steinberg, K.M., Middle, C., Cutler, D.J., Albert, T.J. and Zwick, M.E. (2007) Microarray-based genomic selection for high-throughput resequencing. Nat Methods, 4, 907–909.

    Article  PubMed  CAS  Google Scholar 

  36. Porreca, G.J., Zhang, K., Li, J.B., Xie, B., Austin, D., Vassallo, S.L. et al.(2007) Multiplex amplification of large sets of human exons. Nat Methods, 4, 931–936.

    Article  PubMed  CAS  Google Scholar 

  37. Li, B. and Leal, S.M. (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet, 83, 311–321.

    Article  PubMed  CAS  Google Scholar 

  38. Lin, J., Gan, C.M., Zhang, X., Jones, S., Sjoblom, T., Wood, L.D. et al.(2007) A multidimensional analysis of genes mutated in breast and colorectal cancers. Genome Res, 17, 1304–1318.

    Article  PubMed  CAS  Google Scholar 

  39. Chittenden, T.W., Howe, E.A., Culhane, A.C., Sultana, R., Taylor, J.M., Holmes, C. and Quackenbush, J. (2008) Functional classification analysis of somatically mutated genes in human breast and colorectal cancers. Genomics, 91, 508–511.

    Article  PubMed  CAS  Google Scholar 

  40. Marini, N.J., Gin, J., Ziegle, J., Keho, K.H., Ginzinger, D., Gilbert, D.A. and Rine, J. (2008) The prevalence of folate-remedial MTHFR enzyme variants in humans. Proc Natl Acad Sci U S A, 105, 8055–8060.

    Article  PubMed  CAS  Google Scholar 

  41. Fahmi, S., Yang, C., Esmail, S., Hobbs, H.H. and Cohen, J.C. (2008) Functional characterization of genetic variants in NPC1L1 supports the sequencing extremes strategy to identify complex trait genes. Hum Mol Genet, 17, 2101–2107.

    Article  PubMed  CAS  Google Scholar 

  42. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S. et al. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res, 18, 810–820.

    Article  PubMed  CAS  Google Scholar 

  43. Hernandez, D., Francois, P., Farinelli, L., Osteras, M. and Schrenzel, J. (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res, 18, 802–809.

    Article  PubMed  CAS  Google Scholar 

  44. Dohm, J.C., Lottaz, C., Borodina, T. and Himmelbauer, H. (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res, 17, 1697–1706.

    Article  PubMed  CAS  Google Scholar 

  45. Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. and Batzoglou, S. (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE, 2, e484.

    Article  PubMed  Google Scholar 

  46. Warren, R.L., Sutton, G.G., Jones, S.J. and Holt, R.A. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500–501.

    Article  PubMed  CAS  Google Scholar 

  47. Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R. et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942–2944.

    Article  PubMed  CAS  Google Scholar 

  48. Zerbino, D.R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 18, 821–829.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ewen F. Kirkness .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Ng, P.C., Kirkness, E.F. (2010). Whole Genome Sequencing. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-367-1_12

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60327-366-4

  • Online ISBN: 978-1-60327-367-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics