Whole Genome Sequencing

Part of the Methods in Molecular Biology book series (MIMB, volume 628)


Whole genome sequencing provides the most comprehensive collection of an individual’s genetic variation. With the falling costs of sequencing technology, we envision paradigm shift from microarray-based genotyping studies to whole genome sequencing. We review methodologies for whole genome sequencing. There are two approaches for assembling short shotgun sequence reads into longer contiguous genomic sequences. In the de novo assembly approach, sequence reads are compared to each other, and then overlapped to build longer contiguous sequences. The reference-based assembly approach involves mapping each read to a reference genome sequence. We discuss methods for identifying genetic variation (single nucleotide polymorphisms, small indels, and copy number variants) and building haplotypes from genome assemblies, and discuss potential pitfalls. We expect methodologies to evolve rapidly as sequencing technologies improve and more human genomes are sequenced.

Key words

Human Genome Sequencing Assembly 


  1. 1.
    Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and Hobbs, H.H. (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science, 305, 869–872.PubMedCrossRefGoogle Scholar
  2. 2.
    Estivill, X. and Armengol, L. (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet, 3, 1787–1799.PubMedCrossRefGoogle Scholar
  3. 3.
    Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P. et al.(2007) The diploid genome sequence of an individual human. PLoS Biol, 5, e254.PubMedCrossRefGoogle Scholar
  4. 4.
    Holt, R.A. and Jones, S.J. (2008) The new paradigm of flow cell sequencing. Genome Res, 18, 839–846.PubMedCrossRefGoogle Scholar
  5. 5.
    Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics, 6, 31.PubMedCrossRefGoogle Scholar
  6. 6.
    Wu, T.D. and Watanabe, C.K. (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21, 1859–1875.PubMedCrossRefGoogle Scholar
  7. 7.
    Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S.L. (2004) Versatile and open software for comparing large genomes. Genome Biol, 5, R12.PubMedCrossRefGoogle Scholar
  8. 8.
    Ning, Z., Cox, A.J. and Mullikin, J.C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res, 11, 1725–1729.PubMedCrossRefGoogle Scholar
  9. 9.
    Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A. et al.(2008) The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–876.PubMedCrossRefGoogle Scholar
  10. 10.
    Sjoblom, T., Jones, S., Wood, L.D., Parsons, D.W., Lin, J., Barber, T.D. et al.(2006) The consensus coding sequences of human breast and colorectal cancers. Science, 314, 268–274.PubMedCrossRefGoogle Scholar
  11. 11.
    Ng, P.C., Levy, S., Huang, J., Stockwell, T.B., Walenz, B.P., Li, K. et al. (2008) Genetic variation in an individual human exome. PLoS Genet, 4, e1000160.PubMedCrossRefGoogle Scholar
  12. 12.
    Feuk, L., Carson, A.R. and Scherer, S.W. (2006) Structural variation in the human genome. Nat Rev Genet, 7, 85–97.PubMedCrossRefGoogle Scholar
  13. 13.
    Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D. et al.(2006) Global variation in copy number in the human genome. Nature, 444, 444–454.PubMedCrossRefGoogle Scholar
  14. 14.
    Winkelmann, B.R., Hoffmann, M.M., Nauck, M., Kumar, A.M., Nandabalan, K., Judson, R.S. et al. (2003) Haplotypes of the cholesteryl ester transfer protein gene predict lipid-modifying response to statin therapy. Pharmacogenomics J, 3, 284–296.PubMedCrossRefGoogle Scholar
  15. 15.
    Martin, E.R., Lai, E.H., Gilbert, J.R., Rogala, A.R., Afshari, A.J., Riley, J. et al.(2000) SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am J Hum Genet, 67, 383–394.PubMedCrossRefGoogle Scholar
  16. 16.
    Drysdale, C.M., McGraw, D.W., Stack, C.B., Stephens, J.C., Judson, R.S., Nandabalan, K. et al. (2000) Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc Natl Acad Sci U S A, 97, 10483–10488.PubMedCrossRefGoogle Scholar
  17. 17.
    Kong, A., Masson, G., Frigge, M.L., Gylfason, A., Zusmanovich, P., Thorleifsson, G. et al.(2008) Detection of sharing by descent, long-range phasing and haplotype imputation. Nat Genet, 40, 1068–1075.PubMedCrossRefGoogle Scholar
  18. 18.
    Stephens, M. and Donnelly, P. (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet, 73, 1162–1169.PubMedCrossRefGoogle Scholar
  19. 19.
    Bansal, V., Halpern, A.L., Axelrod, N. and Bafna, V. (2008) An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res, 18, 1336–1346.PubMedCrossRefGoogle Scholar
  20. 20.
    Zhang, K., Zhu, J., Shendure, J., Porreca, G.J., Aach, J.D., Mitra, R.D. and Church, G.M. (2006) Long-range polony haplotyping of individual human chromosome molecules. Nat Genet, 38, 382–387.PubMedCrossRefGoogle Scholar
  21. 21.
    Turner, D.J., Tyler-Smith, C. and Hurles, M.E. (2008) Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping. Nucleic Acids Res, 36, e82.PubMedCrossRefGoogle Scholar
  22. 22.
    Konfortov, B.A., Bankier, A.T. and Dear, P.H. (2007) An efficient method for multi-locus molecular haplotyping. Nucleic Acids Res, 35, e6.PubMedCrossRefGoogle Scholar
  23. 23.
    Xiao, M., Gordon, M.P., Phong, A., Ha, C., Chan, T.F., Cai, D. et al. (2007) Determination of haplotypes from single DNA molecules: a method for single-molecule barcoding. Hum Mutat, 28, 913–921.PubMedCrossRefGoogle Scholar
  24. 24.
    Bansal, V. and Bafna, V. (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24, i153-i159.PubMedCrossRefGoogle Scholar
  25. 25.
    Parsons, D.W., Jones, S., Zhang, X., Lin, J.C., Leary, R.J., Angenendt, P. et al.(2008) An integrated genomic analysis of human glioblastoma multiforme. Science, 321, 1807–1812.PubMedCrossRefGoogle Scholar
  26. 26.
    Romeo, S., Pennacchio, L.A., Fu, Y., Boerwinkle, E., Tybjaerg-Hansen, A., Hobbs, H.H. and Cohen, J.C. (2007) Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet, 39, 513–516.PubMedCrossRefGoogle Scholar
  27. 27.
    Cohen, J.C., Pertsemlidis, A., Fahmi, S., Esmail, S., Vega, G.L., Grundy, S.M. and Hobbs, H.H. (2006) Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci U S A, 103, 1810–1815.PubMedCrossRefGoogle Scholar
  28. 28.
    Jones, S., Zhang, X., Parsons, D.W., Lin, J.C., Leary, R.J., Angenendt, P. et al.(2008) Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science, 321, 1801–1806.PubMedCrossRefGoogle Scholar
  29. 29.
    Greenman, C., Stephens, P., Smith, R., Dalgliesh, G.L., Hunter, C., Bignell, G. et al.(2007) Patterns of somatic mutation in human cancer genomes. Nature, 446, 153–158.PubMedCrossRefGoogle Scholar
  30. 30.
    Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjoblom, T., Leary, R.J. et al.(2007) The genomic landscapes of human breast and colorectal cancers. Science, 318, 1108–1113.PubMedCrossRefGoogle Scholar
  31. 31.
    Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068.CrossRefGoogle Scholar
  32. 32.
    Parmigiani, G., Boca, S., Lin, J., Kinzler, K.W., Velculescu, V. and Vogelstein, B. (2009) Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics, 93(1), 17–21.PubMedCrossRefGoogle Scholar
  33. 33.
    Albert, T.J., Molla, M.N., Muzny, D.M., Nazareth, L., Wheeler, D., Song, X. et al.(2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods, 4, 903–905.PubMedCrossRefGoogle Scholar
  34. 34.
    Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M.N., Smith, S.W. et al.(2007) Genome-wide in situ exon capture for selective resequencing. Nat Genet, 39, 1522–1527.PubMedCrossRefGoogle Scholar
  35. 35.
    Okou, D.T., Steinberg, K.M., Middle, C., Cutler, D.J., Albert, T.J. and Zwick, M.E. (2007) Microarray-based genomic selection for high-throughput resequencing. Nat Methods, 4, 907–909.PubMedCrossRefGoogle Scholar
  36. 36.
    Porreca, G.J., Zhang, K., Li, J.B., Xie, B., Austin, D., Vassallo, S.L. et al.(2007) Multiplex amplification of large sets of human exons. Nat Methods, 4, 931–936.PubMedCrossRefGoogle Scholar
  37. 37.
    Li, B. and Leal, S.M. (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet, 83, 311–321.PubMedCrossRefGoogle Scholar
  38. 38.
    Lin, J., Gan, C.M., Zhang, X., Jones, S., Sjoblom, T., Wood, L.D. et al.(2007) A multidimensional analysis of genes mutated in breast and colorectal cancers. Genome Res, 17, 1304–1318.PubMedCrossRefGoogle Scholar
  39. 39.
    Chittenden, T.W., Howe, E.A., Culhane, A.C., Sultana, R., Taylor, J.M., Holmes, C. and Quackenbush, J. (2008) Functional classification analysis of somatically mutated genes in human breast and colorectal cancers. Genomics, 91, 508–511.PubMedCrossRefGoogle Scholar
  40. 40.
    Marini, N.J., Gin, J., Ziegle, J., Keho, K.H., Ginzinger, D., Gilbert, D.A. and Rine, J. (2008) The prevalence of folate-remedial MTHFR enzyme variants in humans. Proc Natl Acad Sci U S A, 105, 8055–8060.PubMedCrossRefGoogle Scholar
  41. 41.
    Fahmi, S., Yang, C., Esmail, S., Hobbs, H.H. and Cohen, J.C. (2008) Functional characterization of genetic variants in NPC1L1 supports the sequencing extremes strategy to identify complex trait genes. Hum Mol Genet, 17, 2101–2107.PubMedCrossRefGoogle Scholar
  42. 42.
    Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S. et al. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res, 18, 810–820.PubMedCrossRefGoogle Scholar
  43. 43.
    Hernandez, D., Francois, P., Farinelli, L., Osteras, M. and Schrenzel, J. (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res, 18, 802–809.PubMedCrossRefGoogle Scholar
  44. 44.
    Dohm, J.C., Lottaz, C., Borodina, T. and Himmelbauer, H. (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res, 17, 1697–1706.PubMedCrossRefGoogle Scholar
  45. 45.
    Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. and Batzoglou, S. (2007) Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS ONE, 2, e484.PubMedCrossRefGoogle Scholar
  46. 46.
    Warren, R.L., Sutton, G.G., Jones, S.J. and Holt, R.A. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500–501.PubMedCrossRefGoogle Scholar
  47. 47.
    Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R. et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942–2944.PubMedCrossRefGoogle Scholar
  48. 48.
    Zerbino, D.R. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 18, 821–829.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.The J. Craig Venter InstituteRockvilleUSA

Personalised recommendations