Skip to main content

Next-Generation Sequencing Technologies and Fragment Assembly Algorithms

  • Protocol
  • First Online:
Evolutionary Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 855))

Abstract

As a classic topic in bioinformatics, the fragment assembly problem has been studied for over two decades. Fragment assembly algorithms take a set of DNA fragments as input, piece them together into a set of aligned overlapping fragments (i.e., contigs), and output a consensus sequence for each of the contigs. The rapid advance of massively parallel sequencing, often referred to as next-generation sequencing (NGS) technologies, has revolutionized DNA sequencing by reducing both its time and cost by several orders of magnitude in the past few years, but posed new challenges for fragment assembly. As a result, many new approaches have been developed to assemble NGS sequences, which are typically shorter with a higher error rate, but at a much higher throughput, than classic methods provided. In this chapter, we review both classic and new algorithms for fragment assembly, with a focus on NGS sequences. We also discuss a few new assembly problems emerging from the broader applications of NGS techniques, which are distinct from the classic fragment assembly problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sanger, F., Nicklen, S., and Coulson, A. (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences of the United States of America, 74, 5463.

    Article  CAS  PubMed  Google Scholar 

  2. Wheeler, D., et al. (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–876.

    Article  CAS  PubMed  Google Scholar 

  3. Bentley, D., et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456, 53–59.

    Article  CAS  PubMed  Google Scholar 

  4. Wang, J., et al. (2008) The diploid genome sequence of an Asian individual. Nature, 456, 60–65.

    Article  CAS  PubMed  Google Scholar 

  5. Kim, J., et al. (2009) A highly annotated whole-genome sequence of a Korean individual. Nature, 460, 1011–1015.

    CAS  PubMed  Google Scholar 

  6. Robertson, G., et al. (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods, 4, 651–657.

    Article  CAS  PubMed  Google Scholar 

  7. Wang, Z., Gerstein, M., and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10, 57–63.

    Article  CAS  PubMed  Google Scholar 

  8. Lister, R., et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462, 315–322.

    Google Scholar 

  9. Ng, S., et al. (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461, 272–276.

    Article  CAS  PubMed  Google Scholar 

  10. Ronaghi, M., Uhlen, M., and Nyren, P. (1998) A sequencing method based on real-time pyrophosphate. Science(Washington), 281, 363–365.

    Google Scholar 

  11. Brenner, S., et al. (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nature biotechnology, 18, 630–634.

    Article  CAS  PubMed  Google Scholar 

  12. Huse, S., Huber, J., Morrison, H., Sogin, M., and Welch, D. (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology, 8, R143.

    Article  PubMed  Google Scholar 

  13. Miller, J., Koren, S., and Sutton, G. (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327.

    Article  CAS  PubMed  Google Scholar 

  14. Li, H., Ruan, J., and Durbin, R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18, 1851.

    Article  CAS  PubMed  Google Scholar 

  15. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. (2009) Ultra-fast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 10, R25.

    Article  PubMed  Google Scholar 

  16. Li, H. and Durbin, R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589.

    Article  PubMed  Google Scholar 

  17. Alkan, C., et al. (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 41, 1061–1067.

    Article  CAS  PubMed  Google Scholar 

  18. Homer, N., Merriman, B., and Nelson, S. (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS One, 4, e7767.

    Article  PubMed  Google Scholar 

  19. Li, R., Li, Y., Kristiansen, K., and Wang, J. (2008) SOAP: short oligonucleotide alignment program. Bioinformatics, 24, 713.

    Article  CAS  PubMed  Google Scholar 

  20. Demaine, E. and Demaine, M. (2007) Jigsaw puzzles, edge matching, and polyomino packing: Connections and complexity. Graphs and Combinatorics, 23, 195–208.

    Article  Google Scholar 

  21. Staden, R. (1979) A strategy of DNA sequencing employing computer programs. Nucleic Acids Research, 6, 2601.

    Article  CAS  PubMed  Google Scholar 

  22. Lander, E. and Waterman, M. (1988) Genomic mapping by finger-printing random clones: a mathematical analysis. Genomics, 2, 231–239.

    Article  CAS  PubMed  Google Scholar 

  23. Myers, E. (1995) Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology, 2, 275–290.

    Article  CAS  PubMed  Google Scholar 

  24. Green, P. (1994), PHRAP documentation. http://www.phrap.org/phredphrap/phrap.html

  25. Sutton, G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1, 9–19.

    Google Scholar 

  26. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome research, 9, 868.

    Article  CAS  PubMed  Google Scholar 

  27. Myers, E., et al. (2000) A whole-genome assembly of Drosophila. Science, 287, 2196.

    Article  CAS  PubMed  Google Scholar 

  28. Idury, R. and Waterman, M. (1995) A new algorithm for DNA sequence assembly. Journal of Computational Biology, 2, 291–306.

    Article  CAS  PubMed  Google Scholar 

  29. Pevzner, P., Tang, H., and Waterman, M. (2001) An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America, 98, 9748.

    Article  CAS  PubMed  Google Scholar 

  30. Pop, M., Kosack, D., and Salzberg, S. (2004) Hierarchical scaffolding with Bambus. Genome Research, 14, 149.

    Article  CAS  PubMed  Google Scholar 

  31. Yang, X., Dorman, K., and Aluru, S. (2010) Reptile: Representative Tiling for Short Read Error Correction. Bioinformatics, 26, 2526

    Google Scholar 

  32. Kelley, D., Schatz, M., and Salzberg, S. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biology, 11, R116.

    Article  CAS  PubMed  Google Scholar 

  33. Phillippy, A., Schatz, M., and Pop, M. (2008) Genome assembly forensics: finding the elusive mis-assembly. Genome Biology, 9, R55.

    Article  PubMed  Google Scholar 

  34. Choi, J., Kim, S., Tang, H., Andrews, J., Gilbert, D., and Colbourne, J. (2008) A machine-learning approach to combined evidence validation of genome assemblies. Bioinformatics, 24, 744.

    Article  CAS  PubMed  Google Scholar 

  35. Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Research, 8, 195.

    CAS  PubMed  Google Scholar 

  36. Nielsen, C., Cantor, M., Dubchak, I., Gordon, D., and Wang, T. (2010) Visualizing genomes: techniques and challenges. Nature Methods, 7, S5–S15.

    Article  CAS  PubMed  Google Scholar 

  37. Schatz, M., Phillippy, A., Shneiderman, B., and Salzberg, S. (2007) Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biology, 8, R34.

    Article  PubMed  Google Scholar 

  38. Velasco, R., et al. (2007) A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS One, 2, 1326.

    Article  Google Scholar 

  39. Goldberg, S., et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proceedings of the National Academy of Sciences, 103, 11240.

    Article  CAS  Google Scholar 

  40. Huang, S., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nature Genetics, 41, 1275–1281.

    Article  CAS  PubMed  Google Scholar 

  41. Reinhardt, J., Baltrus, D., Nishimura, M., Jeck, W., Jones, C., and Dangl, J. (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Research, 19, 294.

    Article  CAS  PubMed  Google Scholar 

  42. Lee, S., Cheran, E., and Brudno, M. (2008) A robust framework for detecting structural variations in a genome. Bioinformatics, 24, i59.

    Article  CAS  PubMed  Google Scholar 

  43. Hormozdiari, F., Alkan, C., Eichler, E., and Sahinalp, S. (2009) Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Research, 19, 1270.

    Article  CAS  PubMed  Google Scholar 

  44. Lee, S., Hormozdiari, F., Alkan, C., and Brudno, M. (2009) MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods, 6, 473–474.

    Article  CAS  PubMed  Google Scholar 

  45. Chen, K., et al. (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods, 6, 677–681.

    Article  CAS  PubMed  Google Scholar 

  46. Ye, K., Schulz, M., Long, Q., Apweiler, R., and Ning, Z. (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 25, 2865.

    Article  CAS  PubMed  Google Scholar 

  47. Pop, M., Phillippy, A., Delcher, A., and Salzberg, S. (2004) Comparative genome assembly. Briefings in Bioinformatics, 5, 237.

    Article  CAS  PubMed  Google Scholar 

  48. Salzberg, S., Sommer, D., Puiu, D., and Lee, V. (2008) Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Comput Biol, 4, e1000186.

    Article  PubMed  Google Scholar 

  49. Bansal, V. and Bafna, V. (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24, i153.

    Article  PubMed  Google Scholar 

  50. Levy, S., et al. (2007) The diploid genome sequence of an individual human. PLoS Biol, 5, e254.

    Article  PubMed  Google Scholar 

  51. Ye, Y. and Tang, H. (2009) An orfome assembly approach to metagenomics sequences analysis. Journal of Bioinformatics and Computational Biology, 7, 455.

    Article  CAS  PubMed  Google Scholar 

  52. De Bona, F., Ossowski, S., Schneeberger, K., and Ratsch, G. (2008) Optimal spliced alignments of short sequence reads. BMC Bioinformatics, 9, O7.

    Article  Google Scholar 

  53. Trapnell, C., Pachter, L., and Salzberg, S. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105.

    Article  CAS  PubMed  Google Scholar 

  54. Wang, K., et al. (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research, 38, e178.

    Google Scholar 

  55. Trapnell, C., Williams, B., Pertea, G., Mortazavi, A., Kwan, G., Van Baren, M., Salzberg, S., Wold, B., and Pachter, L. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28, 511–515.

    Article  CAS  PubMed  Google Scholar 

  56. Warren, R., Sutton, G., Jones, S., and Holt, R. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500.

    Article  CAS  PubMed  Google Scholar 

  57. Jeck, W., Reinhardt, J., Baltrus, D., Hickenbotham, M., Magrini, V., Mardis, E., Dangl, J., and Jones, C. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942.

    Article  CAS  PubMed  Google Scholar 

  58. Jeck, W., Reinhardt, J., Baltrus, D., Hickenbotham, M., Magrini, V., Mardis, E., Dangl, J., and Jones, C. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942.

    Article  CAS  PubMed  Google Scholar 

  59. Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J., and Lander, E. (2002) ARACHNE: a whole-genome shotgun assembler. Genome Research, 12, 177.

    Article  CAS  PubMed  Google Scholar 

  60. Jaffe, D., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J., Zody, M., and Lander, E. (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Research, 13, 91.

    Article  CAS  PubMed  Google Scholar 

  61. Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A., Muller, W., Wetter, T., and Suhai, S. (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research, 14, 1147.

    Article  CAS  PubMed  Google Scholar 

  62. Life Sciences (2005), Newbler.

    Google Scholar 

  63. Chaisson, M. and Pevzner, P. (2008) Short read fragment assembly of bacterial genomes. Genome Research, 18, 324.

    Article  CAS  PubMed  Google Scholar 

  64. Zerbino, D. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18, 821.

    Article  CAS  PubMed  Google Scholar 

  65. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I., Belmonte, M., Lander, E., Nusbaum, C., and Jaffe, D. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research, 18, 810.

    Article  CAS  PubMed  Google Scholar 

  66. Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., and Birol, I. (2009) ABySS: A parallel assembler for short read sequence data. Genome Research, 19, 1117.

    Article  CAS  PubMed  Google Scholar 

  67. Li, R., et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20, 265.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haixu Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Lee, H., Tang, H. (2012). Next-Generation Sequencing Technologies and Fragment Assembly Algorithms. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 855. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-582-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-582-4_5

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-581-7

  • Online ISBN: 978-1-61779-582-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics