Next-Generation Sequencing Technologies and Fragment Assembly Algorithms

Lee, Heewook; Tang, Haixu

doi:10.1007/978-1-61779-582-4_5

Heewook Lee² &
Haixu Tang²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 855))

5240 Accesses
14 Citations

Abstract

As a classic topic in bioinformatics, the fragment assembly problem has been studied for over two decades. Fragment assembly algorithms take a set of DNA fragments as input, piece them together into a set of aligned overlapping fragments (i.e., contigs), and output a consensus sequence for each of the contigs. The rapid advance of massively parallel sequencing, often referred to as next-generation sequencing (NGS) technologies, has revolutionized DNA sequencing by reducing both its time and cost by several orders of magnitude in the past few years, but posed new challenges for fragment assembly. As a result, many new approaches have been developed to assemble NGS sequences, which are typically shorter with a higher error rate, but at a much higher throughput, than classic methods provided. In this chapter, we review both classic and new algorithms for fragment assembly, with a focus on NGS sequences. We also discuss a few new assembly problems emerging from the broader applications of NGS techniques, which are distinct from the classic fragment assembly problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sanger, F., Nicklen, S., and Coulson, A. (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences of the United States of America, 74, 5463.
Article CAS PubMed Google Scholar
Wheeler, D., et al. (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature, 452, 872–876.
Article CAS PubMed Google Scholar
Bentley, D., et al. (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456, 53–59.
Article CAS PubMed Google Scholar
Wang, J., et al. (2008) The diploid genome sequence of an Asian individual. Nature, 456, 60–65.
Article CAS PubMed Google Scholar
Kim, J., et al. (2009) A highly annotated whole-genome sequence of a Korean individual. Nature, 460, 1011–1015.
CAS PubMed Google Scholar
Robertson, G., et al. (2007) Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods, 4, 651–657.
Article CAS PubMed Google Scholar
Wang, Z., Gerstein, M., and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10, 57–63.
Article CAS PubMed Google Scholar
Lister, R., et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462, 315–322.
Google Scholar
Ng, S., et al. (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461, 272–276.
Article CAS PubMed Google Scholar
Ronaghi, M., Uhlen, M., and Nyren, P. (1998) A sequencing method based on real-time pyrophosphate. Science(Washington), 281, 363–365.
Google Scholar
Brenner, S., et al. (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nature biotechnology, 18, 630–634.
Article CAS PubMed Google Scholar
Huse, S., Huber, J., Morrison, H., Sogin, M., and Welch, D. (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology, 8, R143.
Article PubMed Google Scholar
Miller, J., Koren, S., and Sutton, G. (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327.
Article CAS PubMed Google Scholar
Li, H., Ruan, J., and Durbin, R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18, 1851.
Article CAS PubMed Google Scholar
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. (2009) Ultra-fast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 10, R25.
Article PubMed Google Scholar
Li, H. and Durbin, R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589.
Article PubMed Google Scholar
Alkan, C., et al. (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 41, 1061–1067.
Article CAS PubMed Google Scholar
Homer, N., Merriman, B., and Nelson, S. (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS One, 4, e7767.
Article PubMed Google Scholar
Li, R., Li, Y., Kristiansen, K., and Wang, J. (2008) SOAP: short oligonucleotide alignment program. Bioinformatics, 24, 713.
Article CAS PubMed Google Scholar
Demaine, E. and Demaine, M. (2007) Jigsaw puzzles, edge matching, and polyomino packing: Connections and complexity. Graphs and Combinatorics, 23, 195–208.
Article Google Scholar
Staden, R. (1979) A strategy of DNA sequencing employing computer programs. Nucleic Acids Research, 6, 2601.
Article CAS PubMed Google Scholar
Lander, E. and Waterman, M. (1988) Genomic mapping by finger-printing random clones: a mathematical analysis. Genomics, 2, 231–239.
Article CAS PubMed Google Scholar
Myers, E. (1995) Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology, 2, 275–290.
Article CAS PubMed Google Scholar
Green, P. (1994), PHRAP documentation. http://www.phrap.org/phredphrap/phrap.html
Sutton, G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1, 9–19.
Google Scholar
Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome research, 9, 868.
Article CAS PubMed Google Scholar
Myers, E., et al. (2000) A whole-genome assembly of Drosophila. Science, 287, 2196.
Article CAS PubMed Google Scholar
Idury, R. and Waterman, M. (1995) A new algorithm for DNA sequence assembly. Journal of Computational Biology, 2, 291–306.
Article CAS PubMed Google Scholar
Pevzner, P., Tang, H., and Waterman, M. (2001) An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America, 98, 9748.
Article CAS PubMed Google Scholar
Pop, M., Kosack, D., and Salzberg, S. (2004) Hierarchical scaffolding with Bambus. Genome Research, 14, 149.
Article CAS PubMed Google Scholar
Yang, X., Dorman, K., and Aluru, S. (2010) Reptile: Representative Tiling for Short Read Error Correction. Bioinformatics, 26, 2526
Google Scholar
Kelley, D., Schatz, M., and Salzberg, S. (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biology, 11, R116.
Article CAS PubMed Google Scholar
Phillippy, A., Schatz, M., and Pop, M. (2008) Genome assembly forensics: finding the elusive mis-assembly. Genome Biology, 9, R55.
Article PubMed Google Scholar
Choi, J., Kim, S., Tang, H., Andrews, J., Gilbert, D., and Colbourne, J. (2008) A machine-learning approach to combined evidence validation of genome assemblies. Bioinformatics, 24, 744.
Article CAS PubMed Google Scholar
Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Research, 8, 195.
CAS PubMed Google Scholar
Nielsen, C., Cantor, M., Dubchak, I., Gordon, D., and Wang, T. (2010) Visualizing genomes: techniques and challenges. Nature Methods, 7, S5–S15.
Article CAS PubMed Google Scholar
Schatz, M., Phillippy, A., Shneiderman, B., and Salzberg, S. (2007) Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biology, 8, R34.
Article PubMed Google Scholar
Velasco, R., et al. (2007) A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS One, 2, 1326.
Article Google Scholar
Goldberg, S., et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proceedings of the National Academy of Sciences, 103, 11240.
Article CAS Google Scholar
Huang, S., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nature Genetics, 41, 1275–1281.
Article CAS PubMed Google Scholar
Reinhardt, J., Baltrus, D., Nishimura, M., Jeck, W., Jones, C., and Dangl, J. (2009) De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Research, 19, 294.
Article CAS PubMed Google Scholar
Lee, S., Cheran, E., and Brudno, M. (2008) A robust framework for detecting structural variations in a genome. Bioinformatics, 24, i59.
Article CAS PubMed Google Scholar
Hormozdiari, F., Alkan, C., Eichler, E., and Sahinalp, S. (2009) Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Research, 19, 1270.
Article CAS PubMed Google Scholar
Lee, S., Hormozdiari, F., Alkan, C., and Brudno, M. (2009) MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods, 6, 473–474.
Article CAS PubMed Google Scholar
Chen, K., et al. (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods, 6, 677–681.
Article CAS PubMed Google Scholar
Ye, K., Schulz, M., Long, Q., Apweiler, R., and Ning, Z. (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 25, 2865.
Article CAS PubMed Google Scholar
Pop, M., Phillippy, A., Delcher, A., and Salzberg, S. (2004) Comparative genome assembly. Briefings in Bioinformatics, 5, 237.
Article CAS PubMed Google Scholar
Salzberg, S., Sommer, D., Puiu, D., and Lee, V. (2008) Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Comput Biol, 4, e1000186.
Article PubMed Google Scholar
Bansal, V. and Bafna, V. (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24, i153.
Article PubMed Google Scholar
Levy, S., et al. (2007) The diploid genome sequence of an individual human. PLoS Biol, 5, e254.
Article PubMed Google Scholar
Ye, Y. and Tang, H. (2009) An orfome assembly approach to metagenomics sequences analysis. Journal of Bioinformatics and Computational Biology, 7, 455.
Article CAS PubMed Google Scholar
De Bona, F., Ossowski, S., Schneeberger, K., and Ratsch, G. (2008) Optimal spliced alignments of short sequence reads. BMC Bioinformatics, 9, O7.
Article Google Scholar
Trapnell, C., Pachter, L., and Salzberg, S. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105.
Article CAS PubMed Google Scholar
Wang, K., et al. (2010) MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research, 38, e178.
Google Scholar
Trapnell, C., Williams, B., Pertea, G., Mortazavi, A., Kwan, G., Van Baren, M., Salzberg, S., Wold, B., and Pachter, L. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28, 511–515.
Article CAS PubMed Google Scholar
Warren, R., Sutton, G., Jones, S., and Holt, R. (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23, 500.
Article CAS PubMed Google Scholar
Jeck, W., Reinhardt, J., Baltrus, D., Hickenbotham, M., Magrini, V., Mardis, E., Dangl, J., and Jones, C. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942.
Article CAS PubMed Google Scholar
Jeck, W., Reinhardt, J., Baltrus, D., Hickenbotham, M., Magrini, V., Mardis, E., Dangl, J., and Jones, C. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics, 23, 2942.
Article CAS PubMed Google Scholar
Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J., and Lander, E. (2002) ARACHNE: a whole-genome shotgun assembler. Genome Research, 12, 177.
Article CAS PubMed Google Scholar
Jaffe, D., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J., Zody, M., and Lander, E. (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Research, 13, 91.
Article CAS PubMed Google Scholar
Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A., Muller, W., Wetter, T., and Suhai, S. (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research, 14, 1147.
Article CAS PubMed Google Scholar
Life Sciences (2005), Newbler.
Google Scholar
Chaisson, M. and Pevzner, P. (2008) Short read fragment assembly of bacterial genomes. Genome Research, 18, 324.
Article CAS PubMed Google Scholar
Zerbino, D. and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18, 821.
Article CAS PubMed Google Scholar
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I., Belmonte, M., Lander, E., Nusbaum, C., and Jaffe, D. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research, 18, 810.
Article CAS PubMed Google Scholar
Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., and Birol, I. (2009) ABySS: A parallel assembler for short read sequence data. Genome Research, 19, 1117.
Article CAS PubMed Google Scholar
Li, R., et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20, 265.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics and Computing, Indiana University, Bloomington, IN, USA
Heewook Lee & Haixu Tang

Authors

Heewook Lee
View author publications
You can also search for this author in PubMed Google Scholar
Haixu Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haixu Tang .

Editor information

Editors and Affiliations

Department of Computer Science, ETH Zürich, Universitätsstr. 6, Zürich, 8092, Switzerland
Maria Anisimova

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Lee, H., Tang, H. (2012). Next-Generation Sequencing Technologies and Fragment Assembly Algorithms. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 855. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-582-4_5

Download citation

DOI: https://doi.org/10.1007/978-1-61779-582-4_5
Published: 07 February 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-581-7
Online ISBN: 978-1-61779-582-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics