Skip to main content

Approaches and Challenges of Next-Generation Sequence Assembly Stages

  • Chapter
  • First Online:
Next Generation Sequencing Technologies and Challenges in Sequence Assembly

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 7))

  • 3072 Accesses

Abstract

The process of sequence assembly in the next-generation environment is broken down into five stages. We introduced all these stages in Chap. 8. Here, we will discuss four of these stages in detail and present the different approaches followed in each of them. Additionally, we will debate the challenges that face each stage and their stage-specific implementation approaches. The fifth stage, the assessment of the assembly, will be discussed separately in Chap. 10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98 (17):9748-9753. doi:10.1073/pnas.171285098

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  2. Vyahhi N, Pyshkin A, Pham S, Pevzner P (2012) From de Bruijn Graphs to Rectangle Graphs for Genome Assembly. In: Raphael B, Tang J (eds) Algorithms in Bioinformatics, vol 7534. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp 249-261. doi:10.1007/978-3-642-33122-0_20

  3. Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nature reviews Genetics 12 (10):671-682. doi:10.1038/nrg3068

    Article  PubMed  CAS  Google Scholar 

  4. Pop M, Phillippy A, Delcher AL, Salzberg SL (2004) Comparative genome assembly. Briefings in bioinformatics 5 (3):237-248

    Article  PubMed  CAS  Google Scholar 

  5. Kelley DR, Schatz MC, Salzberg SL (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11 (11):R116. doi:10.1186/gb-2010-11-11-r116

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  6. Yang X, Dorman KS, Aluru S (2010) Reptile: representative tiling for short read error correction. Bioinformatics 26 (20):2526-2533. doi:10.1093/bioinformatics/btq468

    Article  PubMed  Google Scholar 

  7. Medvedev P, Scott E, Kakaradov B, Pevzner P (2011) Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics 27 (13):i137-i141. doi:10.1093/bioinformatics/btr208

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  8. Schroder J, Schroder H, Puglisi SJ, Sinha R, Schmidt B (2009) SHREC: a short-read error correction method. Bioinformatics 25 (17):2157-2163. doi:10.1093/bioinformatics/btp379

    Article  PubMed  Google Scholar 

  9. Ilie L, Fazayeli F, Ilie S (2011) HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 27 (3):295-302. doi:10.1093/bioinformatics/btq653

    Article  PubMed  CAS  Google Scholar 

  10. Salmela L, Schroder J (2011) Correcting errors in short reads by multiple alignments. Bioinformatics 27 (11):1455-1461. doi:10.1093/bioinformatics/btr170

    Article  PubMed  CAS  Google Scholar 

  11. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48 (3):443-453. doi:0022-2836(70)90057-4

    Article  PubMed  CAS  Google Scholar 

  12. Kao WC, Chan AH, Song YS (2011) ECHO: a reference-free short-read error correction algorithm. Genome research 21 (7):1181-1192. doi:10.1101/gr.111351.110

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  13. Zhang Q, Pell J, Canino-Koning R, Chuang Howe CA, Brown T (under review) These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. Preprint arXiv: 1309:2975. In review, PloS One

    Google Scholar 

  14. Yang X, Chockalingam SP, Aluru S (2013) A survey of error-correction methods for next-generation sequencing. Briefings in bioinformatics 14 (1):56-66. doi:10.1093/bib/bbs015

    Article  PubMed  CAS  Google Scholar 

  15. Medvedev P, Brudno M (2009) Maximum likelihood genome assembly. J Comput Biol 16 (8):1101-1116. doi:10.1089/cmb.2009.0047

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  16. Medvedev P, Georgiou K, Myers G, Brudno M (2007) Computability of Models for Sequence Assembly. In: Giancarlo R, Hannenhalli S (eds) Algorithms in Bioinformatics, vol 4645. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp 289-301. doi:10.1007/978-3-540-74126-8_27

  17. DiGuistini S, Liao NY, Platt D, Robertson G, Seidel M et al. (2009) De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biol 10 (9):R94. doi:10.1186/gb-2009-10-9-r94

    Article  PubMed Central  PubMed  Google Scholar 

  18. Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J (2008) De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome research 18 (5):802-809. doi:10.1101/gr.072033.107

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  19. Hossain M, Azimi N, Skiena S (2009) Crystallizing short-read assemblies around seeds. BMC bioinformatics 10 (Suppl 1):S16. doi:10.1186/1471-2105-10-s1-s16

    Article  PubMed Central  PubMed  Google Scholar 

  20. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437 (7057):376-380. doi:nature03959

    PubMed Central  PubMed  CAS  Google Scholar 

  21. Miller JR, Delcher AL, Koren S, Venter E, Walenz BP et al. (2008) Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24 (24):2818-2824. doi:10.1093/bioinformatics/btn548

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  22. Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP et al. (2000) A whole-genome assembly of Drosophila. Science 287 (5461):2196-2204

    Article  PubMed  CAS  Google Scholar 

  23. Myers EW (2005) The fragment assembly string graph. Bioinformatics 21 Suppl 2:ii79-85. doi:21/suppl_2/ii79

    Article  PubMed  CAS  Google Scholar 

  24. Gonnella G, Kurtz S (2012) Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC bioinformatics 13:82. doi:10.1186/1471-2105-13-82

    Article  PubMed Central  PubMed  Google Scholar 

  25. Simpson JT, Durbin R (2010) Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26 (12):i367-373. doi:10.1093/bioinformatics/btq217

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  26. Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome research 22 (3):549-556. doi:10.1101/gr.126953.111

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  27. Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK et al. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome research 18 (5):810-820. doi:10.1101/gr.7337908

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  28. Chaisson M, Pevzner P, Tang H (2004) Fragment assembly with short reads. Bioinformatics 20 (13):2067-2074. doi:10.1093/bioinformatics/bth205

    Article  PubMed  CAS  Google Scholar 

  29. Chaisson MJ, Brinza D, Pevzner PA (2009) De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome research 19 (2):336-346. doi:10.1101/gr.079053.108

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  30. Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome research 18 (2):324-330. doi:10.1101/gr.7088808

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  31. Li R, Zhu H, Ruan J, Qian W, Fang X et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome research 20 (2):265-272. doi:10.1101/gr.097261.109

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  32. Maccallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I et al. (2009) ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 10 (10):R103. doi:10.1186/gb-2009-10-10-r103

    Article  PubMed Central  PubMed  Google Scholar 

  33. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome research 19 (6):1117-1123. doi:10.1101/gr.089532.108

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  34. Zerbino DR, Birney E (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome research 18 (5):821-829. doi:10.1101/gr.074492.107

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  35. Ye C, Ma ZS, Cannon CH, Pop M, Yu DW (2012) Exploiting sparseness in de novo genome assembly. BMC bioinformatics 13 Suppl 6:S1. doi:10.1186/1471-2105-13-S6-S1

  36. Conway TC, Bromage AJ (2011) Succinct data structures for assembling large genomes. Bioinformatics 27 (4):479-486. doi:10.1093/bioinformatics/btq697

    Article  PubMed  CAS  Google Scholar 

  37. Bowe A, Onodera T, Sadakane K, Shibuya T (2012) Succinct de Bruijn Graphs. In: Raphael B, Tang J (eds) Algorithms in Bioinformatics, vol 7534. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp 225-235. doi:10.1007/978-3-642-33122-0_18

  38. Chikhi R, Rizk G (2012) Space-Efficient and Exact de Bruijn Graph Representation Based on a Bloom Filter. In: Raphael B, Tang J (eds) Algorithms in Bioinformatics, vol 7534. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp 236-248. doi:10.1007/978-3-642-33122-0_19

  39. Salikhov K, Sacomoto G, Kucherov G (Submitted) Using cascading Bloom filters to improve the memory usage for de Brujin graphs.

    Google Scholar 

  40. Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P (2011) Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J Comput Biol 18 (11):1625-1634. doi:10.1089/cmb.2011.0151

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  41. Bryant DW, Jr., Wong WK, Mockler TC (2009) QSRA: a quality-value guided de novo short read assembler. BMC bioinformatics 10:69. doi:10.1186/1471-2105-10-69

    Article  PubMed Central  PubMed  Google Scholar 

  42. Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007) SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 17 (11):1697-1706. doi:gr.6435207

    Google Scholar 

  43. Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V et al. (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics 23 (21):2942-2944. doi:10.1093/bioinformatics/btm451

    Article  PubMed  CAS  Google Scholar 

  44. Warren RL, Sutton GG, Jones SJ, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23 (4):500-501. doi: 10.1093/bioinformatics/btl629

    Article  PubMed  CAS  Google Scholar 

  45. Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95 (6):315-327. doi:10.1016/j.ygeno.2010.03.001

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  46. Schmidt B, Sinha R, Beresford-Smith B, Puglisi SJ (2009) A fast hybrid short read fragment assembly algorithm. Bioinformatics 25 (17):2279-2280. doi:10.1093/bioinformatics/btp374

    Article  PubMed  CAS  Google Scholar 

  47. El-Metwally S, Hamza T, Zakaria M, Helmy M (2013) Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol 9 (12):e1003345. doi:10.1371/journal.pcbi.1003345

    Article  PubMed Central  PubMed  Google Scholar 

  48. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America 108 (4):1513-1518. doi:10.1073/pnas.1017351108

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  49. Zerbino DR, McEwen GK, Margulies EH, Birney E (2009) Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One 4 (12):e8407. doi:10.1371/journal.pone.0008407

    Article  PubMed Central  PubMed  Google Scholar 

  50. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 24 (4):578-579

    Article  Google Scholar 

  51. Dayarian A, Michael TP, Sengupta AM (2010) SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC bioinformatics 11:345. doi:10.1186/1471-2105-11-345

    Article  PubMed Central  PubMed  Google Scholar 

  52. Donmez N, Brudno M (2013) SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29 (4):428-434. doi:10.1093/bioinformatics/bts716

    Article  PubMed  CAS  Google Scholar 

  53. Gao S, Sung WK, Nagarajan N (2011) Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol 18 (11):1681-1691. doi:10.1089/cmb.2011.0170

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  54. Gritsenko AA, Nijkamp JF, Reinders MJ, de Ridder D (2012) GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28 (11):1429-1437. doi:10.1093/bioinformatics/bts175

    Article  PubMed  CAS  Google Scholar 

  55. Koren S, Treangen TJ, Pop M (2011) Bambus 2: scaffolding metagenomes. Bioinformatics 27 (21):2964-2971. doi:10.1093/bioinformatics/btr520

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  56. Pop M, Kosack DS, Salzberg SL (2004) Hierarchical scaffolding with Bambus. Genome research 14 (1):149-159. doi:10.1101/gr.1536204

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  57. Salmela L, Makinen V, Valimaki N, Ylinen J, Ukkonen E (2011) Fast scaffolding with small independent mixed integer programs. Bioinformatics 27 (23):3259-3265. doi:10.1093/bioinformatics/btr562

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  58. Huson DH, Reinert K, Myers EW (2002) The greedy path-merging algorithm for contig scaffolding. Journal of the ACM 49 (5):603 - 615

    Article  Google Scholar 

  59. Medvedev P, Brudno M (2008) Ab initio whole genome shotgun assembly with mated short reads. Paper presented at the Proceedings of the 12th annual international conference on Research in computational molecular biology, Singapore

    Google Scholar 

  60. Liu Y, Schroder J, Schmidt B (2013) Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics 29 (3):308-315. doi:10.1093/bioinformatics/bts690

    Article  PubMed  CAS  Google Scholar 

  61. Salmela L (2010) Correction of sequencing errors in a mixed set of reads. Bioinformatics 26 (10):1284-1290. doi:10.1093/bioinformatics/btq151

    Article  PubMed  CAS  Google Scholar 

  62. Koren S, Schatz MC, Walenz BP, Martin J, Howard JT et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30 (7):693-700. doi:10.1038/nbt.2280

    Article  PubMed Central  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 The Authors

About this chapter

Cite this chapter

El-Metwally, S., Ouda, O.M., Helmy, M. (2014). Approaches and Challenges of Next-Generation Sequence Assembly Stages. In: Next Generation Sequencing Technologies and Challenges in Sequence Assembly. SpringerBriefs in Systems Biology, vol 7. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0715-1_9

Download citation

Publish with us

Policies and ethics