Skip to main content

Localized Genome Assembly from Reads to Scaffolds: Practical Traversal of the Paired String Graph

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 6833)

Abstract

Next-generation de novo short reads assemblers typically use the following strategy: (1) assemble unpaired reads using heuristics leading to contigs; (2) order contigs from paired reads information to produce scaffolds. We propose to unify these two steps by introducing localized assembly: direct construction of scaffolds from reads. To this end, the paired string graph structure is introduced, along with a formal framework for building scaffolds as paths of reads. This framework leads to the design of a novel greedy algorithm for memory-efficient, parallel assembly of paired reads. A prototype implementation of the algorithm has been developed and applied to the assembly of simulated and experimental short reads. Our experiments show that our methods yields longer scaffolds than recent assemblers, and is capable of assembling diploid genomes significantly better than other greedy methods.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-23038-7_4
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-23038-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   74.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ariyaratne, P.N., Sung, W.: PE-Assembler: de novo assembler using short paired-end reads. Bioinformatics (December 2010)

    Google Scholar 

  2. Batzoglou, S., Jaffe, D.B., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J.P., Lander, E.S.: ARACHNE: a whole-genome shotgun assembler. Genome Research 12(1), 177 (2002)

    CrossRef  Google Scholar 

  3. Boisvert, S., Laviolette, F., Corbeil, J.: Ray: Simultaneous assembly of reads from a mix of High-Throughput sequencing technologies. Journal of Computational Biology, 3389–3402 (2010)

    Google Scholar 

  4. Bryant, D.W., Wong, W.K., Mockler, T.C.: QSRA – a quality-value guided de novo short read assembler. BMC Bioinformatics 10(1), 69 (2009)

    CrossRef  Google Scholar 

  5. Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research 18(5), 810–820 (2008), http://genome.cshlp.org/content/18/5/810.abstract

    CrossRef  Google Scholar 

  6. Chikhi, R., Lavenier, D.: Paired-end read length lower bounds for genome re-sequencing. BMC Bioinformatics 10(suppl. 13), O2 (2009)

    CrossRef  Google Scholar 

  7. Donmez, N., Brudno, M.: Hapsembler: An assembler for highly polymorphic genomes. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 38–52. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  8. Ferragina, P., Manzini, G.: Indexing compressed text. Journal of the ACM (JACM) 52(4), 552–581 (2005)

    CrossRef  MATH  Google Scholar 

  9. Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M., Aird, D., Costello, M., Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., Jaffe, D.B.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108(4), 1513–1518 (2011), http://www.pnas.org/content/108/4/1513.abstract

    CrossRef  Google Scholar 

  10. Hossain, M., Azimi, N., Skiena, S.: Crystallizing short-read assemblies around seeds. BMC Bioinformatics 10(suppl. 1), S16 (2009), http://www.biomedcentral.com/1471-2105/10/S1/S16

    CrossRef  Google Scholar 

  11. Huson, D.H., Reinert, K., Myers, E.W.: The greedy path-merging algorithm for contig scaffolding. Journal of the ACM (JACM) 49(5), 603–615 (2002)

    CrossRef  MATH  Google Scholar 

  12. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078 (2009)

    CrossRef  Google Scholar 

  13. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., Wang, J.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2), 265–272 (2010), http://genome.cshlp.org/content/20/2/265.abstract

    CrossRef  Google Scholar 

  14. Medvedev, P., Pham, S., Chaisson, M., Tesler, G., Pevzner, P.: Paired de bruijn graphs: A novel approach for incorporating mate pair information into genome assemblers. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 238–251. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  15. Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics (2010)

    Google Scholar 

  16. Myers, E.W.: Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology 2(2), 275–290 (1995)

    CrossRef  Google Scholar 

  17. Nagarajan, N., Pop, M.: Parametric complexity of sequence assembly: Theory and applications to next generation sequencing. Journal of Computational Biology 16(7), 897–908 (2009)

    CrossRef  Google Scholar 

  18. Pop, M., Kosack, D.S., Salzberg, S.L.: Hierarchical scaffolding with bambus. Genome Research 14(1), 149–159 (2004), http://genome.cshlp.org/content/14/1/149.abstract

    CrossRef  Google Scholar 

  19. Schmidt, B., Sinha, R., Beresford-Smith, B., Puglisi, S.J.: A fast hybrid short read fragment assembly algorithm. Bioinformatics 25(17), 2279 (2009)

    CrossRef  Google Scholar 

  20. Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367 (2010)

    CrossRef  Google Scholar 

  21. Simpson, J., Wong, K., Jackman, S., Schein, J., Jones, S., Birol, İ.: ABySS: A parallel assembler for short read sequence data. Genome Research 19(6), 1117 (2009)

    CrossRef  Google Scholar 

  22. Sutton, G., Miller, J.R., Delcher, A.L., Koren, S., Venter, E., Walenz, B.P., Brownley, A., Johnson, J., Li, K., Mobarry, C.: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24(24), 2818–2824 (2008), http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/24/2818

    CrossRef  Google Scholar 

  23. Warren, R.L., Sutton, G.G., Jones, S.J.M., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007), http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/4/500

    CrossRef  Google Scholar 

  24. Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome Research 18(5), 821–829 (2008), http://genome.cshlp.org/content/18/5/821.abstract

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chikhi, R., Lavenier, D. (2011). Localized Genome Assembly from Reads to Scaffolds: Practical Traversal of the Paired String Graph. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23038-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23037-0

  • Online ISBN: 978-3-642-23038-7

  • eBook Packages: Computer ScienceComputer Science (R0)