Advertisement

Bidirectional Variable-Order de Bruijn Graphs

  • Djamal Belazzougui
  • Travis Gagie
  • Veli Mäkinen
  • Marco Previtali
  • Simon J. Puglisi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9644)

Abstract

Implementing de Bruijn graphs compactly is an important problem because of their role in genome assembly. There are currently two main approaches, one using Bloom filters and the other using a kind of Burrows-Wheeler Transform on the edge labels of the graph. The second representation is more elegant and can even handle many graph-orders at once, but it does not cleanly support traversing edges backwards or inserting new nodes or edges. In this paper we resolve the first of these issues and partially address the second.

Keywords

Static Structure Current Node Outgoing Edge Distribute Hash Table Bloom Filter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bankevich, A., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Boucher, C., Bowe, A., Gagie, T., Puglisi, S.J., Sadakane, K.: Variable-order de Bruijn graphs. In: Proceedings of the Data Compression Conference (DCC), pp. 383–392. IEEE (2015)Google Scholar
  4. 4.
    Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)Google Scholar
  6. 6.
    Butler, J., et al.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18(5), 810–820 (2008)CrossRefGoogle Scholar
  7. 7.
    Chikhi, R., Limasset, A., Jackman, S., Simpson, J.T., Medvedev, P.: On the representation of de Bruijn graphs. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 35–55. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  8. 8.
    Chikhi, R., Rizk, G.: Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithm. Mol. Biol. 8(22) (2012)Google Scholar
  9. 9.
    Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27(4), 479–486 (2011)CrossRefGoogle Scholar
  10. 10.
    Haussler, D., et al.: Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 100(6), 659–674 (2009)CrossRefGoogle Scholar
  11. 11.
    Holley, G., Wittler, R., Stoye, J.: Bloom filter trie – a data structure for pan-genome storage. In: Pop, M., Touzet, H. (eds.) WABI 2015. LNCS, vol. 9289, pp. 217–230. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  12. 12.
    Hon, W.-K., Sadakane, K.: Space-economical algorithms for finding maximal unique matches. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 144–152. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265–272 (2010)CrossRefGoogle Scholar
  14. 14.
    Li, R., Yu, C., Li, Y., Lam, T.-W., Yiu, S.-M., Kristiansen, K., Wang, J.: SOAP2. Bioinformatics 25(15), 1966–1967 (2009)CrossRefGoogle Scholar
  15. 15.
    Munro, J.I., Nekrich, Y.: Compressed data structures for dynamic sequences. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 891–902. Springer, Heidelberg (2015)Google Scholar
  16. 16.
    Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM J. Comput. 43(5), 1781–1806 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: ALENEX, pp. 60–70 (2007)Google Scholar
  18. 18.
    Ossowski, S., et al.: Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 18(12), 2024–2033 (2008)CrossRefGoogle Scholar
  19. 19.
    Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Nat. Acad. Sci. 109(33), 13272–13277 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – a practical iterative de Bruijn graph de novo assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Salikhov, K., Sacomoto, G., Kucherov, G.: Using cascading Bloom filters to improve the memory usage for de Bruijn graphs. Algorithms Mol. Biol. 9(2) (2014)Google Scholar
  22. 22.
    Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefGoogle Scholar
  24. 24.
    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012)Google Scholar
  25. 25.
    Turnbaugh, P.J., et al.: The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449(7164), 804–810 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Djamal Belazzougui
    • 1
  • Travis Gagie
    • 2
  • Veli Mäkinen
    • 2
  • Marco Previtali
    • 3
  • Simon J. Puglisi
    • 2
  1. 1.Center for Research on Technical and Scientific Information (CERIST)AlgiersAlgeria
  2. 2.Department of Computer Science, Helsinki Institute for Information TechnologyUniversity of HelsinkiHelsinkiFinland
  3. 3.Department of Computer ScienceUniversity of Milano-BicoccaMilanItaly

Personalised recommendations