Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs

  • Kamil Salikhov
  • Gustavo Sacomoto
  • Gregory Kucherov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8126)

Abstract

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing (NGS) data. Due to the very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of Chikhi and Rizk (WABI, 2012) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to their method. This is, to our knowledge, the best practical representation for de Bruijn graphs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blattner, F.R., Plunkett, G., Bloch, C.A., et al.: The complete genome sequence of Escherichia coli k-12. Science 277(5331), 1453–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Chikhi, R., Rizk, G.: Space-efficient and exact de bruijn graph representation based on a bloom filter. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 236–248. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27(4), 479–486 (2011)CrossRefGoogle Scholar
  5. 5.
    Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotech. 29(7), 644–652 (2011)CrossRefGoogle Scholar
  6. 6.
    Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44(2), 226–232 (2012)CrossRefGoogle Scholar
  7. 7.
    Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: Building a better bloom filter. Random Struct. Algorithms 33(2), 187–218 (2008)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics 95(6), 315–327 (2010)CrossRefGoogle Scholar
  9. 9.
    Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. U.S.A. 109(33), 13272–13277 (2012)MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)Google Scholar
  11. 11.
    Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98(17), 9748–9753 (2001)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Porat, E.: An optimal Bloom filter replacement based on matrix solving. In: Frid, A., Morozov, A., Rybalchenko, A., Wagner, K.W. (eds.) CSR 2009. LNCS, vol. 5675, pp. 263–273. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Rizk, G., Lavenier, D., Chikhi, R.: DSK: k-mer counting with very low memory usage. Bioinformatics (2013)Google Scholar
  14. 14.
    Sacomoto, G., Kielbassa, J., Chikhi, R., Uricaru, R., et al.: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinformatics 13(suppl. 6), S5 (2012)Google Scholar
  15. 15.
    Ye, C., Ma, Z., Cannon, C., Pop, M., Yu, D.: Exploiting sparseness in de novo genome assembly. BMC Bioinformatics 13(suppl. 6), S1 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kamil Salikhov
    • 1
  • Gustavo Sacomoto
    • 2
    • 3
  • Gregory Kucherov
    • 4
    • 5
  1. 1.Lomonosov Moscow State UniversityMoscowRussia
  2. 2.INRIA Grenoble Rhône-AlpesFrance
  3. 3.Laboratoire Biométrie et Biologie EvolutiveUniversité Lyon 1LyonFrance
  4. 4.Department of Computer ScienceBen-Gurion University of the NegevBe’er ShevaIsrael
  5. 5.Laboratoire d’Informatique Gaspard MongeUniversité Paris-Est & CNRSParisFrance

Personalised recommendations