Skip to main content

Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8126))

Abstract

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing (NGS) data. Due to the very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of Chikhi and Rizk (WABI, 2012) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to their method. This is, to our knowledge, the best practical representation for de Bruijn graphs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blattner, F.R., Plunkett, G., Bloch, C.A., et al.: The complete genome sequence of Escherichia coli k-12. Science 277(5331), 1453–1462 (1997)

    Article  Google Scholar 

  2. Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Chikhi, R., Rizk, G.: Space-efficient and exact de bruijn graph representation based on a bloom filter. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 236–248. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27(4), 479–486 (2011)

    Article  Google Scholar 

  5. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotech. 29(7), 644–652 (2011)

    Article  Google Scholar 

  6. Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44(2), 226–232 (2012)

    Article  Google Scholar 

  7. Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: Building a better bloom filter. Random Struct. Algorithms 33(2), 187–218 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  8. Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics 95(6), 315–327 (2010)

    Article  Google Scholar 

  9. Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. U.S.A. 109(33), 13272–13277 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)

    Google Scholar 

  11. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98(17), 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. Porat, E.: An optimal Bloom filter replacement based on matrix solving. In: Frid, A., Morozov, A., Rybalchenko, A., Wagner, K.W. (eds.) CSR 2009. LNCS, vol. 5675, pp. 263–273. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Rizk, G., Lavenier, D., Chikhi, R.: DSK: k-mer counting with very low memory usage. Bioinformatics (2013)

    Google Scholar 

  14. Sacomoto, G., Kielbassa, J., Chikhi, R., Uricaru, R., et al.: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinformatics 13(suppl. 6), S5 (2012)

    Google Scholar 

  15. Ye, C., Ma, Z., Cannon, C., Pop, M., Yu, D.: Exploiting sparseness in de novo genome assembly. BMC Bioinformatics 13(suppl. 6), S1 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Salikhov, K., Sacomoto, G., Kucherov, G. (2013). Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40453-5_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40452-8

  • Online ISBN: 978-3-642-40453-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics