Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs

Salikhov, Kamil; Sacomoto, Gustavo; Kucherov, Gregory

doi:10.1007/978-3-642-40453-5_28

Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs

Kamil Salikhov²¹,
Gustavo Sacomoto^22,23 &
Gregory Kucherov^24,25

Conference paper

2070 Accesses
21 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8126))

Abstract

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing (NGS) data. Due to the very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of Chikhi and Rizk (WABI, 2012) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to their method. This is, to our knowledge, the best practical representation for de Bruijn graphs.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blattner, F.R., Plunkett, G., Bloch, C.A., et al.: The complete genome sequence of Escherichia coli k-12. Science 277(5331), 1453–1462 (1997)
Article Google Scholar
Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012)
Chapter Google Scholar
Chikhi, R., Rizk, G.: Space-efficient and exact de bruijn graph representation based on a bloom filter. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 236–248. Springer, Heidelberg (2012)
Chapter Google Scholar
Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27(4), 479–486 (2011)
Article Google Scholar
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotech. 29(7), 644–652 (2011)
Article Google Scholar
Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., McVean, G.: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44(2), 226–232 (2012)
Article Google Scholar
Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: Building a better bloom filter. Random Struct. Algorithms 33(2), 187–218 (2008)
Article MathSciNet MATH Google Scholar
Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics 95(6), 315–327 (2010)
Article Google Scholar
Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. U.S.A. 109(33), 13272–13277 (2012)
Article MathSciNet MATH Google Scholar
Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27(13), i94–i101 (2011)
Google Scholar
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98(17), 9748–9753 (2001)
Article MathSciNet MATH Google Scholar
Porat, E.: An optimal Bloom filter replacement based on matrix solving. In: Frid, A., Morozov, A., Rybalchenko, A., Wagner, K.W. (eds.) CSR 2009. LNCS, vol. 5675, pp. 263–273. Springer, Heidelberg (2009)
Chapter Google Scholar
Rizk, G., Lavenier, D., Chikhi, R.: DSK: k-mer counting with very low memory usage. Bioinformatics (2013)
Google Scholar
Sacomoto, G., Kielbassa, J., Chikhi, R., Uricaru, R., et al.: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. BMC Bioinformatics 13(suppl. 6), S5 (2012)
Google Scholar
Ye, C., Ma, Z., Cannon, C., Pop, M., Yu, D.: Exploiting sparseness in de novo genome assembly. BMC Bioinformatics 13(suppl. 6), S1 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Lomonosov Moscow State University, Moscow, Russia
Kamil Salikhov
INRIA Grenoble Rhône-Alpes, France
Gustavo Sacomoto
Laboratoire Biométrie et Biologie Evolutive, Université Lyon 1, Lyon, France
Gustavo Sacomoto
Department of Computer Science, Ben-Gurion University of the Negev, Be’er Sheva, Israel
Gregory Kucherov
Laboratoire d’Informatique Gaspard Monge, Université Paris-Est & CNRS, Marne-la-Vallée, Paris, France
Gregory Kucherov

Authors

Kamil Salikhov
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Sacomoto
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Kucherov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ithree institute,, University of Technology Sydney, 2007, Ultimo, NSW, Australia
Aaron Darling
Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, Germany
Jens Stoye

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salikhov, K., Sacomoto, G., Kucherov, G. (2013). Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-40453-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40452-8
Online ISBN: 978-3-642-40453-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics