Skip to main content

Bidirectional Variable-Order de Bruijn Graphs

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9644)

Abstract

Implementing de Bruijn graphs compactly is an important problem because of their role in genome assembly. There are currently two main approaches, one using Bloom filters and the other using a kind of Burrows-Wheeler Transform on the edge labels of the graph. The second representation is more elegant and can even handle many graph-orders at once, but it does not cleanly support traversing edges backwards or inserting new nodes or edges. In this paper we resolve the first of these issues and partially address the second.

Keywords

  • Static Structure
  • Current Node
  • Outgoing Edge
  • Distribute Hash Table
  • Bloom Filter

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supported by the Academy of Finland through grants 258308, 268324, and 284598, and Italian MIUR PRIN 2010–2011 grant Automi e Linguaggi Formali: Aspetti Matematici e Applicativi (code 2010LYA9RH). This work was done while the fourth author was visiting the University of Helsinki.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-662-49529-2_13
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-662-49529-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    Note that there are at most four possible outgoing edges, corresponding to each letter of the DNA alphabet {A, C, G, T}.

  2. 2.

    In the original (single-order) BOSS representation, one maintains a sequence of characters from alphabet \(2\sigma \) and a bitvecor. The sequence is built by sorting the k-mers (nodes) in lexicographic order and for each k-mer storing the characters labelling its outgoing edges. The bitvector is used to mark the beginning of the labels of each k-mer in the sequence. We additionally use another bitvector to mark the beginning of each block of edges outgoing from a group of nodes whose labels have the same prefix of length \(k-1\). Queries, and insertions or deletions of nodes and edges can then easily be done by rank and select queries and insertions, deletions in the two bitvectors and the sequence (the reader can easily guess the details by looking at the original BOSS representation [4]). The total time for an update or query will be \(O(\log m/\log \log m)\).

References

  1. Bankevich, A., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)

    MathSciNet  CrossRef  Google Scholar 

  2. Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  3. Boucher, C., Bowe, A., Gagie, T., Puglisi, S.J., Sadakane, K.: Variable-order de Bruijn graphs. In: Proceedings of the Data Compression Conference (DCC), pp. 383–392. IEEE (2015)

    Google Scholar 

  4. Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  5. Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  6. Butler, J., et al.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18(5), 810–820 (2008)

    CrossRef  Google Scholar 

  7. Chikhi, R., Limasset, A., Jackman, S., Simpson, J.T., Medvedev, P.: On the representation of de Bruijn graphs. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 35–55. Springer, Heidelberg (2014)

    CrossRef  Google Scholar 

  8. Chikhi, R., Rizk, G.: Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithm. Mol. Biol. 8(22) (2012)

    Google Scholar 

  9. Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27(4), 479–486 (2011)

    CrossRef  Google Scholar 

  10. Haussler, D., et al.: Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 100(6), 659–674 (2009)

    CrossRef  Google Scholar 

  11. Holley, G., Wittler, R., Stoye, J.: Bloom filter trie – a data structure for pan-genome storage. In: Pop, M., Touzet, H. (eds.) WABI 2015. LNCS, vol. 9289, pp. 217–230. Springer, Heidelberg (2015)

    CrossRef  Google Scholar 

  12. Hon, W.-K., Sadakane, K.: Space-economical algorithms for finding maximal unique matches. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 144–152. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  13. Li, R., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20(2), 265–272 (2010)

    CrossRef  Google Scholar 

  14. Li, R., Yu, C., Li, Y., Lam, T.-W., Yiu, S.-M., Kristiansen, K., Wang, J.: SOAP2. Bioinformatics 25(15), 1966–1967 (2009)

    CrossRef  Google Scholar 

  15. Munro, J.I., Nekrich, Y.: Compressed data structures for dynamic sequences. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 891–902. Springer, Heidelberg (2015)

    Google Scholar 

  16. Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM J. Comput. 43(5), 1781–1806 (2014)

    MathSciNet  CrossRef  MATH  Google Scholar 

  17. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: ALENEX, pp. 60–70 (2007)

    Google Scholar 

  18. Ossowski, S., et al.: Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 18(12), 2024–2033 (2008)

    CrossRef  Google Scholar 

  19. Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Nat. Acad. Sci. 109(33), 13272–13277 (2012)

    MathSciNet  CrossRef  MATH  Google Scholar 

  20. Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – a practical iterative de Bruijn graph de novo assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  21. Salikhov, K., Sacomoto, G., Kucherov, G.: Using cascading Bloom filters to improve the memory usage for de Bruijn graphs. Algorithms Mol. Biol. 9(2) (2014)

    Google Scholar 

  22. Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  23. Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)

    CrossRef  Google Scholar 

  24. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012)

    Google Scholar 

  25. Turnbaugh, P.J., et al.: The human microbiome project: exploring the microbial part of ourselves in a changing world. Nature 449(7164), 804–810 (2007)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Travis Gagie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Belazzougui, D., Gagie, T., Mäkinen, V., Previtali, M., Puglisi, S.J. (2016). Bidirectional Variable-Order de Bruijn Graphs. In: Kranakis, E., Navarro, G., Chávez, E. (eds) LATIN 2016: Theoretical Informatics. LATIN 2016. Lecture Notes in Computer Science(), vol 9644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49529-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49529-2_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49528-5

  • Online ISBN: 978-3-662-49529-2

  • eBook Packages: Computer ScienceComputer Science (R0)