Skip to main content

A BWT-Based Algorithm for Random de Bruijn Sequence Construction

  • Conference paper
  • First Online:
LATIN 2024: Theoretical Informatics (LATIN 2024)

Abstract

A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny fraction of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a laptop computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm’s potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is available (in C++ and python) at https://github.com/lucaparmigiani/rnd_dbseq.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that in this tradition, an algorithm which runs in time and space \(\mathcal{O}(n)\) is considered exponential, since it is exponential in k; however, if one wants to output or even store the sequence, then it is de facto optimal.

  2. 2.

    The standard permutation is also called LF-mapping if s is the BWT of some string.

  3. 3.

    In particular, Higgins uses the term necklace in a non-standard meaning.

References

  1. Aardenne-Ehrenfest, T.v., Bruijn, N.G.d.: Circuits and trees in oriented linear graphs. Simon Stevin, Wisen Natuurkundig Tijdschrift 28, 203–217 (1951)

    Google Scholar 

  2. Aguirre, G.K., Mattar, M.G., Magis-Weinberg, L.: De Bruijn cycles for neural decoding. Neuroimage 56(3), 1293–1300 (2011)

    Article  Google Scholar 

  3. Ben-Dor, A., Karp, R., Schwikowski, B., Yakhini, Z.: Universal DNA tag systems: a combinatorial design scheme. J. Comp. Biol. 7(3/4), 503–519 (2000)

    Google Scholar 

  4. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  5. Colbourn, C.J., Myrvold, W.J., Neufeld, E.: Two algorithms for unranking arborescences. J. Algorithms 20(2), 268–281 (1996)

    Article  MathSciNet  Google Scholar 

  6. Cooper, J.N., Heitsch, C.E.: The discrepancy of the lex-least de Bruijn sequence. Discret. Math. 310(6–7), 1152–1159 (2010)

    Article  MathSciNet  Google Scholar 

  7. de Bruijn, N.G.: A combinatorial problem. Proc. Sect. Sci. 49(7), 758–764 (1946)

    Google Scholar 

  8. Durfee, D., Kyng, R., Peebles, J., Rao, A.B., Sachdeva, S.: Sampling random spanning trees faster than matrix multiplication. In: Hatami, H., McKenzie, P., King, V. (eds.) Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pp. 730–742. ACM (2017)

    Google Scholar 

  9. Emerson, P.L., Tobias, R.D.: Computer program for quasi-random stimulus sequences with equal transition frequencies. Behav. Res. Methods Instrum. Comput. 27(1), 88–98 (1995)

    Article  Google Scholar 

  10. Fleury, P.-H.: Deux problèmes de géométrie de situation. J. Mathématiq. élément. 2, 257–261 (1883)

    Google Scholar 

  11. Fredricksen, H.: A survey of full length nonlinear shift register cycle algorithms. SIAM Rev. 24(2), 195–221 (1982)

    Article  MathSciNet  Google Scholar 

  12. Gabric, D., Sawada, J.: Investigating the discrepancy property of de Bruijn sequences. Discret. Math. 345(4), 112780 (2022)

    Article  MathSciNet  Google Scholar 

  13. Gabric, D., Sawada, J., Williams, A., Wong, D.: A framework for constructing de Bruijn sequences via simple successor rules. Discret. Math. 341(11), 2977–2987 (2018)

    Article  MathSciNet  Google Scholar 

  14. Giuliani, S., Lipták, Zs., Masillo, F., Rizzi, R.: When a dollar makes a BWT. Theor. Comput. Sci. 857, 123–146 (2021)

    Google Scholar 

  15. Golomb, S.: Shift Register Sequences, 3rd edn. World Scientific (2016)

    Google Scholar 

  16. Higgins, P.M.: Burrows-Wheeler transformations and de Bruijn words. Theor. Comput. Sci. 457, 128–136 (2012)

    Article  MathSciNet  Google Scholar 

  17. Huang, Y.: A new algorithm for the generation of binary de Bruijn sequences. J. Algorithm. 11(1), 44–51 (1990)

    Article  MathSciNet  Google Scholar 

  18. Jansen, C.J., Boekee, D.E.: An efficient algorithm for the generation of DeBruijn cycles. IEEE Trans. Inf. Theory 37(5), 1475–1478 (1991)

    Article  Google Scholar 

  19. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press (2002)

    Book  Google Scholar 

  20. Mandal, K., Gong, G.: Cryptographically strong de Bruijn sequences with large periods. In: Knudsen, L.R., Wu, H. (eds.) Selected Areas in Cryptography: 19th International Conference, SAC 2012, pp. 104–118. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_8

  21. Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler Transform. Theor. Comput. Sci. 387(3), 298–312 (2007)

    Article  MathSciNet  Google Scholar 

  22. Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press (2016)

    Google Scholar 

  23. Perrin, D., Restivo, A.: Words. In: Bóna, M. (ed.) Enumerative Combinatorics, chapter 8, pp. 485–540. CRC Press (2015)

    Google Scholar 

  24. Philippakis, A., Qureshi, A.M., Berger, M.F., Bulyk, M.L.: Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comp. Biol. 15(7), 655–665 (2008)

    Article  MathSciNet  Google Scholar 

  25. Sawada, J.: De Bruijn sequence and universal cycle constructions. https://debruijnsequence.org

  26. Sohn, H.-S., Bricker, D.L., Simon, J.R., Hsieh, Y.-C.: Optimal sequences of trials for balancing practice and repetition effects. Behav. Res. Methods Instrum. Comput. 29(4), 574–581 (1997)

    Article  Google Scholar 

  27. Tarjan, R.E., van Leeuwen, J.: Worst-case analysis of set union algorithms. J. ACM 31(2), 245–281 (1984)

    Article  MathSciNet  Google Scholar 

  28. Turan, M.S.: Evolutionary construction of de Bruijn sequences. In: Proceedings of ACM-AISec, pp. 81–86 (2011)

    Google Scholar 

  29. Yang, B., Mandal, K., Aagaard, M.D., Gong, G.: Efficient composited de Bruijn sequence generators. IEEE Trans. Computers 66(8), 1354–1368 (2017)

    Article  MathSciNet  Google Scholar 

  30. Zhu, Y., Chang, Z., Ezerman, M.F., Wang, Q.: An efficiently generated family of binary de Bruijn sequences. Discret. Math. 344(6), 112368 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

ZsL would like to thank Joe Sawada for awakening her interest in de Bruijn sequences. We thank the anonymous reviewers for some insightful suggestions, and the participants of the Monday Meetings of the Algorithms Group of Verona University for useful discussions. This work has been supported in part by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956229 and by the MUR PRIN Project ‘PINC, Pangenome INformatiCs: from Theory to Applications’ (Grant No. 2022YRB97K).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zsuzsanna Lipták .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lipták, Z., Parmigiani, L. (2024). A BWT-Based Algorithm for Random de Bruijn Sequence Construction. In: Soto, J.A., Wiese, A. (eds) LATIN 2024: Theoretical Informatics. LATIN 2024. Lecture Notes in Computer Science, vol 14578. Springer, Cham. https://doi.org/10.1007/978-3-031-55598-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-55598-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-55597-8

  • Online ISBN: 978-3-031-55598-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics