Abstract
A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny fraction of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a laptop computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm’s potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is available (in C++ and python) at https://github.com/lucaparmigiani/rnd_dbseq.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that in this tradition, an algorithm which runs in time and space \(\mathcal{O}(n)\) is considered exponential, since it is exponential in k; however, if one wants to output or even store the sequence, then it is de facto optimal.
- 2.
The standard permutation is also called LF-mapping if s is the BWT of some string.
- 3.
In particular, Higgins uses the term necklace in a non-standard meaning.
References
Aardenne-Ehrenfest, T.v., Bruijn, N.G.d.: Circuits and trees in oriented linear graphs. Simon Stevin, Wisen Natuurkundig Tijdschrift 28, 203–217 (1951)
Aguirre, G.K., Mattar, M.G., Magis-Weinberg, L.: De Bruijn cycles for neural decoding. Neuroimage 56(3), 1293–1300 (2011)
Ben-Dor, A., Karp, R., Schwikowski, B., Yakhini, Z.: Universal DNA tag systems: a combinatorial design scheme. J. Comp. Biol. 7(3/4), 503–519 (2000)
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Colbourn, C.J., Myrvold, W.J., Neufeld, E.: Two algorithms for unranking arborescences. J. Algorithms 20(2), 268–281 (1996)
Cooper, J.N., Heitsch, C.E.: The discrepancy of the lex-least de Bruijn sequence. Discret. Math. 310(6–7), 1152–1159 (2010)
de Bruijn, N.G.: A combinatorial problem. Proc. Sect. Sci. 49(7), 758–764 (1946)
Durfee, D., Kyng, R., Peebles, J., Rao, A.B., Sachdeva, S.: Sampling random spanning trees faster than matrix multiplication. In: Hatami, H., McKenzie, P., King, V. (eds.) Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pp. 730–742. ACM (2017)
Emerson, P.L., Tobias, R.D.: Computer program for quasi-random stimulus sequences with equal transition frequencies. Behav. Res. Methods Instrum. Comput. 27(1), 88–98 (1995)
Fleury, P.-H.: Deux problèmes de géométrie de situation. J. Mathématiq. élément. 2, 257–261 (1883)
Fredricksen, H.: A survey of full length nonlinear shift register cycle algorithms. SIAM Rev. 24(2), 195–221 (1982)
Gabric, D., Sawada, J.: Investigating the discrepancy property of de Bruijn sequences. Discret. Math. 345(4), 112780 (2022)
Gabric, D., Sawada, J., Williams, A., Wong, D.: A framework for constructing de Bruijn sequences via simple successor rules. Discret. Math. 341(11), 2977–2987 (2018)
Giuliani, S., Lipták, Zs., Masillo, F., Rizzi, R.: When a dollar makes a BWT. Theor. Comput. Sci. 857, 123–146 (2021)
Golomb, S.: Shift Register Sequences, 3rd edn. World Scientific (2016)
Higgins, P.M.: Burrows-Wheeler transformations and de Bruijn words. Theor. Comput. Sci. 457, 128–136 (2012)
Huang, Y.: A new algorithm for the generation of binary de Bruijn sequences. J. Algorithm. 11(1), 44–51 (1990)
Jansen, C.J., Boekee, D.E.: An efficient algorithm for the generation of DeBruijn cycles. IEEE Trans. Inf. Theory 37(5), 1475–1478 (1991)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press (2002)
Mandal, K., Gong, G.: Cryptographically strong de Bruijn sequences with large periods. In: Knudsen, L.R., Wu, H. (eds.) Selected Areas in Cryptography: 19th International Conference, SAC 2012, pp. 104–118. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6_8
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler Transform. Theor. Comput. Sci. 387(3), 298–312 (2007)
Navarro, G.: Compact Data Structures: A Practical Approach. Cambridge University Press (2016)
Perrin, D., Restivo, A.: Words. In: Bóna, M. (ed.) Enumerative Combinatorics, chapter 8, pp. 485–540. CRC Press (2015)
Philippakis, A., Qureshi, A.M., Berger, M.F., Bulyk, M.L.: Design of compact, universal DNA microarrays for protein binding microarray experiments. J. Comp. Biol. 15(7), 655–665 (2008)
Sawada, J.: De Bruijn sequence and universal cycle constructions. https://debruijnsequence.org
Sohn, H.-S., Bricker, D.L., Simon, J.R., Hsieh, Y.-C.: Optimal sequences of trials for balancing practice and repetition effects. Behav. Res. Methods Instrum. Comput. 29(4), 574–581 (1997)
Tarjan, R.E., van Leeuwen, J.: Worst-case analysis of set union algorithms. J. ACM 31(2), 245–281 (1984)
Turan, M.S.: Evolutionary construction of de Bruijn sequences. In: Proceedings of ACM-AISec, pp. 81–86 (2011)
Yang, B., Mandal, K., Aagaard, M.D., Gong, G.: Efficient composited de Bruijn sequence generators. IEEE Trans. Computers 66(8), 1354–1368 (2017)
Zhu, Y., Chang, Z., Ezerman, M.F., Wang, Q.: An efficiently generated family of binary de Bruijn sequences. Discret. Math. 344(6), 112368 (2021)
Acknowledgements
ZsL would like to thank Joe Sawada for awakening her interest in de Bruijn sequences. We thank the anonymous reviewers for some insightful suggestions, and the participants of the Monday Meetings of the Algorithms Group of Verona University for useful discussions. This work has been supported in part by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956229 and by the MUR PRIN Project ‘PINC, Pangenome INformatiCs: from Theory to Applications’ (Grant No. 2022YRB97K).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lipták, Z., Parmigiani, L. (2024). A BWT-Based Algorithm for Random de Bruijn Sequence Construction. In: Soto, J.A., Wiese, A. (eds) LATIN 2024: Theoretical Informatics. LATIN 2024. Lecture Notes in Computer Science, vol 14578. Springer, Cham. https://doi.org/10.1007/978-3-031-55598-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-55598-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55597-8
Online ISBN: 978-3-031-55598-5
eBook Packages: Computer ScienceComputer Science (R0)