Skip to main content
Log in

Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Kitchens, B.P.: Symbolic Dynamics. One-Sided, Two-Sided and Countable State Markov Shifts. Springer, Springer (1998)

    MATH  Google Scholar 

  2. Hao, B.-L., Zheng, W.-M.: Applied Symbolic Dynamics and Chaos. World Scientific, Singapore (1998)

    Book  MATH  Google Scholar 

  3. Daw, C.S., Finney, C.E.A., Tracy, E.R.: A review of symbolic analysis of experimental data. Rev. Sci. Instrum. 74, 915 (2003)

    Article  ADS  Google Scholar 

  4. Berkolaiko, G., Kuchment, P.: Introduction to Quantum Graphs. American Mathematical Society, Providence (2012)

    Book  MATH  Google Scholar 

  5. de Bruijn, N.G.: A combinatorial problem. Indag. Math. 8, 461 (1946)

    MATH  MathSciNet  Google Scholar 

  6. Haussler, D., OBrien, S.J., et al.: Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 100, 659 (2008)

    Google Scholar 

  7. Gilbert, E.N., Riordan, J.: Symmetry types of periodic sequences. Illinois J. Math. 5, 657 (1961)

    MATH  MathSciNet  Google Scholar 

  8. Bailin, H., Huimin, X.: Factorizable language revisited from dynamics to biology. Int. J. Mod. Phys. B 21, 4077 (2007)

    Article  Google Scholar 

  9. Brida, J.G.: Symbolic time series analysis and economic regimes. Struct. Chang. Econ. Dyn. 14, 159 (2000)

    Article  Google Scholar 

  10. Bolshoy, A., Volkovich, Z., Kirzhner, V., Barzily, Z.: Genome Clustering: From Linguistic Models to Classification of Genetic Texts. Springer, Heidelberg (2010)

    Book  Google Scholar 

  11. Tabor, M.: The Surface of Section in “Chaos and Integrability in Nonlinear Dynamics: An Introduction. Wiley, New York (1989)

    MATH  Google Scholar 

  12. Ott, E.: Chaos in Dynamical Systems, sec edn, p. 77. Cambridge University Press, Cambridge (2002)

    Book  MATH  Google Scholar 

  13. Cvitanović, P., Artuso, R., Mainieri, R., Tanner, G., Vattay, G.: Chaos: Classical and Quantum. Niels Bohr Institute, Copenhagen (2012)

    Google Scholar 

  14. Auerbach, D., Cvitanović, P., Eckmann, J.-P., Gunaratne, G.: Exploring chaotic motion through periodic orbits. Phys. Rev. Lett. 23, 2387 (1987)

    Article  ADS  MathSciNet  Google Scholar 

  15. Artuso, R., Aurell, E., Cvitanović, P.: Recycling of strange sets: I. Cycle expansions, Nonlinearity 3 (1990) 325; Recycling of strange sets: II. Applications. Nonlinearity 3, 361 (1990)

  16. Gutzwiller, M.G.: Chaos in Classical and Quantum Mechanics. Springer, New York (1990)

    Book  MATH  Google Scholar 

  17. Sieber, M., Richter, K.: Correlations between periodic orbits and their role in spectral statistics. Phys. Scr. T90, 128 (2001)

    Article  ADS  Google Scholar 

  18. Heusler, S., Müller, S., Braun, P., Haake, F.: Universal spectral form factor for chaotic dynamics. J. Phys. A: Math. Gen 37, L31 (2004)

    Article  ADS  MATH  MathSciNet  Google Scholar 

  19. Haake, F.: Quantuim Signatures of Chaos, 3rd edn. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  20. Gutkin, B., Osipov, VAl: Clustering of periodic orbits in chaotic systems. Nonlinearity 26, 177 (2013)

    Article  ADS  MATH  MathSciNet  Google Scholar 

  21. Murtagh, F.: Identifying and exploiting ultrametricity. In: Baier, D., Becker, R., Schmidt-Thieme, L. (eds.) Data Analysis, Machine Learning and Applications, p. 263. Springer, Berlin (2008)

    Google Scholar 

  22. Sainte-Marie, C.F.: Solution to question nr. 48. L’intermédiaire des Mathématiciens 1, 107 (1894)

    Google Scholar 

  23. Compeau, P.E.C., Pevzner, P.A., Tesler, G.: How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987 (2011)

    Article  Google Scholar 

  24. Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49, 758 (1946)

    MATH  MathSciNet  Google Scholar 

  25. Good, I.J.: Normal recurring decimals. J. Lond. Math. Soc. 21, 167 (1946)

    Article  MATH  MathSciNet  Google Scholar 

  26. Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19, 336 (2009)

    Article  Google Scholar 

  27. Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27, 479 (2011)

    Article  Google Scholar 

  28. J. Pell, Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. In: Proceedings of the National Academy of Sciences, vol. 109, p. 13272 (2012)

  29. Chikhi, R., Limasset, A., Jackman, S., Simpson, J.T., Medvedev, P.: On the Representation of de Bruijn Graphs. Lecture Notes in Computer Science, p. 35. Springer, Berlin (2014)

    Google Scholar 

  30. Avetisov, V.A., Zhuravlev, YuN: An evolutionary interpretation of the \(p\)-adic ultrametric diffusion equation. Dokl. Math. 75, 453 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  31. Avetisov, V.A., Ivanov, V.A., Meshkov, D.A., Nechaev, S.K.: Fractal globules: a new approach to artificial molecular machines. Biophys. J. 107, 2361 (2014)

    Article  ADS  Google Scholar 

  32. Messer, P.W., Arndt, P.F., Lässig, M.: Solvable sequence evolution models and genomic correlations. Phys. Rev. Lett. 94, 138103 (2005)

    Article  ADS  Google Scholar 

  33. Dragovich, B., Dragovich, A.Y.: A \(p\)-adic model of DNA sequence and genetic code. p-Adic Numbers Ultramet. Anal. Appl. 1, 34 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  34. Gutkin, B., Osipov, VAl: Spectral problem of block-rectangular hierarchical matrices. J. Stat. Phys. 143, 72 (2011)

    Article  ADS  MATH  MathSciNet  Google Scholar 

  35. Kozyrev, S.V.: Wavelet analysis as a \(p\)-adic spectral analysis. Izvestia Akademii Nauk Seria Math. 66, 149 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  36. Kozyrev, S.V., Khrennikov, A.Y., Shelkovich, V.M.: \(p\)-Adic wavelets and their applications. Proc. Steklov Inst. Math. 285, 157 (2014)

    Article  MATH  Google Scholar 

  37. Navarro, G., Navarro, G.: Wavelet Trees for All. Lecture Notes in Computer Science, p. 2. Springer, Berlin (2014)

    MATH  Google Scholar 

  38. Stanley, R.P.: Enumerative Combinatorics, vol. 2. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  39. Weisstein, E.W.: Line Graph, From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/LineGraph.html

  40. Aardenne-Ehrenfest, T., Bruijn, N.G.: Circuits and trees in oriented linear graphs. Simon Stevin 28, 203 (1951)

    MATH  MathSciNet  Google Scholar 

  41. Rosenfeld, V.R.: Enumerating de Bruijn sequences. Commun. Math. Comput. Chem. 45, 71 (2002)

    MATH  MathSciNet  Google Scholar 

  42. Gutkin, B., Osipov, VAl: Clustering of periodic orbits and ensembles of truncated unitary matrices. J. Stat. Phys. 153, 1049 (2013)

    Article  ADS  MATH  MathSciNet  Google Scholar 

  43. Sharp, R.: Degeneracy in the length spectrum for metric graphs. Geom. Dedic. 149, 177 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  44. Tanner, G.: Spectral statistics for unitary transfer matrices of binary graphs. J. Phys. A 33, 3567 (2000)

    Article  ADS  MATH  MathSciNet  Google Scholar 

  45. Nagao, T., Braun, P., Müller, S., Saito, K., Heusler, S., Haake, F.: Semiclassical theory for parametric correlation of energy levels. J. Phys. A: Math. Theor. 40, 47 (2007)

    Article  ADS  MATH  MathSciNet  Google Scholar 

  46. Wan, Z., Xiong, R., Yu, M.: On the number of cycles of short length in the de Bruijn-Good graph \(G_n\). Discret. Math. 62, 85 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  47. Kapoor, S., Ramesh, H.: An algorithm for enumerating all spanning trees of a directed graph. Algorithmica 27, 120 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  48. Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147 (1950)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

Author thanks S.Nechaev, P.Braun, B.Gutkin, T.Guhr for useful discussion at the beginning of this work. The work was partially supported by KAW foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Al. Osipov.

Appendix 1: Representation of the Set \(\mathcal {X}_n^{ \mathcal {A}_2}\) on a Regularly Branching Tree

Appendix 1: Representation of the Set \(\mathcal {X}_n^{ \mathcal {A}_2}\) on a Regularly Branching Tree

Fig. 5
figure 5

Graphical representation of the set \(X_7^{ \mathcal {A}_2}\) by a binary tree. Only half of the tree (the sequences starting from 1) is plotted. The representatives of the factor sets \(\mathcal {X}_n^{ \mathcal {A}_2}\) (\(n=7,6,5,4,3,2,1\)) are marked by the circles on corresponding level of the tree. The nubers inside denote period of the corresponding sequence.

On the Fig. 5 we graphically represent structure of the factor set \(\mathcal {X}_7^{ \mathcal {A}_2}\) by a special choice of its representatives from the set \(X_7^{ \mathcal {A}_2}\). One can compare it with the structure of the same set (see Fig. 2) revealed by the ultrametric distance (8).

The representatives of \(\mathcal {X}_7^{ \mathcal {A}_2}\) are chosen to maximize the number \(1+\sum _{k=1}^n a_k 2^{n-k}\), i.e. the index of the basis vector (1). For the case of prime n one can offer a synthetic algorithm based on the following principles:

  1. 1.

    Sequences are classified by \(r=0,\dots ,n\), the lengths of the longest sub-string, \(L=[1,\dots ,1]\), and by the frequency f of appearance L in the sequence. The integer partition of n, then has the form

    $$\begin{aligned} \sum _{j=1}^{s}\nu _j (r+m_j)=n,\qquad \sum _{j=1}^{s}\nu _j=f. \end{aligned}$$

    Let \(L_j\) be the sequence L followed by arbitrary string of the length \(m_j\), as on the scheme

    $$\begin{aligned} \underbrace{ \underbrace{1\dots 1}_{r} \underbrace{0\dots 0}_{m_1}}_{L_1} \underbrace{\underbrace{1\dots 1}_{r} \underbrace{0\dots 0}_{m_2}}_{L_2}\dots \end{aligned}$$
  2. 2.

    Since n is prime, there are at least two non-equal \(\nu _j\) entering the integer partition of n. Let \(\nu _{j_0}=\min _j\left\{ \nu _j\right\} \), then there is at least one \(\nu _{j_1}>\nu _{j_0}\), such that \(\nu _{j_1}\) is not divisible by \(\nu _{j_0}\). Therefore one can organize a set of cyclic sequences constructed from two letters \(L_{j_0}\) and \(L_{j_1}\), such that they have no other periods except \(\nu _{j_0}+\nu _{j_1}\), for instance

    $$\begin{aligned} \underbrace{L_{j_0}\dots L_{j_0}}_{\nu _{j_0}}\underbrace{L_{j_1}\dots L_{j_1}}_{\nu _{j_1}}. \end{aligned}$$
  3. 3.

    To obtain the representatives one should choose any suitable strategy of ordering of \(\nu _{j_1}\) pieces between \(\nu _{j_0}\) places, such that the resulting sequences of \(L_{j_0}\)’s and \(L_{j_1}\)’s cannot be obtained one from the other by any rotation and, finally, one has to and fill up the yet empty places within the strings \(L_j\) by all possible ways.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Osipov, V.A. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences. J Stat Phys 164, 142–165 (2016). https://doi.org/10.1007/s10955-016-1537-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-016-1537-5

Keywords

Navigation