Abstract
The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.
Similar content being viewed by others
References
Kitchens, B.P.: Symbolic Dynamics. One-Sided, Two-Sided and Countable State Markov Shifts. Springer, Springer (1998)
Hao, B.-L., Zheng, W.-M.: Applied Symbolic Dynamics and Chaos. World Scientific, Singapore (1998)
Daw, C.S., Finney, C.E.A., Tracy, E.R.: A review of symbolic analysis of experimental data. Rev. Sci. Instrum. 74, 915 (2003)
Berkolaiko, G., Kuchment, P.: Introduction to Quantum Graphs. American Mathematical Society, Providence (2012)
de Bruijn, N.G.: A combinatorial problem. Indag. Math. 8, 461 (1946)
Haussler, D., OBrien, S.J., et al.: Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 100, 659 (2008)
Gilbert, E.N., Riordan, J.: Symmetry types of periodic sequences. Illinois J. Math. 5, 657 (1961)
Bailin, H., Huimin, X.: Factorizable language revisited from dynamics to biology. Int. J. Mod. Phys. B 21, 4077 (2007)
Brida, J.G.: Symbolic time series analysis and economic regimes. Struct. Chang. Econ. Dyn. 14, 159 (2000)
Bolshoy, A., Volkovich, Z., Kirzhner, V., Barzily, Z.: Genome Clustering: From Linguistic Models to Classification of Genetic Texts. Springer, Heidelberg (2010)
Tabor, M.: The Surface of Section in “Chaos and Integrability in Nonlinear Dynamics: An Introduction. Wiley, New York (1989)
Ott, E.: Chaos in Dynamical Systems, sec edn, p. 77. Cambridge University Press, Cambridge (2002)
Cvitanović, P., Artuso, R., Mainieri, R., Tanner, G., Vattay, G.: Chaos: Classical and Quantum. Niels Bohr Institute, Copenhagen (2012)
Auerbach, D., Cvitanović, P., Eckmann, J.-P., Gunaratne, G.: Exploring chaotic motion through periodic orbits. Phys. Rev. Lett. 23, 2387 (1987)
Artuso, R., Aurell, E., Cvitanović, P.: Recycling of strange sets: I. Cycle expansions, Nonlinearity 3 (1990) 325; Recycling of strange sets: II. Applications. Nonlinearity 3, 361 (1990)
Gutzwiller, M.G.: Chaos in Classical and Quantum Mechanics. Springer, New York (1990)
Sieber, M., Richter, K.: Correlations between periodic orbits and their role in spectral statistics. Phys. Scr. T90, 128 (2001)
Heusler, S., Müller, S., Braun, P., Haake, F.: Universal spectral form factor for chaotic dynamics. J. Phys. A: Math. Gen 37, L31 (2004)
Haake, F.: Quantuim Signatures of Chaos, 3rd edn. Springer, Heidelberg (2010)
Gutkin, B., Osipov, VAl: Clustering of periodic orbits in chaotic systems. Nonlinearity 26, 177 (2013)
Murtagh, F.: Identifying and exploiting ultrametricity. In: Baier, D., Becker, R., Schmidt-Thieme, L. (eds.) Data Analysis, Machine Learning and Applications, p. 263. Springer, Berlin (2008)
Sainte-Marie, C.F.: Solution to question nr. 48. L’intermédiaire des Mathématiciens 1, 107 (1894)
Compeau, P.E.C., Pevzner, P.A., Tesler, G.: How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987 (2011)
Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49, 758 (1946)
Good, I.J.: Normal recurring decimals. J. Lond. Math. Soc. 21, 167 (1946)
Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19, 336 (2009)
Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27, 479 (2011)
J. Pell, Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J.M., Brown, C.T.: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. In: Proceedings of the National Academy of Sciences, vol. 109, p. 13272 (2012)
Chikhi, R., Limasset, A., Jackman, S., Simpson, J.T., Medvedev, P.: On the Representation of de Bruijn Graphs. Lecture Notes in Computer Science, p. 35. Springer, Berlin (2014)
Avetisov, V.A., Zhuravlev, YuN: An evolutionary interpretation of the \(p\)-adic ultrametric diffusion equation. Dokl. Math. 75, 453 (2007)
Avetisov, V.A., Ivanov, V.A., Meshkov, D.A., Nechaev, S.K.: Fractal globules: a new approach to artificial molecular machines. Biophys. J. 107, 2361 (2014)
Messer, P.W., Arndt, P.F., Lässig, M.: Solvable sequence evolution models and genomic correlations. Phys. Rev. Lett. 94, 138103 (2005)
Dragovich, B., Dragovich, A.Y.: A \(p\)-adic model of DNA sequence and genetic code. p-Adic Numbers Ultramet. Anal. Appl. 1, 34 (2009)
Gutkin, B., Osipov, VAl: Spectral problem of block-rectangular hierarchical matrices. J. Stat. Phys. 143, 72 (2011)
Kozyrev, S.V.: Wavelet analysis as a \(p\)-adic spectral analysis. Izvestia Akademii Nauk Seria Math. 66, 149 (2002)
Kozyrev, S.V., Khrennikov, A.Y., Shelkovich, V.M.: \(p\)-Adic wavelets and their applications. Proc. Steklov Inst. Math. 285, 157 (2014)
Navarro, G., Navarro, G.: Wavelet Trees for All. Lecture Notes in Computer Science, p. 2. Springer, Berlin (2014)
Stanley, R.P.: Enumerative Combinatorics, vol. 2. Cambridge University Press, Cambridge (1999)
Weisstein, E.W.: Line Graph, From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/LineGraph.html
Aardenne-Ehrenfest, T., Bruijn, N.G.: Circuits and trees in oriented linear graphs. Simon Stevin 28, 203 (1951)
Rosenfeld, V.R.: Enumerating de Bruijn sequences. Commun. Math. Comput. Chem. 45, 71 (2002)
Gutkin, B., Osipov, VAl: Clustering of periodic orbits and ensembles of truncated unitary matrices. J. Stat. Phys. 153, 1049 (2013)
Sharp, R.: Degeneracy in the length spectrum for metric graphs. Geom. Dedic. 149, 177 (2010)
Tanner, G.: Spectral statistics for unitary transfer matrices of binary graphs. J. Phys. A 33, 3567 (2000)
Nagao, T., Braun, P., Müller, S., Saito, K., Heusler, S., Haake, F.: Semiclassical theory for parametric correlation of energy levels. J. Phys. A: Math. Theor. 40, 47 (2007)
Wan, Z., Xiong, R., Yu, M.: On the number of cycles of short length in the de Bruijn-Good graph \(G_n\). Discret. Math. 62, 85 (1986)
Kapoor, S., Ramesh, H.: An algorithm for enumerating all spanning trees of a directed graph. Algorithmica 27, 120 (2000)
Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147 (1950)
Acknowledgments
Author thanks S.Nechaev, P.Braun, B.Gutkin, T.Guhr for useful discussion at the beginning of this work. The work was partially supported by KAW foundation.
Author information
Authors and Affiliations
Corresponding author
Appendix 1: Representation of the Set \(\mathcal {X}_n^{ \mathcal {A}_2}\) on a Regularly Branching Tree
Appendix 1: Representation of the Set \(\mathcal {X}_n^{ \mathcal {A}_2}\) on a Regularly Branching Tree
On the Fig. 5 we graphically represent structure of the factor set \(\mathcal {X}_7^{ \mathcal {A}_2}\) by a special choice of its representatives from the set \(X_7^{ \mathcal {A}_2}\). One can compare it with the structure of the same set (see Fig. 2) revealed by the ultrametric distance (8).
The representatives of \(\mathcal {X}_7^{ \mathcal {A}_2}\) are chosen to maximize the number \(1+\sum _{k=1}^n a_k 2^{n-k}\), i.e. the index of the basis vector (1). For the case of prime n one can offer a synthetic algorithm based on the following principles:
-
1.
Sequences are classified by \(r=0,\dots ,n\), the lengths of the longest sub-string, \(L=[1,\dots ,1]\), and by the frequency f of appearance L in the sequence. The integer partition of n, then has the form
$$\begin{aligned} \sum _{j=1}^{s}\nu _j (r+m_j)=n,\qquad \sum _{j=1}^{s}\nu _j=f. \end{aligned}$$Let \(L_j\) be the sequence L followed by arbitrary string of the length \(m_j\), as on the scheme
$$\begin{aligned} \underbrace{ \underbrace{1\dots 1}_{r} \underbrace{0\dots 0}_{m_1}}_{L_1} \underbrace{\underbrace{1\dots 1}_{r} \underbrace{0\dots 0}_{m_2}}_{L_2}\dots \end{aligned}$$ -
2.
Since n is prime, there are at least two non-equal \(\nu _j\) entering the integer partition of n. Let \(\nu _{j_0}=\min _j\left\{ \nu _j\right\} \), then there is at least one \(\nu _{j_1}>\nu _{j_0}\), such that \(\nu _{j_1}\) is not divisible by \(\nu _{j_0}\). Therefore one can organize a set of cyclic sequences constructed from two letters \(L_{j_0}\) and \(L_{j_1}\), such that they have no other periods except \(\nu _{j_0}+\nu _{j_1}\), for instance
$$\begin{aligned} \underbrace{L_{j_0}\dots L_{j_0}}_{\nu _{j_0}}\underbrace{L_{j_1}\dots L_{j_1}}_{\nu _{j_1}}. \end{aligned}$$ -
3.
To obtain the representatives one should choose any suitable strategy of ordering of \(\nu _{j_1}\) pieces between \(\nu _{j_0}\) places, such that the resulting sequences of \(L_{j_0}\)’s and \(L_{j_1}\)’s cannot be obtained one from the other by any rotation and, finally, one has to and fill up the yet empty places within the strings \(L_j\) by all possible ways.
Rights and permissions
About this article
Cite this article
Osipov, V.A. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences. J Stat Phys 164, 142–165 (2016). https://doi.org/10.1007/s10955-016-1537-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-016-1537-5