Abstract
We study the random variable Y n representing the number of occurrences of a given symbol in a word of length n generated at random. The stochastic model we assume is a simple non-ergodic model defined by the product of two primitive rational formal series, which form two distinct ergodic components. We obtain asymptotic evaluations for the mean and the variance of Y n and its limit distribution. It turns out that there are two main cases: if one component is dominant and non-degenerate we get a Gaussian limit distribution; if the two components are equipotent and have different leading terms of the mean, we get a uniform limit distribution. Other particular limit distributions are obtained in the case of a degenerate dominant component and in the equipotent case when the leading terms of the expectation values are equal.
This work has been supported by the Project M.I.U.R. COFIN “Formal languages and automata: theory and applications”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. A. Bender and F. Kochman. The distribution of subword counts is usually normal. European Journal of Combinatorics, 14:265–275, 1993.
J. Berstel and C. Reutenauer. Rational series and their languages, Springer-Verlag, New York — Heidelberg — Berlin, 1988.
A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. On the number of occurrences of a symbol in words of regular languages. Rapporto Interno n. 274-02, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, February 2002 (to appear in TCS).
A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. The symbol-periodicity of irreducible finite automata. Rapporto Interno n. 277-02, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, April 2002 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).
D. de Falco, M. Goldwurm, and V. Lonati. Frequency of symbol occurrences in simple non-primitive stochastic models. Rapporto Interno n. 287-03, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, February 2003 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).
A. Denise. Génération aléatoire uniforme de mots de langages rationnels. Theoretical Computer Science, 159:43–63, 1996.
J. Fickett. Recognition of protein coding regions in DNA sequences. Nucleic Acid Res, 10:5303–5318, 1982.
P. Flajolet and R. Sedgewick. The average case analysis of algorithms: multivariate asymptotics and limit distributions. Rapport de recherche n. 3162, INRIA Rocquencourt, May 1997.
M.S. Gelfand. Prediction of function in DNA sequence analysis. J. Comput. Biol., 2:87–117, 1995.
L.J. Guibas and A. M. Odlyzko. Maximal prefix-synchronized codes. SIAM J. Appl. Math., 35:401–418, 1978.
L.J. Guibas and A. M. Odlyzko. Periods in strings. Journal of Combinatorial Theory. Series A, 30:19–43, 1981.
L.J. Guibas and A. M. Odlyzko. String overlaps, pattern matching, and nontransitive games. Journal of Combinatorial Theory. Series A, 30(2):183–208, 1981.
P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts Proc. MFCS 91, Lecture Notes in Computer Science, vol. n.520, Springer, 240–248, 1991.
P. Nicodeme, B. Salvy, and P. Flajolet. Motif statistics. In Proceedings of the 7th ESA, J. Nešeťril editor. Lecture Notes in Computer Science, vol. n.1643, Springer, 1999, 194–211.
B. Prum, F. Rudolphe and E. Turckheim. Finding words with unexpected frequencies in deoxyribonucleic acid sequence. J. Roy. Statist. Soc. Ser. B, 57: 205–220, 1995.
M. Régnier and W. Szpankowski. On the approximate pattern occurrence in a text. Proc. Sequence’ 97, Positano, 1997.
M. Régnier and W. Szpankowski. On pattern frequency occurrences in a Markovian sequence. Algorithmica, 22(4):621–649, 1998.
C. Reutenauer. Propriétés arithmétiques et topologiques de séries rationnelles en variables non commutatives, These Sc. Maths, Doctorat troisieme cycle, Université Paris VI, 1977.
E. Seneta. Non-negative matrices and Markov chains, Springer-Verlag, New York Heidelberg Berlin, 1981.
M. Waterman. Introduction to computational biology, Chapman & Hall, New York, 1995.
K. Wich. Sublinear ambiguity. In Proceedings of the 25th MFCS, M. Nielsen and B. Rovan editors. Lecture Notes in Computer Science, vol. n.1893, Springer, 2000, 690–698.
S. Wolfram. The Mathematica book Fourth Edition, Wolfram Media-Cambridge University Press, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Falco, D., Goldwurm, M., Lonati, V. (2003). Frequency of Symbol Occurrences in Simple Non-primitive Stochastic Models. In: Ésik, Z., Fülöp, Z. (eds) Developments in Language Theory. DLT 2003. Lecture Notes in Computer Science, vol 2710. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45007-6_19
Download citation
DOI: https://doi.org/10.1007/3-540-45007-6_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40434-7
Online ISBN: 978-3-540-45007-8
eBook Packages: Springer Book Archive