Frequency of Symbol Occurrences in Simple Non-primitive Stochastic Models

  • Diego de Falco
  • Massimiliano Goldwurm
  • Violetta Lonati
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2710)

Abstract

We study the random variable Yn representing the number of occurrences of a given symbol in a word of length n generated at random. The stochastic model we assume is a simple non-ergodic model defined by the product of two primitive rational formal series, which form two distinct ergodic components. We obtain asymptotic evaluations for the mean and the variance of Yn and its limit distribution. It turns out that there are two main cases: if one component is dominant and non-degenerate we get a Gaussian limit distribution; if the two components are equipotent and have different leading terms of the mean, we get a uniform limit distribution. Other particular limit distributions are obtained in the case of a degenerate dominant component and in the equipotent case when the leading terms of the expectation values are equal.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E. A. Bender and F. Kochman. The distribution of subword counts is usually normal. European Journal of Combinatorics, 14:265–275, 1993.MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    J. Berstel and C. Reutenauer. Rational series and their languages, Springer-Verlag, New York — Heidelberg — Berlin, 1988.MATHGoogle Scholar
  3. 3.
    A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. On the number of occurrences of a symbol in words of regular languages. Rapporto Interno n. 274-02, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, February 2002 (to appear in TCS).Google Scholar
  4. 4.
    A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. The symbol-periodicity of irreducible finite automata. Rapporto Interno n. 277-02, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, April 2002 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).Google Scholar
  5. 5.
    D. de Falco, M. Goldwurm, and V. Lonati. Frequency of symbol occurrences in simple non-primitive stochastic models. Rapporto Interno n. 287-03, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, February 2003 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).Google Scholar
  6. 6.
    A. Denise. Génération aléatoire uniforme de mots de langages rationnels. Theoretical Computer Science, 159:43–63, 1996.MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    J. Fickett. Recognition of protein coding regions in DNA sequences. Nucleic Acid Res, 10:5303–5318, 1982.CrossRefGoogle Scholar
  8. 8.
    P. Flajolet and R. Sedgewick. The average case analysis of algorithms: multivariate asymptotics and limit distributions. Rapport de recherche n. 3162, INRIA Rocquencourt, May 1997.Google Scholar
  9. 9.
    M.S. Gelfand. Prediction of function in DNA sequence analysis. J. Comput. Biol., 2:87–117, 1995.CrossRefGoogle Scholar
  10. 10.
    L.J. Guibas and A. M. Odlyzko. Maximal prefix-synchronized codes. SIAM J. Appl. Math., 35:401–418, 1978.MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    L.J. Guibas and A. M. Odlyzko. Periods in strings. Journal of Combinatorial Theory. Series A, 30:19–43, 1981.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    L.J. Guibas and A. M. Odlyzko. String overlaps, pattern matching, and nontransitive games. Journal of Combinatorial Theory. Series A, 30(2):183–208, 1981.MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts Proc. MFCS 91, Lecture Notes in Computer Science, vol. n.520, Springer, 240–248, 1991.Google Scholar
  14. 14.
    P. Nicodeme, B. Salvy, and P. Flajolet. Motif statistics. In Proceedings of the 7th ESA, J. Nešeťril editor. Lecture Notes in Computer Science, vol. n.1643, Springer, 1999, 194–211.Google Scholar
  15. 15.
    B. Prum, F. Rudolphe and E. Turckheim. Finding words with unexpected frequencies in deoxyribonucleic acid sequence. J. Roy. Statist. Soc. Ser. B, 57: 205–220, 1995.MATHMathSciNetGoogle Scholar
  16. 16.
    M. Régnier and W. Szpankowski. On the approximate pattern occurrence in a text. Proc. Sequence’ 97, Positano, 1997.Google Scholar
  17. 17.
    M. Régnier and W. Szpankowski. On pattern frequency occurrences in a Markovian sequence. Algorithmica, 22(4):621–649, 1998.CrossRefGoogle Scholar
  18. 18.
    C. Reutenauer. Propriétés arithmétiques et topologiques de séries rationnelles en variables non commutatives, These Sc. Maths, Doctorat troisieme cycle, Université Paris VI, 1977.Google Scholar
  19. 19.
    E. Seneta. Non-negative matrices and Markov chains, Springer-Verlag, New York Heidelberg Berlin, 1981.MATHGoogle Scholar
  20. 20.
    M. Waterman. Introduction to computational biology, Chapman & Hall, New York, 1995.MATHGoogle Scholar
  21. 21.
    K. Wich. Sublinear ambiguity. In Proceedings of the 25th MFCS, M. Nielsen and B. Rovan editors. Lecture Notes in Computer Science, vol. n.1893, Springer, 2000, 690–698.Google Scholar
  22. 22.
    S. Wolfram. The Mathematica book Fourth Edition, Wolfram Media-Cambridge University Press, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Diego de Falco
    • 1
  • Massimiliano Goldwurm
    • 1
  • Violetta Lonati
    • 1
  1. 1.Dipartimento di Scienze dell’InformazioneUniversità degli Studi di MilanoMilanoItaly

Personalised recommendations