Skip to main content

Frequency of Symbol Occurrences in Simple Non-primitive Stochastic Models

  • Conference paper
  • First Online:
Developments in Language Theory (DLT 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2710))

Included in the following conference series:

Abstract

We study the random variable Y n representing the number of occurrences of a given symbol in a word of length n generated at random. The stochastic model we assume is a simple non-ergodic model defined by the product of two primitive rational formal series, which form two distinct ergodic components. We obtain asymptotic evaluations for the mean and the variance of Y n and its limit distribution. It turns out that there are two main cases: if one component is dominant and non-degenerate we get a Gaussian limit distribution; if the two components are equipotent and have different leading terms of the mean, we get a uniform limit distribution. Other particular limit distributions are obtained in the case of a degenerate dominant component and in the equipotent case when the leading terms of the expectation values are equal.

This work has been supported by the Project M.I.U.R. COFIN “Formal languages and automata: theory and applications”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. A. Bender and F. Kochman. The distribution of subword counts is usually normal. European Journal of Combinatorics, 14:265–275, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  2. J. Berstel and C. Reutenauer. Rational series and their languages, Springer-Verlag, New York — Heidelberg — Berlin, 1988.

    MATH  Google Scholar 

  3. A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. On the number of occurrences of a symbol in words of regular languages. Rapporto Interno n. 274-02, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, February 2002 (to appear in TCS).

    Google Scholar 

  4. A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. The symbol-periodicity of irreducible finite automata. Rapporto Interno n. 277-02, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, April 2002 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).

    Google Scholar 

  5. D. de Falco, M. Goldwurm, and V. Lonati. Frequency of symbol occurrences in simple non-primitive stochastic models. Rapporto Interno n. 287-03, Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano, February 2003 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).

    Google Scholar 

  6. A. Denise. Génération aléatoire uniforme de mots de langages rationnels. Theoretical Computer Science, 159:43–63, 1996.

    Article  MATH  MathSciNet  Google Scholar 

  7. J. Fickett. Recognition of protein coding regions in DNA sequences. Nucleic Acid Res, 10:5303–5318, 1982.

    Article  Google Scholar 

  8. P. Flajolet and R. Sedgewick. The average case analysis of algorithms: multivariate asymptotics and limit distributions. Rapport de recherche n. 3162, INRIA Rocquencourt, May 1997.

    Google Scholar 

  9. M.S. Gelfand. Prediction of function in DNA sequence analysis. J. Comput. Biol., 2:87–117, 1995.

    Article  Google Scholar 

  10. L.J. Guibas and A. M. Odlyzko. Maximal prefix-synchronized codes. SIAM J. Appl. Math., 35:401–418, 1978.

    Article  MATH  MathSciNet  Google Scholar 

  11. L.J. Guibas and A. M. Odlyzko. Periods in strings. Journal of Combinatorial Theory. Series A, 30:19–43, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  12. L.J. Guibas and A. M. Odlyzko. String overlaps, pattern matching, and nontransitive games. Journal of Combinatorial Theory. Series A, 30(2):183–208, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  13. P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts Proc. MFCS 91, Lecture Notes in Computer Science, vol. n.520, Springer, 240–248, 1991.

    Google Scholar 

  14. P. Nicodeme, B. Salvy, and P. Flajolet. Motif statistics. In Proceedings of the 7th ESA, J. Nešeťril editor. Lecture Notes in Computer Science, vol. n.1643, Springer, 1999, 194–211.

    Google Scholar 

  15. B. Prum, F. Rudolphe and E. Turckheim. Finding words with unexpected frequencies in deoxyribonucleic acid sequence. J. Roy. Statist. Soc. Ser. B, 57: 205–220, 1995.

    MATH  MathSciNet  Google Scholar 

  16. M. Régnier and W. Szpankowski. On the approximate pattern occurrence in a text. Proc. Sequence’ 97, Positano, 1997.

    Google Scholar 

  17. M. Régnier and W. Szpankowski. On pattern frequency occurrences in a Markovian sequence. Algorithmica, 22(4):621–649, 1998.

    Article  Google Scholar 

  18. C. Reutenauer. Propriétés arithmétiques et topologiques de séries rationnelles en variables non commutatives, These Sc. Maths, Doctorat troisieme cycle, Université Paris VI, 1977.

    Google Scholar 

  19. E. Seneta. Non-negative matrices and Markov chains, Springer-Verlag, New York Heidelberg Berlin, 1981.

    MATH  Google Scholar 

  20. M. Waterman. Introduction to computational biology, Chapman & Hall, New York, 1995.

    MATH  Google Scholar 

  21. K. Wich. Sublinear ambiguity. In Proceedings of the 25th MFCS, M. Nielsen and B. Rovan editors. Lecture Notes in Computer Science, vol. n.1893, Springer, 2000, 690–698.

    Google Scholar 

  22. S. Wolfram. The Mathematica book Fourth Edition, Wolfram Media-Cambridge University Press, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de Falco, D., Goldwurm, M., Lonati, V. (2003). Frequency of Symbol Occurrences in Simple Non-primitive Stochastic Models. In: Ésik, Z., Fülöp, Z. (eds) Developments in Language Theory. DLT 2003. Lecture Notes in Computer Science, vol 2710. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45007-6_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-45007-6_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40434-7

  • Online ISBN: 978-3-540-45007-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics