Skip to main content

Random generation of words of context-free languages according to the frequencies of letters

  • Conference paper
Mathematics and Computer Science

Part of the book series: Trends in Mathematics ((TM))

Abstract

Let L be a context-free language on an alphabet X={ x 1,x2,…, xk} and n a positive integer. We consider the problem of generating at random words of L with re-spect to a given distribution of the number of occurrences of the letters. We consider two alternatives of the problem. In the first one, a vector of natural numbers (n1, n2,…,nk) such that n1 + n2+… + nk = n is given, and the words must be generated uniformly among the set of words of L which contain exactly ni letters xi (1 ≤ i ≤ k). The second alternative consists, given v = (vi,…, vk) a vector of positive real numbers such that vi +… + vk = 1, to generate at random words among the whole set of words of L of length n, in such a way that the expected number of occurrences of any letter x i equals nvi (1 ≤i ≤ k), and two words having the same distribution of letters have the same probability to be generated. For this purpose, we design and study two alternatives of the recursive method which is classically employed for the uniform generation of combinatorial structures. This type of “controlled” non-uniform generation can be applied in the field of statistical analysis of genomic sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Coward. Shufflet: Shuffling sequences while conserving the k-let counts. Bioinformatics, 15(12):1058–1059, 1999.

    Article  Google Scholar 

  2. A. Denise and P. Zimmermann. Uniform random generation of decomposable structures using floating-point arithmetic. Theoretical Computer Science, 218:233–248, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  3. L. Devroye. Non-uniform random variate generation Springer-Verlag, 1986.

    MATH  Google Scholar 

  4. M. Drmota. Systems of functional equations. Random Structures and Algorithms, 10:103–124, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  5. I. Dutour and J.-M. Fédou. Object grammars and random generation. Discrete Mathematics and Theoretical Computer Science, 2:47–61, 1998.

    MathSciNet  MATH  Google Scholar 

  6. J-.C. Faugère. GB. http://calfor.lip6.fr/GB.html.

  7. J.W. Fickett. ORFs and genes: how strong a connection? J Comput Biol, 2(1):117–123, 1995.

    Article  Google Scholar 

  8. W.M. Fitch. Random sequences. Journal of Molecular Biology, 163:171–176, 1983.

    Article  Google Scholar 

  9. Ph. Flajolet, P. Zimmermann, and B. Van Cutsem. A calculus for the random generation of labelled combinatorial structures. Theoretical Computer Science, 132:1–35, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  10. M. Goldwurm. Random generation of words in an algebraic language in linear binary space. Information Processing Letters, 54:229–233, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  11. R.L. Graham, D.E. Knuth, and O. Patashnik. Concrete Mathematics Addison Wesley, 2nd edition, 1997. French translation: Mathematiques concretes, International Thomson Publishing, 1998.

    Google Scholar 

  12. T. Hickey and J. Cohen. Uniform random generation of strings in a context-free language. SIAM J. Comput, 12(4):645–655, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  13. D. Kandel, Y. Matias, R. Unger, and P. Winkler. Shuffling biological sequences. Discrete Applied Mathematics, 71:171–185,1996.

    Article  MathSciNet  MATH  Google Scholar 

  14. D.J. Lipman and W.R. Pearson. Rapid and sensitive protein similarity searches. Science, 227:1435–1441, 1985.

    Article  Google Scholar 

  15. D.J. Lipman, W.J. Wilbur, T.F. Smith, and M.S. Waterman. On the statistical signifiance of nucleic acid similarities. Nucleic Acids Research, 12:215–226, 1984.

    Article  Google Scholar 

  16. H. G. Mairson. Generating words in a context free language uniformly at random. Information Processing Letters, 49:95–99, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  17. P. Nicodème, B. Salvy, and Ph. Flajolet. Motif statistics. In European Symposium on Algorithms-ESA99, pages 194–211. Lecture Notes in Computer Science vol. 1643,1999.

    Google Scholar 

  18. A. Nijenhuis and H.S. Wilf. Combinatorial algorithms Academic Press, New York, 2nd edition, 1978.

    MATH  Google Scholar 

  19. Ph. Flajolet and R. Sedgewick. The average case analysis of algorithms: Multivariate asymptotics and limit distribution. RR INRIA, Number 3162, 1997.

    Google Scholar 

  20. M. Régnier. A unified approach to word occurrence probabilities. Discrete Applied Mathematics, 2000. To appear in a special issue on Computational Biology; preliminary version at RECOMB’98

    Google Scholar 

  21. R. Sedgewick and Ph. Flajolet. An introduction to the analysis of algorithms Addison Wesley, 1996. French translation: Introduction à l’analyse des algorithmes, International Thomson Publishing, 1996.

    MATH  Google Scholar 

  22. M. Termier and A. Kalogeropoulos. Discrimination between fortuitous and biologically constrained Open Reading Frames in DNA sequences of Saccharomyces cerevisiae. Yeast, 12:369–384, 1996.

    Article  Google Scholar 

  23. A. Vanet, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithmical methods for identifying them. Res. Microbiol, 150:779–799, 1999.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Basel AG

About this paper

Cite this paper

Denise, A., Roques, O., Termier, M. (2000). Random generation of words of context-free languages according to the frequencies of letters. In: Gardy, D., Mokkadem, A. (eds) Mathematics and Computer Science. Trends in Mathematics. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8405-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-0348-8405-1_10

  • Publisher Name: Birkhäuser, Basel

  • Print ISBN: 978-3-0348-9553-8

  • Online ISBN: 978-3-0348-8405-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics