Random generation of words of context-free languages according to the frequencies of letters

Denise, Alain; Roques, Olivier; Termier, Michel

doi:10.1007/978-3-0348-8405-1_10

Alain Denise³,
Olivier Roques⁴ &
Michel Termier⁵

Part of the book series: Trends in Mathematics ((TM))

490 Accesses
6 Citations

Abstract

Let L be a context-free language on an alphabet X={ x ₁,x₂,…, x_k} and n a positive integer. We consider the problem of generating at random words of L with re-spect to a given distribution of the number of occurrences of the letters. We consider two alternatives of the problem. In the first one, a vector of natural numbers (n₁, n₂,…,n_k) such that n₁ + n₂+… + n_k = n is given, and the words must be generated uniformly among the set of words of L which contain exactly n_i letters x_i (1 ≤ i ≤ k). The second alternative consists, given v = (v_i,…, v_k) a vector of positive real numbers such that v_i +… + v_k = 1, to generate at random words among the whole set of words of L of length n, in such a way that the expected number of occurrences of any letter x _i equals nv_i (1 ≤i ≤ k), and two words having the same distribution of letters have the same probability to be generated. For this purpose, we design and study two alternatives of the recursive method which is classically employed for the uniform generation of combinatorial structures. This type of “controlled” non-uniform generation can be applied in the field of statistical analysis of genomic sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E. Coward. Shufflet: Shuffling sequences while conserving the k-let counts. Bioinformatics, 15(12):1058–1059, 1999.
Article Google Scholar
A. Denise and P. Zimmermann. Uniform random generation of decomposable structures using floating-point arithmetic. Theoretical Computer Science, 218:233–248, 1999.
Article MathSciNet MATH Google Scholar
L. Devroye. Non-uniform random variate generation Springer-Verlag, 1986.
MATH Google Scholar
M. Drmota. Systems of functional equations. Random Structures and Algorithms, 10:103–124, 1997.
Article MathSciNet MATH Google Scholar
I. Dutour and J.-M. Fédou. Object grammars and random generation. Discrete Mathematics and Theoretical Computer Science, 2:47–61, 1998.
MathSciNet MATH Google Scholar
J-.C. Faugère. GB. http://calfor.lip6.fr/GB.html.
J.W. Fickett. ORFs and genes: how strong a connection? J Comput Biol, 2(1):117–123, 1995.
Article Google Scholar
W.M. Fitch. Random sequences. Journal of Molecular Biology, 163:171–176, 1983.
Article Google Scholar
Ph. Flajolet, P. Zimmermann, and B. Van Cutsem. A calculus for the random generation of labelled combinatorial structures. Theoretical Computer Science, 132:1–35, 1994.
Article MathSciNet MATH Google Scholar
M. Goldwurm. Random generation of words in an algebraic language in linear binary space. Information Processing Letters, 54:229–233, 1995.
Article MathSciNet MATH Google Scholar
R.L. Graham, D.E. Knuth, and O. Patashnik. Concrete Mathematics Addison Wesley, 2nd edition, 1997. French translation: Mathematiques concretes, International Thomson Publishing, 1998.
Google Scholar
T. Hickey and J. Cohen. Uniform random generation of strings in a context-free language. SIAM J. Comput, 12(4):645–655, 1983.
Article MathSciNet MATH Google Scholar
D. Kandel, Y. Matias, R. Unger, and P. Winkler. Shuffling biological sequences. Discrete Applied Mathematics, 71:171–185,1996.
Article MathSciNet MATH Google Scholar
D.J. Lipman and W.R. Pearson. Rapid and sensitive protein similarity searches. Science, 227:1435–1441, 1985.
Article Google Scholar
D.J. Lipman, W.J. Wilbur, T.F. Smith, and M.S. Waterman. On the statistical signifiance of nucleic acid similarities. Nucleic Acids Research, 12:215–226, 1984.
Article Google Scholar
H. G. Mairson. Generating words in a context free language uniformly at random. Information Processing Letters, 49:95–99, 1994.
Article MathSciNet MATH Google Scholar
P. Nicodème, B. Salvy, and Ph. Flajolet. Motif statistics. In European Symposium on Algorithms-ESA99, pages 194–211. Lecture Notes in Computer Science vol. 1643,1999.
Google Scholar
A. Nijenhuis and H.S. Wilf. Combinatorial algorithms Academic Press, New York, 2nd edition, 1978.
MATH Google Scholar
Ph. Flajolet and R. Sedgewick. The average case analysis of algorithms: Multivariate asymptotics and limit distribution. RR INRIA, Number 3162, 1997.
Google Scholar
M. Régnier. A unified approach to word occurrence probabilities. Discrete Applied Mathematics, 2000. To appear in a special issue on Computational Biology; preliminary version at RECOMB’98
Google Scholar
R. Sedgewick and Ph. Flajolet. An introduction to the analysis of algorithms Addison Wesley, 1996. French translation: Introduction à l’analyse des algorithmes, International Thomson Publishing, 1996.
MATH Google Scholar
M. Termier and A. Kalogeropoulos. Discrimination between fortuitous and biologically constrained Open Reading Frames in DNA sequences of Saccharomyces cerevisiae. Yeast, 12:369–384, 1996.
Article Google Scholar
A. Vanet, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithmical methods for identifying them. Res. Microbiol, 150:779–799, 1999.
Article Google Scholar

Download references

Author information

Authors and Affiliations

LRI, UMR CNRS, Université Paris-Sud XI, 8623, France
Alain Denise
LaBRI, UMR CNRS, Université Bordeaux I, 5800, France
Olivier Roques
IGM, UMR CNRS, Université Paris-Sud XI, 8621, France
Michel Termier

Authors

Alain Denise
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Roques
View author publications
You can also search for this author in PubMed Google Scholar
Michel Termier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bâtiment Descartes, Université de Versailles-St-Quentin PRISM, 45 avenue des Etats-Unis, 78035, Versailles, Cedex, France
Danièle Gardy
Département de Mathématiques Bâtiment Fermat, Université de Versailles-St-Quentin, 45 avenue des Etats-Unis, 78035, Versailles, Cedex, France
Abdelkader Mokkadem

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Denise, A., Roques, O., Termier, M. (2000). Random generation of words of context-free languages according to the frequencies of letters. In: Gardy, D., Mokkadem, A. (eds) Mathematics and Computer Science. Trends in Mathematics. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8405-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-0348-8405-1_10
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-9553-8
Online ISBN: 978-3-0348-8405-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics