Abstract
We propose a new method for the calculation of the statistical properties, e.g., the entropy, of unknown generators of symbolic sequences. The probability distributionp(k) of the elementsk of a population can be approximated by the frequenciesf(k) of a sample provided the sample is long enough so that each elementk occurs many times. Our method yields an approximation if this precondition does not hold. For a givenf(k) we recalculate the Zipf-ordered probability distribution by optimization of the parameters of a guessed distribution. We demonstrate that our method yields reliable results.
Similar content being viewed by others
References
J. H. Justice, ed.,Maximum Entropy and Bayesian Methods in Applied Statistics (Cambridge University Press, Cambridge, 1986).
H. Herzel,Syst. Anal. Mod. Simul. 5:435 (1988); P. Grassberger,Inf. J. Theor. Phys. 25:907 (1986);Phys. Lett. A 128:369 (1988);IEEE Trans. Inf. Theory 35:669 (1989).
A. Schmitt, H. Herzel, and W. Ebeling,Europhys. Lett. 23:303 (1993).
B. McMillan,Ann. Math. Stat. 24:196–216 (1953); A. Khinchin,Mathematical Foundation of Information Theory (Dover, New York, 1967).
Donald E. Knuth,The Art of Computer Programming (Addison-Wesley Reading, Massachusetts, 1973), Vol. 3, pp. 506–570; Robert Sedgwick,Algorithms (Addison-Wesley, Reading, Massachusetts, 1991).
W. Ebeling, T. Pöschel, and K. Albrecht,Bifurcation & Chaos, in press.
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling,Numerical Recipes (Cambridge University Press, Cambridge, 1987).
H. Melville,Moby Dick [provided as ASCII-text by Project Gutenberg Etext, Illinois Benedictine College, Lisle, Illinois].
W. Ebeling and T. Pöschel,Europhys. Lett. 26:241 (1994).
W. Ebeling and T. Pöschel, in preparation.
A. Apostolico and Z. Galil,Combinatorial Algorithms on Words (Springer, Berlin, 1985).
Author information
Authors and Affiliations
Additional information
Communicated by D. Stauffer
Rights and permissions
About this article
Cite this article
Pöschel, T., Ebeling, W. & Rosé, H. Guessing probability distributions from small samples. J Stat Phys 80, 1443–1452 (1995). https://doi.org/10.1007/BF02179880
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02179880