Skip to main content
Log in

Guessing probability distributions from small samples

  • Articles
  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

We propose a new method for the calculation of the statistical properties, e.g., the entropy, of unknown generators of symbolic sequences. The probability distributionp(k) of the elementsk of a population can be approximated by the frequenciesf(k) of a sample provided the sample is long enough so that each elementk occurs many times. Our method yields an approximation if this precondition does not hold. For a givenf(k) we recalculate the Zipf-ordered probability distribution by optimization of the parameters of a guessed distribution. We demonstrate that our method yields reliable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. H. Justice, ed.,Maximum Entropy and Bayesian Methods in Applied Statistics (Cambridge University Press, Cambridge, 1986).

    Google Scholar 

  2. H. Herzel,Syst. Anal. Mod. Simul. 5:435 (1988); P. Grassberger,Inf. J. Theor. Phys. 25:907 (1986);Phys. Lett. A 128:369 (1988);IEEE Trans. Inf. Theory 35:669 (1989).

    Google Scholar 

  3. A. Schmitt, H. Herzel, and W. Ebeling,Europhys. Lett. 23:303 (1993).

    Google Scholar 

  4. B. McMillan,Ann. Math. Stat. 24:196–216 (1953); A. Khinchin,Mathematical Foundation of Information Theory (Dover, New York, 1967).

    Google Scholar 

  5. Donald E. Knuth,The Art of Computer Programming (Addison-Wesley Reading, Massachusetts, 1973), Vol. 3, pp. 506–570; Robert Sedgwick,Algorithms (Addison-Wesley, Reading, Massachusetts, 1991).

    Google Scholar 

  6. W. Ebeling, T. Pöschel, and K. Albrecht,Bifurcation & Chaos, in press.

  7. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling,Numerical Recipes (Cambridge University Press, Cambridge, 1987).

    Google Scholar 

  8. H. Melville,Moby Dick [provided as ASCII-text by Project Gutenberg Etext, Illinois Benedictine College, Lisle, Illinois].

  9. W. Ebeling and T. Pöschel,Europhys. Lett. 26:241 (1994).

    Google Scholar 

  10. W. Ebeling and T. Pöschel, in preparation.

  11. A. Apostolico and Z. Galil,Combinatorial Algorithms on Words (Springer, Berlin, 1985).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by D. Stauffer

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pöschel, T., Ebeling, W. & Rosé, H. Guessing probability distributions from small samples. J Stat Phys 80, 1443–1452 (1995). https://doi.org/10.1007/BF02179880

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02179880

Key Words

Navigation