# Limiting size index distributions for ball-bin models with Zipf-type frequencies

- 97 Downloads
- 1 Citations

## Abstract

We consider a random ball-bin model where balls are thrown randomly and sequentially into a set of bins. The frequency of choices of bins follows the Zipf-type (power-law) distribution; that is, the probability with which a ball enters the *i*th most popular bin is asymptotically proportional to 1/*i* ^{ α }, *α* > 0. In this model, we derive the limiting size index distributions to which the empirical distributions of size indices converge almost surely, where the size index of degree *k* at time *t* represents the number of bins containing exactly *k* balls at *t*. While earlier studies have only treated the case where the power *α* of the Zipf-type distribution is greater than unity, we here consider the case of *α* ≤ 1 as well as *α* > 1. We first investigate the limiting size index distributions for the independent throw models and then extend the derived results to a case where bins are chosen dependently. Simulation experiments demonstrate not only that our analysis is valid but also that the derived limiting distributions well approximate the empirical size index distributions in a relatively short period.

## Keywords

Limiting distributions Random ball-bin occupancy models Size indices Zipf-type distribution## Preview

Unable to display preview. Download preview PDF.

## References

- Baayen R.H. (2001) Word frequency distributions. Kluwer, DordrechtzbMATHCrossRefGoogle Scholar
- Davis P.J. (1972) Gamma function and related functions. In: Abramowitz M., Stegun I.A. (eds) Handbook of mathematical functions with formulas, graphs, and mathematical tables, 9th printing (Chap. 6). Dover, New York, pp 253–293Google Scholar
- Evert, S. (2004). A simple LNRE model for random character sequences.
*Proceedings of the 7èmes Journées Internationales d’Analyse Statistique des Données Textuelles (JADT2004)*, 411–422.Google Scholar - Gnedin A., Hansen B., Pitman J. (2007) Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws. Probability Surveys 4: 146–171MathSciNetzbMATHCrossRefGoogle Scholar
- Grandell J. (1997) Mixed poisson processes. Chapman & Hall., LondonzbMATHGoogle Scholar
- Ivanov V.A., Ivchenko G.I., Medvedev Yu. I. (1985) Discrete problems in probability theory. Journal of Mathematical Sciences (New York) 31: 2759–2795zbMATHCrossRefGoogle Scholar
- Karlin S. (1967) Central limit theorems for certain infinite urn schemes. Journal of Mathematics and Mechanics 17: 373–401MathSciNetzbMATHGoogle Scholar
- Khmaladze, E. V. (1987). The statistical analysis of a large number of rare events. Report MS-R8804, Centre for Mathematics and Computer Science, CWI, Amsterdam, The Netherlands.Google Scholar
- Mitzenmacher M., Upfal E. (2005) Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, CambridgezbMATHGoogle Scholar
- Müller A., Stoyan D. (2002) Comparison methods for stochastic models and risks. Wiley, ChichesterzbMATHGoogle Scholar
- R Development Core Team (2007). R: A language and environment for statistical computing. http://www.R-project.org.
- Rouault A. (1978) Lois de Zipf et sources markoviennes. Annales de l’Institut Henri Poincaré: Probabilités et Statistiques 14: 169–188MathSciNetzbMATHGoogle Scholar
- Shaked M., Shanthikumar J.G. (2007) Stochastic orders. Springer, New YorkzbMATHCrossRefGoogle Scholar
- Sibuya M. (1979) Generalized hypergeometric, digamma and trigamma distributions. Annals of the Institute of Statistical Mathematics 31: 373–390MathSciNetzbMATHCrossRefGoogle Scholar
- Sibuya M. (1993) A random clustering process. Annals of the Institute of Statistical Mathematics 45: 459–465MathSciNetzbMATHCrossRefGoogle Scholar