Abstract
We consider a random ball-bin model where balls are thrown randomly and sequentially into a set of bins. The frequency of choices of bins follows the Zipf-type (power-law) distribution; that is, the probability with which a ball enters the ith most popular bin is asymptotically proportional to 1/i α, α > 0. In this model, we derive the limiting size index distributions to which the empirical distributions of size indices converge almost surely, where the size index of degree k at time t represents the number of bins containing exactly k balls at t. While earlier studies have only treated the case where the power α of the Zipf-type distribution is greater than unity, we here consider the case of α ≤ 1 as well as α > 1. We first investigate the limiting size index distributions for the independent throw models and then extend the derived results to a case where bins are chosen dependently. Simulation experiments demonstrate not only that our analysis is valid but also that the derived limiting distributions well approximate the empirical size index distributions in a relatively short period.
Similar content being viewed by others
References
Baayen R.H. (2001) Word frequency distributions. Kluwer, Dordrecht
Davis P.J. (1972) Gamma function and related functions. In: Abramowitz M., Stegun I.A. (eds) Handbook of mathematical functions with formulas, graphs, and mathematical tables, 9th printing (Chap. 6). Dover, New York, pp 253–293
Evert, S. (2004). A simple LNRE model for random character sequences. Proceedings of the 7èmes Journées Internationales d’Analyse Statistique des Données Textuelles (JADT2004), 411–422.
Gnedin A., Hansen B., Pitman J. (2007) Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws. Probability Surveys 4: 146–171
Grandell J. (1997) Mixed poisson processes. Chapman & Hall., London
Ivanov V.A., Ivchenko G.I., Medvedev Yu. I. (1985) Discrete problems in probability theory. Journal of Mathematical Sciences (New York) 31: 2759–2795
Karlin S. (1967) Central limit theorems for certain infinite urn schemes. Journal of Mathematics and Mechanics 17: 373–401
Khmaladze, E. V. (1987). The statistical analysis of a large number of rare events. Report MS-R8804, Centre for Mathematics and Computer Science, CWI, Amsterdam, The Netherlands.
Mitzenmacher M., Upfal E. (2005) Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, Cambridge
Müller A., Stoyan D. (2002) Comparison methods for stochastic models and risks. Wiley, Chichester
R Development Core Team (2007). R: A language and environment for statistical computing. http://www.R-project.org.
Rouault A. (1978) Lois de Zipf et sources markoviennes. Annales de l’Institut Henri Poincaré: Probabilités et Statistiques 14: 169–188
Shaked M., Shanthikumar J.G. (2007) Stochastic orders. Springer, New York
Sibuya M. (1979) Generalized hypergeometric, digamma and trigamma distributions. Annals of the Institute of Statistical Mathematics 31: 373–390
Sibuya M. (1993) A random clustering process. Annals of the Institute of Statistical Mathematics 45: 459–465
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Chida, S., Miyoshi, N. Limiting size index distributions for ball-bin models with Zipf-type frequencies. Ann Inst Stat Math 63, 745–768 (2011). https://doi.org/10.1007/s10463-010-0276-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-010-0276-7