Sankhya A

pp 1–11 | Cite as

Asymptotically Normal Estimators for Zipf’s Law

  • Mikhail ChebuninEmail author
  • Artyom Kovalevskii


We study an infinite urn scheme with probabilities corresponding to a power function. Urns here represent words from an infinitely large vocabulary. We propose asymptotically normal estimators of the exponent of the power function. The estimators use the number of different elements and a few similar statistics. If we use only one of the statistics we need to know asymptotics of a normalizing constant (a function of a parameter). All the estimators are implicit in this case. If we use two statistics then the estimators are explicit, but their rates of convergence are lower than those for estimators with the known normalizing constant.

Keywords and phrases.

Infinite urn scheme Zipf’s law Asymptotic normality. 

AMS (2000) subject classification.

Primary 62F10; Secondary 62F12 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



Our research was partially supported by RFBR grant 17-01-00683 and by the program of fundamental scientific researches of the SB RAS No. I.1.3., project No. 0314-2016-0008.


  1. Bahadur, R.R. (1960). On the number of distinct values in a large sample from an infinite discrete distribution. Proceedings of the National Institute of Sciences of India 26A, Supp II, 67–75.MathSciNetzbMATHGoogle Scholar
  2. Barbour, A.D. (2009). Univariate approximations in the infinite occupancy scheme. Alea 6, 415–433.MathSciNetGoogle Scholar
  3. Barbour, A.D. and Gnedin, A.V. (2009). Small counts in the infinite occupancy scheme. Electronic. J. Probab. 14, 365–384.MathSciNetzbMATHGoogle Scholar
  4. Ben-Hamou, A., Boucheron, S. and Gassiat, E. (2016). Pattern coding meets censoring: (almost) adaptive coding on countable alphabets. arXiv:1608.08367.
  5. Ben-Hamou, A., Boucheron, S. and Ohannessian, M.I. (2017). Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli 23, 249–287.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bogachev, L.V., Gnedin, A.V. and Yakubovich, Y.V. (2008). On the variance of the number of occupied boxes. Adv. Appl. Math. 40, 401–432.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Boonta, S. and Neammanee, K. (2007). Bounds on random infinite urn model. Bull. Malays. Math. Sci. Soc. Second Series 30.2, 121–128.MathSciNetzbMATHGoogle Scholar
  8. Chebunin, M.G. (2014). Estimation of parameters of probabilistic models which is based on the number of different elements in a sample. Sib. Zh. Ind. Mat. 17:3, 135–147. (in Russian).MathSciNetzbMATHGoogle Scholar
  9. Chebunin, M. and Kovalevskii, A. (2016). Functional central limit theorems for certain statistics in an infinite urn scheme. Statist. Probab. Lett. 119, 344–348.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Durieu, O. and Wang, Y. (2016). From infinite urn schemes to decompositions of self-similar Gaussian processes. Electron. J. Probab. 21, 43.CrossRefzbMATHGoogle Scholar
  11. Dutko, M. (1989). Central limit theorems for infinite urn models. Ann. Probab. 17, 1255–1263.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Gnedin, A., Hansen, B. and Pitman, J. (2007). Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws. Probab. Surv. 4, 146–171.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Grubel, R. and Hitczenko, P. (2009). Gaps in discrete random samples. J. Appl. Probab. 46, 1038–1051.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Heaps, H.S. (1978). Information retrieval, computational and theoretical aspects. Academic Press.Google Scholar
  15. Herdan, G. (1960). Type-token mathematics. The Hague, Mouton.zbMATHGoogle Scholar
  16. Hwang, H.-K. and Janson, S. (2008). Local limit theorems for finite and infinite urn models. Ann. Probab. 36, 992–1022.MathSciNetCrossRefzbMATHGoogle Scholar
  17. Karlin, S. (1967). Central limit theorems for certain infinite urn schemes. J. Math. Mech. 17, 373–401.MathSciNetzbMATHGoogle Scholar
  18. Key, E.S. (1992). Rare Numbers. J. Theor. Probab. 5, 375–389.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Key, E.S. (1996). Divergence rates for the number of rare numbers. J. Theor. Probab. 9, 413–428.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Khmaladze, E.V. (2011). Convergence properties in certain occupancy problems including the Karlin-Rouault law. J. Appl. Probab. 48, 1095–1113.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Mandelbrot, B. (1965). Information theory and psycholinguistics. In Scientific psychology. Basic Books, (B.B. Wolman and E. Nagel, eds.)Google Scholar
  22. Muratov, A. and Zuyev, S. (2016). Bit flipping and time to recover. J. Appl. Probab. 53, 650–666.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Nicholls, P.T. (1987). Estimation of Zipf parameters. J. Am. Soc. Inf. Sci. 38, 443–445.CrossRefGoogle Scholar
  24. Ohannessian, M.I. and Dahleh, M.A. (2012). Rare probability estimation under regularly varying heavy tails. In Proceedings of the 25th Annual Conference on Learning Theory PMLR, pp. 23:21.1–21.24.Google Scholar
  25. Petersen, A.M., Tenenbaum, J.N., Havlin, S., Stanley, H.E. and Perc, M. (2012). Languages cool as they expand: allometric scaling and the decreasing need for new words. Scientific Reports 2. Article No 943.Google Scholar
  26. Zakrevskaya, N.S. and Kovalevskii, A.P. (2001). One-parameter probabilistic models of text statistics. Sib. Zh. Ind. Mat. 4:2, 142–153. (in Russian).MathSciNetzbMATHGoogle Scholar
  27. Zipf, G.K. (1949). Human behavior and the principle of least effort. University Press, Cambridge.Google Scholar

Copyright information

© Indian Statistical Institute 2018

Authors and Affiliations

  1. 1.Sobolev Institute of MathematicsNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia
  3. 3.Novosibirsk State Technical UniversityNovosibirskRussia

Personalised recommendations