Statistical Inference for Rényi Entropy Functionals

  • David Källberg
  • Nikolaj Leonenko
  • Oleg Seleznjev
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7260)


Numerous entropy-type characteristics (functionals) generalizing Rényi entropy are widely used in mathematical statistics, physics, information theory, and signal processing for characterizing uncertainty in probability distributions and distribution identification problems. We consider estimators of some entropy (integral) functionals for discrete and continuous distributions based on the number of epsilon-close vector records in the corresponding independent and identically distributed samples from two distributions. The proposed estimators are generalized U-statistics. We show the asymptotic properties of these estimators (e.g., consistency and asymptotic normality). The results can be applied in various problems in computer science and mathematical statistics (e.g., approximate matching for random databases, record linkage, image matching).

AMS 2000 subject classification: 94A15, 62G20


entropy estimation Rényi entropy U-statistics approximate matching asymptotic normality 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [2009]
    Baryshnikov, Y., Penrose, M., Yukich, J.E.: Gaussian limits for generalized spacings. Ann. Appl. Probab. 19, 158–185 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  2. [1990]
    Copas, J.B., Hilton, F.J.: Record linkage: statistical models for matching computer records. Jour. Royal Stat. Soc. Ser. A 153, 287–320 (1990)CrossRefGoogle Scholar
  3. [2003]
    Costa, J., Hero, A., Vignat, C.: On Solutions to Multivariate Maximum α-entropy Problems. In: Rangarajan, A., Figueiredo, M.A.T., Zerubia, J. (eds.) EMMCVPR 2003. LNCS, vol. 2683, pp. 211–228. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. [1995]
    Demetrovics, J., Katona, G.O.H., Miklós, D., Seleznjev, O., Thalheim, B.: The Average Length of Keys and Functional Dependencies in (Random) Databases. In: Vardi, M.Y., Gottlob, G. (eds.) ICDT 1995. LNCS, vol. 893, pp. 266–279. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  5. [1998a]
    Demetrovics, J., Katona, G.O.H., Miklós, D., Seleznjev, O., Thalheim, B.: Asymptotic properties of keys and functional dependencies in random databases. Theor. Computer Science 190, 151–166 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  6. [1998b]
    Demetrovics, J., Katona, G.O.H., Miklós, D., Seleznjev, O., Thalheim, B.: Functional dependencies in random databases. Studia Scien. Math. Hungarica 34, 127–140 (1998)zbMATHMathSciNetGoogle Scholar
  7. [1991]
    Durrett, R.: Probability: Theory and Examples. Brooks/Cole Publishing Company, New York (1991)zbMATHGoogle Scholar
  8. [1989]
    Kapur, J.N.: Maximum-entropy Models in Science and Engineering. Wiley, New York (1989)zbMATHGoogle Scholar
  9. [1992]
    Kapur, J.N., Kesavan, H.K.: Entropy Optimization Principles with Applications. Academic Press, New York (1992)CrossRefzbMATHGoogle Scholar
  10. [2005]
    Kiefer, M., Bernstein, A., Lewis, P.M.: Database Systems: An Application-Oriented Approach. Addison-Wesley (2005)Google Scholar
  11. [1994]
    Koroljuk, V.S., Borovskich, Y.V.: Theory of U-statistics. Kluwer, London (1994)CrossRefGoogle Scholar
  12. [1987]
    Kozachenko, L.F., Leonenko, N.N.: On statistical estimation of entropy of random vector. Problems Infor. Transmiss. 23, 95–101 (1987)zbMATHGoogle Scholar
  13. [1990]
    Lee, A.J.: U-Statistics: Theory and Practice. Marcel Dekker, New York (1990)zbMATHGoogle Scholar
  14. [2008]
    Leonenko, N., Pronzato, L., Savani, V.: A class of Rényi information estimators for multidimensional densities. Ann. Stat. 36, 2153–2182 (2008); Corrections, Ann. Stat. 38(6), 3837–3838 (2010) CrossRefzbMATHGoogle Scholar
  15. [2010]
    Leonenko, N., Seleznjev, O.: Statistical inference for the ε-entropy and the quadratic Rényi entropy. Jour. Multivariate Analysis 101, 1981–1994 (2010)CrossRefzbMATHGoogle Scholar
  16. [2006]
    Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman and Hall (2006)Google Scholar
  17. [1961]
    Rényi, A.: On measures of entropy and information. In: Proc. 4th Berkeley Symp. Math. Statist. Prob., vol. 1 (1961)Google Scholar
  18. [1970]
    Rényi, A.: Probability Theory. North-Holland, London (1970)zbMATHGoogle Scholar
  19. [2003]
    Seleznjev, O., Thalheim, B.: Average case analysis in database problems. Methodol. Comput. Appl. Prob. 5, 395–418 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  20. [2008]
    Seleznjev, O., Thalheim, B.: Random databases with approximate record matching. Methodol. Comput. Appl. Prob. 12, 63–89 (2008), doi:10.1007/s11009-008-9092-4 (2010) (published online, in print)CrossRefzbMATHMathSciNetGoogle Scholar
  21. [1948]
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. Jour. 27, 379–423, 623–656 (1948)CrossRefzbMATHMathSciNetGoogle Scholar
  22. [2001]
    Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. John Wiley, New York (2001)CrossRefzbMATHGoogle Scholar
  23. [2000]
    Thalheim, B.: Entity-Relationship Modeling. In: Foundations of Database Technology. Springer, Berlin (2000)Google Scholar
  24. [1996]
    Tsybakov, A.B., Van der Meulen, E.C.: Root-n consistent estimators of entropy for densities with unbounded support. Scandinavian Jour. Statistics 23, 75–83 (1996)zbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • David Källberg
    • 1
  • Nikolaj Leonenko
    • 2
  • Oleg Seleznjev
    • 1
  1. 1.Department of Mathematics and Mathematical StatisticsUmeå UniversityUmeåSweden
  2. 2.School of MathematicsCardiff UniversityCardiffUK

Personalised recommendations