The Generalized Cross Entropy Method, with Applications to Probability Density Estimation

Botev, Zdravko I.; Kroese, Dirk P.

doi:10.1007/s11009-009-9133-7

The Generalized Cross Entropy Method, with Applications to Probability Density Estimation

Published: 16 May 2009

Volume 13, pages 1–27, (2011)
Cite this article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Zdravko I. Botev¹ &
Dirk P. Kroese¹

783 Accesses
34 Citations
6 Altmetric
Explore all metrics

Abstract

Nonparametric density estimation aims to determine the sparsest model that explains a given set of empirical data and which uses as few assumptions as possible. Many of the currently existing methods do not provide a sparse solution to the problem and rely on asymptotic approximations. In this paper we describe a framework for density estimation which uses information-theoretic measures of model complexity with the aim of constructing a sparse density estimator that does not rely on large sample approximations. The effectiveness of the approach is demonstrated through an application to some well-known density estimation test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new method for estimation and model selection: $$\rho $$ -estimation

Article 26 July 2016

Constrained Density Estimation

The locally Gaussian density estimator for multivariate data

Article 05 October 2016

References

Abramson IS (1982) On bandwidth variation in kernel estimates—a square root law. Ann Stat 10:1217–1223
Article MATH MathSciNet Google Scholar
Basford KE, McLachlan GJ, York MG (1997) Modelling the distribution of stamp paper thickness via finite normal mixtures: the 1872 stamp issue of Mexico revisited. J Appl Stat 24:169–179
Article Google Scholar
Ben-Tal A, Teboulle M (1987) Penalty functions and duality in stochastic programming via ϕ divergence functionals. Math Oper Res 12:224–240
Article MATH MathSciNet Google Scholar
Biernacki C, Celeux C, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Technical report no. 3521. Rhône-Alpes, INRIA
Borwein JM, Lewis AS (1991) Duality relationships for entropy-like minimization problems. SIAM J Control Optim 29:325–338
Article MATH MathSciNet Google Scholar
Borwein JM, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, Berlin Heidelberg New York
MATH Google Scholar
Botev ZI (2005) Stochastic methods for optimization and machine learning. ePrintsUQ, BSc (Hons) thesis, Department of Mathematics, School of Physical Sciences, The University of Queensland. http://eprint.uq.edu.au/archive/00003377/
Botev ZI, Kroese DP (2008) Non-asymptotic bandwidth selection for density estimation of discrete data. Methodol Comput Appl Probab 10:435–451
Article MATH MathSciNet Google Scholar
Bowman AW (1985) A comparative study of some kernel-based nonparametric density estimators. J Stat Comput Simul 21:313–327
Article MATH MathSciNet Google Scholar
Bowman AW, Hall P, Titterington DM (1984) Cross-validation in nonparametric estimation of probabilities and probability densities. Biometrika 71:341–351
Article MATH MathSciNet Google Scholar
Boyd SP (2004) Convex optimization. Cambridge, New York
MATH Google Scholar
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13:195–212
Article MATH MathSciNet Google Scholar
Chib S (1982) Marginal likelihood from the gibbs output. J Am Stat Assoc 90:1313–1321
Article MathSciNet Google Scholar
Chiu ST (1991) Bandwidth selection for kernel density estimation. Ann Stat 19:1883–1905
Article MATH MathSciNet Google Scholar
Csiszár I (1972) A class of measures of informativity of observation channels. Period Math Hung 2:191–213
Article MATH Google Scholar
Decarreau A, Hilhorst D, Lemarechal C, Navaza J (1992) Dual methods in entropy maximization. Applications to some problems in crystalography. SIAM J Optim 2:173–197
Article MATH MathSciNet Google Scholar
Devroye L, Gyofri L (1985) Nonparametric density estimation: the L ₁ view. Wiley series in probability and mathematical statistics. Wiley, New York
MATH Google Scholar
Doucet A, de Freitas N, Gordon N (2001) Sequential Monte Carlo methods in practice. Springer, New York
MATH Google Scholar
Girolami M, He C (2003) Probability density estimation from optimally condensed data samples. IEEE Trans Pattern Anal Mach Intell 25(10):1253–1264
Article Google Scholar
Girolami M, He C (2004) Novelty detection employing an l ₂ optimal non-parametric density estimator. Pattern Recogn Lett 25:1389–1397
Article Google Scholar
Hall P (1987) On Kullback–Leibler loss and density estimation. Ann Stat 15:1491–1519
Article MATH Google Scholar
Hall P, Turlach BA (1999) Reducing bias in curve estimation by use of weights. Comput Stat Data Anal 30:67–86
Article MATH MathSciNet Google Scholar
Havrda JH, Charvat F (1967) Quantification methods of classification processes: concepts of structural α entropy. Kybernatica 3:30–35
MATH MathSciNet Google Scholar
Izenman AJ, Sommer CJ (1988) Philatelic mixtures and multimodal densities. J Am Stat Assoc 83:941–953
Article Google Scholar
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106:621–630
Article MathSciNet Google Scholar
Jones MC, Marron JS, Sheather SJ (1996) Progress in data-based bandwidth selection for kernel density estimation. Comput Stat 11:337–381
MATH MathSciNet Google Scholar
Kapur JN (1989) Maximum entropy models in science and engineering. Wiley Eastern, New Delhi
MATH Google Scholar
Kapur JN (1994) Measures of information and their applications. Wiley, New Delhi
MATH Google Scholar
Kapur JN, Kesavan HK (1987) Generalized maximum entropy principle (with applications). Standford Educational Press, University of Waterloo, Waterloo
MATH Google Scholar
Kapur JN, Kesavan HK (1989) The generalized maximum entropy principle. IEEE Trans Syst Man Cybern 19:1042–1052
Article MathSciNet Google Scholar
Kapur JN, Kesavan HK (1992) Entropy optimization principles with applications. Academic, New York
Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MATH MathSciNet Google Scholar
Kesavan HK, Srikanth M, Roe PH (2000) Probability density function estimation using the minmax measure. IEEE Trans Syst Man Cybern Part C Appl Rev 30(1):77–83
Article Google Scholar
Lehmann EL (1990) Model specification: the views of fisher and neyman, and later developments. Stat Sci 5:160–168
Article MATH MathSciNet Google Scholar
Loader CR (1999a) Bandwidth selection: classical or plug-in. Ann Stat 27:415–438
Article MATH MathSciNet Google Scholar
Loader CR (1999b) Local regression and likelihood. Springer, Berlin Heidelberg New York
MATH Google Scholar
Marron JS (1985) An asymptotically efficient solution to the bandwidth problem of kernel density estimation. Ann Stat 13:1011–1023
Article MATH MathSciNet Google Scholar
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20:712–736
Article MATH MathSciNet Google Scholar
Marron JS, Jones MC, Park BU (1991) A simple root n bandwidth selector. Ann Stat 19(4):1919–1932
Article MATH MathSciNet Google Scholar
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
MATH Google Scholar
Mclachlan GJ, Peel D (1997) Contribution to the discussion of paper by S. Richardson and P. J. Green. J R Stat Soc Ser B Stat Methodol 59:779–780
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Morejon RA, Principe JC (2004) Advanced search algorithms for information-theoretic learning with kernel-based estimators. IEEE Trans Neural Netw 15:874–884
Article Google Scholar
Mukherjee S, Vapnik V (1999) Multivariate density estimation: a support vector machine approach. Massachusetts Institute of Technology. ftp://publications.ai.mit.edu/ai-publications/1500-1999/AIM-1653.ps
Pawitan Y (2001) In all likelihood: statistical modeling and inference using likelihood. Carendon, Oxford
Google Scholar
Principe JC, Erdogmus D (2002) An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans Signal Process 50:1780–1786
Article Google Scholar
Richardson S, Green PJ (1997) On bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B Stat Methodol 59:731–792
Article MATH MathSciNet Google Scholar
Roeder K (1990) Density estimation with confidence sets exemplified by super-clusters and voids in the galaxies. J Am Stat Assoc 85:617–624
Article MATH Google Scholar
Rubinstein RY (2005) The stochastic minimum cross-entropy method for combinatorial optimization and rare-event estimation. Methodol Comput Appl Probab 7:5–50
Article MATH MathSciNet Google Scholar
Rubinstein RY, Kroese DP (2004) The cross-entropy method. Springer, Berlin Heidelberg New York
MATH Google Scholar
Rubinstein RY, Kroese DP (2007) Simulation and the Monte Carlo method, 2nd edn. Wiley, New York
Book Google Scholar
Ruppert D, Cline DBH (1994) Bias reduction in kernel density estimation by smoothed empirical transformations. Ann Stat 22:185–210
Article MATH MathSciNet Google Scholar
Scott DW (1992) Multivariate density estimation. Theory, practice and visualization. Wiley, New York
Book MATH Google Scholar
Scott DW (2001) Parametric statistical modeling by minimum integrated square error. Technimetrics 43:274–285
Article Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–659
MATH MathSciNet Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
MATH Google Scholar
Simonoff JS (1996) Smoothing methods in statistics. Springer, Berlin Heidelberg New York
MATH Google Scholar
Stone CJ (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann Stat 12:1285–1297
Article MATH Google Scholar
Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 20:1236–1265
Article MATH MathSciNet Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
Tsallis C (1988) Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys 52:479
Article MATH MathSciNet Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wan FYM (1995) Introduction to the calculus of variations and its applications. Chapman and Hall, London
MATH Google Scholar
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
MATH Google Scholar
Zhang P (1996) Nonparametric importance sampling. J Am Stat Assoc 91(435):1245–1253
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, The University of Queensland, Brisbane, 4072, Australia
Zdravko I. Botev & Dirk P. Kroese

Authors

Zdravko I. Botev
View author publications
You can also search for this author in PubMed Google Scholar
Dirk P. Kroese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk P. Kroese.

Additional information

Supported by the Australian Research Council, under grant number DP0985177.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Botev, Z.I., Kroese, D.P. The Generalized Cross Entropy Method, with Applications to Probability Density Estimation. Methodol Comput Appl Probab 13, 1–27 (2011). https://doi.org/10.1007/s11009-009-9133-7

Download citation

Received: 18 October 2007
Revised: 04 May 2009
Accepted: 04 May 2009
Published: 16 May 2009
Issue Date: March 2011
DOI: https://doi.org/10.1007/s11009-009-9133-7

Keywords

AMS 2000 Subject Classifications

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Generalized Cross Entropy Method, with Applications to Probability Density Estimation

Abstract

Access this article

Similar content being viewed by others

A new method for estimation and model selection: $$\rho $$ -estimation

Constrained Density Estimation

The locally Gaussian density estimator for multivariate data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

AMS 2000 Subject Classifications

Navigation

The Generalized Cross Entropy Method, with Applications to Probability Density Estimation

Abstract

Access this article

Similar content being viewed by others

A new method for estimation and model selection: $$\rho $$ -estimation

Constrained Density Estimation

The locally Gaussian density estimator for multivariate data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS 2000 Subject Classifications

Search

Navigation