Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints

Wang, Xiaogang; Zidek, James V.

doi:10.1007/BF02915433

Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints

Information-Theoretic Approach
Published: December 2005

Volume 57, pages 687–701, (2005)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Xiaogang Wang¹ &
James V. Zidek²

200 Accesses
12 Citations
Explore all metrics

Abstract

In this article, mixture distributions and weighted likelihoods are derived within an information-theoretic framework and shown to be closely related. This surprising relationship obtains in spite of the arithmetic form of the former and the geometric form of the latter. Mixture distributions are shown to be optima that minimize the entropy loss under certain constraints. The same framework implies the weighted likelihood when the distributions in the mixture are unknown and information from independent samples generated by them have to be used instead. Thus the likelihood weights trade bias for precision and yield inferential procedures such as estimates that can be more reliable than their classical counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Testing the Number and the Nature of the Components in a Mixture Distribution

The Statistical Minkowski Distances: Closed-Form Formula for Gaussian Mixture Models

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Article 04 March 2020

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principles,Proceedings of the Second International Symposium on Information Theory (eds. B. N. Petrov and F. Caski), 267–281, Akademiai Kiado, Budapest.
Google Scholar
Akaike, H. (1977). On entropy maximization principle,Applications of Statistics (ed. P. R. Krishnaiah), 27–41, North-Holland, Amsterdam.
Google Scholar
Beavis, B. and Dobbs, I. (1990).Optimization and Stability Theory for Economic Analysis, Cambridge University Press, New York.
MATH Google Scholar
Bernardo, J. M. (1979). Expected information as expected utility,The Annals of Statistics,7, 686–690.
Article MATH MathSciNet Google Scholar
Brillinger, D. R. (1977). Discussion of Stone (1977),The Annals of Statistics,5, 622–623.
Google Scholar
Brown, I. D. (1966). On the admissibility of invariant estimators of one or more location parameters,The Annals of Mathematical Statistics,37, 1087–1136.
Article MATH Google Scholar
Copas, J. B. (1995). Local likelihood based on kernel censoring,Journal of the Royal Statistical Society Series B,57, 221–235.
MATH MathSciNet Google Scholar
Cover, T. M. and Thomas, J. A. (1991).Elements of Information Theory, Wiley, New York.
MATH Google Scholar
Csiszar, I. (1975).I-divergence geometry of probability distributions and minimization problems,The Annals of Probability,3, 146–158.
Article MATH MathSciNet Google Scholar
Dacorogna, B. (1989).Direet Methods in the Calculus of Variations, Springer-Verlag, New York.
Google Scholar
Easton, G. S. (1991). Compromise maximum likelihood estimators for location,Journal of the American Statistical Association,83, 1051–1073.
Article Google Scholar
Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach,Journal of the American Statistical Association,68, 117–130.
Article MATH MathSciNet Google Scholar
Eguchi S. and Copas, J. (1998). A class of local likelihood methods and near-parametric asymptotics,Journal of the Royal Statistical Society Series B,60, 709–724.
Article MATH MathSciNet Google Scholar
Ekeland, I. and Temam, R. (1976).Convex Analysis and Variational Problems, American Elsevier Publishing Company, New York.
MATH Google Scholar
Field, C. and Smith, B. (1994). Robust estimation: A weighted maximum likelihood approach,International Statistics Review,62, 405–424.
Article MATH Google Scholar
Ghosh, M. and Yang, M. C. (1988). Simultaneous estimation of the multivariate precision matrix,The Annals of Statistics,16, 278–291.
Article MATH MathSciNet Google Scholar
Giaquinta, M. and Hildebrandt, S. (1996).Calculus of Variations, Springer-Verlag, New York.
Google Scholar
Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix,The Annals of Statistics,8, 586–597.
Article MATH MathSciNet Google Scholar
Hjort, N. L. and Jones, M. C. (1996) Locally parametric non-parametric density estimation,The Annals of Statistics,24, 1619–1647.
Article MATH MathSciNet Google Scholar
Hu, F. (1997). The asymptotic properties of the maximum-relevance weighted likelihood estimators,The Canadian Journal of Statistics,25, 45–59.
Article MATH Google Scholar
Hu, F. and Rosenberger, W. F. (2000). Analysis of time trends in adaptive designs with applications to neurophysiology experiment,Statistics in Medicine,19, 2067–2075.
Article Google Scholar
Hu, F. and Zidek, J. V. (1993). A relevance weighted nonparametric quantile estimator, Tech. Report No. 134, Department of Statistics, University of British Columbia, Vancouver, Canada.
Google Scholar
Hu, F. and Zidek, J. V. (1995). Incorporating relevant sample information using the likelihood, Tech. Report No. 161, Department of Statistics, University of British Columbia, Vancouver, Canada.
Google Scholar
Hu, F. and Zidek, J. V. (2001). The relevance weighted likelihood with applications,Empirical Bayes and Likelihood Inference (eds. S. E. Ahmed and N. Reid), Springer-Verlag, New York.
Google Scholar
Hu, F. and Zidek, J. V. (2002). The weighted likelihood,The Canadian Journal of Statistics,30, 347–371.
Article MATH MathSciNet Google Scholar
Hunsberger, S. (1994). Semiparametric regression in likelihood-based models,Journal of the American Statistical Association,89, 1354–1365.
Article MATH MathSciNet Google Scholar
James, W. and Stein, C. (1961). Estimation with quadratic loss,Proceedings of 4th Berkeley Symposium on Mathematical Statistics and Probability,1, 361–379, University of California Press, Berkeley, California.
MathSciNet Google Scholar
Kullback, S. (1959).Information Theory and Statistics, Wiley, New York.
MATH Google Scholar
LeBlanc, M. and Crowley, M. (1995). Semiparametric regression functionals,Journal of the American Statistical Association,90, 95–105.
Article MATH MathSciNet Google Scholar
Loader, C. R. (1996). Local likelihood density estimation,The Annals of Statistics,24, 1602–1618.
Article MATH MathSciNet Google Scholar
Markatou, M., Basu, A. and Lindsay, B. G. (1997). Weighted likelihood estimating equations: The discrete case with applications to logistic regression,Journal of Statistical Planning and Inference,57, 215–232.
Article MATH MathSciNet Google Scholar
Markatou, M., Basu, A. and Lindsay, B. G. (1998). Weighted likelihood equations with bootstrap root search,Journal of the American Statistical Association,93, 740–750.
Article MATH MathSciNet Google Scholar
Morgenthaler, S. and Tukey, J. W. (1991).Configural Polysampling: A Route to Practical Robustness, Wiley, New York.
MATH Google Scholar
Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap,Journal of the Royal Statistical Society Series B,56, 3–48.
MATH MathSciNet Google Scholar
Owen, A. B. (2001).Empirical Likelihood, Chapman and Hall, New York.
MATH Google Scholar
Parsian, A. and Nematollahi, N. (1996). Estimation of scale parameter under entropy loss,Journal of Statistical Planning and Inference,52, 77–91.
Article MATH MathSciNet Google Scholar
Rao, P. B. L. S. (1991). Asymptotic theory of weighted maximum likelihood estimation for growth models,Statistical Inference for Stochastic Processes (eds. N. U. Prabhu and I. V. Vasawa), 183–208, Marcel Dekker.
Ren, J. (2001). Weighted empirical likelihood ratio confidence interval for the mean with censored data,Annals of the Institute of Statistical Mathematics,53, 498–516.
Article MATH MathSciNet Google Scholar
Royden, H. L. (1988).Real Analysis, Prentice Hall, New York.
MATH Google Scholar
San Martini, A. and Spezzaferri, F. (1984). A predictive model selection criterion,Journal of the Royal Statistical Society Series B,57, 99–138.
Google Scholar
Staniswalis, J. G. (1989). The kernel estimate of a regression function in a likelihood-based models,Journal of the American Statistical Assocation,84, 276–283.
Article MATH MathSciNet Google Scholar
Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution,Proceeding of the 3rd Berkeley Symposium on Mathematical Statistics and Probability,1, 107–206, University of California Press, Berkeley.
Google Scholar
Tibshirani, R. and Hastie, T. (1987). Local likelihood of statistical predictions,Journal of the Royal Statistical Society Series B,36, 111–147.
Google Scholar
Trottini, M. and Spezzaferri, F. (2002). A generalized predictive criterion for model selection,The Canadian Journal of Statistics,30, 79–96.
Article MATH MathSciNet Google Scholar
Wang, X. and Zidek, J. V. (2004). Selecting likelihood weights by cross-validation,The Annals of Statistics (in press).
Wang, X., van Eeden, C. and Zidek, J. V. (2004). Asymptotic properties of maximum weighted likelihood estimators,Journal of Statistical Inference and Planning 119, 37–54.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, York University, 4700 Keele Street, M3J 1P3, ON, Canada
Xiaogang Wang
Department of Statistics, University of British Columbia, 33-6356 Agriculture Road, V6T 1Z2, BC, Canada
James V. Zidek

Authors

Xiaogang Wang
View author publications
You can also search for this author in PubMed Google Scholar
James V. Zidek
View author publications
You can also search for this author in PubMed Google Scholar

About this article

Cite this article

Wang, X., Zidek, J.V. Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints. Ann Inst Stat Math 57, 687–701 (2005). https://doi.org/10.1007/BF02915433

Download citation

Received: 26 May 2003
Revised: 07 July 2004
Issue Date: December 2005
DOI: https://doi.org/10.1007/BF02915433

Key words and phrases

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints

Abstract

Access this article

Similar content being viewed by others

Testing the Number and the Nature of the Components in a Mixture Distribution

The Statistical Minkowski Distances: Closed-Form Formula for Gaussian Mixture Models

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

References

Author information

Authors and Affiliations

About this article

Cite this article

Key words and phrases

Navigation

Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints

Abstract

Access this article

Similar content being viewed by others

Testing the Number and the Nature of the Components in a Mixture Distribution

The Statistical Minkowski Distances: Closed-Form Formula for Gaussian Mixture Models

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

References

Author information

Authors and Affiliations

About this article

Cite this article

Share this article

Key words and phrases

Search

Navigation