Skip to main content
Log in

Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints

  • Information-Theoretic Approach
  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this article, mixture distributions and weighted likelihoods are derived within an information-theoretic framework and shown to be closely related. This surprising relationship obtains in spite of the arithmetic form of the former and the geometric form of the latter. Mixture distributions are shown to be optima that minimize the entropy loss under certain constraints. The same framework implies the weighted likelihood when the distributions in the mixture are unknown and information from independent samples generated by them have to be used instead. Thus the likelihood weights trade bias for precision and yield inferential procedures such as estimates that can be more reliable than their classical counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principles,Proceedings of the Second International Symposium on Information Theory (eds. B. N. Petrov and F. Caski), 267–281, Akademiai Kiado, Budapest.

    Google Scholar 

  • Akaike, H. (1977). On entropy maximization principle,Applications of Statistics (ed. P. R. Krishnaiah), 27–41, North-Holland, Amsterdam.

    Google Scholar 

  • Beavis, B. and Dobbs, I. (1990).Optimization and Stability Theory for Economic Analysis, Cambridge University Press, New York.

    MATH  Google Scholar 

  • Bernardo, J. M. (1979). Expected information as expected utility,The Annals of Statistics,7, 686–690.

    Article  MATH  MathSciNet  Google Scholar 

  • Brillinger, D. R. (1977). Discussion of Stone (1977),The Annals of Statistics,5, 622–623.

    Google Scholar 

  • Brown, I. D. (1966). On the admissibility of invariant estimators of one or more location parameters,The Annals of Mathematical Statistics,37, 1087–1136.

    Article  MATH  Google Scholar 

  • Copas, J. B. (1995). Local likelihood based on kernel censoring,Journal of the Royal Statistical Society Series B,57, 221–235.

    MATH  MathSciNet  Google Scholar 

  • Cover, T. M. and Thomas, J. A. (1991).Elements of Information Theory, Wiley, New York.

    MATH  Google Scholar 

  • Csiszar, I. (1975).I-divergence geometry of probability distributions and minimization problems,The Annals of Probability,3, 146–158.

    Article  MATH  MathSciNet  Google Scholar 

  • Dacorogna, B. (1989).Direet Methods in the Calculus of Variations, Springer-Verlag, New York.

    Google Scholar 

  • Easton, G. S. (1991). Compromise maximum likelihood estimators for location,Journal of the American Statistical Association,83, 1051–1073.

    Article  Google Scholar 

  • Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach,Journal of the American Statistical Association,68, 117–130.

    Article  MATH  MathSciNet  Google Scholar 

  • Eguchi S. and Copas, J. (1998). A class of local likelihood methods and near-parametric asymptotics,Journal of the Royal Statistical Society Series B,60, 709–724.

    Article  MATH  MathSciNet  Google Scholar 

  • Ekeland, I. and Temam, R. (1976).Convex Analysis and Variational Problems, American Elsevier Publishing Company, New York.

    MATH  Google Scholar 

  • Field, C. and Smith, B. (1994). Robust estimation: A weighted maximum likelihood approach,International Statistics Review,62, 405–424.

    Article  MATH  Google Scholar 

  • Ghosh, M. and Yang, M. C. (1988). Simultaneous estimation of the multivariate precision matrix,The Annals of Statistics,16, 278–291.

    Article  MATH  MathSciNet  Google Scholar 

  • Giaquinta, M. and Hildebrandt, S. (1996).Calculus of Variations, Springer-Verlag, New York.

    Google Scholar 

  • Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix,The Annals of Statistics,8, 586–597.

    Article  MATH  MathSciNet  Google Scholar 

  • Hjort, N. L. and Jones, M. C. (1996) Locally parametric non-parametric density estimation,The Annals of Statistics,24, 1619–1647.

    Article  MATH  MathSciNet  Google Scholar 

  • Hu, F. (1997). The asymptotic properties of the maximum-relevance weighted likelihood estimators,The Canadian Journal of Statistics,25, 45–59.

    Article  MATH  Google Scholar 

  • Hu, F. and Rosenberger, W. F. (2000). Analysis of time trends in adaptive designs with applications to neurophysiology experiment,Statistics in Medicine,19, 2067–2075.

    Article  Google Scholar 

  • Hu, F. and Zidek, J. V. (1993). A relevance weighted nonparametric quantile estimator, Tech. Report No. 134, Department of Statistics, University of British Columbia, Vancouver, Canada.

    Google Scholar 

  • Hu, F. and Zidek, J. V. (1995). Incorporating relevant sample information using the likelihood, Tech. Report No. 161, Department of Statistics, University of British Columbia, Vancouver, Canada.

    Google Scholar 

  • Hu, F. and Zidek, J. V. (2001). The relevance weighted likelihood with applications,Empirical Bayes and Likelihood Inference (eds. S. E. Ahmed and N. Reid), Springer-Verlag, New York.

    Google Scholar 

  • Hu, F. and Zidek, J. V. (2002). The weighted likelihood,The Canadian Journal of Statistics,30, 347–371.

    Article  MATH  MathSciNet  Google Scholar 

  • Hunsberger, S. (1994). Semiparametric regression in likelihood-based models,Journal of the American Statistical Association,89, 1354–1365.

    Article  MATH  MathSciNet  Google Scholar 

  • James, W. and Stein, C. (1961). Estimation with quadratic loss,Proceedings of 4th Berkeley Symposium on Mathematical Statistics and Probability,1, 361–379, University of California Press, Berkeley, California.

    MathSciNet  Google Scholar 

  • Kullback, S. (1959).Information Theory and Statistics, Wiley, New York.

    MATH  Google Scholar 

  • LeBlanc, M. and Crowley, M. (1995). Semiparametric regression functionals,Journal of the American Statistical Association,90, 95–105.

    Article  MATH  MathSciNet  Google Scholar 

  • Loader, C. R. (1996). Local likelihood density estimation,The Annals of Statistics,24, 1602–1618.

    Article  MATH  MathSciNet  Google Scholar 

  • Markatou, M., Basu, A. and Lindsay, B. G. (1997). Weighted likelihood estimating equations: The discrete case with applications to logistic regression,Journal of Statistical Planning and Inference,57, 215–232.

    Article  MATH  MathSciNet  Google Scholar 

  • Markatou, M., Basu, A. and Lindsay, B. G. (1998). Weighted likelihood equations with bootstrap root search,Journal of the American Statistical Association,93, 740–750.

    Article  MATH  MathSciNet  Google Scholar 

  • Morgenthaler, S. and Tukey, J. W. (1991).Configural Polysampling: A Route to Practical Robustness, Wiley, New York.

    MATH  Google Scholar 

  • Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap,Journal of the Royal Statistical Society Series B,56, 3–48.

    MATH  MathSciNet  Google Scholar 

  • Owen, A. B. (2001).Empirical Likelihood, Chapman and Hall, New York.

    MATH  Google Scholar 

  • Parsian, A. and Nematollahi, N. (1996). Estimation of scale parameter under entropy loss,Journal of Statistical Planning and Inference,52, 77–91.

    Article  MATH  MathSciNet  Google Scholar 

  • Rao, P. B. L. S. (1991). Asymptotic theory of weighted maximum likelihood estimation for growth models,Statistical Inference for Stochastic Processes (eds. N. U. Prabhu and I. V. Vasawa), 183–208, Marcel Dekker.

  • Ren, J. (2001). Weighted empirical likelihood ratio confidence interval for the mean with censored data,Annals of the Institute of Statistical Mathematics,53, 498–516.

    Article  MATH  MathSciNet  Google Scholar 

  • Royden, H. L. (1988).Real Analysis, Prentice Hall, New York.

    MATH  Google Scholar 

  • San Martini, A. and Spezzaferri, F. (1984). A predictive model selection criterion,Journal of the Royal Statistical Society Series B,57, 99–138.

    Google Scholar 

  • Staniswalis, J. G. (1989). The kernel estimate of a regression function in a likelihood-based models,Journal of the American Statistical Assocation,84, 276–283.

    Article  MATH  MathSciNet  Google Scholar 

  • Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution,Proceeding of the 3rd Berkeley Symposium on Mathematical Statistics and Probability,1, 107–206, University of California Press, Berkeley.

    Google Scholar 

  • Tibshirani, R. and Hastie, T. (1987). Local likelihood of statistical predictions,Journal of the Royal Statistical Society Series B,36, 111–147.

    Google Scholar 

  • Trottini, M. and Spezzaferri, F. (2002). A generalized predictive criterion for model selection,The Canadian Journal of Statistics,30, 79–96.

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, X. and Zidek, J. V. (2004). Selecting likelihood weights by cross-validation,The Annals of Statistics (in press).

  • Wang, X., van Eeden, C. and Zidek, J. V. (2004). Asymptotic properties of maximum weighted likelihood estimators,Journal of Statistical Inference and Planning 119, 37–54.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

About this article

Cite this article

Wang, X., Zidek, J.V. Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints. Ann Inst Stat Math 57, 687–701 (2005). https://doi.org/10.1007/BF02915433

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02915433

Key words and phrases

Navigation