Abstract
In this article, mixture distributions and weighted likelihoods are derived within an information-theoretic framework and shown to be closely related. This surprising relationship obtains in spite of the arithmetic form of the former and the geometric form of the latter. Mixture distributions are shown to be optima that minimize the entropy loss under certain constraints. The same framework implies the weighted likelihood when the distributions in the mixture are unknown and information from independent samples generated by them have to be used instead. Thus the likelihood weights trade bias for precision and yield inferential procedures such as estimates that can be more reliable than their classical counterparts.
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principles,Proceedings of the Second International Symposium on Information Theory (eds. B. N. Petrov and F. Caski), 267–281, Akademiai Kiado, Budapest.
Akaike, H. (1977). On entropy maximization principle,Applications of Statistics (ed. P. R. Krishnaiah), 27–41, North-Holland, Amsterdam.
Beavis, B. and Dobbs, I. (1990).Optimization and Stability Theory for Economic Analysis, Cambridge University Press, New York.
Bernardo, J. M. (1979). Expected information as expected utility,The Annals of Statistics,7, 686–690.
Brillinger, D. R. (1977). Discussion of Stone (1977),The Annals of Statistics,5, 622–623.
Brown, I. D. (1966). On the admissibility of invariant estimators of one or more location parameters,The Annals of Mathematical Statistics,37, 1087–1136.
Copas, J. B. (1995). Local likelihood based on kernel censoring,Journal of the Royal Statistical Society Series B,57, 221–235.
Cover, T. M. and Thomas, J. A. (1991).Elements of Information Theory, Wiley, New York.
Csiszar, I. (1975).I-divergence geometry of probability distributions and minimization problems,The Annals of Probability,3, 146–158.
Dacorogna, B. (1989).Direet Methods in the Calculus of Variations, Springer-Verlag, New York.
Easton, G. S. (1991). Compromise maximum likelihood estimators for location,Journal of the American Statistical Association,83, 1051–1073.
Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach,Journal of the American Statistical Association,68, 117–130.
Eguchi S. and Copas, J. (1998). A class of local likelihood methods and near-parametric asymptotics,Journal of the Royal Statistical Society Series B,60, 709–724.
Ekeland, I. and Temam, R. (1976).Convex Analysis and Variational Problems, American Elsevier Publishing Company, New York.
Field, C. and Smith, B. (1994). Robust estimation: A weighted maximum likelihood approach,International Statistics Review,62, 405–424.
Ghosh, M. and Yang, M. C. (1988). Simultaneous estimation of the multivariate precision matrix,The Annals of Statistics,16, 278–291.
Giaquinta, M. and Hildebrandt, S. (1996).Calculus of Variations, Springer-Verlag, New York.
Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix,The Annals of Statistics,8, 586–597.
Hjort, N. L. and Jones, M. C. (1996) Locally parametric non-parametric density estimation,The Annals of Statistics,24, 1619–1647.
Hu, F. (1997). The asymptotic properties of the maximum-relevance weighted likelihood estimators,The Canadian Journal of Statistics,25, 45–59.
Hu, F. and Rosenberger, W. F. (2000). Analysis of time trends in adaptive designs with applications to neurophysiology experiment,Statistics in Medicine,19, 2067–2075.
Hu, F. and Zidek, J. V. (1993). A relevance weighted nonparametric quantile estimator, Tech. Report No. 134, Department of Statistics, University of British Columbia, Vancouver, Canada.
Hu, F. and Zidek, J. V. (1995). Incorporating relevant sample information using the likelihood, Tech. Report No. 161, Department of Statistics, University of British Columbia, Vancouver, Canada.
Hu, F. and Zidek, J. V. (2001). The relevance weighted likelihood with applications,Empirical Bayes and Likelihood Inference (eds. S. E. Ahmed and N. Reid), Springer-Verlag, New York.
Hu, F. and Zidek, J. V. (2002). The weighted likelihood,The Canadian Journal of Statistics,30, 347–371.
Hunsberger, S. (1994). Semiparametric regression in likelihood-based models,Journal of the American Statistical Association,89, 1354–1365.
James, W. and Stein, C. (1961). Estimation with quadratic loss,Proceedings of 4th Berkeley Symposium on Mathematical Statistics and Probability,1, 361–379, University of California Press, Berkeley, California.
Kullback, S. (1959).Information Theory and Statistics, Wiley, New York.
LeBlanc, M. and Crowley, M. (1995). Semiparametric regression functionals,Journal of the American Statistical Association,90, 95–105.
Loader, C. R. (1996). Local likelihood density estimation,The Annals of Statistics,24, 1602–1618.
Markatou, M., Basu, A. and Lindsay, B. G. (1997). Weighted likelihood estimating equations: The discrete case with applications to logistic regression,Journal of Statistical Planning and Inference,57, 215–232.
Markatou, M., Basu, A. and Lindsay, B. G. (1998). Weighted likelihood equations with bootstrap root search,Journal of the American Statistical Association,93, 740–750.
Morgenthaler, S. and Tukey, J. W. (1991).Configural Polysampling: A Route to Practical Robustness, Wiley, New York.
Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap,Journal of the Royal Statistical Society Series B,56, 3–48.
Owen, A. B. (2001).Empirical Likelihood, Chapman and Hall, New York.
Parsian, A. and Nematollahi, N. (1996). Estimation of scale parameter under entropy loss,Journal of Statistical Planning and Inference,52, 77–91.
Rao, P. B. L. S. (1991). Asymptotic theory of weighted maximum likelihood estimation for growth models,Statistical Inference for Stochastic Processes (eds. N. U. Prabhu and I. V. Vasawa), 183–208, Marcel Dekker.
Ren, J. (2001). Weighted empirical likelihood ratio confidence interval for the mean with censored data,Annals of the Institute of Statistical Mathematics,53, 498–516.
Royden, H. L. (1988).Real Analysis, Prentice Hall, New York.
San Martini, A. and Spezzaferri, F. (1984). A predictive model selection criterion,Journal of the Royal Statistical Society Series B,57, 99–138.
Staniswalis, J. G. (1989). The kernel estimate of a regression function in a likelihood-based models,Journal of the American Statistical Assocation,84, 276–283.
Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution,Proceeding of the 3rd Berkeley Symposium on Mathematical Statistics and Probability,1, 107–206, University of California Press, Berkeley.
Tibshirani, R. and Hastie, T. (1987). Local likelihood of statistical predictions,Journal of the Royal Statistical Society Series B,36, 111–147.
Trottini, M. and Spezzaferri, F. (2002). A generalized predictive criterion for model selection,The Canadian Journal of Statistics,30, 79–96.
Wang, X. and Zidek, J. V. (2004). Selecting likelihood weights by cross-validation,The Annals of Statistics (in press).
Wang, X., van Eeden, C. and Zidek, J. V. (2004). Asymptotic properties of maximum weighted likelihood estimators,Journal of Statistical Inference and Planning 119, 37–54.
Author information
Authors and Affiliations
About this article
Cite this article
Wang, X., Zidek, J.V. Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints. Ann Inst Stat Math 57, 687–701 (2005). https://doi.org/10.1007/BF02915433
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02915433