Abstract
We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EGή). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMή which, for ή=1, gives the usual EM update. Experimentally, both the EMή-update and the EGή-update for ή > 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EGή algorithm.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abe, N., Takeuchi, J., and Warmuth, M. (1991). Polynomial learnability of probablistic concepts with respect to the Kullback-Leibler divergence. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 277-289. Morgan Kaufmann.
Bridle, J. (1989). Probabilistic interpretation of feedforward classification network outputs with relationships to statistical pattern recognition. In Fogelman-Souli, F. and Hérault, J., editors, Neuro-Computing: Algorithms, Architectures, and Applications. New York: Springer Verlag.
Cover, T. (1991). Universal portfolios. Mathematical Finance, 1(1):1-29.
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39:1-38.
Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. Wiley.
Golub, G. and Van Loan, C. (1989). Matrix Computations. Johns-Hopkins University Press.
Helmbold, D., Schapire, R. E., Singer, Y., and Warmuth, M. K. (1996). On-line portfolio selection using multiplicative updates. In Proc. 13th International Conference on Machine Learning, pages 243-251. Morgan Kaufmann, San Francisco.
Kivinen, J. and Warmuth, M. (1995a). Additive versus exponentiated gradient updates. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing.
Kivinen, J. and Warmuth, M. (1995b). The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. In Proceedings of the Eighth Annual Workshop on Computational Learning Theory.
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285-318.
Luenberger, D. (1984). Linear and Nonlinear Programming. Addison-Wesley.
Meng, X. and Rubin, D. (1992). Recent extensions of the EM algorithm (with discussion). In Bernardo, J., Berger, J., Dawid, A., and Smith, A., editors, Bayesian Statistics, 4. Oxfod: Clarendon Press.
Neal, R. and Hinton, G. (1993). A new view of the EM algorithm that justifies incremental and other variants. Unpublished manuscript.
Peters, B. and Walker, H. (1978a). An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM Journal of Applied Mathematics, 35:362-378.
Peters, B. and Walker, H. (1978b). The numerical evaluation of the maximum-likelihood estimates of a subset of mixture proportions. SIAM Journal of Applied Mathematics, 35:447-452.
Redner, R. and Walker, H. (1984). Mixture densities, maximum likelihood, and the EM algorithm. Siam Review, 26:195-239.
Singer, Y. and Warmuth, M. (1996). Training algorithms for hidden markov models using entropy based distance functions. To appear in Advances in Neural Information Processing Systems, 8.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Helmbold, D.P., Schapire, R.E., Singer, Y. et al. A Comparison of New and Old Algorithms for a Mixture Estimation Problem. Machine Learning 27, 97–119 (1997). https://doi.org/10.1023/A:1007301011561
Issue Date:
DOI: https://doi.org/10.1023/A:1007301011561