A Comparison of New and Old Algorithms for a Mixture Estimation Problem

Helmbold, David P.; Schapire, Robert E.; Singer, Yoram; Warmuth, Manfred K.

doi:10.1023/A:1007301011561

A Comparison of New and Old Algorithms for a Mixture Estimation Problem

Published: April 1997

Volume 27, pages 97–119, (1997)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Comparison of New and Old Algorithms for a Mixture Estimation Problem

Download PDF

David P. Helmbold¹,
Robert E. Schapire²,
Yoram Singer³ &
…
Manfred K. Warmuth¹

1193 Accesses
30 Citations
Explore all metrics

Abstract

We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EG_ή). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EM_ή which, for ή=1, gives the usual EM update. Experimentally, both the EM_ή-update and the EG_ή-update for ή > 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EG_ή algorithm.

Article PDF

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Article 19 March 2019

Modified EM Algorithms for Parameter Estimation in Finite Mixture Models

Maximum likelihood estimation of Gaussian mixture models without matrix operations

Article 05 June 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abe, N., Takeuchi, J., and Warmuth, M. (1991). Polynomial learnability of probablistic concepts with respect to the Kullback-Leibler divergence. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 277-289. Morgan Kaufmann.
Bridle, J. (1989). Probabilistic interpretation of feedforward classification network outputs with relationships to statistical pattern recognition. In Fogelman-Souli, F. and Hérault, J., editors, Neuro-Computing: Algorithms, Architectures, and Applications. New York: Springer Verlag.
Google Scholar
Cover, T. (1991). Universal portfolios. Mathematical Finance, 1(1):1-29.
Google Scholar
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39:1-38.
Google Scholar
Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. Wiley.
Golub, G. and Van Loan, C. (1989). Matrix Computations. Johns-Hopkins University Press.
Helmbold, D., Schapire, R. E., Singer, Y., and Warmuth, M. K. (1996). On-line portfolio selection using multiplicative updates. In Proc. 13th International Conference on Machine Learning, pages 243-251. Morgan Kaufmann, San Francisco.
Google Scholar
Kivinen, J. and Warmuth, M. (1995a). Additive versus exponentiated gradient updates. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing.
Kivinen, J. and Warmuth, M. (1995b). The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. In Proceedings of the Eighth Annual Workshop on Computational Learning Theory.
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285-318.
Article Google Scholar
Luenberger, D. (1984). Linear and Nonlinear Programming. Addison-Wesley.
Meng, X. and Rubin, D. (1992). Recent extensions of the EM algorithm (with discussion). In Bernardo, J., Berger, J., Dawid, A., and Smith, A., editors, Bayesian Statistics, 4. Oxfod: Clarendon Press.
Google Scholar
Neal, R. and Hinton, G. (1993). A new view of the EM algorithm that justifies incremental and other variants. Unpublished manuscript.
Peters, B. and Walker, H. (1978a). An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM Journal of Applied Mathematics, 35:362-378.
Google Scholar
Peters, B. and Walker, H. (1978b). The numerical evaluation of the maximum-likelihood estimates of a subset of mixture proportions. SIAM Journal of Applied Mathematics, 35:447-452.
Google Scholar
Redner, R. and Walker, H. (1984). Mixture densities, maximum likelihood, and the EM algorithm. Siam Review, 26:195-239.
Google Scholar
Singer, Y. and Warmuth, M. (1996). Training algorithms for hidden markov models using entropy based distance functions. To appear in Advances in Neural Information Processing Systems, 8.

Download references

Author information

Authors and Affiliations

Computer and Information Sciences, University of California, Santa Cruz, CA, 95064
David P. Helmbold & Manfred K. Warmuth
AT&T Labs, 600 Mountain Avenue, Murray Hill, NJ 07974
Robert E. Schapire
600 Mountain Avenue, Murray Hill, NJ, 07974
Yoram Singer

Authors

David P. Helmbold
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Schapire
View author publications
You can also search for this author in PubMed Google Scholar
Yoram Singer
View author publications
You can also search for this author in PubMed Google Scholar
Manfred K. Warmuth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Helmbold, D.P., Schapire, R.E., Singer, Y. et al. A Comparison of New and Old Algorithms for a Mixture Estimation Problem. Machine Learning 27, 97–119 (1997). https://doi.org/10.1023/A:1007301011561

Download citation

Issue Date: April 1997
DOI: https://doi.org/10.1023/A:1007301011561

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Comparison of New and Old Algorithms for a Mixture Estimation Problem

Abstract

Article PDF

Similar content being viewed by others

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Modified EM Algorithms for Parameter Estimation in Finite Mixture Models

Maximum likelihood estimation of Gaussian mixture models without matrix operations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Comparison of New and Old Algorithms for a Mixture Estimation Problem

Abstract

Article PDF

Similar content being viewed by others

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Modified EM Algorithms for Parameter Estimation in Finite Mixture Models

Maximum likelihood estimation of Gaussian mixture models without matrix operations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation