The EM Algorithm, Its Randomized Implementation and Global Optimization: Some Challenges and Opportunities for Operations Research

  • Wolfgang Jank

Summary

The EM algorithm is a very powerful optimization method and has become popular in many fields. Unfortunately, EM is only a local optimization method and can get stuck in sub-optimal solutions. While more and more contemporary data/model combinations yield multiple local optima, there have been only very few attempts at making EM suitable for global optimization. In this paper we review the basic EM algorithm, its properties and challenges, and we focus in particular on its randomized implementation. The randomized EM implementation promises to solve some of the contemporary data/model challenges, and it is particularly well-suited for a wedding with global optimization ideas, since most global optimization paradigms are also based on the principles of randomization. We review some of the challenges of the randomized EM implementation and present a new algorithm that combines the principles of EM with that of the Genetic Algorithm. While this new algorithm shows some promising results for clustering of an online auction database of functional objects, the primary goal of this work is to bridge a gap between the field of statistics, which is home to extensive research on the EM algorithm, and the field of operations research, in which work on global optimization thrives, and to stimulate new ideas for joint research between the two.

Key words

Monte Carlo EM stochastic optimization mixture model clustering global optimization online auctions functional objects 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Amari. Information geometry of the EM and EM algorithms for neural networks. Neural Networks, 8:1379–1408, 1995.CrossRefGoogle Scholar
  2. 2.
    J. Booth and J. Hobert. Standard errors of prediction in generalized linear mixed models. Journal of the American Statistical Association, 93:262–272, 1998.CrossRefMathSciNetGoogle Scholar
  3. 3.
    J. Booth and J. Hobert. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society B, 61:265–285, 1999.CrossRefGoogle Scholar
  4. 4.
    J. Booth, J. Hobert and W. Jank. A survey of Monte Carlo algorithms for maximizing the likelihood of a two-stage hierarchical model. Statistical Modelling, 1:333–349, 2001.CrossRefGoogle Scholar
  5. 5.
    Z. Botev and D. Kroese. Global likelihood optimization via the cross-entropy method with an application to mixture models. In Proceedings of the 2004 Winter Simulation Conference, pages 529–535. IEEE Press, 2004.Google Scholar
  6. 6.
    R. Boyles. On the convergence of the EM algorithm. Journal of the Royal Statistical Society B, 45:47–50, 1983.MathSciNetGoogle Scholar
  7. 7.
    S. Caffo, J. Booth and A. Davison. Empirical sup rejection sampling. Biometrika, 89:745–754, 2002.CrossRefMathSciNetGoogle Scholar
  8. 8.
    B. Caffo, W. Jank and G. Jones. Ascent-Based Monte Carlo EM. Journal of the Royal Statistical Society, Series B, 67:235–252, 2005.CrossRefMathSciNetGoogle Scholar
  9. 9.
    G. Celeux and J. Diebolt. A stochastic approximation type EM algorithm for the mixture problem. Stochastics and Stochastics Reports, 41:127–146, 1992.MathSciNetGoogle Scholar
  10. 10.
    K. Chan and J. Ledolter. Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association, 90:242–252, 1995.CrossRefMathSciNetGoogle Scholar
  11. 11.
    B. Delyon, M. Lavielle and E. Moulines. Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics, 27:94–128, 1999.CrossRefMathSciNetGoogle Scholar
  12. 12.
    A. Dempster, N. Laird and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–22, 1977.MathSciNetGoogle Scholar
  13. 13.
    M. Feder and E. Weinstein. Parameter estimation of superimposed signals using the EM algorithm. Acoustics, Speech, and Signal Processing, 36:477–489, 1988.CrossRefGoogle Scholar
  14. 14.
    G. Fort and E. Moulines. Convergence of the Monte Carlo expectation maximization for curved exponential families. The Annals of Statistics, 31:1220–1259, 2003.CrossRefMathSciNetGoogle Scholar
  15. 15.
    M. Gu and S. Li. A stochastic approximation algorithm for maximum likelihood estimation with incomplete data. Canadian Journal of Statistics, 26:567–582, 1998.MathSciNetCrossRefGoogle Scholar
  16. 16.
    M. Gu and H.-T. Zhu. Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. Journal of the Royal Statistical Society B, 63:339–355, 2001.CrossRefMathSciNetGoogle Scholar
  17. 17.
    J. Heath, M. Fu and W. Jank. Global optimization with MRAS, Cross Entropy and the EM algorithm. Working Paper, Smith School of Business, University of Maryland, 2006.Google Scholar
  18. 18.
    J. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, MI, 1975.Google Scholar
  19. 19.
    M. Jamshidian and R. Jennrich. Conjugate gradient acceleration of the EM algorithm. Journal of the American Statistical Association, 88:221–228, 1993.CrossRefMathSciNetGoogle Scholar
  20. 20.
    M. Jamshidian and R. Jennrich. Acceleration of the EM algorithm by using Quasi-Newton methods. Journal of the Royal Statistical Society B, 59:569–587, 1997.CrossRefMathSciNetGoogle Scholar
  21. 21.
    W. Jank and G. Shmueli. Dynamic profiling of online auctions using curve clustering. Technical report, Smith School of Business, University of Maryland, 2003.Google Scholar
  22. 22.
    W. Jank and J. Booth. Efficiency of Monte Carlo EM and simulated maxi mum likelihood in two-stage hierarchical models. Journal of Computational and Graphical Statistics, in print, 2002.Google Scholar
  23. 23.
    W. Jank. Implementing and diagnosing the stochastic approximation EM algorithm. Technical report, University of Maryland, 2004.Google Scholar
  24. 24.
    W. Jank. Quasi-Monte Carlo Sampling to Improve the Efficiency of Monte Carlo EM. Computational Statistics and Data Analysis, 48:685–701, 2004.CrossRefMathSciNetGoogle Scholar
  25. 25.
    W. Jank. Ascent EM for fast and global model-based clustering: An application to curve-clustering of online auctions. Technical report, University of Maryland, 2005.Google Scholar
  26. 26.
    M. Jordan and R. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.CrossRefGoogle Scholar
  27. 27.
    K. Lange. A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society B, 57:425–437, 1995.Google Scholar
  28. 28.
    S. Lauritzen. The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19:191–201, 1995.CrossRefGoogle Scholar
  29. 29.
    P. L’Ecuyer and C. Lemieux. Recent advances in randomized Quasi-Monte Carlo Methods. In M Dror, P L’Ecuyer, and F Szidarovszki, editors, Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications, pages 419–474. Kluwer Academic Publishers, 2002.Google Scholar
  30. 30.
    C. Lemieux and P. L’Ecuyer. Efficiency improvement by lattice rules for pricing asian options. In Proceedings of the 1998 Winter Simulation Conference, pages 579–586. IEEE Press, 1998.Google Scholar
  31. 31.
    R. Levine and G. Casella. Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10:422–439, 2001.CrossRefMathSciNetGoogle Scholar
  32. 32.
    R. Levine and J. Fan. An automated (Markov Chain) Monte Carlo EM algorithm. Journal of Statistical Computation and Simulation, 74:349–359, 2004.CrossRefMathSciNetGoogle Scholar
  33. 33.
    C. Liu and D. Rubin. The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81:633–648, 1994.CrossRefMathSciNetGoogle Scholar
  34. 34.
    C. McCulloch. Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92:162–170, 1997.CrossRefMathSciNetGoogle Scholar
  35. 35.
    C. McCulloch and S. Searle. Generalized, Linear and Mixed Models. Wiley, New-York, 2001.MATHGoogle Scholar
  36. 36.
    G. McLachlan and D. Peel. Finite Mixture Models. Wiley, New York, 2000.MATHCrossRefGoogle Scholar
  37. 37.
    X.-L. Meng. On the rate of convergence of the ECM algorithm. The Annals of Statistics, 22:326–339, 1994.MathSciNetCrossRefGoogle Scholar
  38. 38.
    X.-L. Meng and D. Rubin. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80:267–278, 1993.CrossRefMathSciNetGoogle Scholar
  39. 39.
    R. Neal and G. Hinton. A view of EM that justifies incremental, sparse and other variants. In M Jordan, editor, Learning in Graphical Models, pages 355–371, 1998.Google Scholar
  40. 40.
    S.-K. Ng and G. McLachlan. On some variants of the EM Algorithm for fitting finite mixture models. Australian Journal of Statistics, 32:143–161, 2003.Google Scholar
  41. 41.
    S.-K. Ng and G. McLachlan. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Statistics and Computing, 13:45–55, 2003.CrossRefMathSciNetGoogle Scholar
  42. 42.
    K. Nigam, A. Mccallum, S. Thrun and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103–134, 2000.CrossRefGoogle Scholar
  43. 43.
    A. Owen and S. Tribble. A Quasi-Monte Carlo Metropolis algorithm. Proceedings of the National Academy of Sciences, 102:8844–8849, 2005.CrossRefMathSciNetADSGoogle Scholar
  44. 44.
    B. Polyak and A. Juditsky. Acceleration of stochastic approximation by aver aging. SIAM Journal of Control and Optimization, 30:838–855, 1992.CrossRefMathSciNetGoogle Scholar
  45. 45.
    F. Quintana, J. Liu and G. delPino. Monte Carlo EM with importance reweighting and its applications in random effects models. Computational Statistics and Data Analysis, 29:429–444, 1999.CrossRefGoogle Scholar
  46. 46.
    J. Ramsay and B. Silverman. Functional Data Analysis. Springer-Verlag, New York, 1997.MATHGoogle Scholar
  47. 47.
    H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22:400–407, 1951.MathSciNetCrossRefGoogle Scholar
  48. 48.
    D. Rubin. EM and beyond. Psychometrika, 56:241–254, 1991.CrossRefMathSciNetGoogle Scholar
  49. 49.
    B. Thiesson, C. Meek and D. Heckerman. Accelerating EM for large databases. Machine Learning, 45:279–299, 2001.CrossRefGoogle Scholar
  50. 50.
    Y. Tu, M. Ball and W. Jank. Estimating flight departure delay distributions: A statistical approach with long-term trend and short-term pattern. Technical report, University of Maryland, 2005.Google Scholar
  51. 51.
    X. Wang and F. Hickernell. Randomized Halton sequences. Mathematical and Computer Modelling, 32:887–899, 2000.CrossRefMathSciNetGoogle Scholar
  52. 52.
    G. Wei and M. Tanner. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85:699–704, 1990.CrossRefGoogle Scholar
  53. 53.
    C. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11:95–103, 1983.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  • Wolfgang Jank
    • 1
  1. 1.Robert H. Smith School of Business, Department of Decision and Information TechnologiesUniversity of MarylandCollege Park

Personalised recommendations