The MM Algorithm

  • Kenneth Lange
Part of the Statistics and Computing book series (SCO)


Most practical optimization problems defy exact solution. In the current chapter we discuss an optimization method that relies heavily on convexity arguments and is particularly useful in high-dimensional problems such as image reconstruction [27]. This iterative method is called the MM algorithm. One of the virtues of the MM acronym is that it does double duty. In minimization problems, the first M stands for majorize and the second M for minimize. In maximization problems, the first M stands for minorize and the second M for maximize. When it is successful, the MM algorithm substitutes a simple optimization problem for a difficult optimization problem. Simplicity can be attained by (a) avoiding large matrix inversions, (b) linearizing an optimization problem, (c) separating the variables of an optimization problem, (d) dealing with equality and inequality constraints gracefully, and (e) turning a nondifferentiable problem into a smooth problem. In simplifying the original problem, we pay the price of iteration or iteration with a slower rate of convergence.


Projection Line Surrogate Function Random Graph Model Cyclic Coordinate Descent Transmission Tomography 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Becker MP, Yang I, Lange K (1997) EM algorithms without missing data. Stat Methods Med Res 6:37-53Google Scholar
  2. 2.
    Böhning D, Lindsay BG (1988) Monotonicity of quadratic approximation algorithms. Ann Instit Stat Math 40:641-663MATHCrossRefGoogle Scholar
  3. 3.
    Bradley RA, Terry ME (1952), Rank analysis of incomplete block designs. Biometrika, 39:324-345MATHMathSciNetGoogle Scholar
  4. 4.
    De Leeuw J (1994) Block relaxation algorithms in statistics. in Information Systems and Data Analysis, Bock HH, Lenski W, Richter MM, Springer, New York, pp 308-325Google Scholar
  5. 5.
    De Leeuw J (2006) Some majorization techniques. Preprint series, UCLA Department of Statistics.Google Scholar
  6. 6.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1-38MATHMathSciNetGoogle Scholar
  7. 7.
    Dempster AP, Laird NM, Rubin DB (1980) Iteratively reweighted least squares for linear regression when the errors are normal/independent distributed. in Multivariate Analysis V, Krishnaiah PR, editor, North Holland, Amsterdam, pp 35-57Google Scholar
  8. 8.
    De Pierro AR (1993) On the relation between the ISRA and EM algorithm for positron emission tomography. IEEE Trans Med Imaging 12:328-333CrossRefGoogle Scholar
  9. 9.
    Geman S, McClure D (1985) Bayesian image analysis: An application to single photon emission tomography. Proc Stat Comput Sec, Amer Stat Assoc, Washington, DC, pp 12-18Google Scholar
  10. 10.
    Green P (1990) Bayesian reconstruction for emission tomography data using a modified EM algorithm. IEEE Trans Med Imaging 9:84-94CrossRefGoogle Scholar
  11. 11.
    Grimmett GR, Stirzaker DR (1992) Probability and Random Processes, 2nd ed. Oxford University Press, OxfordGoogle Scholar
  12. 12.
    Heiser WJ (1995) Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. in Recent Advances in Descriptive Multivariate Analysis, Krzanowski WJ, Clarendon Press, Oxford pp 157-189Google Scholar
  13. 13.
    Herman GT (1980) Image Reconstruction from Projections: The Fundamentals of Computerized Tomography. Springer, New YorkMATHGoogle Scholar
  14. 14.
    Hoel PG, Port SC, Stone CJ (1971) Introduction to Probability Theory. Houghton Mifflin, BostonMATHGoogle Scholar
  15. 15.
    Huber PJ (1981) Robust Statistics, Wiley, New YorkMATHCrossRefGoogle Scholar
  16. 16.
    Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Annals Stat 32:386-408MathSciNetGoogle Scholar
  17. 17.
    Hunter DR, Lange K (2004) A tutorial on MM algorithms. Amer Statistician 58:30-37CrossRefMathSciNetGoogle Scholar
  18. 18.
    Karlin S, Taylor HM (1975) A First Course in Stochastic Processes, 2nd ed. Academic Press, New YorkMATHGoogle Scholar
  19. 19.
    Keener JP (1993), The Perron-Frobenius theorem and the ranking of football teams. SIAM Review, 35:80-93MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Kent JT, Tyler DE, Vardi Y (1994) A curious likelihood identity for the multivariate t-distribution. Comm Stat Simulation 23:441-453MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Kingman JFC (1993) Poisson Processes. Oxford University Press, OxfordMATHGoogle Scholar
  22. 22.
    Lange K (1995) A gradient algorithm locally equivalent to the EM algorithm. J Roy Stat Soc B 57:425-437MATHGoogle Scholar
  23. 23.
    Lange K (2002) Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. Springer, New YorkMATHGoogle Scholar
  24. 24.
    Lange K (2004) Optimization. Springer, New YorkMATHGoogle Scholar
  25. 25.
    Lange K, Carson R (1984) EM reconstruction algorithms for emission and transmission tomography. J Computer Assist Tomography 8:306-316Google Scholar
  26. 26.
    Lange K, Fessler JA (1995) Globally convergent algorithms for maximum a posteriori transmission tomography. IEEE Trans Image Processing 4:1430-1438CrossRefGoogle Scholar
  27. 27.
    Lange K, Hunter D, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Computational Graphical Stat 9:1-59CrossRefMathSciNetGoogle Scholar
  28. 28.
    Lange K, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. J Amer Stat Assoc 84:881-896CrossRefMathSciNetGoogle Scholar
  29. 29.
    Lange K, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comp Graph Stat 2:175-198CrossRefMathSciNetGoogle Scholar
  30. 30.
    Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis. Wiley, New YorkMATHGoogle Scholar
  31. 31.
    Luce RD (1977) The choice axiom after twenty years. J Math Psychology 15:215-233MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions, 2nd ed. Wiley, New YorkMATHGoogle Scholar
  33. 33.
    Merle G, Spath H (1974) Computational experiences with discrete Lp approximation. Computing 12:315-321MATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Rao CR (1973) Linear Statistical Inference and its Applications, 2nd ed. Wiley, New YorkMATHGoogle Scholar
  35. 35.
    Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proceedings IEEE 90:1803-1810CrossRefGoogle Scholar
  36. 36.
    Schlossmacher EJ (1973) An iterative technique for absolute deviations curve fitting. J Amer Stat Assoc 68:857-859MATHCrossRefGoogle Scholar
  37. 37.
    Sha F, Saul LK, Lee DD (2003) Multiplicative updates for nonnegative quadratic programming in support vector machines. In Advances in Neural Information Processing Systems 15, Becker S, Thrun S, Obermayer K, editors, MIT Press, Cambridge, MA, pp 1065-1073Google Scholar
  38. 38.
    Steele JM (2004) The Cauchy-Schwarz Master Class: An Introduction to the Art of Inequalities. Cambridge University Press and the Mathematical Association of America, CambridgeMATHGoogle Scholar
  39. 39.
    van Ruitenburg J (2005) Algorithms for parameter estimation in the Rasch model. Measurement and Research Department Reports 2005-4, CITO, Arnhem, NetherlandsGoogle Scholar
  40. 40.
    Wu TT, Lange K (2009) The MM alternative to EM. Stat Sci (in press)Google Scholar

Copyright information

© Springer New York 2010

Authors and Affiliations

  1. 1.Departments of Biomathematics, Human Genetics, and Statistics David Geffen School of MedicineUniversity of California, Los AngelesLos AngelesUSA

Personalised recommendations