Optimization pp 185-219 | Cite as

The MM Algorithm

  • Kenneth Lange
Part of the Springer Texts in Statistics book series (STS, volume 95)


Most practical optimization problems defy exact solution. In the current chapter we discuss an optimization method that relies heavily on convexity arguments and is particularly useful in high-dimensional problems such as image reconstruction [171]. This iterative method is called the MM algorithm. One of the virtues of this acronym is that it does double duty. In minimization problems, the first M of MM stands for majorize and the second M for minimize. In maximization problems, the first M stands for minorize and the second M for maximize. When it is successful, the MM algorithm substitutes a simple optimization problem for a difficult optimization problem. Simplicity can be attained by: (a) separating the variables of an optimization problem, (b) avoiding large matrix inversions, (c) linearizing an optimization problem, (d) restoring symmetry, (e) dealing with equality and inequality constraints gracefully, and (f) turning a nondifferentiable problem into a smooth problem. In simplifying the original problem, we must pay the price of iteration or iteration with a slower rate of convergence.


Projection Line Multinomial Distribution Surrogate Function Cyclic Coordinate Descent Difficult Optimization Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Acosta E, Delgado C (1994) Fréchet versus Carathéodory. Am Math Mon 101:332–338MATHCrossRefGoogle Scholar
  2. 16.
    Böhning D, Lindsay BG (1988) Monotonicity of quadratic approximation algorithms. Ann Inst Stat Math 40:641–663MATHCrossRefGoogle Scholar
  3. 20.
    Boyd S, Kim SJ, Vandenberghe L, Hassibi A (2007) A tutorial on geometric programming. Optim Eng 8:67–127MathSciNetMATHCrossRefGoogle Scholar
  4. 23.
    Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. Biometrika 39:324–345MathSciNetMATHGoogle Scholar
  5. 50.
    Clarke CA, Price Evans DA, McConnell RB, Sheppard PM (1959) Secretion of blood group antigens and peptic ulcers. Br Med J 1:603–607CrossRefGoogle Scholar
  6. 59.
    de Leeuw J (1994) Block relaxation algorithms in statistics. In: Bock HH, Lenski W, Richter MM (eds) Information systems and data analysis. Springer, New York, pp 308–325CrossRefGoogle Scholar
  7. 60.
    de Leeuw J (2006) Some majorization techniques. Preprint series, UCLA Department of Statistics.Google Scholar
  8. 65.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38MathSciNetMATHGoogle Scholar
  9. 67.
    De Pierro AR (1993) On the relation between the ISRA and EM algorithm for positron emission tomography. IEEE Trans Med Imag 12:328–333CrossRefGoogle Scholar
  10. 103.
    Geman S, McClure D (1985) Bayesian image analysis: an application to single photon emission tomography. In: Proceedings of the statistical computing section. American Statistical Association, Washington, DC, pp 12–18Google Scholar
  11. 111.
    Green PJ (1990) Bayesian reconstruction for emission tomography data using a modified EM algorithm. IEEE Trans Med Imag 9:84–94CrossRefGoogle Scholar
  12. 113.
    Grimmett GR, Stirzaker DR (1992) Probability and random processes, 2nd edn. Oxford University Press, OxfordGoogle Scholar
  13. 121.
    Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356MATHCrossRefGoogle Scholar
  14. 122.
    Heiser WJ (1995) Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski WJ (ed) Recent advances in descriptive multivariate analysis. Clarendon, Oxford, pp 157–189Google Scholar
  15. 124.
    Herman GT (1980) Image reconstruction from projections: the fundamentals of computerized tomography. Springer, New YorkMATHGoogle Scholar
  16. 133.
    Hoel PG, Port SC, Stone CJ (1971) Introduction to probability theory. Houghton Mifflin, BostonMATHGoogle Scholar
  17. 140.
    Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Ann Stat 32:386–408Google Scholar
  18. 142.
    Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37MathSciNetCrossRefGoogle Scholar
  19. 148.
    Karlin S, Taylor HM (1975) A first course in stochastic processes, 2nd edn. Academic, New YorkMATHGoogle Scholar
  20. 150.
    Keener JP (1993) The Perron-Frobenius theorem and the ranking of football teams. SIAM Rev 35:80–93MathSciNetMATHCrossRefGoogle Scholar
  21. 153.
    Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266MathSciNetMATHCrossRefGoogle Scholar
  22. 154.
    Kingman JFC (1993) Poisson processes. Oxford University Press, OxfordMATHGoogle Scholar
  23. 166.
    Lange K (2010) Numerical analysis for statisticians, 2nd edn. Springer, New YorkMATHCrossRefGoogle Scholar
  24. 167.
    Lange K, Carson R (1984) EM reconstruction algorithms for emission and transmission tomography. J Comput Assist Tomogr 8:306–316Google Scholar
  25. 168.
    Lange K, Fessler JA (1995) Globally convergent algorithms for maximum a posteriori transmission tomography. IEEE Trans Image Process 4:1430–1438CrossRefGoogle Scholar
  26. 170.
    Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1Google Scholar
  27. 171.
    Lange K, Hunter D, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Comput Graph Stat 9:1–59MathSciNetGoogle Scholar
  28. 181.
    Luce RD (1959) Individual choice behavior: a theoretical analysis. Wiley, HobokenMATHGoogle Scholar
  29. 182.
    Luce RD (1977) The choice axiom after twenty years. J Math Psychol 15:215–233MathSciNetMATHCrossRefGoogle Scholar
  30. 191.
    McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, HobokenMATHCrossRefGoogle Scholar
  31. 217.
    Ranola JM, Ahn S, Sehl ME, Smith DJ, Lange K (2010) A Poisson model for random multigraphs. Bioinformatics 26:2004–2011CrossRefGoogle Scholar
  32. 227.
    Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90:1803–1810CrossRefGoogle Scholar
  33. 235.
    Sha F, Saul LK, Lee DD (2003) Multiplicative updates for nonnegative quadratic programming in support vector machines. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge, pp 1065–1073Google Scholar
  34. 239.
    Smith CAB (1957) Counting methods in genetical statistics. Ann Hum Genet 21:254–276CrossRefGoogle Scholar
  35. 242.
    Srebro N, Jaakkola T (2003) Weighted low-rank approximations. In: Machine learning international workshop conference 2003. AAAI Press, 20:720–727Google Scholar
  36. 263.
    Van Ruitenburg J (2005) Algorithms for parameter estimation in the Rasch model. Measurement and Research Department Reports 2005–4. CITO, ArnhemGoogle Scholar
  37. 265.
    Vardi Y, Shepp LA, Kaufman L (1985) A statistical model for positron emission tomography. J Am Stat Assoc 80:8–37MathSciNetMATHCrossRefGoogle Scholar
  38. 271.
    Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167:741 (Translated from the French original [Tohoku Math J 43:335–386 (1937)] and annotated by Frank Plastria)Google Scholar
  39. 282.
    Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19:645–665MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Kenneth Lange
    • 1
  1. 1.Biomathematics, Human Genetics, StatisticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations