Optimization pp 185-219

# The MM Algorithm

• Kenneth Lange
Chapter
Part of the Springer Texts in Statistics book series (STS, volume 95)

## Abstract

Most practical optimization problems defy exact solution. In the current chapter we discuss an optimization method that relies heavily on convexity arguments and is particularly useful in high-dimensional problems such as image reconstruction [171]. This iterative method is called the MM algorithm. One of the virtues of this acronym is that it does double duty. In minimization problems, the first M of MM stands for majorize and the second M for minimize. In maximization problems, the first M stands for minorize and the second M for maximize. When it is successful, the MM algorithm substitutes a simple optimization problem for a difficult optimization problem. Simplicity can be attained by: (a) separating the variables of an optimization problem, (b) avoiding large matrix inversions, (c) linearizing an optimization problem, (d) restoring symmetry, (e) dealing with equality and inequality constraints gracefully, and (f) turning a nondifferentiable problem into a smooth problem. In simplifying the original problem, we must pay the price of iteration or iteration with a slower rate of convergence.

## Keywords

Projection Line Multinomial Distribution Surrogate Function Cyclic Coordinate Descent Difficult Optimization Problem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Acosta E, Delgado C (1994) Fréchet versus Carathéodory. Am Math Mon 101:332–338
2. 16.
Böhning D, Lindsay BG (1988) Monotonicity of quadratic approximation algorithms. Ann Inst Stat Math 40:641–663
3. 20.
Boyd S, Kim SJ, Vandenberghe L, Hassibi A (2007) A tutorial on geometric programming. Optim Eng 8:67–127
4. 23.
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. Biometrika 39:324–345
5. 50.
Clarke CA, Price Evans DA, McConnell RB, Sheppard PM (1959) Secretion of blood group antigens and peptic ulcers. Br Med J 1:603–607
6. 59.
de Leeuw J (1994) Block relaxation algorithms in statistics. In: Bock HH, Lenski W, Richter MM (eds) Information systems and data analysis. Springer, New York, pp 308–325
7. 60.
de Leeuw J (2006) Some majorization techniques. Preprint series, UCLA Department of Statistics.Google Scholar
8. 65.
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38
9. 67.
De Pierro AR (1993) On the relation between the ISRA and EM algorithm for positron emission tomography. IEEE Trans Med Imag 12:328–333
10. 103.
Geman S, McClure D (1985) Bayesian image analysis: an application to single photon emission tomography. In: Proceedings of the statistical computing section. American Statistical Association, Washington, DC, pp 12–18Google Scholar
11. 111.
Green PJ (1990) Bayesian reconstruction for emission tomography data using a modified EM algorithm. IEEE Trans Med Imag 9:84–94
12. 113.
Grimmett GR, Stirzaker DR (1992) Probability and random processes, 2nd edn. Oxford University Press, OxfordGoogle Scholar
13. 121.
Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356
14. 122.
Heiser WJ (1995) Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski WJ (ed) Recent advances in descriptive multivariate analysis. Clarendon, Oxford, pp 157–189Google Scholar
15. 124.
Herman GT (1980) Image reconstruction from projections: the fundamentals of computerized tomography. Springer, New York
16. 133.
Hoel PG, Port SC, Stone CJ (1971) Introduction to probability theory. Houghton Mifflin, Boston
17. 140.
Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Ann Stat 32:386–408Google Scholar
18. 142.
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37
19. 148.
Karlin S, Taylor HM (1975) A first course in stochastic processes, 2nd edn. Academic, New York
20. 150.
Keener JP (1993) The Perron-Frobenius theorem and the ranking of football teams. SIAM Rev 35:80–93
21. 153.
Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266
22. 154.
Kingman JFC (1993) Poisson processes. Oxford University Press, Oxford
23. 166.
Lange K (2010) Numerical analysis for statisticians, 2nd edn. Springer, New York
24. 167.
Lange K, Carson R (1984) EM reconstruction algorithms for emission and transmission tomography. J Comput Assist Tomogr 8:306–316Google Scholar
25. 168.
Lange K, Fessler JA (1995) Globally convergent algorithms for maximum a posteriori transmission tomography. IEEE Trans Image Process 4:1430–1438
26. 170.
Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1Google Scholar
27. 171.
Lange K, Hunter D, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Comput Graph Stat 9:1–59
28. 181.
Luce RD (1959) Individual choice behavior: a theoretical analysis. Wiley, Hoboken
29. 182.
Luce RD (1977) The choice axiom after twenty years. J Math Psychol 15:215–233
30. 191.
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken
31. 217.
Ranola JM, Ahn S, Sehl ME, Smith DJ, Lange K (2010) A Poisson model for random multigraphs. Bioinformatics 26:2004–2011
32. 227.
Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90:1803–1810
33. 235.
Sha F, Saul LK, Lee DD (2003) Multiplicative updates for nonnegative quadratic programming in support vector machines. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge, pp 1065–1073Google Scholar
34. 239.
Smith CAB (1957) Counting methods in genetical statistics. Ann Hum Genet 21:254–276
35. 242.
Srebro N, Jaakkola T (2003) Weighted low-rank approximations. In: Machine learning international workshop conference 2003. AAAI Press, 20:720–727Google Scholar
36. 263.
Van Ruitenburg J (2005) Algorithms for parameter estimation in the Rasch model. Measurement and Research Department Reports 2005–4. CITO, ArnhemGoogle Scholar
37. 265.
Vardi Y, Shepp LA, Kaufman L (1985) A statistical model for positron emission tomography. J Am Stat Assoc 80:8–37
38. 271.
Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167:741 (Translated from the French original [Tohoku Math J 43:335–386 (1937)] and annotated by Frank Plastria)Google Scholar
39. 282.
Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19:645–665