Mathematical Programming

, Volume 146, Issue 1–2, pp 409–436 | Cite as

Distance majorization and its applications

  • Eric C. Chi
  • Hua Zhou
  • Kenneth Lange
Full Length Paper Series A


The problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton’s method such as the interior point method are viable for small to medium-scale problems. However, modern applications in statistics, engineering, and machine learning are posing problems with potentially tens of thousands of parameters or more. We revisit this convex programming problem and propose an algorithm that scales well with dimensionality. Our proposal is an instance of a sequential unconstrained minimization technique and revolves around three ideas: the majorization-minimization principle, the classical penalty method for constrained optimization, and quasi-Newton acceleration of fixed-point algorithms. The performance of our distance majorization algorithms is illustrated in several applications.


Constrained optimization Majorization-minimization (MM) Sequential unconstrained minimization Projection 

Mathematics Subject Classification (2000)

65K05 90C25 90C30 62J02 



We thank Janet Sinsheimer for helpful feedback in the course of this work. We also thank the anonymous referees and associate editor for their constructive suggestions. In particular, we appreciate the detailed comments bringing sequential unconstrained minimization and SUMMA to our attention and highlighting its connection to the MM algorithm. This research was partially supported by United States Public Health Service Grants GM53275 and HG006139 and NSF Grant DMS-1310319.


  1. 1.
    Barlow, R.E., Bartholomew, D., Bremner, J.M., Brunk, H.D.: Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley, New York (1972)zbMATHGoogle Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Becker, M.P., Yang, I., Lange, K.: EM algorithms without missing data. Stat. Methods Med. Res. 6, 38–54 (1997)CrossRefGoogle Scholar
  4. 4.
    Bertsekas, D.P.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003). With Angelia Nedić and Asuman E. OzdaglarGoogle Scholar
  5. 5.
    Bertsekas, D.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim 20(2), 221–246 (1982)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Bertsekas, D.P.: Convex Optimization Theory. Athena Scientific, Belmont (2009)zbMATHGoogle Scholar
  7. 7.
    Böhning, D., Lindsay, B.G.: Monotonicity of quadratic-approximation algorithms. Ann. Inst. Stat. Math. 40, 641–663 (1988)CrossRefzbMATHGoogle Scholar
  8. 8.
    Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer, New York (2000)CrossRefGoogle Scholar
  9. 9.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Byrne, C.: An elementary proof of convergence of the forward-backward splitting algorithm (2013, submitted for publication)Google Scholar
  11. 11.
    Byrne, C.: Applied Iterative Methods. Ak Peters Series, AK Peters, Wellesly (2008)Google Scholar
  12. 12.
    Byrne, C., Censor, Y.: Proximity function minimization using multiple Bregman projections, with applications to split feasibility and Kullback–Leibler distance minimization. Ann. Oper. Res. 105 (1–4), 77–98 (2001)Google Scholar
  13. 13.
    Byrne, C.: Sequential unconstrained minimization algorithms for constrained optimization. Inverse Probl. 24(1), 015,013 (2008)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Byrne, C.: Alternating minimization as sequential unconstrained minimization: a survey. J. Optim. Theory Appl. 156(3), 554–566 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Censor, Y., Chen, W., Combettes, P., Davidi, R., Herman, G.: On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput. Optim. Appl. 51, 1065–1088 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Chi, E.C., Lange, K.: A look at the generalized Heron problem through the lens of majorization-minimization. Am. Math. Mon. (2013, appear)Google Scholar
  17. 17.
    Cimmino, G.: Calcolo approssimato per soluzioni dei sistemi di equazioni lineari. La Ricerca Scientifica XVI Series II Anno IX(1), 326–333 (1938)MathSciNetGoogle Scholar
  18. 18.
    Combettes, P., Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer Optimization and Its Applications, vol. 49, pp. 185–212. Springer, New York (2011)CrossRefGoogle Scholar
  20. 20.
    Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the \(\ell _1\)-ball for learning in high dimensions. In: Proceedings of the International Conference on Machine Learning (2008)Google Scholar
  21. 21.
    Dykstra, R.L.: An algorithm for restricted least squares regression. J. Am. Stat. Assoc. 78(384), 837–842 (1983)Google Scholar
  22. 22.
    Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Classics in Applied Mathematics. SIAM, Philadelphia (1990)CrossRefzbMATHGoogle Scholar
  23. 23.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore (1996)Google Scholar
  24. 24.
    Gould, N.: How good are projection methods for convex feasibility problems? Comput. Optim. Appl. 40, 1–12 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2004)Google Scholar
  26. 26.
    Huber, P.J., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, New York (2009)CrossRefzbMATHGoogle Scholar
  27. 27.
    Kakade, S.M., Shalev-Shwartz, S., Tewari, A.: On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization, Technical report. Toyota Technological Institute (2009)Google Scholar
  28. 28.
    Kim, D., Sra, S., Dhillon, I.: Tackling box-constrained optimization via a new projected quasi-newton approach. SIAM J. Sci. Comput. 32(6), 3548–3563 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  29. 29.
    Lange, K.: Numerical Analysis for Statisticians, 2nd edn. Statistics and Computing. Springer, New York (2010)Google Scholar
  30. 30.
    Lange, K.: Optimization, 2nd edn. Springer Texts in Statistics. Springer, New York (2012)Google Scholar
  31. 31.
    Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Stat. 9, 1–20 (2000)MathSciNetGoogle Scholar
  32. 32.
    Meyer, R.: Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput. Syst. Sci. 12(1), 108–121 (1976)Google Scholar
  33. 33.
    Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of \({\mathbb{R}}^n\). J. Optim. Theory Appl. 50, 195–200 (1986)CrossRefzbMATHMathSciNetGoogle Scholar
  34. 34.
    Mordukhovich, B.S., Nam, N.M., Salinas, J.: Applications of variational analysis to a generalized Heron problems. Appl. Anal. 91(10), 1915–1942 (2012)Google Scholar
  35. 35.
    Mordukhovich, B., Nam, N.M.: Applications of variational analysis to a generalized Fermat-Torricelli problem. J. Optim. Theory Appl. 148, 431–454 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Mordukhovich, B., Nam, N.M., Salinas, J.: Solving a generalized Heron problem by means of convex analysis. Am. Math. Mon. 119(2), 87–99 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Papers (2007)Google Scholar
  38. 38.
    Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)Google Scholar
  39. 39.
    Ortega, J.M., Rheinboldt, W.C.: Iterative Solutions of Nonlinear Equations in Several Variables. Academic, New York (1970)Google Scholar
  40. 40.
    Robertson, T., Wright, F.T., Dykstra, R.L.: Order Restricted Statistical Inference. Probability and Mathematical Statistics: Wiley Series in Probability and Mathematical Statistics. Wiley, Chichester (1988)Google Scholar
  41. 41.
    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1996)Google Scholar
  42. 42.
    Rosen, J.B.: The gradient projection method for nonlinear programming. Part I. Linear constraints. J. Soc. Ind. Appl. Math. 8(1), 181–217 (1960)Google Scholar
  43. 43.
    Ruszczyński, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)zbMATHGoogle Scholar
  44. 44.
    Schmidt, M., van den Berg, E., Friedlander, M.P., Murphy, K.: Optimizing costly functions with simple constraints: a limited-memory projected quasi-newton algorithm. In: van Dyk, D., Welling, M. (eds.) Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, vol. 5, pp. 456–463. Clearwater Beach, Florida (2009)Google Scholar
  45. 45.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  46. 46.
    Silvapulle, M.J., Sen, P.K.: Constrained Statistical Inference: Inequality, Order, and Shape Restrictions. Wiley Series in Probability and Statistics. Wiley, Hoboken (2005)Google Scholar
  47. 47.
    Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Statistics for Engineering and Information Science. Springer, New York (2000)Google Scholar
  48. 48.
    Wu, T.T., Lange, K.: The MM alternative to EM. Stat. Sci. 25(4), 492–505 (2010)CrossRefMathSciNetGoogle Scholar
  49. 49.
    Zhou, H., Alexander, D., Lange, K.: A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21, 261–273 (2011)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2013

Authors and Affiliations

  1. 1.Department of Human GeneticsUniversity of CaliforniaLos AngelesUSA
  2. 2.Department of StatisticsNorth Carolina State UniversityRaleighUSA
  3. 3.Departments of Biomathematics, Human Genetics, and StatisticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations