Computational Optimization and Applications

, Volume 61, Issue 3, pp 609–634

# Path following in the exact penalty method of convex programming

Article

## Abstract

Classical penalty methods solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to $$\infty$$, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. In practice, the kinks in the penalty and the unknown magnitude of the penalty constant prevent wide application of the exact penalty method in nonlinear programming. In this article, we examine a strategy of path following consistent with the exact penalty method. Instead of performing optimization at a single penalty constant, we trace the solution as a continuous function of the penalty constant. Thus, path following starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. For quadratic programming, the solution path is piecewise linear and takes large jumps from constraint to constraint. For a general convex program, the solution path is piecewise smooth, and path following operates by numerically solving an ordinary differential equation segment by segment. Our diverse applications to (a) projection onto a convex set, (b) nonnegative least squares, (c) quadratically constrained quadratic programming, (d) geometric programming, and (e) semidefinite programming illustrate the mechanics and potential of path following. The final detour to image denoising demonstrates the relevance of path following to regularized estimation in inverse problems. In regularized estimation, one follows the solution path as the penalty constant decreases from a large value.

### Keywords

Constrained convex optimization Exact penalty Geometric programming Ordinary differential equation Quadratically constrained quadratic programming Regularization Semidefinite programming

65K05 90C25

## Notes

### Acknowledgments

Research supported in part by National Science Foundatation Grant DMS-1310319 and National Institutes of Health Grants GM53275, MH59490, HG006139 and GM105785.

## Supplementary material

10589_2015_9732_MOESM1_ESM.png (181 kb)
Supplementary material 1 (png 181 KB)

### References

1. 1.
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
2. 2.
Forsgren, A., Gill, P.E., Wright, M.H.: Interior methods for nonlinear optimization. SIAM Rev. 44, 525–597 (2002)
3. 3.
Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. International Series in Operations Research & Management Science, 3rd edn, p. 116. Springer, New York (2008)Google Scholar
4. 4.
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)Google Scholar
5. 5.
Ruszczyński, A.: Nonlinear Optim. Princeton University Press, Princeton (2006)Google Scholar
6. 6.
Zangwill, W.I.: Non-linear programming via penalty functions. Manag. Sci. 13(5), 344–358 (1967)
7. 7.
Hestenes, M.R.: Optimization Theory: The Finite Dimensional Case. Wiley-Interscience (Wiley), New York (1975)Google Scholar
8. 8.
Zhou, H., Lange, K.: A path algorithm for constrained estimation. J. Comput. Gr. Stat. 22, 261–283 (2013)
9. 9.
Cottle, R.W., Pang, J.-S., Stone, R.E.: The Linear Complementarity Problem. Computer Science and Scientific Computing. Academic Press Inc., Boston (1992)Google Scholar
10. 10.
Watson, L.T.: Numerical linear algebra aspects of globally convergent homotopy methods. SIAM Rev. 28(4), 529–545 (1986)
11. 11.
Watson, L.T.: Theory of globally convergent probability-one homotopies for nonlinear programming. SIAM J. Optim., 11(3), 761–780, electronic (2000/2001)Google Scholar
12. 12.
Zangwill, W.I., Garcia, C.B.: Pathways to Solutions, Fixed Points, and Equilibria. Prentice-Hall Series in Computational Mathematics. Prentice-Hall, New Jersey (1981)Google Scholar
13. 13.
Zhou, H., Wu, Y.: A generic path algorithm for regularized statistical estimation. J. Am. Statist. Assoc. 109(506), 686–699 (2014)
14. 14.
Bertsekas, D.P.: Convex Analysis and Optimization. Athena Scientific, Belmont. With Angelia Nedić and Asuman E. Ozdaglar (2003)Google Scholar
15. 15.
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004). (with discussion, and a rejoinder by the authors)
16. 16.
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
17. 17.
Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)
18. 18.
Lange, K.: Optimization. Springer Texts in Statistics. Springer, New York (2004)Google Scholar
19. 19.
Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley Series in Probability and Statistics. Wiley, Chichester (1999)Google Scholar
20. 20.
Lange, K.: Numerical Analysis for Statisticians. Statistics and Computing, 2nd edn. Springer, New York (2010)Google Scholar
21. 21.
Chi, E., Lange, K.: Splitting methods for convex clustering. J. Comput. Gr. Stat. (in press) (2014)Google Scholar
22. 22.
Lawson, C.L., Hanson, R.J.: Solving least squares problems. Classics in Applied Mathematics, Society for Industrial Mathematics, new ed., (1987)Google Scholar
23. 23.
Dykstra, R.L.: An algorithm for restricted least squares regression. J. Am. Stat. Assoc. 78(384), 837–842 (1983)
24. 24.
Deutsch, F.: Best Approximation in Inner Product Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, 7. Springer, New York (2001)
25. 25.
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Statist. Data Anal. 52(1), 155–173 (2007)
26. 26.
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
27. 27.
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization, in NIPS, pp. 556–562, MIT Press (2001)Google Scholar
28. 28.
Boyd, S., Kim, S.-J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng. 8(1), 67–127 (2007)
29. 29.
Ecker, J.G.: Geometric programming: methods, computations and applications. SIAM Rev. 22(3), 338–362 (1980)
30. 30.
Peressini, A.L., Sullivan, F.E., Uhl Jr, J.J.: The Mathematics of Nonlinear Programming. Undergraduate Texts in Mathematics. Springer, New York (1988)
31. 31.
Peterson, E.L.: Geometric programming. SIAM Rev. 18(1), 1–51 (1976)
32. 32.
Passy, U., Wilde, D.J.: A geometric programming algorithm for solving chemical equilibrium problems. SIAM J. Appl. Math. 16, 363–373 (1968)
33. 33.
Boyd, S.P., Kim, S.-J., Patil, D.D., Horowitz, M.A.: Digital circuit optimization via geometric programming. Oper. Res. 53, 899–932 (2005)
34. 34.
Mazumdar, M., Jefferson, T.R.: Maximum likelihood estimates for multinomial probabilities via geometric programming. Biometrika 70(1), 257–261 (1983)
35. 35.
Feigin, P.D., Passy, U.: The geometric programming dual to the extinction probability problem in simple branching processes. Ann. Probab. 9(3), 498–503 (1981)
36. 36.
Lange, K., Zhou, H.: MM algorithms for geometric and signomial programming. Math. Program.Ser. A 143, 339–356 (2014)
37. 37.
Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)
38. 38.
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenom. 60(1–4), 259–268 (1992)
39. 39.
Le, T., Chartrand, R., Asaki, T.J.: A variational approach to reconstructing images corrupted by Poisson noise. J. Math. Imaging Vis 27, 257–263 (2007)
40. 40.
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)
41. 41.
Goldstein, T., Osher, S.: The split Bregman method for $$l_1$$-regularized problems. SIAM J. Img. Sci. 2, 323–343 (2009)
42. 42.
Zhou, H., Armagan, A., Dunson, D.: Path following and empirical Bayes model selection for sparse regressions. arXiv:1201.3528 (2012)
43. 43.
Xiao, W., Wu, Y., Zhou, H.: ConvexLAR: an extension of least angle regression. J. Comput. Gr. Stat. Vol. (in press) (2015)Google Scholar
44. 44.
Zhou, H., Lange, K.: On the bumpy road to the dominant mode. Scand. J. Stat. 37(4), 612–631 (2010)