Advertisement

Block Coordinate Descent Methods for Semidefinite Programming

  • Zaiwen Wen
  • Donald Goldfarb
  • Katya Scheinberg
Chapter
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 166)

Abstract

We consider in this chapter block coordinate descent (BCD) methods for solving semidefinite programming (SDP) problems. These methods are based on sequentially minimizing the SDP problem’s objective function over blocks of variables corresponding to the elements of a single row (and column) of the positive semidefinite matrix X; hence, we will also refer to these methods as row-by-row (RBR) methods. Using properties of the (generalized) Schur complement with respect to the remaining fixed (n − 1)-dimensional principal submatrix of X, the positive semidefiniteness constraint on X reduces to a simple second-order cone constraint. It is well known that without certain safeguards, BCD methods cannot be guaranteed to converge in the presence of general constraints. Hence, to handle linear equality constraints, the methods that we describe here use an augmented Lagrangian approach. Since BCD methods are first-order methods, they are likely to work well only if each subproblem minimization can be performed very efficiently. Fortunately, this is the case for several important SDP problems, including the maxcut SDP relaxation and the minimum nuclear norm matrix completion problem, since closed-form solutions for the BCD subproblems that arise in these cases are available. We also describe how BCD can be applied to solve the sparse inverse covariance estimation problem by considering a dual formulation of this problem. The BCD approach is further generalized by using a rank-two update so that the coordinates can be changed in more than one row and column at each iteration. Finally, numerical results on the maxcut SDP relaxation and matrix completion problems are presented to demonstrate the robustness and efficiency of the BCD approach, especially if only moderately accurate solutions are desired.

Keywords

Cholesky Factorization Augmented Lagrangian Method Matrix Completion Augmented Lagrangian Function Coordinate Descent Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

The research presented in this chapter was supported in part by NSF Grants DMS-0439872, DMS 06-06712, DMS 10-16571, ONR Grant N00014-08-1-1118 and DOE Grant DE-FG02-08ER58562. The authors would like to thank Shiqian Ma for his help in writing and testing the codes, especially for the matrix completion problem, and the editors and two anonymous referees for their valuable comments and suggestions.

References

  1. 1.
    Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Programming 95(1, Ser. B), 3–51 (2003). ISMP 2000, Part 3 (Atlanta, GA)Google Scholar
  2. 2.
    Banerjee, O., El Ghaoui, L., d’Aspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)Google Scholar
  3. 3.
    Benson, S.J., Ye, Y., Zhang, X.: Solving large-scale sparse semidefinite programs for combinatorial optimization. SIAM J. Optim. 10(2), 443–461 (2000)CrossRefGoogle Scholar
  4. 4.
    Bertsekas, D.P.: Necessary and sufficient condition for a penalty method to be exact. Math. Programming 9(1), 87–99 (1975)CrossRefGoogle Scholar
  5. 5.
    Bertsekas, D.P.: Constrained optimization and Lagrange multiplier methods. Computer Science and Applied Mathematics. Academic Press Inc., New York (1982)Google Scholar
  6. 6.
    Bertsekas, D.P.: Nonlinear Programming. Athena Scientific (1999)Google Scholar
  7. 7.
    Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1989)Google Scholar
  8. 8.
    Bo, L., Sminchisescu, C.: Greedy Block Coordinate Descent for Large Scale Gaussian Process Regression. In: Uncertainty in Artificial Intelligence (2008)Google Scholar
  9. 9.
    Bouman, C.A., Sauer, K.: A unified approach to statistical tomography using coordinate descent optimization. IEEE Transactions on Image Processing 5, 480–492 (1996)CrossRefGoogle Scholar
  10. 10.
    Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)Google Scholar
  11. 11.
    Burer, S.: Optimizing a polyhedral-semidefinite relaxation of completely positive programs. Tech. rep., Department of Management Sciences, University of Iowa (2008)Google Scholar
  12. 12.
    Burer, S., Monteiro, R.D.C.: A projected gradient algorithm for solving the maxcut SDP relaxation. Optim. Methods Softw. 15(3-4), 175–200 (2001)CrossRefGoogle Scholar
  13. 13.
    Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2, Ser. B), 329–357 (2003)Google Scholar
  14. 14.
    Burer, S., Monteiro, R.D.C.: Local minima and convergence in low-rank semidefinite programming. Math. Program. 103(3, Ser. A), 427–444 (2005)Google Scholar
  15. 15.
    Burer, S., Vandenbussche, D.: Solving lift-and-project relaxations of binary integer programs. SIAM J. Optim. 16(3), 726–750 (2006)CrossRefGoogle Scholar
  16. 16.
    Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Foundations of Computational Mathematics (2009). DOI 10.1007/s10208-009-9045-5Google Scholar
  17. 17.
    Chang, K.W., Hsieh, C.J., Lin, C.J.: Coordinate descent method for large-scale L2-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)Google Scholar
  18. 18.
    Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Programming 64(1, Ser. A), 81–101 (1994)Google Scholar
  19. 19.
    Clarkson, K.L.: Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. In: SODA ’08: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 922–931 (2008)Google Scholar
  20. 20.
    Diachin, L.F., Knupp, P., Munson, T., Shontz, S.: A comparison of inexact newton and coordinate descent mesh optimization techniques. In: 13th International Meshing Roundtable, Williamsburg, VA, Sandia National Laboratories, pp. 243–254 (2004)Google Scholar
  21. 21.
    Eckstein, J., Bertsekas, D.P.: An alternating direction method for linear programming. LIDS-P, 1967. Laboratory for Information and Decision Systems, MITGoogle Scholar
  22. 22.
    Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Programming 55(3, Ser. A), 293–318 (1992)Google Scholar
  23. 23.
    Elad, M., Matalon, B., Zibulevsky, M.: Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. Appl. Comput. Harmon. Anal. 23(3), 346–367 (2007)CrossRefGoogle Scholar
  24. 24.
    Fessler, J.A.: Grouped coordinate descent algorithms for robust edge-preserving image restoration. In: in Proc. SPIE 3071, Im. Recon. and Restor. II, pp. 184–94 (1997)Google Scholar
  25. 25.
    Fletcher, R.: Practical methods of optimization, second edn. A Wiley-Interscience Publication. John Wiley & Sons Ltd., Chichester (1987)Google Scholar
  26. 26.
    Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)CrossRefGoogle Scholar
  27. 27.
    Fujisawa, K., Kojima, M., Nakata, K.: Exploiting sparsity in primal–dual interior-point methods for semidefinite programming. Math. Programming 79(1-3, Ser. B), 235–253 (1997)Google Scholar
  28. 28.
    Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42(6), 1115–1145 (1995)Google Scholar
  29. 29.
    Goldfarb, D., Iyengar, G.: Robust convex quadratically constrained programs. Math. Program. 97(3, Ser. B), 495–515 (2003). New trends in optimization and computational algorithms (NTOC 2001) (Kyoto)Google Scholar
  30. 30.
    Goldfarb, D., Iyengar, G.: Robust portfolio selection problems. Math. Oper. Res. 28(1), 1–38 (2003)CrossRefGoogle Scholar
  31. 31.
    Goldfarb, D., Ma, S.: Fast alternating linearization methods for minimizing the sum of two convex functions. Tech. rep., IEOR, Columbia University (2009)Google Scholar
  32. 32.
    Goldfarb, D., Ma, S.: Fast multiple splitting algorithms for convex optimization. Tech. rep., IEOR, Columbia University (2009)Google Scholar
  33. 33.
    Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)CrossRefGoogle Scholar
  34. 34.
    Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26(3), 127–136 (2000)CrossRefGoogle Scholar
  35. 35.
    Hackbusch, W.: Multigrid methods and applications, Springer Series in Computational Mathematics, vol. 4. Springer-Verlag, Berlin (1985)Google Scholar
  36. 36.
    He, B., Liao, L.Z., Han, D., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92(1, Ser. A), 103–118 (2002)Google Scholar
  37. 37.
    He, B.S., Yang, H., Wang, S.L.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106(2), 337–356 (2000)CrossRefGoogle Scholar
  38. 38.
    Helmberg, C., Rendl, F.: A spectral bundle method for semidefinite programming. SIAM J. Optim. 10(3), 673–696 (2000)CrossRefGoogle Scholar
  39. 39.
    Huang, F.L., Hsieh, C.J., Chang, K.W., Lin, C.J.: Iterative scaling and coordinate descent methods for maximum entropy. In: ACL-IJCNLP ’09: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 285–288. Association for Computational Linguistics, Morristown, NJ, USA (2009)Google Scholar
  40. 40.
    Jian-Feng, C., Candes, E.J., Zuowei, S.: A singular value thresholding algorithm for matrix completion export. SIAM J. Optim. 20, 1956–1982 (2010)CrossRefGoogle Scholar
  41. 41.
    Kiwiel, K.C., Rosa, C.H., Ruszczyński, A.: Proximal decomposition via alternating linearization. SIAM J. Optim. 9(3), 668–689 (1999)CrossRefGoogle Scholar
  42. 42.
    Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev. 45(3), 385–482 (2003)CrossRefGoogle Scholar
  43. 43.
    Kontogiorgis, S., Meyer, R.R.: A variable-penalty alternating directions method for convex optimization. Math. Programming 83(1, Ser. A), 29–53 (1998)Google Scholar
  44. 44.
    Li, Y., Osher, S.: Coordinate descent optimization for 1 minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging 3(3), 487–503 (2009)CrossRefGoogle Scholar
  45. 45.
    Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)CrossRefGoogle Scholar
  46. 46.
    Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Mathematical Programming pp. 1–33 (2009)Google Scholar
  47. 47.
    Malick, J., Povh, J., Rendl, F., Wiegele, A.: Regularization methods for semidefinite programming. SIAM Journal on Optimization 20(1), 336–356 (2009)CrossRefGoogle Scholar
  48. 48.
    Mazumder, R., Friedman, J., Hastie, T.: Sparsenet: Coordinate descent with non-convex penalties. Tech. rep., Stanford University (2009)Google Scholar
  49. 49.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. Tech. rep., CORE Discussion Paper (2010)Google Scholar
  50. 50.
    Povh, J., Rendl, F., Wiegele, A.: A boundary point method to solve semidefinite programs. Computing 78(3), 277–286 (2006)CrossRefGoogle Scholar
  51. 51.
    Powell, M.J.D.: On search directions for minimization algorithms. Math. Programming 4, 193–201 (1973)CrossRefGoogle Scholar
  52. 52.
    Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)CrossRefGoogle Scholar
  53. 53.
    Saad, Y.: Iterative methods for sparse linear systems, second edn. Society for Industrial and Applied Mathematics, Philadelphia, PA (2003)CrossRefGoogle Scholar
  54. 54.
    Saha, A., Tewari, A.: On the finite time convergence of cyclic coordinate descent methods. Tech. rep., University of Chicago (2010)Google Scholar
  55. 55.
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1 regularized loss minimization. In: ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 929–936. ACM (2009)Google Scholar
  56. 56.
    Tai, X.C., Xu, J.: Global and uniform convergence of subspace correction methods for some convex optimization problems. Math. Comp. 71(237), 105–124 (2002)CrossRefGoogle Scholar
  57. 57.
    Todd, M.J.: Semidefinite optimization. Acta Numer. 10, 515–560 (2001)Google Scholar
  58. 58.
    Tseng, P.: Dual coordinate ascent methods for non-strictly convex minimization. Math. Programming 59(2, Ser. A), 231–247 (1993)Google Scholar
  59. 59.
    Tseng, P.: Alternating projection-proximal methods for convex programming and variational inequalities. SIAM J. Optim. 7(4), 951–965 (1997)CrossRefGoogle Scholar
  60. 60.
    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)CrossRefGoogle Scholar
  61. 61.
    Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. (2007)Google Scholar
  62. 62.
    Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513–535 (2009). DOI 10.1007/s10957-008-9458-3CrossRefGoogle Scholar
  63. 63.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1-2, Ser. B), 387–423 (2009)Google Scholar
  64. 64.
    Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)CrossRefGoogle Scholar
  65. 65.
    Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imaging Sci. 1(3), 248–272 (2008)CrossRefGoogle Scholar
  66. 66.
    Wen, Z.: First-order methods for semidefinite programming. Ph.D. thesis, Columbia University (2009)Google Scholar
  67. 67.
    Wen, Z., Goldfarb, D., Ma, S., Scheinberg, K.: Row by row methods for semidefinite programming. Tech. rep., Dept of IEOR, Columbia University (2009)Google Scholar
  68. 68.
    Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented lagrangian methods for semidefinite programming. Mathematical Programming Computation 2(3-4), 203–230 (2010)CrossRefGoogle Scholar
  69. 69.
    Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Tech. rep., Rice University (2010). CAAM Technical Report TR10-07Google Scholar
  70. 70.
    Wolkowicz, H., Saigal, R., Vandenberghe, L. (eds.): Handbook of semidefinite programming. Kluwer Academic Publishers, Boston, MA (2000)Google Scholar
  71. 71.
    Yang, J., Zhang, Y.: Alternating direction algorithms for l1-problems in compressive sensing. Tech. rep., Rice University (2009)Google Scholar
  72. 72.
    Ye, C., Yuan, X.: A descent method for structured monotone variational inequalities. Optimization Methods and Software 22, 329–338 (2007)CrossRefGoogle Scholar
  73. 73.
    Ye, J.C., Webb, K.J., Bouman, C.A., Millane, R.P.: Optical diffusion tomography by iterative-coordinate-descent optimization in a bayesian framework. J. Opt. Soc. Am. A 16(10), 2400–2412 (1999)CrossRefGoogle Scholar
  74. 74.
    Yu, Z.: Solving semidefinite programming problems via alternating direction methods. J. Comput. Appl. Math. 193(2), 437–445 (2006)CrossRefGoogle Scholar
  75. 75.
    Yun, S., Toh, K.C.: A coordinate gradient descent method for l1-regularized convex minimization. Comput. Optim. Appl. (2009)Google Scholar
  76. 76.
    Zhang, F. (ed.): The Schur complement and its applications, Numerical Methods and Algorithms, vol. 4. Springer-Verlag, New York (2005)Google Scholar
  77. 77.
    Zhang, Y.: User’s guide for YALL1: Your algorithms for l1 optimization. Tech. rep., Rice University (2009)Google Scholar
  78. 78.
    Zhao, X., Sun, D., Toh, K.: A Newton-CG augmented lagrangian method for semidefinite programming. SIAM Journal on Optimization 20, 1737–1765 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Mathematics and Institute of Natural SciencesShanghai Jiaotong UniversityShanghaiChina
  2. 2.Department of Industrial Engineering and Operations ResearchColumbia UniversityNew YorkUSA
  3. 3.Department of Industrial and Systems EngineeringLehigh UniversityBethlehemUSA

Personalised recommendations