Mathematical Programming

, Volume 144, Issue 1–2, pp 1–38 | Cite as

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

Full Length Paper Series A

Abstract

In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an \(\varepsilon \)-accurate solution with probability at least \(1-\rho \) in at most \(O((n/\varepsilon ) \log (1/\rho ))\) iterations, where \(n\) is the number of blocks. This extends recent results of Nesterov (SIAM J Optim 22(2): 341–362, 2012), which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing \(\varepsilon \) from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving first true iteration complexity bounds. For strongly convex functions the method converges linearly. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale \(\ell _1\)-regularized least squares problems with a billion variables.

Keywords

Block coordinate descent Huge-scale optimization  Composite minimization Iteration complexity Convex optimization LASSO  Sparse regression Gradient descent  Coordinate relaxation Gauss–Seidel method 

Mathematics Subject Classification (2000)

65K05 90C05 90C06 90C25 

References

  1. 1.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999)Google Scholar
  2. 2.
    Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12, 963–972 (2003)CrossRefGoogle Scholar
  3. 3.
    Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: Coordinate descent method for large-scale \(l_2\)-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)MATHMathSciNetGoogle Scholar
  4. 4.
    Friedman, J., Hastie, T., Robert, T.: A Note on the Group Lasso and a Sparse Group Lasso. Technical report (2010)Google Scholar
  5. 5.
    Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Sathiya Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In ICML 2008, pp. 408–415 (2008)Google Scholar
  6. 6.
    Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res 35(3), 641–654 (2010)CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    Lewis, A.S., Wright, S.J.: A Proximal Method for Composite Minimization. Technical report (2008)Google Scholar
  8. 8.
    Li, Y., Osher, S.: Coordinate descent optimization for \(l_1\) minimization with application to compressed sensing: a greedy algorithm. Inverse Probl. Imaging 3, 487–503 (2009)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Luo, Z.Q., Tseng, P.: A coordinate gradient descent method for nonsmooth separable minimization. J. Optim. Theory Appl. 72 (1) (2002)Google Scholar
  10. 10.
    Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)CrossRefMATHGoogle Scholar
  11. 11.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer, Netherlands (2004)Google Scholar
  12. 12.
    Nesterov, Y.: Gradient Methods for Minimizing Composite Objective Function. Core discussion paper \(\#\) 2007/76, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) (2007)Google Scholar
  13. 13.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)Google Scholar
  14. 14.
    Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient Block-Coordinate Descent Algorithms for the Group Lasso. Technical report (2010)Google Scholar
  15. 15.
    Richtárik, P., Takáč, M.: Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function. In: 4th Workshop on Signal Processing with Adaptive Sparse Structured, Representations (2011)Google Scholar
  16. 16.
    Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent method for huge-scale truss topology design. In: Operations Research Proceedings 2011, pp. 27–32. Springer (2012)Google Scholar
  17. 17.
    Saha, A., Tewari, A.: On the Finite Time Convergence of Cyclic Coordinate Descent Methods. CoRR, abs/1005.2146 (2010)Google Scholar
  18. 18.
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(l_1\) regularized loss minimization. In: Proceedings of the 26th International Conference on, Machine Learning (2009)Google Scholar
  19. 19.
    Strohmer, T., Vershynin, R.: A randomized kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)Google Scholar
  20. 20.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 268–288 (1996)Google Scholar
  21. 21.
    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)Google Scholar
  22. 22.
    Tseng, P., Yun, S., Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513–535 (2009). doi:10.1007/s10957-008-9458-3 Google Scholar
  23. 23.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B, 117, 387–423 (2009)Google Scholar
  24. 24.
    Wen, Z., Goldfarb, D., Scheinberg, K.: Block coordinate descent methods for semidefinite programming. In: Anjos, M.F., Lasserre, J.B. (eds.) Handbook on Semidefinite, Cone and Polynomial Optimization: Theory, Algorithms, Software and Applications. Springer (forthcoming)Google Scholar
  25. 25.
    Wright, S.J.: University of Wisconsin, Accelerated Block-Coordinate Relaxation for Regularized Optimization. Technical report (2010)Google Scholar
  26. 26.
    Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. Trans. Sig. Proc 57, 2479–2493 (2009)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)CrossRefMATHMathSciNetGoogle Scholar
  28. 28.
    Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale \(l_1\)-regularized linear classification. J. Mach. Learn. Res. 11(1), 3183–3234 (2010)MATHMathSciNetGoogle Scholar
  29. 29.
    Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent Advances of Large-Scale Linear Classification. Technical report (2012)Google Scholar
  30. 30.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)CrossRefMATHMathSciNetGoogle Scholar
  31. 31.
    Yun, S., Toh, K.-C.: A coordinate gradient descent method for \( l_1\)-regularized convex minimization. Comput. Optim. Appl. 48, 273–307 (2011)CrossRefMATHMathSciNetGoogle Scholar
  32. 32.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2012

Authors and Affiliations

  1. 1.School of MathematicsUniversity of EdinburghEdinburghUK

Personalised recommendations