Skip to main content
Log in

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an \(\varepsilon \)-accurate solution with probability at least \(1-\rho \) in at most \(O((n/\varepsilon ) \log (1/\rho ))\) iterations, where \(n\) is the number of blocks. This extends recent results of Nesterov (SIAM J Optim 22(2): 341–362, 2012), which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing \(\varepsilon \) from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving first true iteration complexity bounds. For strongly convex functions the method converges linearly. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale \(\ell _1\)-regularized least squares problems with a billion variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A function \(F: \mathbb{R }^N\rightarrow \mathbb{R }\) is isotone if \(x\ge y\) implies \(F(x)\ge F(y)\).

  2. Note that in [12] Nesterov considered the composite setting and developed standard and accelerated gradient methods with iteration complexity guarantees for minimizing composite objective functions. These can be viewed as block coordinate descent methods with a single block.

  3. This will not be the case for certain types of matrices, such as those arising from wavelet bases or FFT.

  4. There are various theoretical results on the identification of active manifolds explaining numerical observations of this type; see [7] and the references therein. See also [28].

  5. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

  6. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

References

  1. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999)

  2. Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12, 963–972 (2003)

    Article  Google Scholar 

  3. Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: Coordinate descent method for large-scale \(l_2\)-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)

    MATH  MathSciNet  Google Scholar 

  4. Friedman, J., Hastie, T., Robert, T.: A Note on the Group Lasso and a Sparse Group Lasso. Technical report (2010)

  5. Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Sathiya Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In ICML 2008, pp. 408–415 (2008)

  6. Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res 35(3), 641–654 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  7. Lewis, A.S., Wright, S.J.: A Proximal Method for Composite Minimization. Technical report (2008)

  8. Li, Y., Osher, S.: Coordinate descent optimization for \(l_1\) minimization with application to compressed sensing: a greedy algorithm. Inverse Probl. Imaging 3, 487–503 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  9. Luo, Z.Q., Tseng, P.: A coordinate gradient descent method for nonsmooth separable minimization. J. Optim. Theory Appl. 72 (1) (2002)

  10. Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)

    Article  MATH  Google Scholar 

  11. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer, Netherlands (2004)

  12. Nesterov, Y.: Gradient Methods for Minimizing Composite Objective Function. Core discussion paper \(\#\) 2007/76, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) (2007)

  13. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)

    Google Scholar 

  14. Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient Block-Coordinate Descent Algorithms for the Group Lasso. Technical report (2010)

  15. Richtárik, P., Takáč, M.: Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function. In: 4th Workshop on Signal Processing with Adaptive Sparse Structured, Representations (2011)

  16. Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent method for huge-scale truss topology design. In: Operations Research Proceedings 2011, pp. 27–32. Springer (2012)

  17. Saha, A., Tewari, A.: On the Finite Time Convergence of Cyclic Coordinate Descent Methods. CoRR, abs/1005.2146 (2010)

  18. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(l_1\) regularized loss minimization. In: Proceedings of the 26th International Conference on, Machine Learning (2009)

  19. Strohmer, T., Vershynin, R.: A randomized kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)

    Google Scholar 

  20. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 268–288 (1996)

    Google Scholar 

  21. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)

    Google Scholar 

  22. Tseng, P., Yun, S., Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513–535 (2009). doi:10.1007/s10957-008-9458-3

    Google Scholar 

  23. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B, 117, 387–423 (2009)

    Google Scholar 

  24. Wen, Z., Goldfarb, D., Scheinberg, K.: Block coordinate descent methods for semidefinite programming. In: Anjos, M.F., Lasserre, J.B. (eds.) Handbook on Semidefinite, Cone and Polynomial Optimization: Theory, Algorithms, Software and Applications. Springer (forthcoming)

  25. Wright, S.J.: University of Wisconsin, Accelerated Block-Coordinate Relaxation for Regularized Optimization. Technical report (2010)

  26. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. Trans. Sig. Proc 57, 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  27. Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  28. Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale \(l_1\)-regularized linear classification. J. Mach. Learn. Res. 11(1), 3183–3234 (2010)

    MATH  MathSciNet  Google Scholar 

  29. Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent Advances of Large-Scale Linear Classification. Technical report (2012)

  30. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  31. Yun, S., Toh, K.-C.: A coordinate gradient descent method for \( l_1\)-regularized convex minimization. Comput. Optim. Appl. 48, 273–307 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  32. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank anonymous referees and Hui Zhang (National University of Defense Technology, China) for useful comments that helped to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Richtárik.

Additional information

An extended abstract of a preliminary version of this paper appeared in [15]. The work of the first author was supported in part by EPSRC grant EP/I017127/1 “Mathematics for vast digital resources”. The second author was supported in part by the Centre for Numerical algorithms and Intelligent Software (funded by EPSRC grant EP/G036136/1 and the Scottish Funding Council)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richtárik, P., Takáč, M. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144, 1–38 (2014). https://doi.org/10.1007/s10107-012-0614-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-012-0614-z

Keywords

Mathematics Subject Classification (2000)

Navigation