Abstract
In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an \(\varepsilon \)-accurate solution with probability at least \(1-\rho \) in at most \(O((n/\varepsilon ) \log (1/\rho ))\) iterations, where \(n\) is the number of blocks. This extends recent results of Nesterov (SIAM J Optim 22(2): 341–362, 2012), which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing \(\varepsilon \) from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving first true iteration complexity bounds. For strongly convex functions the method converges linearly. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale \(\ell _1\)-regularized least squares problems with a billion variables.
Similar content being viewed by others
Notes
A function \(F: \mathbb{R }^N\rightarrow \mathbb{R }\) is isotone if \(x\ge y\) implies \(F(x)\ge F(y)\).
Note that in [12] Nesterov considered the composite setting and developed standard and accelerated gradient methods with iteration complexity guarantees for minimizing composite objective functions. These can be viewed as block coordinate descent methods with a single block.
This will not be the case for certain types of matrices, such as those arising from wavelet bases or FFT.
References
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific (1999)
Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12, 963–972 (2003)
Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: Coordinate descent method for large-scale \(l_2\)-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
Friedman, J., Hastie, T., Robert, T.: A Note on the Group Lasso and a Sparse Group Lasso. Technical report (2010)
Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Sathiya Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In ICML 2008, pp. 408–415 (2008)
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res 35(3), 641–654 (2010)
Lewis, A.S., Wright, S.J.: A Proximal Method for Composite Minimization. Technical report (2008)
Li, Y., Osher, S.: Coordinate descent optimization for \(l_1\) minimization with application to compressed sensing: a greedy algorithm. Inverse Probl. Imaging 3, 487–503 (2009)
Luo, Z.Q., Tseng, P.: A coordinate gradient descent method for nonsmooth separable minimization. J. Optim. Theory Appl. 72 (1) (2002)
Meier, L., Van De Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer, Netherlands (2004)
Nesterov, Y.: Gradient Methods for Minimizing Composite Objective Function. Core discussion paper \(\#\) 2007/76, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) (2007)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient Block-Coordinate Descent Algorithms for the Group Lasso. Technical report (2010)
Richtárik, P., Takáč, M.: Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function. In: 4th Workshop on Signal Processing with Adaptive Sparse Structured, Representations (2011)
Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent method for huge-scale truss topology design. In: Operations Research Proceedings 2011, pp. 27–32. Springer (2012)
Saha, A., Tewari, A.: On the Finite Time Convergence of Cyclic Coordinate Descent Methods. CoRR, abs/1005.2146 (2010)
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(l_1\) regularized loss minimization. In: Proceedings of the 26th International Conference on, Machine Learning (2009)
Strohmer, T., Vershynin, R.: A randomized kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 268–288 (1996)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Tseng, P., Yun, S., Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513–535 (2009). doi:10.1007/s10957-008-9458-3
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B, 117, 387–423 (2009)
Wen, Z., Goldfarb, D., Scheinberg, K.: Block coordinate descent methods for semidefinite programming. In: Anjos, M.F., Lasserre, J.B. (eds.) Handbook on Semidefinite, Cone and Polynomial Optimization: Theory, Algorithms, Software and Applications. Springer (forthcoming)
Wright, S.J.: University of Wisconsin, Accelerated Block-Coordinate Relaxation for Regularized Optimization. Technical report (2010)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. Trans. Sig. Proc 57, 2479–2493 (2009)
Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale \(l_1\)-regularized linear classification. J. Mach. Learn. Res. 11(1), 3183–3234 (2010)
Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent Advances of Large-Scale Linear Classification. Technical report (2012)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)
Yun, S., Toh, K.-C.: A coordinate gradient descent method for \( l_1\)-regularized convex minimization. Comput. Optim. Appl. 48, 273–307 (2011)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Acknowledgments
We thank anonymous referees and Hui Zhang (National University of Defense Technology, China) for useful comments that helped to improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
An extended abstract of a preliminary version of this paper appeared in [15]. The work of the first author was supported in part by EPSRC grant EP/I017127/1 “Mathematics for vast digital resources”. The second author was supported in part by the Centre for Numerical algorithms and Intelligent Software (funded by EPSRC grant EP/G036136/1 and the Scottish Funding Council)
Rights and permissions
About this article
Cite this article
Richtárik, P., Takáč, M. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144, 1–38 (2014). https://doi.org/10.1007/s10107-012-0614-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-012-0614-z
Keywords
- Block coordinate descent
- Huge-scale optimization
- Composite minimization
- Iteration complexity
- Convex optimization
- LASSO
- Sparse regression
- Gradient descent
- Coordinate relaxation
- Gauss–Seidel method