Abstract
Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.
Similar content being viewed by others
References
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Beck, A., Teboulle, M.: A fast iterative shrinkage-threshold algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Englewood Cliffs (1989)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. Ser. A 146, 1–36 (2014)
Bouman, C.A., Sauer, K.: A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 5(3), 480–492 (1996)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction methods of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordunate descent for \(\ell _1\)-regularized loss minimization. In: Proceedings of the 28 International Conference on Machine Learning (ICML 2011) (2011)
Breheny, P., Huang, J.: Coordinate descent algroithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–252 (2011)
Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12(5), 963–972 (2003)
Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Technical Report, RUTCOR, Rutgers University (2014)
Fercoq, O., Qu, Z., Richtarik, P., Takac, M.: Fast distributed coordinate descent for non-strongly convex losses (2014). arxiv:1405.5300
Fercoq, O., Richtarik, P.: Accelerated, parallel, and proximal coordinate descent. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1312.5799
Florian, M., Chen, Y.: A coordinate descent method for the bilevel O-D matrix adjustment problem. Int. Trans. Oper. Res. 2(2), 165–179 (1995)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hoffman, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3068–3076. Curran Associates (2014)
Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. Technical Report (2012). arXiv:1212.0467
Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de l’Academie Polonaise des Sciences et des Lettres 35, 355–357 (1937)
Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorihtms for solving linear systems. In: 54th Annual Symposium on Foundations of Computer Science, pp. 147–156 (2013)
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)
Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to empirical risk minimization. Technical Report, Microsoft Research (2014). arXiv:1407.1296
Liu, H., Palatucci, M., Zhang, J.: Lockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09, pp. 649–656. ACM, New York, NY, USA (2009)
Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. Technical Report, University of Wisconsin, Madison. (2014). (To appear in SIAM Journal on Optimization). arXiv:1403.3862
Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Journal of Machine Learning Research). arXiv:1311.1873
Liu, J., Wright, S.J., Sridhar, S.: An accelerated randomized Kaczmarz algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Mathematics of Computation). arXiv 1310.2887
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Marecek, J., Richtarik, P., Takac, M.: Distributed block coordinate descent for minimizing partially separable functions. Technical Report arXiv:1406.0238 (2014)
Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)
Necoara, I., Clipici, D.: Distributed random coordinate descent method for composite minimization. Technical Report 1–41, University Politehnica Bucharest (2013)
Nesterov, Y.: A method for unconstrained convex problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, New York (2004)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 693–701. Curran Associates (2011)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. (2013). doi:10.1007/s10898-014-0151-9
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Richtarik, P., Takac, M.: Parallel coordinate descent methods for big data optimization. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1212.0873
Richtarik, P., Takac, M.: Iteration complexity of a randomized block-coordinate descent methods for minimizing a composite function. Math. Program. Ser. A 144(1), 1–38 (2014)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Sardy, S., Bruce, A., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9, 361–379 (2000)
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent mehods for regularized loss minimization. J. Mach. Learn. Res. 14, 437–469 (2013)
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1996)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)
Ye, J.C., Webb, K.J., Bouman, C.A., Millane, R.P.: Optical diffusion tomography by iterative-coordinate-descent optimization in a bayesian framework. J. Opt. Soc. Am. A 16(10), 2400–2412 (1999)
Acknowledgments
I thank Ji Liu for the pleasure of collaborating with him on this topic over the past two years. I am grateful to the editors and referees of the paper, whose expert and constructive comments led to numerous improvements.
Author information
Authors and Affiliations
Corresponding author
Additional information
The author was supported by NSF Awards DMS-1216318 and IIS-1447449, ONR Award N00014-13-1-0129, AFOSR Award FA9550-13-1-0138, and Subcontract 3F-30222 from Argonne National Laboratory.
Rights and permissions
About this article
Cite this article
Wright, S.J. Coordinate descent algorithms. Math. Program. 151, 3–34 (2015). https://doi.org/10.1007/s10107-015-0892-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0892-3