# A flexible coordinate descent method

## Abstract

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model is minimized approximately/inexactly to determine the search direction. An inexpensive line search is then employed to ensure a monotonic decrease in the objective function and acceptance of large step sizes. We present several high probability iteration complexity results to show that convergence of FCD is guaranteed theoretically. Finally, we present numerical results on large-scale problems to demonstrate the practical performance of the method.

This is a preview of subscription content, log in to check access.

## Notes

1. 1.

See [31, Equation (2)] or [37, Definition 13] for a precise definition of ‘partial separability’.

2. 2.

An earlier version of this work, which, to the best of our knowledge, was the first to propose varying random block selection, is cited by [29].

3. 3.

Notice that, at any particular iteration of FCD, either of the two options inside the ‘min’ in (56) could be the smallest. However, $$\varLambda _{\max }\mathcal {R}^2(x_0)$$ is fixed throughout the iterations, and $$F(x_k) - F^*$$ is becoming smaller as the iterations progress by Lemma 1. Therefore, eventually it will be the case that the first term inside the ‘min’ in (56) will be the smallest, i.e., $$F(x_k) - F^* \le \varLambda _{\max }\mathcal {R}^2(x_0)$$.

4. 4.

We could simply have set $$\zeta = 1$$ initially, so that $$\mu _f \le \varLambda _{\max }\zeta$$ is satisfied by Assumption 8. However, we obtain a better complexity result by taking a smaller $$\zeta$$.

5. 5.

Using the notation of [32], we have $$n=N$$. Further, although UCDC allows arbitrary probabilities in the smooth case, we restrict our attention to uniform probabilities only, so N appears in the C-S and SC-S results in Table 2.

6. 6.

Clearly, there is a trade-off when selecting $$\rho$$. The larger $$\rho$$ is, the smaller the condition number of $$H_k$$, so that (19) will be solved quickly using an iterative solver. However, if $$\rho$$ is too large then $$H_k^S \approx \rho I$$ so that essential second order information from $$\nabla ^2 f(x_k)$$ may be lost.

## References

1. 1.

Alart, P., Maisonneuve, O., Rockafellar, R.T.: Nonsmooth Mechanics and Analysis: Theoretical and Numerical Advances. Springer, New York (2006)

2. 2.

Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

3. 3.

Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex L-1 regularized optimization. Math. Program. 157(2), 375–396 (2016)

4. 4.

Candès, E.: Compressive sampling. In: International Congress of Mathematics, vol. 3, pp. 1433–1452, Madrid, Spain (2006)

5. 5.

Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

6. 6.

Cassioli, A., Di Lorenzo, D., Sciandrone, M.: On the convergence of inexact block coordinate descent methods for constrained optimization. Eur. J. Oper. Res. 231(2), 274–281 (2013)

7. 7.

Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

8. 8.

Daneshmand, A., Facchinei, F., Kungurtsev, V., Scutari, G.: Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization. IEEE Trans. Signal Process. 63(15), 3914–3929 (2015)

9. 9.

De Santis, M., Lucidi, S., Rinaldi, F.: A fast active set block coordinate descent algorithm for $$\ell _1$$-regularized least squares. SIAM J. Optim. 26(1), 781–809 (2016)

10. 10.

Devolder, O., Glineur, F., Nesterov, Yu: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report, CORE-2013017 (2013)

11. 11.

Devolder, O., Glineur, F., Nesterov, Yu.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)

12. 12.

Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

13. 13.

Facchinei, F., Sagratella, S., Scutari, G.: Flexible parallel algorithms for big data optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7208–7212 (2014)

14. 14.

Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)

15. 15.

Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)

16. 16.

Fountoulakis, K., Gondzio, J.: Performance of first- and second-order methods for $$\ell _1$$-regularized least squares problems. Comput. Optim. Appl. 65(3), 605–635 (2016)

17. 17.

Fountoulakis, K., Gondzio, J.: A second-order method for strongly convex $$\ell _1$$-regularization problems. Math. Program. 156(1), 189–219 (2016)

18. 18.

Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)

19. 19.

Karimi, S., Vavasis, S.: IMRO: a proximal quasi-Newton method for solving $$l_1$$-regularized least square problem. SIAM J. Optim. 27(2), 583–615 (2017)

20. 20.

Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for convex optimization. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 827–835. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4740-proximal-newton-type-methods-for-convex-optimization.pdf

21. 21.

Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)

22. 22.

Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1), 615–642 (2014)

23. 23.

Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)

24. 24.

Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)

25. 25.

Nesterov, Yu.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

26. 26.

Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, New York (1999)

27. 27.

Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

28. 28.

Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)

29. 29.

Qu, Z., Richtárik, P., Takáč, M., Fercoq, O.: SDNA: stochastic dual newton ascent for empirical risk minimization. In: ICML’16 Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1823–1832, (2016)

30. 30.

Razaviyayn, M., Hong, M., Luo, Z.-Q., Pang, J.-S.: Parallel successive convex approximation for nonsmooth nonconvex optimization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1440–1448. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5609-parallel-successive-convex-approximation-for-nonsmooth-nonconvex-optimization.pdf

31. 31.

Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1), 433–484 (2016). https://doi.org/10.1007/s10107-015-0901-6

32. 32.

Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

33. 33.

Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)

34. 34.

Shalev-Schwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)

35. 35.

Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22(3), 983 (2012)

36. 36.

Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)

37. 37.

Tappenden, R., Takáč, M., Richtárik, P.: On the complexity of parallel coordinate descent. Optim. Methods Softw. 33(2), 372–395 (2018)

38. 38.

Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)

39. 39.

Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)

40. 40.

Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513–535 (2009)

41. 41.

Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47(2), 179–206 (2010)

42. 42.

Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)

43. 43.

Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)

## Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and suggestions, which led to improvements of an earlier version of this paper.

## Author information

Authors

### Corresponding author

Correspondence to Rachael Tappenden.

## Rights and permissions

Reprints and Permissions

Fountoulakis, K., Tappenden, R. A flexible coordinate descent method. Comput Optim Appl 70, 351–394 (2018). https://doi.org/10.1007/s10589-018-9984-3

• Published:

• Issue Date:

### Keywords

• Large scale optimization
• Second-order methods
• Curvature information
• Block coordinate descent
• Nonsmooth problems
• Iteration complexity
• Randomized

• 49M15
• 49M37
• 65K05
• 90C06
• 90C25
• 90C53