A flexible coordinate descent method

Abstract

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model is minimized approximately/inexactly to determine the search direction. An inexpensive line search is then employed to ensure a monotonic decrease in the objective function and acceptance of large step sizes. We present several high probability iteration complexity results to show that convergence of FCD is guaranteed theoretically. Finally, we present numerical results on large-scale problems to demonstrate the practical performance of the method.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    See [31, Equation (2)] or [37, Definition 13] for a precise definition of ‘partial separability’.

  2. 2.

    An earlier version of this work, which, to the best of our knowledge, was the first to propose varying random block selection, is cited by [29].

  3. 3.

    Notice that, at any particular iteration of FCD, either of the two options inside the ‘min’ in (56) could be the smallest. However, \(\varLambda _{\max }\mathcal {R}^2(x_0)\) is fixed throughout the iterations, and \(F(x_k) - F^*\) is becoming smaller as the iterations progress by Lemma 1. Therefore, eventually it will be the case that the first term inside the ‘min’ in (56) will be the smallest, i.e., \(F(x_k) - F^* \le \varLambda _{\max }\mathcal {R}^2(x_0)\).

  4. 4.

    We could simply have set \(\zeta = 1\) initially, so that \(\mu _f \le \varLambda _{\max }\zeta \) is satisfied by Assumption 8. However, we obtain a better complexity result by taking a smaller \(\zeta \).

  5. 5.

    Using the notation of [32], we have \(n=N\). Further, although UCDC allows arbitrary probabilities in the smooth case, we restrict our attention to uniform probabilities only, so N appears in the C-S and SC-S results in Table 2.

  6. 6.

    Clearly, there is a trade-off when selecting \(\rho \). The larger \(\rho \) is, the smaller the condition number of \(H_k\), so that (19) will be solved quickly using an iterative solver. However, if \(\rho \) is too large then \(H_k^S \approx \rho I \) so that essential second order information from \(\nabla ^2 f(x_k)\) may be lost.

References

  1. 1.

    Alart, P., Maisonneuve, O., Rockafellar, R.T.: Nonsmooth Mechanics and Analysis: Theoretical and Numerical Advances. Springer, New York (2006)

    Google Scholar 

  2. 2.

    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

    Google Scholar 

  3. 3.

    Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex L-1 regularized optimization. Math. Program. 157(2), 375–396 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Candès, E.: Compressive sampling. In: International Congress of Mathematics, vol. 3, pp. 1433–1452, Madrid, Spain (2006)

  5. 5.

    Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  6. 6.

    Cassioli, A., Di Lorenzo, D., Sciandrone, M.: On the convergence of inexact block coordinate descent methods for constrained optimization. Eur. J. Oper. Res. 231(2), 274–281 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  8. 8.

    Daneshmand, A., Facchinei, F., Kungurtsev, V., Scutari, G.: Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization. IEEE Trans. Signal Process. 63(15), 3914–3929 (2015)

    MathSciNet  Article  Google Scholar 

  9. 9.

    De Santis, M., Lucidi, S., Rinaldi, F.: A fast active set block coordinate descent algorithm for \(\ell _1\)-regularized least squares. SIAM J. Optim. 26(1), 781–809 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Devolder, O., Glineur, F., Nesterov, Yu: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report, CORE-2013017 (2013)

  11. 11.

    Devolder, O., Glineur, F., Nesterov, Yu.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Facchinei, F., Sagratella, S., Scutari, G.: Flexible parallel algorithms for big data optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7208–7212 (2014)

  14. 14.

    Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Fountoulakis, K., Gondzio, J.: Performance of first- and second-order methods for \(\ell _1\)-regularized least squares problems. Comput. Optim. Appl. 65(3), 605–635 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Fountoulakis, K., Gondzio, J.: A second-order method for strongly convex \(\ell _1\)-regularization problems. Math. Program. 156(1), 189–219 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)

  19. 19.

    Karimi, S., Vavasis, S.: IMRO: a proximal quasi-Newton method for solving \(l_1\)-regularized least square problem. SIAM J. Optim. 27(2), 583–615 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for convex optimization. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 827–835. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4740-proximal-newton-type-methods-for-convex-optimization.pdf

  21. 21.

    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1), 615–642 (2014)

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)

    Google Scholar 

  25. 25.

    Nesterov, Yu.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, New York (1999)

    Google Scholar 

  27. 27.

    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  28. 28.

    Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Qu, Z., Richtárik, P., Takáč, M., Fercoq, O.: SDNA: stochastic dual newton ascent for empirical risk minimization. In: ICML’16 Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1823–1832, (2016)

  30. 30.

    Razaviyayn, M., Hong, M., Luo, Z.-Q., Pang, J.-S.: Parallel successive convex approximation for nonsmooth nonconvex optimization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1440–1448. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5609-parallel-successive-convex-approximation-for-nonsmooth-nonconvex-optimization.pdf

  31. 31.

    Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1), 433–484 (2016). https://doi.org/10.1007/s10107-015-0901-6

    MathSciNet  Article  MATH  Google Scholar 

  32. 32.

    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    Shalev-Schwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  35. 35.

    Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22(3), 983 (2012)

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Tappenden, R., Takáč, M., Richtárik, P.: On the complexity of parallel coordinate descent. Optim. Methods Softw. 33(2), 372–395 (2018)

    MathSciNet  Article  Google Scholar 

  38. 38.

    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  39. 39.

    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  40. 40.

    Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513–535 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  41. 41.

    Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47(2), 179–206 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  42. 42.

    Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  43. 43.

    Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and suggestions, which led to improvements of an earlier version of this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rachael Tappenden.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fountoulakis, K., Tappenden, R. A flexible coordinate descent method. Comput Optim Appl 70, 351–394 (2018). https://doi.org/10.1007/s10589-018-9984-3

Download citation

Keywords

  • Large scale optimization
  • Second-order methods
  • Curvature information
  • Block coordinate descent
  • Nonsmooth problems
  • Iteration complexity
  • Randomized

Mathematics Subject Classification

  • 49M15
  • 49M37
  • 65K05
  • 90C06
  • 90C25
  • 90C53