Advertisement

Computational Optimization and Applications

, Volume 70, Issue 2, pp 351–394 | Cite as

A flexible coordinate descent method

  • Kimon Fountoulakis
  • Rachael TappendenEmail author
Article

Abstract

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model is minimized approximately/inexactly to determine the search direction. An inexpensive line search is then employed to ensure a monotonic decrease in the objective function and acceptance of large step sizes. We present several high probability iteration complexity results to show that convergence of FCD is guaranteed theoretically. Finally, we present numerical results on large-scale problems to demonstrate the practical performance of the method.

Keywords

Large scale optimization Second-order methods Curvature information Block coordinate descent Nonsmooth problems Iteration complexity Randomized 

Mathematics Subject Classification

49M15 49M37 65K05 90C06 90C25 90C53 

Notes

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and suggestions, which led to improvements of an earlier version of this paper.

References

  1. 1.
    Alart, P., Maisonneuve, O., Rockafellar, R.T.: Nonsmooth Mechanics and Analysis: Theoretical and Numerical Advances. Springer, New York (2006)CrossRefzbMATHGoogle Scholar
  2. 2.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)CrossRefzbMATHGoogle Scholar
  3. 3.
    Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex L-1 regularized optimization. Math. Program. 157(2), 375–396 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Candès, E.: Compressive sampling. In: International Congress of Mathematics, vol. 3, pp. 1433–1452, Madrid, Spain (2006)Google Scholar
  5. 5.
    Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Cassioli, A., Di Lorenzo, D., Sciandrone, M.: On the convergence of inexact block coordinate descent methods for constrained optimization. Eur. J. Oper. Res. 231(2), 274–281 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  8. 8.
    Daneshmand, A., Facchinei, F., Kungurtsev, V., Scutari, G.: Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization. IEEE Trans. Signal Process. 63(15), 3914–3929 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    De Santis, M., Lucidi, S., Rinaldi, F.: A fast active set block coordinate descent algorithm for \(\ell _1\)-regularized least squares. SIAM J. Optim. 26(1), 781–809 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Devolder, O., Glineur, F., Nesterov, Yu: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report, CORE-2013017 (2013)Google Scholar
  11. 11.
    Devolder, O., Glineur, F., Nesterov, Yu.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Facchinei, F., Sagratella, S., Scutari, G.: Flexible parallel algorithms for big data optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7208–7212 (2014)Google Scholar
  14. 14.
    Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Fountoulakis, K., Gondzio, J.: Performance of first- and second-order methods for \(\ell _1\)-regularized least squares problems. Comput. Optim. Appl. 65(3), 605–635 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Fountoulakis, K., Gondzio, J.: A second-order method for strongly convex \(\ell _1\)-regularization problems. Math. Program. 156(1), 189–219 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)Google Scholar
  19. 19.
    Karimi, S., Vavasis, S.: IMRO: a proximal quasi-Newton method for solving \(l_1\)-regularized least square problem. SIAM J. Optim. 27(2), 583–615 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for convex optimization. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 827–835. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4740-proximal-newton-type-methods-for-convex-optimization.pdf
  21. 21.
    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1), 615–642 (2014)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)CrossRefzbMATHGoogle Scholar
  25. 25.
    Nesterov, Yu.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, New York (1999)Google Scholar
  27. 27.
    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)Google Scholar
  28. 28.
    Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(2), 143–169 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Qu, Z., Richtárik, P., Takáč, M., Fercoq, O.: SDNA: stochastic dual newton ascent for empirical risk minimization. In: ICML’16 Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1823–1832, (2016)Google Scholar
  30. 30.
    Razaviyayn, M., Hong, M., Luo, Z.-Q., Pang, J.-S.: Parallel successive convex approximation for nonsmooth nonconvex optimization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1440–1448. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5609-parallel-successive-convex-approximation-for-nonsmooth-nonconvex-optimization.pdf
  31. 31.
    Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1), 433–484 (2016).  https://doi.org/10.1007/s10107-015-0901-6 MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Shalev-Schwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22(3), 983 (2012)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Tappenden, R., Takáč, M., Richtárik, P.: On the complexity of parallel coordinate descent. Optim. Methods Softw. 33(2), 372–395 (2018)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513–535 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47(2), 179–206 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Statistics, International Computer Science InstituteUniversity of California BerkeleyBerkeleyUSA
  2. 2.School of Mathematics and StatisticsUniversity of CanterburyChristchurchNew Zealand

Personalised recommendations