Skip to main content
Log in

Markov chain block coordinate descent

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. The time-homogeneous, irreducible, and aperiodic Markov chain is widely used; however, in practical problems, the Markov chain may not satisfy the time-homogeneous assumption. For example, in a mobile, if the network connectivity structure is changing all the time, then the set of the neighbors of an agent is time-varying [9].

  2. The authors in [2, Corollary 3.8] present this results in the perspective of epochs, while here we present the rate in the perspective of iterations. Thus, their result is multiplied by N for comparison.

References

  1. Allen-Zhu, Z., Qu, Z., Richtárik, P., Yuan, Y.: Even faster accelerated coordinate descent using non-uniform sampling. In: International Conference on Machine Learning (ICML), pp. 1110–1119 (2016)

  2. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)

    Article  MathSciNet  Google Scholar 

  3. Bradley, R.C., et al.: Basic properties of strong mixing conditions—a survey and some open questions. Probab Surv 2, 107–144 (2005)

    Article  MathSciNet  Google Scholar 

  4. Brucker, P., Drexl, A., Möhring, R., Neumann, K., Pesch, E.: Resource-constrained project scheduling: notation, classification, models, and methods. Eur. J. Oper. Res. 112(1), 3–41 (1999)

    Article  Google Scholar 

  5. Chow, Y.T., Wu, T., Yin, W.: Cyclic coordinate-update algorithms for fixed-point problems: analysis and applications. SIAM J. Sci. Comput. 39(4), A1280–A1300 (2017)

    Article  MathSciNet  Google Scholar 

  6. Dang, C., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)

    Article  MathSciNet  Google Scholar 

  7. Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)

    Article  MathSciNet  Google Scholar 

  8. Hannah, R., Feng, F., Yin, W.: A2BCD: asynchronous acceleration with optimal complexity. In: International Conference on Learning Representations (ICLR), New Orleans, LA (2019)

  9. Johansson, B., Rabi, M., Johansson, M.: A simple peer-to-peer algorithm for distributed optimization in sensor networks. In: 2007 46th IEEE Conference on Decision and Control, pp. 4705–4710. IEEE (2007)

  10. Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, pp. 147–156. IEEE (2013)

  11. Li, Y., Osher, S.: Coordinate descent optimization for \(\ell ^1\) minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging 3(3), 487–503 (2009)

    Article  MathSciNet  Google Scholar 

  12. Li, Z., Uschmajew, A., Zhang, S.: On convergence of the maximum block improvement method. SIAM J. Optim. 25(1), 210–233 (2015)

    Article  MathSciNet  Google Scholar 

  13. Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)

    Article  MathSciNet  Google Scholar 

  14. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, Berlin (2012)

    MATH  Google Scholar 

  15. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  Google Scholar 

  16. Nutini, J., Schmidt, M., Laradji, I.H., Friedlander, M., Koepke, H.: Coordinate descent converges faster with the Gauss–Southwell rule than random selection. In: International Conference on Machine Learning (ICML), pp. 1632–1641 (2015)

  17. Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)

    MathSciNet  MATH  Google Scholar 

  18. Peng, Z., Xu, Y., Yan, M., Yin, W.: ARock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)

    Article  MathSciNet  Google Scholar 

  19. Peng, Z., Xu, Y., Yan, M., Yin, W.: On the convergence of asynchronous parallel iteration with arbitrary delays. J. Oper. Res. Soc. China 1(1), 5–42 (2019)

    Article  Google Scholar 

  20. Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In: 2013 Asilomar Conference On Signals, Systems and Computers, pp. 659–646. IEEE (2013)

  21. Ram, S.S., Nedić, A., Veeravalli, V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20(2), 691–717 (2009)

    Article  MathSciNet  Google Scholar 

  22. Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1–2), 433–484 (2016)

    Article  MathSciNet  Google Scholar 

  23. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer, Berlin (2009)

    MATH  Google Scholar 

  24. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1-regularized loss minimization. J. Mach. Learn. Res. 12(Jun), 1865–1892 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(Feb), 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  26. Sun, R., Hong, M.: Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. In: Advances in Neural Information Processing Systems, pp. 1306–1314 (2015)

  27. Sun, T., Hannah, R., Yin, W.: Asynchronous coordinate descent under more realistic assumptions. In: Advances in Neural Information Processing Systems, pp. 6183–6191 (2017)

  28. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  29. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1–2), 387–423 (2009)

    Article  MathSciNet  Google Scholar 

  30. Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)

    Article  MathSciNet  Google Scholar 

  31. Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)

    Article  MathSciNet  Google Scholar 

  32. Yin, W., Mao, X., Yuan, K., Gu, Y., Sayed, A.H.: A communication-efficient random-walk algorithm for decentralized optimization. arXiv preprint arXiv:1804.06568 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wotao Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work by Y. Sun and W. Yin was supported in part by NSF DMS-1720237 and ONR N0001417121. The work by Y. Xu was supported in part by NSF DMS-1719549 and an IBM Grant.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, T., Sun, Y., Xu, Y. et al. Markov chain block coordinate descent. Comput Optim Appl 75, 35–61 (2020). https://doi.org/10.1007/s10589-019-00140-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00140-7

Keywords

Mathematics Subject Classification

Navigation