Markov chain block coordinate descent

Sun, Tao; Sun, Yuejiao; Xu, Yangyang; Yin, Wotao

doi:10.1007/s10589-019-00140-7

Markov chain block coordinate descent

Published: 22 October 2019

Volume 75, pages 35–61, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Tao Sun¹,
Yuejiao Sun²,
Yangyang Xu³ &
…
Wotao Yin ORCID: orcid.org/0000-0001-6697-9731²

641 Accesses
2 Citations
Explore all metrics

Abstract

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Article 11 April 2024

Notes

The time-homogeneous, irreducible, and aperiodic Markov chain is widely used; however, in practical problems, the Markov chain may not satisfy the time-homogeneous assumption. For example, in a mobile, if the network connectivity structure is changing all the time, then the set of the neighbors of an agent is time-varying [9].
The authors in [2, Corollary 3.8] present this results in the perspective of epochs, while here we present the rate in the perspective of iterations. Thus, their result is multiplied by N for comparison.

References

Allen-Zhu, Z., Qu, Z., Richtárik, P., Yuan, Y.: Even faster accelerated coordinate descent using non-uniform sampling. In: International Conference on Machine Learning (ICML), pp. 1110–1119 (2016)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MathSciNet Google Scholar
Bradley, R.C., et al.: Basic properties of strong mixing conditions—a survey and some open questions. Probab Surv 2, 107–144 (2005)
Article MathSciNet Google Scholar
Brucker, P., Drexl, A., Möhring, R., Neumann, K., Pesch, E.: Resource-constrained project scheduling: notation, classification, models, and methods. Eur. J. Oper. Res. 112(1), 3–41 (1999)
Article Google Scholar
Chow, Y.T., Wu, T., Yin, W.: Cyclic coordinate-update algorithms for fixed-point problems: analysis and applications. SIAM J. Sci. Comput. 39(4), A1280–A1300 (2017)
Article MathSciNet Google Scholar
Dang, C., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)
Article MathSciNet Google Scholar
Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)
Article MathSciNet Google Scholar
Hannah, R., Feng, F., Yin, W.: A2BCD: asynchronous acceleration with optimal complexity. In: International Conference on Learning Representations (ICLR), New Orleans, LA (2019)
Johansson, B., Rabi, M., Johansson, M.: A simple peer-to-peer algorithm for distributed optimization in sensor networks. In: 2007 46th IEEE Conference on Decision and Control, pp. 4705–4710. IEEE (2007)
Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, pp. 147–156. IEEE (2013)
Li, Y., Osher, S.: Coordinate descent optimization for \(\ell ^1\) minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging 3(3), 487–503 (2009)
Article MathSciNet Google Scholar
Li, Z., Uschmajew, A., Zhang, S.: On convergence of the maximum block improvement method. SIAM J. Optim. 25(1), 210–233 (2015)
Article MathSciNet Google Scholar
Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)
Article MathSciNet Google Scholar
Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, Berlin (2012)
MATH Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet Google Scholar
Nutini, J., Schmidt, M., Laradji, I.H., Friedlander, M., Koepke, H.: Coordinate descent converges faster with the Gauss–Southwell rule than random selection. In: International Conference on Machine Learning (ICML), pp. 1632–1641 (2015)
Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)
MathSciNet MATH Google Scholar
Peng, Z., Xu, Y., Yan, M., Yin, W.: ARock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)
Article MathSciNet Google Scholar
Peng, Z., Xu, Y., Yan, M., Yin, W.: On the convergence of asynchronous parallel iteration with arbitrary delays. J. Oper. Res. Soc. China 1(1), 5–42 (2019)
Article Google Scholar
Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In: 2013 Asilomar Conference On Signals, Systems and Computers, pp. 659–646. IEEE (2013)
Ram, S.S., Nedić, A., Veeravalli, V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20(2), 691–717 (2009)
Article MathSciNet Google Scholar
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1–2), 433–484 (2016)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
MATH Google Scholar
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1-regularized loss minimization. J. Mach. Learn. Res. 12(Jun), 1865–1892 (2011)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(Feb), 567–599 (2013)
MathSciNet MATH Google Scholar
Sun, R., Hong, M.: Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. In: Advances in Neural Information Processing Systems, pp. 1306–1314 (2015)
Sun, T., Hannah, R., Yin, W.: Asynchronous coordinate descent under more realistic assumptions. In: Advances in Neural Information Processing Systems, pp. 6183–6191 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1–2), 387–423 (2009)
Article MathSciNet Google Scholar
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Article MathSciNet Google Scholar
Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)
Article MathSciNet Google Scholar
Yin, W., Mao, X., Yuan, K., Gu, Y., Sayed, A.H.: A communication-efficient random-walk algorithm for decentralized optimization. arXiv preprint arXiv:1804.06568 (2018)

Download references

Author information

Authors and Affiliations

National Lab for Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha, 410073, Hunan, China
Tao Sun
Department of Mathematics, University of California, Los Angeles, CA, 90095, USA
Yuejiao Sun & Wotao Yin
Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Yangyang Xu

Authors

Tao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yuejiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yangyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wotao Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wotao Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work by Y. Sun and W. Yin was supported in part by NSF DMS-1720237 and ONR N0001417121. The work by Y. Xu was supported in part by NSF DMS-1719549 and an IBM Grant.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, T., Sun, Y., Xu, Y. et al. Markov chain block coordinate descent. Comput Optim Appl 75, 35–61 (2020). https://doi.org/10.1007/s10589-019-00140-7

Download citation

Received: 21 November 2018
Published: 22 October 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10589-019-00140-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov chain block coordinate descent

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Markov chain block coordinate descent

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation