A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Xu, Yangyang; Yin, Wotao

doi:10.1007/s10915-017-0376-0

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Published: 04 February 2017

Volume 72, pages 700–734, (2017)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

4339 Accesses
126 Citations
Explore all metrics

Abstract

Nonconvex optimization arises in many areas of computational science and engineering. However, most nonconvex optimization algorithms are only known to have local convergence or subsequence convergence properties. In this paper, we propose an algorithm for nonconvex optimization and establish its global convergence (of the whole sequence) to a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate its efficiency. In our algorithm, the variables of the underlying problem are either treated as one block or multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function, or each constraint, applies only to one block of variables. The differentiable components of the objective function, however, can involve multiple blocks of variables together. Our algorithm updates one block of variables at a time by minimizing a certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The order of update can be either deterministically cyclic or randomly shuffled for each cycle. In fact, our convergence analysis only needs that each block be updated at least once in every fixed number of iterations. We show its global convergence (of the whole sequence) to a critical point under fairly loose conditions including, in particular, the Kurdyka–Łojasiewicz condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These results, of course, remain valid when the underlying problem is convex. We apply our convergence results to the coordinate descent iteration for non-convex regularized linear regression, as well as a modified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically, we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves the chance to avoid low-quality local solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiency of higher-order algorithms for minimizing composite functions

Article 10 October 2023

Preconditioned golden ratio primal-dual algorithm with linesearch

Article 16 April 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Notes

A function f is proximable if it is easy to obtain the minimizer of $f(x)+\frac{1}{2\gamma }\Vert x-y\Vert ^2$ for any input y and $\gamma >0$.
A function F on $\mathbb {R}^n$ is differentiable at point ${\mathbf {x}}$ if there exists a vector ${\mathbf {g}}$ such that $\lim _{{\mathbf {h}}\rightarrow 0}\frac{|F({\mathbf {x}}+{\mathbf {h}})-F({\mathbf {x}})-{\mathbf {g}}^\top {\mathbf {h}}|}{\Vert {\mathbf {h}}\Vert }=0$
Note that from Remark 2, for convex problems, we can take larger extrapolation weight but require it to be uniformly less than one. Hence, although our algorithm framework includes FISTA as a special case, our whole sequence convergence result does not imply that of FISTA.
Another restarting option is tested based on gradient information.
It is stated in [14] that the sequence generated by (42) converges to a coordinate-wise minimizer of (38). However, the result is obtained directly from [55], which only guarantees subsequence convergence.

References

Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Article Google Scholar
Allen, G.: Sparse higher-order principal components analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 27–36. (2012)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Bagirov, A.M., Jin, L., Karmitsa, N., Al Nuaimat, A., Sultanova, N.: Subgradient method for nonconvex nonsmooth optimization. J. Optim. Theory Appl. 157(2), 416–435 (2013)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)
Article MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article MATH Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Article MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. 146(1), 459–494 (2014)
MathSciNet MATH Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–253 (2011)
Article MathSciNet MATH Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Article MathSciNet MATH Google Scholar
Chang, K.W., Hsieh, C.J., Lin, C.J.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
MathSciNet MATH Google Scholar
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 3869–3872. IEEE (2008)
Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134(1), 71–99 (2012)
Article MathSciNet MATH Google Scholar
Donoho, D., Stodden, V.: When does non-negative matrix factorization give a correct decomposition into parts. In: Advances in Neural Information Processing Systems, vol. 16. (2003)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fuduli, A., Gaudioso, M., Giallombardo, G.: Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. 14(3), 743–756 (2004)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1), 59–99 (2016)
Article MathSciNet MATH Google Scholar
Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)
Article MathSciNet MATH Google Scholar
Hildreth, C.: A quadratic programming procedure. Naval Res. Logist. Q. 4(1), 79–85 (1957)
Article MathSciNet Google Scholar
Ho, N., Van Dooren, P., Blondel, V.: Descent methods for nonnegative matrix factorization. In: Numerical Linear Algebra in Signals, Systems and Control, pp. 251–293. Springer, Netherlands (2011)
Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.: Iteration complexity analysis of block coordinate descent methods. arXiv preprint arXiv:1310.6957 (2013)
Hoyer, P.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
MathSciNet MATH Google Scholar
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)
Article MathSciNet MATH Google Scholar
Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455 (2009)
Article MathSciNet MATH Google Scholar
Kruger, A.Y.: On fréchet subdifferentials. J. Math. Sci. 116(3), 3325–3358 (2003)
Article MathSciNet MATH Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier. 48(3), 769–783 (1998)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed $\ell _q$ minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)
Article MathSciNet MATH Google Scholar
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Lojasiewicz inequality and its applications to linear convergence of first-order methods. arXiv preprint arXiv:1602.02915 (2016)
Ling, Q., Xu, Y., Yin, W., Wen, Z.: Decentralized low-rank matrix completion. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 2925–2928. IEEE (2012)
Łojasiewicz, S.: Sur la géométrie semi-et sous-analytique. Ann. Inst. Fourier (Grenoble) 43(5), 1575–1595 (1993)
Article MathSciNet MATH Google Scholar
Lu, Z., Xiao, L.: Randomized block coordinate non-monotone gradient method for a class of nonlinear programming. arXiv preprint arXiv:1306.5918 (2013)
Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
Article MathSciNet MATH Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 689–696. ACM (2009)
Mohan, K., Fazel, M.: Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. 13(1), 3441–3473 (2012)
MathSciNet MATH Google Scholar
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory lectures on convex optimization: a basic course, vol. 87. Springer Science & Business Media, Berlin (2013)
MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, Springer Series in Operations Research and Financial Engineering., 2nd edn. Springer, New York (2006)
Google Scholar
O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2013)
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)
Google Scholar
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Article MathSciNet MATH Google Scholar
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MathSciNet MATH Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1), 1–38 (2014)
Article MathSciNet MATH Google Scholar
Rockafellar, R., Wets, R.: Variational Analysis, vol. 317. Springer, Berlin (2009)
MATH Google Scholar
Saha, A., Tewari, A.: On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)
Article MathSciNet MATH Google Scholar
Shi, H.J.M., Tu, S., Xu, Y., Yin, W.: A primer on coordinate descent algorithms. arXiv preprint arXiv:1610.00040 (2016)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Article MathSciNet MATH Google Scholar
Welling, M., Weber, M.: Positive tensor factorization. Pattern Recogn. Lett. 22(12), 1255–1261 (2001)
Article MATH Google Scholar
Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4(4), 333–361 (2012)
Article MathSciNet MATH Google Scholar
Xu, Y.: Alternating proximal gradient method for sparse nonnegative tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)
Article MathSciNet MATH Google Scholar
Xu, Y., Akrotirianakis, I., Chakraborty, A.: Proximal gradient method for huberized support vector machine. Pattern Anal. Appl. 19(4), 989–1005 (2016)
Article MathSciNet Google Scholar
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Article MathSciNet MATH Google Scholar
Xu, Y., Yin, W.: A fast patch-dictionary method for whole image recovery. Inverse Probl. Imaging 10(2), 563–583 (2016)
Article MathSciNet MATH Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Funding was provided in part by National Science Foundation (Grant No. DMS-1317602 and EECS-1462397) and Office of Naval Research (Grant No. N000141712162).

Author information

Authors and Affiliations

Department of Mathematics, University of Alabama, Tuscaloosa, AL, USA
Yangyang Xu
Department of Mathematics, UCLA, Los Angeles, CA, USA
Wotao Yin

Authors

Yangyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wotao Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangyang Xu.

Additional information

This work is supported in part by NSF DMS-1317602, EECS-1462397, and ONR N000141712162.

Appendices

Appendix 1: Proofs of Key Lemmas

In this section, we give proofs of the lemmas and also propositions we used.

1.1 Proof of Lemma 1

We show the general case of $\alpha _k=\frac{1}{\gamma L_k},\forall k$ and $\tilde{\omega }_i^j\le \frac{\delta (\gamma -1)}{2(\gamma +1)}\sqrt{\tilde{L}_i^{j-1}/\tilde{L}_i^{j}},\,\forall i,j$. Assume $b_k=i$. From the Lipschitz continuity of $\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},{\mathbf {x}}_i)$ about ${\mathbf {x}}_i$, it holds that (e.g., see Lemma 2.1 in [61])

$$\begin{aligned} f({\mathbf {x}}^{k})\le f({\mathbf {x}}^{k-1})+\langle \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1}),{\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\rangle +\frac{L_k}{2}\Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2. \end{aligned}$$

(53)

Since ${\mathbf {x}}_i^{k}$ is the minimizer of (2), then

$$\begin{aligned}&\langle \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},\hat{{\mathbf {x}}}_i^k),{\mathbf {x}}_i^{k}-\hat{{\mathbf {x}}}_i^k\rangle +\frac{1}{2\alpha _k}\Vert {\mathbf {x}}_i^{k}-\hat{{\mathbf {x}}}_i^k\Vert ^2+ r_i({\mathbf {x}}_i^{k})\nonumber \\&\qquad \le \langle \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},\hat{{\mathbf {x}}}_i^k),{\mathbf {x}}_i^{k-1}-\hat{{\mathbf {x}}}_i^k\rangle + \frac{1}{2\alpha _k}\Vert {\mathbf {x}}_i^{k-1}-\hat{{\mathbf {x}}}_i^k\Vert ^2+r_i({\mathbf {x}}_i^{k-1}). \end{aligned}$$

(54)

Summing (53) and (54) and noting that ${\mathbf {x}}_j^{k+1}={\mathbf {x}}_j^k,\forall j\ne i$, we have

$$\begin{aligned}&F({\mathbf {x}}^{k-1})-F({\mathbf {x}}^{k})\\&\quad = f({\mathbf {x}}^{k-1})+r_i({\mathbf {x}}_i^{k-1})-f({\mathbf {x}}^k)-r_i({\mathbf {x}}_i^k)\\&\quad \ge \langle \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},\hat{{\mathbf {x}}}_i^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1}),{\mathbf {x}}_i^{k}-{{\mathbf {x}}}_i^{k-1}\rangle +\frac{1}{2\alpha _k}\Vert {\mathbf {x}}_i^{k}-\hat{{\mathbf {x}}}_i^k\Vert ^2\\&\qquad -\frac{1}{2\alpha _k}\Vert {\mathbf {x}}_i^{k-1}-\hat{{\mathbf {x}}}_i^k\Vert ^2-\frac{L_k}{2}\Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2\\&\quad = \langle \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},\hat{{\mathbf {x}}}_i^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1}),{\mathbf {x}}_i^{k}-{{\mathbf {x}}}_i^{k-1}\rangle +\frac{1}{\alpha _k}\langle {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1},{\mathbf {x}}_i^{k-1}-\hat{{\mathbf {x}}}_i^k\rangle \\&\qquad +\left( \frac{1}{2\alpha _k}-\frac{L_k}{2}\right) \Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2\\&\quad \ge -\Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert \left( \Vert \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},\hat{{\mathbf {x}}}_i^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1})\Vert +\frac{1}{\alpha _k}\Vert {\mathbf {x}}_i^{k-1}-\hat{{\mathbf {x}}}_i^k\Vert \right) \\&\qquad +\left( \frac{1}{2\alpha _k}-\frac{L_k}{2}\right) \Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2\\&\quad \ge -\left( \frac{1}{\alpha _k}+L_k\right) \Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert \cdot \Vert {\mathbf {x}}_i^{k-1}-\hat{{\mathbf {x}}}_i^k\Vert +\left( \frac{1}{2\alpha _k}-\frac{L_k}{2}\right) \Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2\\&\qquad \quad \overset{(6)}{=} -\left( \frac{1}{\alpha _k}+L_k\right) \omega _k\Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert \cdot \Vert {\mathbf {x}}_i^{k-1}-\tilde{{\mathbf {x}}}_i^{d_i^{k-1}-1}\Vert +\left( \frac{1}{2\alpha _k}-\frac{L_k}{2}\right) \Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2\\&\quad \ge \frac{1}{4}\left( \frac{1}{\alpha _k}-L_k\right) \Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2-\frac{\left( 1/\alpha _k+L_k\right) ^2}{1/\alpha _k-L_k}\omega _k^2\Vert {\mathbf {x}}_i^{k-1}-\tilde{{\mathbf {x}}}_i^{d_i^{k-1}-1}\Vert ^2\\&\quad =\frac{\left( \gamma -1\right) L_k}{4}\Vert {\mathbf {x}}_i^{k}-{\mathbf {x}}_i^{k-1}\Vert ^2-\frac{\left( \gamma +1\right) ^2}{\gamma -1}L_k\omega _k^2\Vert {\mathbf {x}}_i^{k-1}-\tilde{{\mathbf {x}}}_i^{d_i^{k-1}-1}\Vert ^2. \end{aligned}$$

Here, we have used Cauchy–Schwarz inequality in the second inequality, Lipschitz continuity of $\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},{\mathbf {x}}_i)$ in the third one, the Young’s inequality in the fourth one, the fact ${\mathbf {x}}_i^{k-1}=\tilde{{\mathbf {x}}}_i^{d_i^k-1}$ to have the third equality, and $\alpha _k=\frac{1}{\gamma L_k}$ to get the last equality. Substituting $\tilde{\omega }_i^j\le \frac{\delta (\gamma -1)}{2(\gamma +1)}\sqrt{\tilde{L}_i^{j-1}/\tilde{L}_i^{j}}$ and recalling (8) completes the proof.

1.2 Proof of the Claim in Remark 2

Assume $b_k=i$ and $\alpha _k=\frac{1}{L_k}$. When f is block multi-convex and $r_i$ is convex, from Lemma 2.1 of [61], it follows that

$$\begin{aligned}&F({\mathbf {x}}^{k-1})-F({\mathbf {x}}^k)\\&\qquad \ge \frac{L_k}{2}\Vert {\mathbf {x}}_i^k-\hat{{\mathbf {x}}}_i^k\Vert ^2+L_k\langle \hat{{\mathbf {x}}}_i^k-{\mathbf {x}}_i^{k-1},{\mathbf {x}}_i^k-\hat{{\mathbf {x}}}_i^k\rangle \\&\qquad \quad \overset{(6)}{=} \frac{L_k}{2}\Vert {\mathbf {x}}_i^k-{{\mathbf {x}}}_i^{k-1}-\omega _k({{\mathbf {x}}}_i^{k-1}-{{\mathbf {x}}}_i^{d_i^{k-1}-1})\Vert ^2\\&\qquad +L_k\omega _k\left\langle {{\mathbf {x}}}_i^{k-1}-{{\mathbf {x}}}_i^{d_i^{k-1}-1}, {\mathbf {x}}_i^k-{{\mathbf {x}}}_i^{k-1}-\omega _k\left( {{\mathbf {x}}}_i^{k-1}-{{\mathbf {x}}}_i^{d_i^{k-1}-1}\right) \right\rangle \\&\quad = \frac{L_k}{2}\Vert {\mathbf {x}}_i^k-{{\mathbf {x}}}_i^{k-1}\Vert ^2-\frac{L_k\omega _k^2}{2}\Vert {\mathbf {x}}_i^{k-1}-{{\mathbf {x}}}_i^{d_i^{k-1}-1}\Vert ^2. \end{aligned}$$

Hence, if $\omega _k\le \delta \sqrt{\tilde{L}_i^{j-1}/\tilde{L}_i^j}$, we have the desired result.

1.3 Proof of Proposition 1

Summing (14) over k from 1 to K gives

$$\begin{aligned} F(\mathbf{x}^0)-F(\mathbf{x}^K)\ge&\ \sum _{i=1}^s\sum _{k=1}^K\sum _{j=d_i^{k-1}+1}^{d_i^k}\left( \frac{\tilde{L}_i^j}{4}\Vert \tilde{\mathbf{x}}_i^{j-1}-\tilde{\mathbf{x}}_i^j\Vert ^2-\frac{\tilde{L}_i^{j-1}\delta ^2}{4}\Vert \tilde{\mathbf{x}}_i^{j-2}-\tilde{\mathbf{x}}_i^{j-1}\Vert ^2\right) \\ =&\ \sum _{i=1}^s\sum _{j=1}^{d_i^{K}}\left( \frac{\tilde{L}_i^j}{4}\Vert \tilde{\mathbf{x}}_i^{j-1}-\tilde{\mathbf{x}}_i^j\Vert ^2-\frac{\tilde{L}_i^{j-1}\delta ^2}{4}\Vert \tilde{\mathbf{x}}_i^{j-2}-\tilde{\mathbf{x}}_i^{j-1}\Vert ^2\right) \\ \ge&\ \sum _{i=1}^s\sum _{j=1}^{d_i^K}\frac{\tilde{L}_i^j(1-\delta ^2)}{4}\Vert \tilde{\mathbf{x}}_i^{j-1}-\tilde{\mathbf{x}}_i^j\Vert ^2\\ \ge&\ \sum _{i=1}^s\sum _{j=1}^{d_i^K}\frac{\ell (1-\delta ^2)}{4}\Vert \tilde{\mathbf{x}}_i^{j-1}-\tilde{\mathbf{x}}_i^j\Vert ^2, \end{aligned}$$

where we have used the fact $d_i^0=0,\forall i$ in the first equality, $\tilde{\mathbf{x}}_i^{-1}=\tilde{\mathbf{x}}_i^0,\forall i$ to have the second inequality, and $\tilde{L}_i^j\ge \ell , \forall i,j$ in the last inequality. Letting $K\rightarrow \infty $ and noting $d_i^K\rightarrow \infty $ for all i by Assumption 3, we conclude from the above inequality and the lower boundedness of F in Assumption 1 that

$$\begin{aligned} \sum _{i=1}^s\sum _{j=1}^\infty \Vert \tilde{{\mathbf {x}}}_i^{j-1}-\tilde{{\mathbf {x}}}_i^j\Vert ^2<\infty , \end{aligned}$$

which implies (15).

1.4 Proof of Proposition 2

From Corollary 5.20 and Example 5.23 of [52], we have that if ${\mathbf {prox}}_{\alpha _kr_i}$ is single valued near ${\mathbf {x}}_i^{k-1}-\alpha _k\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1})$, then ${\mathbf {prox}}_{\alpha _kr_i}$ is continuous at ${\mathbf {x}}_i^{k-1}-\alpha _k\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1})$. Let $\hat{{\mathbf {x}}}^k_i(\omega )$ explicitly denote the extrapolated point with weight $\omega $, namely, we take $\hat{{\mathbf {x}}}^k_i(\omega _k)$ in (6). In addition, let ${\mathbf {x}}^k_i(\omega )={\mathbf {prox}}_{\alpha _kr_i}\big (\hat{{\mathbf {x}}}_i^k(\omega )-\alpha _k\nabla _{{\mathbf {x}}_i}f(\mathbf{x}_{\ne i}^{k-1},\hat{\mathbf{x}}_i^k(\omega ))\big )$. Note that (14) implies

$$\begin{aligned} F({\mathbf {x}}^{k-1})-F({\mathbf {x}}^k(0))\ge \Vert {\mathbf {x}}^{k-1}-{\mathbf {x}}^k(0)\Vert ^2\overset{(19)}{>}0. \end{aligned}$$

(55)

From the optimality of ${\mathbf {x}}^k_i(\omega )$, it holds that

$$\begin{aligned}&\langle \nabla _{{\mathbf {x}}_i} f(\mathbf{x}_{\ne i}^{k-1},\hat{\mathbf{x}}_i^k(\omega )), \mathbf{x}_i^k(\omega )-\hat{\mathbf{x}}_i^k(\omega )\rangle +\frac{1}{2\alpha _k}\Vert \mathbf{x}_i^k(\omega )-\hat{\mathbf{x}}_i^k(\omega )\Vert ^2+r_i(\mathbf{x}_i^k(\omega ))\\&\qquad \le \langle \nabla _{{\mathbf {x}}_i} f(\mathbf{x}_{\ne i}^{k-1},\hat{\mathbf{x}}_i^k(\omega )), {\mathbf{x}}_i-\hat{\mathbf{x}}_i^k(\omega )\rangle +\frac{1}{2\alpha _k}\Vert {\mathbf{x}}_i-\hat{\mathbf{x}}_i^k(\omega )\Vert ^2+r_i({\mathbf{x}}_i),\,\forall {\mathbf {x}}_i. \end{aligned}$$

Taking limit superior on both sides of the above inequality, we have

$$\begin{aligned}&\langle \nabla _{{\mathbf {x}}_i} f(\mathbf{x}^{k-1}), \mathbf{x}_i^k(0)-{\mathbf{x}}_i^{k-1}\rangle +\frac{1}{2\alpha _k}\Vert \mathbf{x}_i^k(0)-{\mathbf{x}}_i^{k-1}\Vert ^2+\limsup _{\omega \rightarrow 0^+}r_i(\mathbf{x}_i^k(\omega ))\\&\qquad \le \langle \nabla _{{\mathbf {x}}_i} f(\mathbf{x}^{k-1}), {\mathbf{x}}_i-{\mathbf{x}}_i^{k-1}\rangle +\frac{1}{2\alpha _k}\Vert {\mathbf{x}}_i-{\mathbf{x}}_i^{k-1}\Vert ^2+r_i({\mathbf{x}}_i),\,\forall {\mathbf {x}}_i, \end{aligned}$$

which implies $\underset{\omega \rightarrow 0^+}{\limsup }\, r_i(\mathbf{x}_i^k(\omega ))\le r_i(\mathbf{x}_i^k(0))$. Since $r_i$ is lower semicontinuous, $\underset{\omega \rightarrow 0^+}{\liminf }\, r_i(\mathbf{x}_i^k(\omega ))\ge r_i(\mathbf{x}_i^k(0))$. Hence, $\underset{\omega \rightarrow 0^+}{\lim }r_i(\mathbf{x}_i^k(\omega ))= r_i(\mathbf{x}_i^k(0))$, and thus $\underset{\omega \rightarrow 0^+}{\lim }F(\mathbf{x}^k(\omega ))= F(\mathbf{x}^k(0))$. Together with (55), we conclude that there exists $\bar{\omega }_k>0$ such that $F({\mathbf {x}}^{k-1})-F(\mathbf{x}^k(\omega ))\ge 0,\,\forall \omega \in [0,\bar{\omega }_k]$. This completes the proof.

1.5 Proof of Lemma 2

Let ${\mathbf {a}}_m$ and ${\mathbf {u}}_m$ be the vectors with their ith entries

$$\begin{aligned} ({\mathbf {a}}_m)_i=\sqrt{\alpha _{i,n_{i,m}}},\quad ({\mathbf {u}}_m)_i=A_{i,n_{i,m}}. \end{aligned}$$

Then (21) can be written as

$$\begin{aligned}&\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert ^2+(1-\beta ^2)\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}\alpha _{i,j}A_{i,j}^2\nonumber \\&\quad \le \beta ^2\Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert ^2+B_m\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}}A_{i,j}. \end{aligned}$$

(56)

Recall

$$\begin{aligned} \underline{\alpha }=\inf _{i,j}\alpha _{i,j},\quad \overline{\alpha }=\sup _{i,j}\alpha _{i,j}. \end{aligned}$$

Then it follows from (56) that

$$\begin{aligned} \Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert ^2+\underline{\alpha }(1-\beta ^2)\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}^2 \le \beta ^2\Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert ^2+B_m\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}}A_{i,j}. \end{aligned}$$

(57)

By the Cauchy–Schwarz inequality and noting $n_{i,m+1}-n_{i,m}\le N,\forall i,m$, we have

$$\begin{aligned} \left( \sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}\right) ^2\le sN\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}^2 \end{aligned}$$

(58)

and for any positive $C_1$,

$$\begin{aligned}&(1+\beta )C_1\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert \left( \sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}\right) \nonumber \\&\qquad \le \sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}\left( \frac{4-(1+\beta )^2}{4sN}\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert ^2+\frac{(1+\beta )^2C_1^2sN}{4-(1+\beta )^2}A_{i,j}^2\right) \nonumber \\&\qquad \le \frac{4-(1+\beta )^2}{4}\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert ^2 + \frac{(1+\beta )^2C_1^2sN}{4-(1+\beta )^2}\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}^2. \end{aligned}$$

(59)

Taking

$$\begin{aligned} C_1 \le \sqrt{\frac{\underline{\alpha }(1-\beta ^2)(4-(1+\beta )^2)}{4sN}}, \end{aligned}$$

(60)

we have from (58) and (59) that

$$\begin{aligned}&\frac{1+\beta }{2}\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert +C_1\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}\nonumber \\&\qquad \le \sqrt{\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert ^2+\underline{\alpha }(1-\beta ^2)\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}^2}. \end{aligned}$$

(61)

For any $C_2>0$, it holds

$$\begin{aligned}&\sqrt{\beta ^2\Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert ^2+B_m\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}}A_{i,j}}\nonumber \\&\qquad \le \beta \Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert +\sqrt{B_m\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}}A_{i,j}}\nonumber \\&\qquad \le \beta \Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert +C_2B_m+\frac{1}{4C_2}\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}}A_{i,j}\nonumber \\&\qquad \le \beta \Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert +C_2B_m+\frac{1}{4C_2}\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}-1}A_{i,j}+\frac{\sqrt{s}}{4C_2}\Vert {\mathbf {u}}_m\Vert . \end{aligned}$$

(62)

Combining (57), (61), and (62), we have

$$\begin{aligned}&\frac{1+\beta }{2}\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert +C_1\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}\\&\qquad \le \beta \Vert {\mathbf {a}}_{m}\odot {\mathbf {u}}_{m}\Vert +C_2B_m+\frac{1}{4C_2}\sum _{i=1}^s\sum _{j=n_{i,m-1}+1}^{n_{i,m}-1}A_{i,j}+\frac{\sqrt{s}}{4C_2}\Vert {\mathbf {u}}_m\Vert . \end{aligned}$$

Summing the above inequality over m from $M_1$ through $M_2\le M$ and arranging terms gives

$$\begin{aligned}&\sum _{m=M_1}^{M_2}\left( \frac{1-\beta }{2}\Vert {\mathbf {a}}_{m+1}\odot {\mathbf {u}}_{m+1}\Vert -\frac{\sqrt{s}}{4C_2}\Vert {\mathbf {u}}_{m+1}\Vert \right) +\left( C_1-\frac{1}{4C_2}\right) \sum _{m=M_1}^{M_2}\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}\nonumber \\&\qquad \le \beta \Vert {\mathbf {a}}_{M_1}\odot {\mathbf {u}}_{M_1}\Vert +C_2\sum _{m=M_1}^{M_2} B_m +\frac{1}{4C_2}\sum _{i=1}^s\sum _{j=n_{i,M_1-1}+1}^{n_{i,M_1}-1}A_{i,j}+\frac{\sqrt{s}}{4C_2}\Vert {\mathbf {u}}_{M_1}\Vert \end{aligned}$$

(63)

Take

$$\begin{aligned} C_2=\max \left( \frac{1}{2C_1},\ \frac{\sqrt{s}}{\sqrt{\underline{\alpha }}(1-\beta )}\right) . \end{aligned}$$

(64)

Then (63) implies

$$\begin{aligned}&\frac{\sqrt{\underline{\alpha }}(1-\beta )}{4}\sum _{m=M_1}^{M_2}\Vert {\mathbf {u}}_{m+1}\Vert +\frac{C_1}{2}\sum _{m=M_1}^{M_2}\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}-1}A_{i,j}\nonumber \\&\qquad \le \beta \sqrt{\overline{\alpha }}\Vert {\mathbf {u}}_{M_1}\Vert +C_2\sum _{m=M_1}^{M_2} B_m+\frac{1}{4C_2}\sum _{i=1}^s\sum _{j=n_{i,M_1-1}+1}^{n_{i,M_1}-1}A_{i,j}+\frac{\sqrt{s}}{4C_2}\Vert {\mathbf {u}}_{M_1}\Vert , \end{aligned}$$

(65)

which together with $\sum _{i=1}^sA_{i,n_{i,m+1}}\le \sqrt{s}\Vert {\mathbf {u}}_{m+1}\Vert $ gives

$$\begin{aligned}&C_3\sum _{i=1}^s\sum _{j=n_{i,M_1}+1}^{n_{i,M_2+1}}A_{i,j}\nonumber \\&\quad = C_3\sum _{m=M_1}^{M_2}\sum _{i=1}^s\sum _{j=n_{i,m}+1}^{n_{i,m+1}}A_{i,j}\nonumber \\&\quad \le \beta \sqrt{\overline{\alpha }}\Vert {\mathbf {u}}_{M_1}\Vert +C_2\sum _{m=M_1}^{M_2} B_m+\frac{1}{4C_2}\sum _{i=1}^s\sum _{j=n_{i,M_1-1}+1}^{n_{i,M_1}-1}A_{i,j}+\frac{\sqrt{s}}{4C_2}\Vert {\mathbf {u}}_{M_1}\Vert ,\nonumber \\&\quad \le C_2\sum _{m=M_1}^{M_2} B_m+C_4\sum _{i=1}^s\sum _{j=n_{i,M_1-1}+1}^{n_{i,M_1}}A_{i,j}, \end{aligned}$$

(66)

where we have used $\Vert {\mathbf {u}}_{M_1}\Vert \le \sum _{i=1}^sA_{i,n_{i,M_1}}$, and

$$\begin{aligned} C_3=\min \left( \frac{\sqrt{\underline{\alpha }}(1-\beta )}{4\sqrt{s}},\frac{C_1}{2}\right) ,\quad C_4=\beta \sqrt{\overline{\alpha }}+\frac{\sqrt{s}}{4C_2}. \end{aligned}$$

(67)

From (60), (64), and (67), we can take

$$\begin{aligned} C_1=\frac{\sqrt{\underline{\alpha }}(1-\beta )}{2\sqrt{sN}}\le \min \left\{ \sqrt{\frac{\underline{\alpha }(1-\beta ^2)(4-(1+\beta )^2)}{4sN}},\ \frac{\sqrt{\underline{\alpha }}(1-\beta )}{2\sqrt{s}}\right\} , \end{aligned}$$

where the inequality can be verified by noting $(1-\beta ^2)(4-(1+\beta )^2)-(1-\beta )^2$ is decreasing with respect to $\beta $ in [0, 1]. Thus from (64) and (67), we have $C_2=\frac{1}{2C_1},\, C_3=\frac{C_1}{2},\, C_4=\beta \sqrt{\overline{\alpha }}+\frac{\sqrt{s}C_1}{2}$. Hence, from (66), we complete the proof of (22).

If $\lim _{m\rightarrow \infty }n_{i,m}=\infty ,\forall i$, $\sum _{m=1}^\infty B_m<\infty $, and (21) holds for all m, letting $M_1=1$ and $M_2\rightarrow \infty $, we have (23) from (66).

1.6 Proof of Proposition 3

For any i, assume that while updating the ith block to ${\mathbf {x}}_i^k$, the value of the jth block ($j\ne i$) is ${\mathbf {y}}_j^{(i)}$, the extrapolated point of the ith block is ${\mathbf {z}}_i$, and the Lipschitz constant of $\nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {x}}_i)$ with respect to ${\mathbf {x}}_i$ is $\tilde{L}_i$, namely,

$$\begin{aligned} {\mathbf {x}}_i^k\in {{\mathrm{\hbox {arg min}}}}_{{\mathbf {x}}_i}\langle \nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i),{\mathbf {x}}_i-{\mathbf {z}}_i\rangle +\tilde{L}_i\Vert {\mathbf {x}}_i-{\mathbf {z}}_i\Vert ^2+r_i({\mathbf {x}}_i). \end{aligned}$$

Hence, $\mathbf {0}\in \nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)+2\tilde{L}_i({\mathbf {x}}_i^k-{\mathbf {z}}_i)+\partial r_i({\mathbf {x}}_i^k),$ or equivalently,

$$\begin{aligned} \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)-2\tilde{L}_i({\mathbf {x}}_i^k-{\mathbf {z}}_i)\in \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^k)+\partial r_i({\mathbf {x}}_i^k),\,\forall i. \end{aligned}$$

(68)

Note that ${\mathbf {x}}_i$ may be updated to ${\mathbf {x}}_i^k$ not at the kth iteration but at some earlier one, which must be between $k-T$ and k by Assumption 3. In addition, for each pair (i, j), there must be some $\kappa _{i,j}$ between $k-2T$ and k such that

$$\begin{aligned} {\mathbf {y}}_j^{(i)}={\mathbf {x}}_j^{\kappa _{i,j}}, \end{aligned}$$

(69)

and for each i, there are $k-3T\le \kappa _1^i<\kappa _2^i\le k$ and extrapolation weight $\tilde{\omega }_i\le 1$ such that

$$\begin{aligned} {\mathbf {z}}_i={\mathbf {x}}_i^{\kappa _2^i}+\tilde{\omega }_i({\mathbf {x}}_i^{\kappa _2^i}- {\mathbf {x}}_i^{\kappa _1^i}). \end{aligned}$$

(70)

By triangle inequality, $({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)\in B_{4\rho }(\bar{{\mathbf {x}}})$ for all i. Therefore, it follows from (10) and (68) that

$$\begin{aligned} \mathrm {dist}(\mathbf {0},\partial F({\mathbf {x}}^k))\overset{(68)}{\le }&\sqrt{\sum _{i=1}^s\Vert \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)-2\tilde{L}_i({\mathbf {x}}_i^k-{\mathbf {z}}_i)\Vert ^2}\nonumber \\ \le&\sum _{i=1}^s\Vert \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)-2\tilde{L}_i({\mathbf {x}}_i^k-{\mathbf {z}}_i)\Vert \nonumber \\ \le&\sum _{i=1}^s\left( \Vert \nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^k)-\nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)\Vert +2\tilde{L}_i\Vert {\mathbf {x}}_i^k-{\mathbf {z}}_i\Vert \right) \nonumber \\ \le&\sum _{i=1}^s\left( L_G\Vert {\mathbf {x}}^k-({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)\Vert +2\tilde{L}_i\Vert {\mathbf {x}}_i^k-{\mathbf {z}}_i\Vert \right) \nonumber \\ \le&\sum _{i=1}^s\left( (L_G+2L)\Vert {\mathbf {x}}_i^k-{\mathbf {z}}_i\Vert +L_G\sum _{j\ne i}\Vert {\mathbf {x}}_j^k-{\mathbf {y}}_j^{(i)}\Vert \right) , \end{aligned}$$

(71)

where in the fourth inequality, we have used the Lipschitz continuity of $\nabla _{{\mathbf {x}}_i}f({\mathbf {x}})$ with respect to ${\mathbf {x}}$, and the last inequality uses $\tilde{L}_i\le L$. Now use (71), (69), (70) and also the triangle inequality to have the desired result.

1.7 Proof of Lemma 3

The proof follows that of Theorem 2 of [3]. When $\gamma \ge 1$, since $0\le A_{k-1}-A_k\le 1,\forall k\ge K$, we have $(A_{k-1}-A_k)^\gamma \le A_{k-1}-A_k$, and thus (33) implies that for all $k\ge K$, it holds that $A_k\le (\alpha +\beta )(A_{k-1}-A_k)$, from which item 1 immediately follows.

When $\gamma <1$, we have $(A_{k-1}-A_k)^\gamma \ge A_{k-1}-A_k$, and thus (33) implies that for all $k\ge K$, it holds that $A_k\le (\alpha +\beta )(A_{k-1}-A_k)^\gamma $. Letting $h(x)=x^{-1/\gamma }$, we have for $k\ge K$,

$$\begin{aligned} 1&\le (\alpha +\beta )^{1/\gamma }(A_{k-1}-A_k)A_k^{-1/\gamma }\\&= (\alpha +\beta )^{1/\gamma }\left( \frac{A_{k-1}}{A_k}\right) ^{1/\gamma }(A_{k-1}-A_k)A_{k-1}^{-1/\gamma }\\&\le (\alpha +\beta )^{1/\gamma }\left( \frac{A_{k-1}}{A_k}\right) ^{1/\gamma }\int _{A_k}^{A_{k-1}}h(x)dx\\&=\frac{(\alpha +\beta )^{1/\gamma }}{1-1/\gamma }\left( \frac{A_{k-1}}{A_k}\right) ^{1/\gamma }\left( A_{k-1}^{1-1/\gamma }-A_k^{1-1/\gamma }\right) , \end{aligned}$$

where we have used nonincreasing monotonicity of h in the second inequality. Hence,

$$\begin{aligned} A_{k}^{1-1/\gamma }-A_{k-1}^{1-1/\gamma }\ge \frac{1/\gamma -1}{(\alpha +\beta )^{1/\gamma }}\left( \frac{A_{k}}{A_{k-1}}\right) ^{1/\gamma }. \end{aligned}$$

(72)

Let $\mu $ be the positive constant such that

$$\begin{aligned} \frac{1/\gamma -1}{(\alpha +\beta )^{1/\gamma }}\mu =\mu ^{\gamma -1}-1. \end{aligned}$$

(73)

Note that the above equation has a unique solution $0<\mu <1$. We claim that

$$\begin{aligned} A_{k}^{1-1/\gamma }-A_{k-1}^{1-1/\gamma }\ge \mu ^{\gamma -1}-1,\ \forall k\ge K. \end{aligned}$$

(74)

It obviously holds from (72) and (73) if $\big (\frac{A_{k}}{A_{k-1}}\big )^{1/\gamma }\ge \mu $. It also holds if $\big (\frac{A_{k}}{A_{k-1}}\big )^{1/\gamma }\le \mu $ from the arguments

$$\begin{aligned} \left( \frac{A_{k}}{A_{k-1}}\right) ^{1/\gamma }\le \mu \Rightarrow&A_k\le \mu ^\gamma A_{k-1}\Rightarrow A_k^{1-1/\gamma }\ge \mu ^{\gamma -1}A_{k-1}^{1-1/\gamma }\\ \Rightarrow&A_{k}^{1-1/\gamma }-A_{k-1}^{1-1/\gamma }\ge (\mu ^{\gamma -1}-1)A_{k-1}^{1-1/\gamma } \ge \mu ^{\gamma -1}-1, \end{aligned}$$

where the last inequality is from $A_{k-1}^{1-1/\gamma }\ge 1$. Hence, (74) holds, and summing it over k gives

$$\begin{aligned} A_k^{1-1/\gamma }\ge A_k^{1-1/\gamma }-A_K^{1-1/\gamma }\ge (\mu ^{\gamma -1}-1)(k-K), \end{aligned}$$

which immediately gives item 2 by letting $\nu =(\mu ^{\gamma -1}-1)^{\frac{\gamma }{\gamma -1}}$.

Appendix 2: Solutions of (46)

In this section, we give closed form solutions to both updates in (46). First, it is not difficult to have the solution of (46b):

$$\begin{aligned} {\mathbf {y}}_{\pi _i}^{k+1}=\max \left( 0,\big ({\mathbf {X}}_{\pi _{<i}}^{k+1}({\mathbf {Y}}_{\pi _{<i}}^{k+1})^\top +{\mathbf {X}}_{\pi _{>i}}^{k}({\mathbf {Y}}_{\pi _{>i}}^{k})^\top -{\mathbf {M}}\big )^\top {\mathbf {x}}_{\pi _i}^{k+1}\right) . \end{aligned}$$

Secondly, since $L_{\pi _i}^k>0$, it is easy to write (46a) in the form of

$$\begin{aligned} \min _{{\mathbf {x}}\ge 0,\,\Vert {\mathbf {x}}\Vert =1}\frac{1}{2}\Vert {\mathbf {x}}-{\mathbf {a}}\Vert ^2+{\mathbf {b}}^\top {\mathbf {x}}+C, \end{aligned}$$

which is apparently equivalent to

$$\begin{aligned} \max _{{\mathbf {x}}\ge 0,\,\Vert {\mathbf {x}}\Vert =1} {\mathbf {c}}^\top {\mathbf {x}}, \end{aligned}$$

(75)

which ${\mathbf {c}}={\mathbf {a}}-{\mathbf {b}}$. Next we give solution to (75) in three different cases.

Case 1 ${\mathbf {c}}<0$. Let $i_0={{\mathrm{\hbox {arg max}}}}_i c_i$ and $c_{\max }=c_{i_0}<0$. If there are more than one components equal $c_{\max }$, one can choose an arbitrary one of them. Then the solution to (75) is given by $x_{i_0}=1$ and $x_i=0,\forall i\ne i_0$ because for any ${\mathbf {x}}\ge 0$ and $\Vert {\mathbf {x}}\Vert =1$, it holds that

$$\begin{aligned} {\mathbf {c}}^\top {\mathbf {x}}\le c_{\max }\Vert {\mathbf {x}}\Vert _1\le c_{\max }\Vert {\mathbf {x}}\Vert =c_{\max }. \end{aligned}$$

Case 2 ${\mathbf {c}}\le 0$ and ${\mathbf {c}}\not <0$. Let ${\mathbf {c}}=({\mathbf {c}}_{I_0},{\mathbf {c}}_{I_-})$ where ${\mathbf {c}}_{I_0}=\mathbf {0}$ and ${\mathbf {c}}_{I_-}<0$. Then the solution to (75) is given by ${\mathbf {x}}_{I_-}=\mathbf {0}$ and ${\mathbf {x}}_{I_0}$ being any vector that satisfies ${\mathbf {x}}_{I_0}\ge 0$ and $\Vert {\mathbf {x}}_{I_0}\Vert =1$ because ${\mathbf {c}}^\top {\mathbf {x}}\le 0$ for any ${\mathbf {x}}\ge 0$.

Case 3 ${\mathbf {c}}\not \le 0$ Let ${\mathbf {c}}=({\mathbf {c}}_{I_+},{\mathbf {c}}_{I_+^c})$ where ${\mathbf {c}}_{I_+}>0$ and ${\mathbf {c}}_{I_+^c}\le 0$. Then (75) has a unique solution given by ${\mathbf {x}}_{I_+}=\frac{{\mathbf {c}}_{I_+}}{\Vert {\mathbf {c}}_{I_+}\Vert }$ and ${\mathbf {x}}_{I_+^c}=\mathbf {0}$ because for any ${\mathbf {x}}\ge 0$ and $\Vert {\mathbf {x}}\Vert =1$, it holds that

$$\begin{aligned} {\mathbf {c}}^\top {\mathbf {x}}\le {\mathbf {c}}_{I_+}^\top {\mathbf {x}}_{I_+}\le \Vert {\mathbf {c}}_{I_+}\Vert \cdot \Vert {\mathbf {x}}_{I_+}\Vert \le \Vert {\mathbf {c}}_{I_+}\Vert \cdot \Vert {\mathbf {x}}\Vert =\Vert {\mathbf {c}}_{I_+}\Vert , \end{aligned}$$

where the second inequality holds with equality if and only if ${\mathbf {x}}_{I_+}$ is collinear with ${\mathbf {c}}_{I_+}$, and the third inequality holds with equality if and only if ${\mathbf {x}}_{I_+^c}=\mathbf {0}$.

Appendix 3: Proofs of Convergence of Some Examples

In this section, we give the proofs of the theorems in Sect.3.

1.1 Proof of Theorem 6

Through checking the assumptions of Theorem 2, we only need to verify the boundedness of $\{{\mathbf {Y}}^k\}$ to show Theorem 6. Let ${\mathbf {E}}^k={\mathbf {X}}^k({\mathbf {Y}}^k)^\top -{\mathbf {M}}$. Since every iteration decreases the objective, it is easy to see that $\{{\mathbf {E}}^k\}$ is bounded. Hence, $\{{\mathbf {E}}^k+{\mathbf {M}}\}$ is bounded, and

$$\begin{aligned} a=\sup _k\max _{i,j}({\mathbf {E}}^k+{\mathbf {M}})_{ij}<\infty . \end{aligned}$$

Let $y_{ij}^k$ be the (i, j)th entry of ${\mathbf {Y}}^k$. Thus the columns of ${\mathbf {E}}^k+{\mathbf {M}}$ satisfy

$$\begin{aligned} a\ge {\mathbf {e}}_i^k+{\mathbf {m}}_i=\sum _{j=1}^p y_{ij}^k {\mathbf {x}}_j^k,\,\forall i, \end{aligned}$$

(76)

where ${\mathbf {x}}_j^k$ is the jth column of ${\mathbf {X}}^k$. Since $\Vert {\mathbf {x}}_j^k\Vert =1$, we have $\Vert {\mathbf {x}}_j^k\Vert _\infty \ge 1/\sqrt{m},\,\forall j$. Note that (76) implies each component of $\sum _{j=1}^p y_{ij}^k {\mathbf {x}}_j^k$ is no greater than a. Hence from nonnegativity of ${\mathbf {X}}^k$ and ${\mathbf {Y}}^k$ and noting that at least one entry of ${\mathbf {x}}_j^k$ is no less than $1/\sqrt{m}$, we have $y_{ij}^k\le a\sqrt{m}$ for all i, j and k. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Y., Yin, W. A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update. J Sci Comput 72, 700–734 (2017). https://doi.org/10.1007/s10915-017-0376-0

Download citation

Received: 14 March 2016
Revised: 31 October 2016
Accepted: 24 January 2017
Published: 04 February 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10915-017-0376-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Abstract

Access this article

Similar content being viewed by others

Efficiency of higher-order algorithms for minimizing composite functions

Preconditioned golden ratio primal-dual algorithm with linesearch

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Proofs of Key Lemmas

1.1 Proof of Lemma 1

1.2 Proof of the Claim in Remark 2

1.3 Proof of Proposition 1

1.4 Proof of Proposition 2

1.5 Proof of Lemma 2

1.6 Proof of Proposition 3

1.7 Proof of Lemma 3

Appendix 2: Solutions of (46)

Appendix 3: Proofs of Convergence of Some Examples

1.1 Proof of Theorem 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Abstract

Access this article

Similar content being viewed by others

Efficiency of higher-order algorithms for minimizing composite functions

Preconditioned golden ratio primal-dual algorithm with linesearch

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Proofs of Key Lemmas

1.1 Proof of Lemma 1

1.2 Proof of the Claim in Remark 2

1.3 Proof of Proposition 1

1.4 Proof of Proposition 2

1.5 Proof of Lemma 2

1.6 Proof of Proposition 3

1.7 Proof of Lemma 3

Appendix 2: Solutions of (46)

Appendix 3: Proofs of Convergence of Some Examples

1.1 Proof of Theorem 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation