Abstract
Nonconvex optimization arises in many areas of computational science and engineering. However, most nonconvex optimization algorithms are only known to have local convergence or subsequence convergence properties. In this paper, we propose an algorithm for nonconvex optimization and establish its global convergence (of the whole sequence) to a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate its efficiency. In our algorithm, the variables of the underlying problem are either treated as one block or multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function, or each constraint, applies only to one block of variables. The differentiable components of the objective function, however, can involve multiple blocks of variables together. Our algorithm updates one block of variables at a time by minimizing a certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The order of update can be either deterministically cyclic or randomly shuffled for each cycle. In fact, our convergence analysis only needs that each block be updated at least once in every fixed number of iterations. We show its global convergence (of the whole sequence) to a critical point under fairly loose conditions including, in particular, the Kurdyka–Łojasiewicz condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These results, of course, remain valid when the underlying problem is convex. We apply our convergence results to the coordinate descent iteration for non-convex regularized linear regression, as well as a modified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically, we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves the chance to avoid low-quality local solutions.
Similar content being viewed by others
Notes
A function f is proximable if it is easy to obtain the minimizer of \(f(x)+\frac{1}{2\gamma }\Vert x-y\Vert ^2\) for any input y and \(\gamma >0\).
A function F on \(\mathbb {R}^n\) is differentiable at point \({\mathbf {x}}\) if there exists a vector \({\mathbf {g}}\) such that \(\lim _{{\mathbf {h}}\rightarrow 0}\frac{|F({\mathbf {x}}+{\mathbf {h}})-F({\mathbf {x}})-{\mathbf {g}}^\top {\mathbf {h}}|}{\Vert {\mathbf {h}}\Vert }=0\)
Note that from Remark 2, for convex problems, we can take larger extrapolation weight but require it to be uniformly less than one. Hence, although our algorithm framework includes FISTA as a special case, our whole sequence convergence result does not imply that of FISTA.
Another restarting option is tested based on gradient information.
References
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Allen, G.: Sparse higher-order principal components analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 27–36. (2012)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Bagirov, A.M., Jin, L., Karmitsa, N., Al Nuaimat, A., Sultanova, N.: Subgradient method for nonconvex nonsmooth optimization. J. Optim. Theory Appl. 157(2), 416–435 (2013)
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)
Bolte, J., Daniilidis, A., Lewis, A.: The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. 146(1), 459–494 (2014)
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–253 (2011)
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Chang, K.W., Hsieh, C.J., Lin, C.J.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 3869–3872. IEEE (2008)
Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134(1), 71–99 (2012)
Donoho, D., Stodden, V.: When does non-negative matrix factorization give a correct decomposition into parts. In: Advances in Neural Information Processing Systems, vol. 16. (2003)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Fuduli, A., Gaudioso, M., Giallombardo, G.: Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. 14(3), 743–756 (2004)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1), 59–99 (2016)
Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)
Hildreth, C.: A quadratic programming procedure. Naval Res. Logist. Q. 4(1), 79–85 (1957)
Ho, N., Van Dooren, P., Blondel, V.: Descent methods for nonnegative matrix factorization. In: Numerical Linear Algebra in Signals, Systems and Control, pp. 251–293. Springer, Netherlands (2011)
Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.: Iteration complexity analysis of block coordinate descent methods. arXiv preprint arXiv:1310.6957 (2013)
Hoyer, P.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)
Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455 (2009)
Kruger, A.Y.: On fréchet subdifferentials. J. Math. Sci. 116(3), 3325–3358 (2003)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier. 48(3), 769–783 (1998)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Lojasiewicz inequality and its applications to linear convergence of first-order methods. arXiv preprint arXiv:1602.02915 (2016)
Ling, Q., Xu, Y., Yin, W., Wen, Z.: Decentralized low-rank matrix completion. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 2925–2928. IEEE (2012)
Łojasiewicz, S.: Sur la géométrie semi-et sous-analytique. Ann. Inst. Fourier (Grenoble) 43(5), 1575–1595 (1993)
Lu, Z., Xiao, L.: Randomized block coordinate non-monotone gradient method for a class of nonlinear programming. arXiv preprint arXiv:1306.5918 (2013)
Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 689–696. ACM (2009)
Mohan, K., Fazel, M.: Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. 13(1), 3441–3473 (2012)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Nesterov, Y.: Introductory lectures on convex optimization: a basic course, vol. 87. Springer Science & Business Media, Berlin (2013)
Nocedal, J., Wright, S.J.: Numerical Optimization, Springer Series in Operations Research and Financial Engineering., 2nd edn. Springer, New York (2006)
O’Donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2013)
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1), 1–38 (2014)
Rockafellar, R., Wets, R.: Variational Analysis, vol. 317. Springer, Berlin (2009)
Saha, A., Tewari, A.: On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)
Shi, H.J.M., Tu, S., Xu, Y., Yin, W.: A primer on coordinate descent algorithms. arXiv preprint arXiv:1610.00040 (2016)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Welling, M., Weber, M.: Positive tensor factorization. Pattern Recogn. Lett. 22(12), 1255–1261 (2001)
Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4(4), 333–361 (2012)
Xu, Y.: Alternating proximal gradient method for sparse nonnegative tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)
Xu, Y., Akrotirianakis, I., Chakraborty, A.: Proximal gradient method for huberized support vector machine. Pattern Anal. Appl. 19(4), 989–1005 (2016)
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Xu, Y., Yin, W.: A fast patch-dictionary method for whole image recovery. Inverse Probl. Imaging 10(2), 563–583 (2016)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Acknowledgements
Funding was provided in part by National Science Foundation (Grant No. DMS-1317602 and EECS-1462397) and Office of Naval Research (Grant No. N000141712162).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported in part by NSF DMS-1317602, EECS-1462397, and ONR N000141712162.
Appendices
Appendix 1: Proofs of Key Lemmas
In this section, we give proofs of the lemmas and also propositions we used.
1.1 Proof of Lemma 1
We show the general case of \(\alpha _k=\frac{1}{\gamma L_k},\forall k\) and \(\tilde{\omega }_i^j\le \frac{\delta (\gamma -1)}{2(\gamma +1)}\sqrt{\tilde{L}_i^{j-1}/\tilde{L}_i^{j}},\,\forall i,j\). Assume \(b_k=i\). From the Lipschitz continuity of \(\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},{\mathbf {x}}_i)\) about \({\mathbf {x}}_i\), it holds that (e.g., see Lemma 2.1 in [61])
Since \({\mathbf {x}}_i^{k}\) is the minimizer of (2), then
Summing (53) and (54) and noting that \({\mathbf {x}}_j^{k+1}={\mathbf {x}}_j^k,\forall j\ne i\), we have
Here, we have used Cauchy–Schwarz inequality in the second inequality, Lipschitz continuity of \(\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}_{\ne i}^{k-1},{\mathbf {x}}_i)\) in the third one, the Young’s inequality in the fourth one, the fact \({\mathbf {x}}_i^{k-1}=\tilde{{\mathbf {x}}}_i^{d_i^k-1}\) to have the third equality, and \(\alpha _k=\frac{1}{\gamma L_k}\) to get the last equality. Substituting \(\tilde{\omega }_i^j\le \frac{\delta (\gamma -1)}{2(\gamma +1)}\sqrt{\tilde{L}_i^{j-1}/\tilde{L}_i^{j}}\) and recalling (8) completes the proof.
1.2 Proof of the Claim in Remark 2
Assume \(b_k=i\) and \(\alpha _k=\frac{1}{L_k}\). When f is block multi-convex and \(r_i\) is convex, from Lemma 2.1 of [61], it follows that
Hence, if \(\omega _k\le \delta \sqrt{\tilde{L}_i^{j-1}/\tilde{L}_i^j}\), we have the desired result.
1.3 Proof of Proposition 1
Summing (14) over k from 1 to K gives
where we have used the fact \(d_i^0=0,\forall i\) in the first equality, \(\tilde{\mathbf{x}}_i^{-1}=\tilde{\mathbf{x}}_i^0,\forall i\) to have the second inequality, and \(\tilde{L}_i^j\ge \ell , \forall i,j\) in the last inequality. Letting \(K\rightarrow \infty \) and noting \(d_i^K\rightarrow \infty \) for all i by Assumption 3, we conclude from the above inequality and the lower boundedness of F in Assumption 1 that
which implies (15).
1.4 Proof of Proposition 2
From Corollary 5.20 and Example 5.23 of [52], we have that if \({\mathbf {prox}}_{\alpha _kr_i}\) is single valued near \({\mathbf {x}}_i^{k-1}-\alpha _k\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1})\), then \({\mathbf {prox}}_{\alpha _kr_i}\) is continuous at \({\mathbf {x}}_i^{k-1}-\alpha _k\nabla _{{\mathbf {x}}_i}f({\mathbf {x}}^{k-1})\). Let \(\hat{{\mathbf {x}}}^k_i(\omega )\) explicitly denote the extrapolated point with weight \(\omega \), namely, we take \(\hat{{\mathbf {x}}}^k_i(\omega _k)\) in (6). In addition, let \({\mathbf {x}}^k_i(\omega )={\mathbf {prox}}_{\alpha _kr_i}\big (\hat{{\mathbf {x}}}_i^k(\omega )-\alpha _k\nabla _{{\mathbf {x}}_i}f(\mathbf{x}_{\ne i}^{k-1},\hat{\mathbf{x}}_i^k(\omega ))\big )\). Note that (14) implies
From the optimality of \({\mathbf {x}}^k_i(\omega )\), it holds that
Taking limit superior on both sides of the above inequality, we have
which implies \(\underset{\omega \rightarrow 0^+}{\limsup }\, r_i(\mathbf{x}_i^k(\omega ))\le r_i(\mathbf{x}_i^k(0))\). Since \(r_i\) is lower semicontinuous, \(\underset{\omega \rightarrow 0^+}{\liminf }\, r_i(\mathbf{x}_i^k(\omega ))\ge r_i(\mathbf{x}_i^k(0))\). Hence, \(\underset{\omega \rightarrow 0^+}{\lim }r_i(\mathbf{x}_i^k(\omega ))= r_i(\mathbf{x}_i^k(0))\), and thus \(\underset{\omega \rightarrow 0^+}{\lim }F(\mathbf{x}^k(\omega ))= F(\mathbf{x}^k(0))\). Together with (55), we conclude that there exists \(\bar{\omega }_k>0\) such that \(F({\mathbf {x}}^{k-1})-F(\mathbf{x}^k(\omega ))\ge 0,\,\forall \omega \in [0,\bar{\omega }_k]\). This completes the proof.
1.5 Proof of Lemma 2
Let \({\mathbf {a}}_m\) and \({\mathbf {u}}_m\) be the vectors with their ith entries
Then (21) can be written as
Recall
Then it follows from (56) that
By the Cauchy–Schwarz inequality and noting \(n_{i,m+1}-n_{i,m}\le N,\forall i,m\), we have
and for any positive \(C_1\),
Taking
we have from (58) and (59) that
For any \(C_2>0\), it holds
Combining (57), (61), and (62), we have
Summing the above inequality over m from \(M_1\) through \(M_2\le M\) and arranging terms gives
Take
Then (63) implies
which together with \(\sum _{i=1}^sA_{i,n_{i,m+1}}\le \sqrt{s}\Vert {\mathbf {u}}_{m+1}\Vert \) gives
where we have used \(\Vert {\mathbf {u}}_{M_1}\Vert \le \sum _{i=1}^sA_{i,n_{i,M_1}}\), and
From (60), (64), and (67), we can take
where the inequality can be verified by noting \((1-\beta ^2)(4-(1+\beta )^2)-(1-\beta )^2\) is decreasing with respect to \(\beta \) in [0, 1]. Thus from (64) and (67), we have \(C_2=\frac{1}{2C_1},\, C_3=\frac{C_1}{2},\, C_4=\beta \sqrt{\overline{\alpha }}+\frac{\sqrt{s}C_1}{2}\). Hence, from (66), we complete the proof of (22).
If \(\lim _{m\rightarrow \infty }n_{i,m}=\infty ,\forall i\), \(\sum _{m=1}^\infty B_m<\infty \), and (21) holds for all m, letting \(M_1=1\) and \(M_2\rightarrow \infty \), we have (23) from (66).
1.6 Proof of Proposition 3
For any i, assume that while updating the ith block to \({\mathbf {x}}_i^k\), the value of the jth block (\(j\ne i\)) is \({\mathbf {y}}_j^{(i)}\), the extrapolated point of the ith block is \({\mathbf {z}}_i\), and the Lipschitz constant of \(\nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {x}}_i)\) with respect to \({\mathbf {x}}_i\) is \(\tilde{L}_i\), namely,
Hence, \(\mathbf {0}\in \nabla _{{\mathbf {x}}_i}f({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)+2\tilde{L}_i({\mathbf {x}}_i^k-{\mathbf {z}}_i)+\partial r_i({\mathbf {x}}_i^k),\) or equivalently,
Note that \({\mathbf {x}}_i\) may be updated to \({\mathbf {x}}_i^k\) not at the kth iteration but at some earlier one, which must be between \(k-T\) and k by Assumption 3. In addition, for each pair (i, j), there must be some \(\kappa _{i,j}\) between \(k-2T\) and k such that
and for each i, there are \(k-3T\le \kappa _1^i<\kappa _2^i\le k\) and extrapolation weight \(\tilde{\omega }_i\le 1\) such that
By triangle inequality, \(({\mathbf {y}}_{\ne i}^{(i)},{\mathbf {z}}_i)\in B_{4\rho }(\bar{{\mathbf {x}}})\) for all i. Therefore, it follows from (10) and (68) that
where in the fourth inequality, we have used the Lipschitz continuity of \(\nabla _{{\mathbf {x}}_i}f({\mathbf {x}})\) with respect to \({\mathbf {x}}\), and the last inequality uses \(\tilde{L}_i\le L\). Now use (71), (69), (70) and also the triangle inequality to have the desired result.
1.7 Proof of Lemma 3
The proof follows that of Theorem 2 of [3]. When \(\gamma \ge 1\), since \(0\le A_{k-1}-A_k\le 1,\forall k\ge K\), we have \((A_{k-1}-A_k)^\gamma \le A_{k-1}-A_k\), and thus (33) implies that for all \(k\ge K\), it holds that \(A_k\le (\alpha +\beta )(A_{k-1}-A_k)\), from which item 1 immediately follows.
When \(\gamma <1\), we have \((A_{k-1}-A_k)^\gamma \ge A_{k-1}-A_k\), and thus (33) implies that for all \(k\ge K\), it holds that \(A_k\le (\alpha +\beta )(A_{k-1}-A_k)^\gamma \). Letting \(h(x)=x^{-1/\gamma }\), we have for \(k\ge K\),
where we have used nonincreasing monotonicity of h in the second inequality. Hence,
Let \(\mu \) be the positive constant such that
Note that the above equation has a unique solution \(0<\mu <1\). We claim that
It obviously holds from (72) and (73) if \(\big (\frac{A_{k}}{A_{k-1}}\big )^{1/\gamma }\ge \mu \). It also holds if \(\big (\frac{A_{k}}{A_{k-1}}\big )^{1/\gamma }\le \mu \) from the arguments
where the last inequality is from \(A_{k-1}^{1-1/\gamma }\ge 1\). Hence, (74) holds, and summing it over k gives
which immediately gives item 2 by letting \(\nu =(\mu ^{\gamma -1}-1)^{\frac{\gamma }{\gamma -1}}\).
Appendix 2: Solutions of (46)
In this section, we give closed form solutions to both updates in (46). First, it is not difficult to have the solution of (46b):
Secondly, since \(L_{\pi _i}^k>0\), it is easy to write (46a) in the form of
which is apparently equivalent to
which \({\mathbf {c}}={\mathbf {a}}-{\mathbf {b}}\). Next we give solution to (75) in three different cases.
Case 1 \({\mathbf {c}}<0\). Let \(i_0={{\mathrm{\hbox {arg max}}}}_i c_i\) and \(c_{\max }=c_{i_0}<0\). If there are more than one components equal \(c_{\max }\), one can choose an arbitrary one of them. Then the solution to (75) is given by \(x_{i_0}=1\) and \(x_i=0,\forall i\ne i_0\) because for any \({\mathbf {x}}\ge 0\) and \(\Vert {\mathbf {x}}\Vert =1\), it holds that
Case 2 \({\mathbf {c}}\le 0\) and \({\mathbf {c}}\not <0\). Let \({\mathbf {c}}=({\mathbf {c}}_{I_0},{\mathbf {c}}_{I_-})\) where \({\mathbf {c}}_{I_0}=\mathbf {0}\) and \({\mathbf {c}}_{I_-}<0\). Then the solution to (75) is given by \({\mathbf {x}}_{I_-}=\mathbf {0}\) and \({\mathbf {x}}_{I_0}\) being any vector that satisfies \({\mathbf {x}}_{I_0}\ge 0\) and \(\Vert {\mathbf {x}}_{I_0}\Vert =1\) because \({\mathbf {c}}^\top {\mathbf {x}}\le 0\) for any \({\mathbf {x}}\ge 0\).
Case 3 \({\mathbf {c}}\not \le 0\) Let \({\mathbf {c}}=({\mathbf {c}}_{I_+},{\mathbf {c}}_{I_+^c})\) where \({\mathbf {c}}_{I_+}>0\) and \({\mathbf {c}}_{I_+^c}\le 0\). Then (75) has a unique solution given by \({\mathbf {x}}_{I_+}=\frac{{\mathbf {c}}_{I_+}}{\Vert {\mathbf {c}}_{I_+}\Vert }\) and \({\mathbf {x}}_{I_+^c}=\mathbf {0}\) because for any \({\mathbf {x}}\ge 0\) and \(\Vert {\mathbf {x}}\Vert =1\), it holds that
where the second inequality holds with equality if and only if \({\mathbf {x}}_{I_+}\) is collinear with \({\mathbf {c}}_{I_+}\), and the third inequality holds with equality if and only if \({\mathbf {x}}_{I_+^c}=\mathbf {0}\).
Appendix 3: Proofs of Convergence of Some Examples
In this section, we give the proofs of the theorems in Sect.3.
1.1 Proof of Theorem 6
Through checking the assumptions of Theorem 2, we only need to verify the boundedness of \(\{{\mathbf {Y}}^k\}\) to show Theorem 6. Let \({\mathbf {E}}^k={\mathbf {X}}^k({\mathbf {Y}}^k)^\top -{\mathbf {M}}\). Since every iteration decreases the objective, it is easy to see that \(\{{\mathbf {E}}^k\}\) is bounded. Hence, \(\{{\mathbf {E}}^k+{\mathbf {M}}\}\) is bounded, and
Let \(y_{ij}^k\) be the (i, j)th entry of \({\mathbf {Y}}^k\). Thus the columns of \({\mathbf {E}}^k+{\mathbf {M}}\) satisfy
where \({\mathbf {x}}_j^k\) is the jth column of \({\mathbf {X}}^k\). Since \(\Vert {\mathbf {x}}_j^k\Vert =1\), we have \(\Vert {\mathbf {x}}_j^k\Vert _\infty \ge 1/\sqrt{m},\,\forall j\). Note that (76) implies each component of \(\sum _{j=1}^p y_{ij}^k {\mathbf {x}}_j^k\) is no greater than a. Hence from nonnegativity of \({\mathbf {X}}^k\) and \({\mathbf {Y}}^k\) and noting that at least one entry of \({\mathbf {x}}_j^k\) is no less than \(1/\sqrt{m}\), we have \(y_{ij}^k\le a\sqrt{m}\) for all i, j and k. This completes the proof.
Rights and permissions
About this article
Cite this article
Xu, Y., Yin, W. A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update. J Sci Comput 72, 700–734 (2017). https://doi.org/10.1007/s10915-017-0376-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-017-0376-0