Abstract
In this paper, we establish the convergence of the proximal alternating direction method of multipliers (ADMM) and block coordinate descent (BCD) method for nonseparable minimization models with quadratic coupling terms. The novel convergence results presented in this paper answer several open questions that have been the subject of considerable discussion. We firstly extend the 2-block proximal ADMM to linearly constrained convex optimization with a coupled quadratic objective function, an area where theoretical understanding is currently lacking, and prove that the sequence generated by the proximal ADMM converges in point-wise manner to a primal-dual solution pair. Moreover, we apply randomly permuted ADMM (RPADMM) to nonseparable multi-block convex optimization, and prove its expected convergence for a class of nonseparable quadratic programming problems. When the linear constraint vanishes, the 2-block proximal ADMM and RPADMM reduce to the 2-block cyclic proximal BCD method and randomly permuted BCD (RPBCD). Our study provides the first iterate convergence result for 2-block cyclic proximal BCD without assuming the boundedness of the iterates. We also theoretically establish the expected iterate convergence result concerning multi-block RPBCD for convex quadratic optimization. In addition, we demonstrate that RPBCD may have a worse convergence rate than cyclic proximal BCD for 2-block convex quadratic minimization problems. Although the results on RPADMM and RPBCD are restricted to quadratic minimization models, they provide some interesting insights: (1) random permutation makes ADMM and BCD more robust for multi-block convex minimization problems; (2) cyclic BCD may outperform RPBCD for “nice” problems, and RPBCD should be applied with caution when solving general convex optimization problems especially with a few blocks.
Similar content being viewed by others
References
Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Stat. 40(2), 1171–1197 (1997)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(2), 2037–2060 (2013)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena-Scientific, Belmont (1999)
Bertsekas, D.P., Tsitsiklis, J.N.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. In: Parallel and Distributed Computation: Numerical Methods, 2nd ed. Athena Scientific, Belmont, MA (1997)
Bolte, J., Sabach, S.Y., Teboulle, M.: Proximal alternating linearized minimization nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Cai, X., Han, D., Yuan, X.: On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1), 57–79 (2016)
Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. Abstract and Applied Analysis, 2013, Article ID 183961, 7 pages
Chen, L., Sun, D., Toh, K.-C.: A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput. Optim. Appl. 66(2), 327–343 (2017)
Chen, L., Sun, D., Toh, K.-C.: An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1), 237–270 (2017)
Cui, Y., Li, X., Sun, D., Toh, K.-C.: On the convergence properties of a majorized alternating direction method of multipliers for linearly constrained convex optimization problems with coupled objective functions. J. Optim. Theory Appl. 169(3), 1013–1041 (2016)
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. UCLA CAM Report, 14–51 (2014)
Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. J. Sci. Comput. 71(2), 712–736 (2017)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)
Feng, C., Xu, H., Li, B.C.: An alternating direction method approach to cloud traffic management. IEEE Trans. Parallel Distrib. Syst. 28(8), 2145–2158 (2017)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Gao, X., Zhang, S.: First-order algorithms for convex optimization with nonseparable objective and coupled constraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)
Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer, New York (1984)
Glowinski, R., Marroco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualite, d’une classe de problemes de dirichlet non lineares. Revue Franqaise d’Automatique, Informatique et Recherche Opirationelle 9, 41–76 (1975)
Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)
Han, D., Yuan, X., Zhang, W., Cai, X.: An ADM-based splitting method for separable convex programming. Comput. Optim. Appl. 54, 343–369 (2013)
He, B., Tao, M., Yuan, X.: A splitting method for separable convex programming. IMA J. Numer. Anal. 35, 394–426 (2015)
He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)
He, B., Yuan, X.: On the O\((1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Hong, M., Chang, T., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization. arXiv:1401.7079 (2014)
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017)
Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Hong, M., Wang, X., Razaviyayn, M., Luo, Z.: Iteration complexity analysis of block coordinate descent methods. arXiv:1310.6957v2 (2014)
Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia Pac. J Oper. Res. 32, 1550024 (2015)
Li, M., Sun, D., Toh, K.-C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)
Li, X., Sun, D., Toh, K.-C.: A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions. Math. Program. 155(1), 333–373 (2016)
Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 69(1), 52–81 (2016)
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25, 1478–1497 (2015)
Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3(3), 251–274 (2015)
Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152, 615–642 (2015)
Monteiro, R., Svaiter, B.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Mota, J.F.C., Xavier, J.M.F., Aguiar, P.M.F., Puschel, M.: Distributed optimization with local domains: Applications in MPC and network flows. IEEE Trans. Autom. Control 60(7), 2004–2009 (2015)
Peng, Y.G., Ganesh, A., Wright, J., Xu, W.L., Ma, Y.: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Razaviyayn, M., Hong, M., Luo, Z.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Richtárik, P., Takác̆, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(2), 1–38 (2014)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14, 567–599 (2013)
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-block constraints. SIAM J. Optim. 25(2), 882–915 (2015)
Sun, R., Luo, Z., Ye, Y.: On the expected convergence of randomly permuted ADMM. arXiv:1503.06387v1 (2015)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Wright, S.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Zhang Y (2010) Convergence of a class of stationary iterative methods for saddle point problems. Rice University Technique Report, TR10-24
Acknowledgements
Caihua Chen was supported by the National Natural Science Foundation of China [Grant No. 11401300, 71732003, 71673130]. Min Li was supported by the National Natural Science Foundation of China [Grant No.11771078, 71390335, 71661147004]. Xin Liu was supported by the National Natural Science Foundation of China [Grant No. 11622112, 11471325, 91530204, 11331012, 11461161005, 11688101], the National Center for Mathematics and Interdisciplinary Sciences, CAS, the Youth Innovation Promotion Association, CAS, and Key Research Program of Frontier Sciences, CAS. Yinyu Ye was supported by the AFOSR Grant [Grant No. FA9550-12-1-0396]. The authors would like to thank Dr. Ji Liu from University of Rochester and Dr. Ruoyu Sun from Stanford University for the helpful discussions on the block coordinate descent method. The authors would also like to thank the associate editor and two anonymous referees for their detailed and valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A.
The proof of Lemma 2 is similar to, but not exactly the same as, that of [48, Lemma 2]. Since S is allowed to be singular here, we need also show the positive definiteness of Q by mathematical induction. For completeness, we will provide a concise proof here. Interested readers are referred to [48] for the motivation and other details of this proof.
Lemma 2 actually reveals a linear algebra property, and is essentially not related with H, A and \(\beta \) if we define \(L_\sigma \) directly by S. For brevity, we restate the main assertion to be proved as following:
where \(S\in \mathbb {R}^{d\times d}\) is positive semidefinite, \(S_{ii}\in \mathbb {R}^{d_i\times d_i}\) (\(i=1,\ldots ,n\)) is positive definite,
and \(\Gamma \) is a set consisting of all permutations of \((1,\ldots ,n)\).
For the brevity of notation, we define the block permutation matrix \(P_k\) as following:
It can be easily verified that \(P_k^{\top }= P_k^{-1}\), and \(P_n=I_d\). For \(k\in \{1,\ldots ,n\}\), we define \(\Gamma _k:=\{\sigma '\mid \sigma ' \text{ is } \text{ a } \text{ permutation } \text{ of }\, \{1,\ldots ,k-1,k+1,\ldots ,n)\}\). For any \(\sigma '\in \Gamma _k\), we define \(L_{\sigma '}\in \mathbb {R}^{(d-d_k)\times (d-d_k)}\) as the following
We define \(\hat{Q}_k\in \mathbb {R}^{(n-d_k)\times (n-d_k)}\) by
and \(W_k\) as the k-th block-column of S excluding the block \(S_{kk}\), i.e.
Due to the positive semi-definiteness of S, and by a slight abuse of the notation A, there exists \(A\in \mathbb {R}^{d\times d}\) satisfying
Let \(A_i\in \mathbb {R}^{d\times d_i}\) (\(i=1,\ldots ,n\)) be the column blocks of A, and it is clear that \(S_{ij} = A_i^{\top }A_j\) for all \(1\le i,j\le n\). For convenience, we define
we have \(AP_k = [\hat{A}_k,A_k]\).
For the clearness of the proof structure, we introduce the following two lemmas.
Lemma 7
Let \(S\in \mathbb {R}^{d\times d}\) be a positive semidefinite matrix \(L_\sigma \), Q, \(\hat{Q}^k\) and \(P_k\) be defined by (48), (50), (78) and (76). It holds that
where
Proof
Let \(\sigma '\in \Gamma _k\), we can partition \(L_{\sigma '}\) as following
Here the sizes of \(Z_{11}\) and \(Z_{22}\) are \((d_1+\cdots + d_{k-1})\times (d_1+\cdots + d_{k-1})\) and \((d_{k+1}+\cdots + d_{n})\times (d_{k+1}+\cdots + d_{n})\), respectively. The sizes of \(Z_{12}\) and \(Z_{21}\) can be determined accordingly. We denote
which implies
It is then easy to verify that
Left and right multiplying both sides of the above relationship by \(P_k^{\top }\) and \(P_k\), respectively, we obtain
Taking the inverse of both sides of (86), we obtain
Summing up (87) for all \(\sigma '\in \Gamma _k\) and dividing by \(|\Gamma _k|\), we get
Here, the last equality follows from (78). By the definition of \(L_\sigma \), it is easy to verify that \(L_{\sigma }^{\top }= L_{\bar{\sigma }}\), where \(\bar{\sigma }\) is a “reverse permutation” of \(\sigma \) that satisfies \(\bar{\sigma }(i)=\sigma (n+1-i)\) (\(i=1,\ldots ,n\)). Thus we have \(L_{(\sigma ',k)}=L_{(k,\bar{\sigma }')}^{\top }\), where \(\bar{\sigma }'\) is a reverse permutation of \(\sigma '\). Summing over all \(\sigma '\), we get
where the last equality follows from the fact that the summing over \(\bar{\sigma }'\) is the same as summing over \(\sigma '\). Thus, we have
Here, the last equality uses the symmetry of \(\hat{Q}_k\). Combining the above relation, (88) and the definition of \(Q_k\), we have
Using the definition of \(P_k\) and the fact that \(|\Gamma _k|=(n-1)!\), we can rewrite (89) as
Summing up the above relation for \(k=1,\ldots ,n\) and then dividing by n, we immediately arrive at (82).\(\square \)
Lemma 8
Let Q, \(\hat{Q}_k\), \(Q_k\), A, \(\hat{A}_n\) and \(W_n\) be defined by (50), (78), (83), (80), (81) and (79). Suppose \(\hat{Q}_n\succ 0\) and
It holds that
Proof
For simplicity, we use W, \(\hat{Q}\) and \(\hat{A}\) to take the place \(W_n\), \(\hat{Q}_n\) and \(\hat{A}_n\), respectively.
It is implied by assumptions \(\hat{Q}\succ 0\) and (90) that \(\Theta :=W^{\top }\hat{Q}W \succeq 0\). Recall that \(S_{nn}=A_n^{\top }A_n = I_{d_n}\), we have
Hence, we obtain
Recall the definition (83), we have
where \(J:= \left[ \begin{array}{cc} I_{d-d_n} &{} 0\\ -\frac{1}{2}W^{\top }&{} I_{d_n} \end{array} \right] \) and \(C:=I_{d_n} - \frac{1}{4}W^{\top }\hat{Q}W\). Apparently, we have \(C\succ 0\). Together with \(\hat{Q}\succ 0\), it implies \(Q_n\succ 0\). Thus, we directly obtain \( \mathrm {eig}(AQ_nA^{\top }) \subset \left[ 0,\infty \right) \). It remains to show
Denote \(\hat{B}:=\hat{A}^{\top }\hat{A}\), then we can write S as
We can reformulate \(\rho (AQ_nA^{\top })\) as follows:
It is easy to verify that
Thus,
According to (96), it suffices to prove \(\rho (Z)<\frac{4}{3}\). Suppose \(\lambda \) is an arbitrary eigenvalue of Z, and \(v\in \mathbb {R}^d\) is one of its associate eigenvector. In the rest, we only need to show
holds. Then, using its arbitrariness, we have \(\rho (Z)<\frac{4}{3}\) which implies (95), and then (91) holds.
Partition v into \(v= \left[ \begin{array}{c} v_1\\ v_0 \end{array} \right] \), where \(v_1\in \mathbb {R}^{d-d_n}\), \(v_0\in \mathbb {R}^{d_n}\). Then, \(Zv=\lambda v\) implies that
If \(\lambda I_{d_n}-C\) is singular, i.e. \(\lambda \) is an eigenvalue of C. By the definition of C and (93), we have \(\frac{2}{3}I_{d_n}\prec C=I_{d_n}-\frac{1}{4}\Theta \preceq I_{d_n}\), which implies that \(\lambda \le 1\), thus inequality (98) holds. In the following, we assume \(\lambda I_{d_n}-C\) is nonsingular. An immediate consequence is \(v_1\ne 0\).
By (100), we obtain \(v_0=\frac{1}{2}(\lambda I_{d_n}-C)^{-1}CW^{\top }v_1\). Substituting this explicit formula into (99), we obtain
where \( \Phi := -I_{d_n} +\lambda [(4\lambda -4) I_{d_n} +\Theta ]^{-1}\). Since \(\Theta \) is a symmetric matrix, \(\Phi \) is also symmetric.
Suppose \(\lambda _{\max }(\Phi )>0\), the definition of \(\Phi \) gives us
Together with \(\lambda _{\max }(\Phi )>0\), there exists \(\theta \in \mathrm {eig}(\Theta )\) such that \(-1+\frac{\lambda }{(4\lambda -4)+\theta }\). If \(\lambda \le 1\), (98) already holds. Otherwise, \(\lambda >1\), which implies \(1<\frac{\lambda }{(4\lambda -4)+\theta }\le \frac{\lambda }{4\lambda -4}\), and then (98) holds.
Now we assume \(\lambda _{\max }(\Phi )\le 0\), i.e. \(\Phi \preceq 0\). By the induction, we have \(\hat{\lambda }:=\rho (\hat{Q}\hat{B})= \rho (\hat{Q}\hat{A}^{\top }\hat{A}) \subset \left[ 0,\frac{4}{3}\right) \). Due to the positive definiteness of \(\hat{Q}\), there exists nonsingular \(U\in \mathbb {R}^{(d-d_n)\times (d-d_n)}\) such that \(\hat{Q}=U^{\top }U\). Let \(Y:=UW\Phi W^{\top }U^{\top }\in \mathbb {R}^{(d-d_n)\times (d-d_n)}\).
We have \(v^{\top }Yv =v^{\top }UW\Phi W^{\top }U^{\top }v =(W^{\top }U^{\top }v)^{\top }\Phi (W^{\top }U^{\top }v) \le 0\) holds for all \(v\in \mathbb {R}^{d-d_n}\), where the last inequality follows from \(\Phi \preceq 0\). Thus, \(Y\preceq 0\). Pick up arbitrary g satisfying \(g>\rho (Y)\). Then, it holds that
From (101), we can conclude that \((g+\lambda ) v_1 = (\hat{Q}\hat{B}+\hat{Q}W\Phi W^{\top }+ g I_{d-d_n})v_1\). Consequently,
which implies
where the last inequality follows from (102). The relation (103) directly gives us that \(\lambda \le \hat{\lambda }<\frac{4}{3}\). Namely, (98) also holds in this case.
We have completed the proof.\(\square \)
Now we are ready to present the main proof of Lemma 2.
Proof of Lemma 2
Without loss of generality, we assume \(S_{ii}=I_{d_i}\) (\(i=1,\ldots ,n\)). Otherwise, we denote
It is easy to verify that \(\tilde{Q} = D^{-1}QD^{-1}\), if \(\tilde{S} =DSD\), and \(\tilde{L}_{\sigma }\) and \(\tilde{Q}\) are defined by (75) with \(\tilde{S}\). It holds that
and \(\tilde{S}_{ii}=I_{d_i}\) (\(i=1,\ldots ,n\)).
It follows from the definition of A, (80), that \(\mathrm {eig}(QS)=\mathrm {eig}(AQA^{\top })\). Now we use mathematical induction to prove this lemma. Firstly, the assertion (74) and \(Q\succ 0\) hold when \(n=1\), as \(QS=I\) in this case. Next, we will prove the lemma for any \(n\ge 2\) given that the assertion (74) and \(Q\succ 0\) hold for \(n-1\).
By using Lemma 7, it directly follows from (82) that \(AQA^{\top }= \frac{1}{n}\sum \limits _{k=1}^n AP_k Q_kP_k^{\top }A^{\top }\). Consequently,
By the induction assumptions and Lemma 8, we obtain the relationship (91). Together with the similarity among the blocks, the relationship (91) implies
Substituting (105) into (104), we prove the assertion (74) for n, and hence complete the proof of Lemma 2.
Appendix B
Proof of Lemma 3
For convenience, we use the notation
We prove this lemma by mathematical induction on the dimension d. When \(d=1\), it is easily seen that
which means that Lemma 3 holds in this case. Suppose this lemma is valid for \(d\le k-1\). Consider the case where \(d =k\).
-
Case 1:
\(S\succ 0\). In this case, \(\mathrm{Rank}(S) = \mathrm{Rank}(S+T)=k\) and then \(l = 0\). Because
$$\begin{aligned} g(\lambda ;S,T)=(\lambda -1)^{l} g(\lambda ;S,T) \qquad \mathrm{and}\qquad g(1;S,T) = \mathrm{det}(S) >0, \end{aligned}$$Lemma 3 holds in this case.
-
Case 2:
\(S\succeq 0\) but not positive definite. Let S admit the following eigenvalue decomposition
$$\begin{aligned} P^\top SP = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0 &{} &{} &{} &{} &{}\\ &{}\ddots &{} &{}&{}&{}\\ &{} &{} 0 &{} &{}&{}\\ &{} &{} &{} s_1 &{} &{} \\ &{}&{}&{}&{}\ddots &{}\\ &{}&{}&{}&{}&{} s_t \end{array} \right] :=D, \end{aligned}$$where P is a orthogonal matrix and \(s_i>0\). If we let \(W = P^\top TP\succeq 0\), then
$$\begin{aligned} g(\lambda ;S,T) = g(\lambda ;D,W). \end{aligned}$$The proof proceeds by considering the following two subcases.
-
Case 2.1:
\(W_{11}=0\). Since W is positive semidefinite, then \(W_{1i}= W_{i1}=0\) for \(i=1,2,\ldots ,k\). Note that
$$\begin{aligned} g(\lambda ;D,W) = (\lambda -1)^2 g(\lambda ;D',W') \end{aligned}$$where \(D'\) and \(W'\) are the submatrices of D and W obtained by deleting the first row and column. As we have assumed that Lemma 3 holds for \(d =k-1\) , there exists a polynomial p(x) such that
$$\begin{aligned} g(\lambda ;D,W) = (\lambda -1)^2(\lambda -1)^{2k-2-\mathrm{Rank}{D'} -\mathrm{Rank}{(D'+W')}}p(\lambda ). \end{aligned}$$Note that \(\mathrm{Rank}(D')=\mathrm{Rank}(D)= \mathrm{Rank}(S)\) and \(\mathrm{Rank}(D'+W')=\mathrm{Rank}(D+W)= \mathrm{Rank}(S+T)\). Thus, we have
$$\begin{aligned} g(\lambda ;S,T) = (\lambda -1)^{2k- \mathrm{Rank}(S)-\mathrm{Rank}(S+T)}, \end{aligned}$$which implies that Lemma 3 is true for \(d=k\) in this subcase.
-
Case 2.2:
\(W_{11}\ne 0\). Without loss of generality, assume \(W_{11}=1\). Let \(w^\top =[W_{12},\ldots ,W_{1k}]\). By direct calculation, we obtain
$$\begin{aligned} g(\lambda ;D,W) = (\lambda -1)^2 g(\lambda ;D',W') + (\lambda -1)g(\lambda ;D',W'- ww^\top ). \end{aligned}$$Since \(\mathrm{Rank}(D'+W')\le \mathrm{Rank}(D+W) =\mathrm{Rank}(S+T)\), there exists a polynomial \(p_1(x)\) such that
$$\begin{aligned} g(\lambda ;D',W') = (\lambda -1)^{2k-2-\mathrm{Rank}(S) - \mathrm{Rank}(S+T)}p_1(\lambda ), \end{aligned}$$where \(p_1(1)\ge 0\). On the other hand, since \(\mathrm{Rank}(D'+W'-ww^\top ) = \mathrm{Rank}(D+W)-1 =\mathrm{Rank}(S+T)-1\), there exists a polynomial \(p_2(x)\) such that
$$\begin{aligned} g(\lambda ;D', W'- ww^\top )= (\lambda -1)^{2k-1-\mathrm{Rank}(S)-\mathrm{Rank}(S+T)}p_2(\lambda ), \end{aligned}$$where \(p_2(1)>0\). Therefore,
$$\begin{aligned} g(\lambda ;S,T) =(\lambda -1)^{2k-\mathrm{Rank}(S)-\mathrm{Rank}(S+T)}(p_1(\lambda )+p_2(\lambda )) \end{aligned}$$and then Lemma 3 holds for this subcase.
-
Case 2.1:
This completes the proof.\(\square \)
Appendix C
Proof of Lemma 4
It is easily seen that
and therefore we need only prove that
Indeed, consider the following linear system
which is equivalent to
It then holds that
and therefore \(Sx =0\) and \(A^\top \mu =0\), because \(S=H+\beta A^\top A\) is positive semidefinite. This means that
On the other hand, it is not difficult to verify that any solution of (108) is the solution of (107), in other words, linear systems (107) and (108) are equivalent. As a result, the rank equality (106) holds, which completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Chen, C., Li, M., Liu, X. et al. Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights. Math. Program. 173, 37–77 (2019). https://doi.org/10.1007/s10107-017-1205-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1205-9
Keywords
- Nonseparable convex minimization
- Alternating direction method of multipliers
- Block coordinate descent method
- Iterate convergence
- Random permutation