Abstract
The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems due to its superior practical performance. On the theoretical side however, a counterexample was shown in Chen et al. (Math Program 155(1):57–79, 2016.) indicating that the multi-block ADMM for minimizing the sum of N \((N\ge 3)\) convex functions with N block variables linked by linear constraints may diverge. It is therefore of great interest to investigate further sufficient conditions on the input side which can guarantee convergence for the multi-block ADMM. The existing results typically require the strong convexity on parts of the objective. In this paper, we provide two different ways related to multi-block ADMM that can find an \(\epsilon \)-optimal solution and do not require strong convexity of the objective function. Specifically, we prove the following two results: (1) the multi-block ADMM returns an \(\epsilon \)-optimal solution within \(O(1/\epsilon ^2)\) iterations by solving an associated perturbation to the original problem; this case can be seen as using multi-block ADMM to solve a modified problem; (2) the multi-block ADMM returns an \(\epsilon \)-optimal solution within \(O(1/\epsilon )\) iterations when it is applied to solve a certain sharing problem, under the condition that the augmented Lagrangian function satisfies the Kurdyka–Łojasiewicz property, which essentially covers most convex optimization models except for some pathological cases; this case can be seen as applying multi-block ADMM to solving a special class of problems.
Similar content being viewed by others
References
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearization minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Cai, X., Han, D., Yuan, X.: The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex. Preprint http://www.optimization-online.org/DB_FILE/2014/11/4644.pdf (2014)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1), 57–79 (2016)
Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. In: Abstract and Applied Analysis, Article ID 183961 (2013)
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Technical report, UCLA CAM Report 15-13 (2015)
Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. Technical report, UCLA CAM 13-64 (2013)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Douglas, J., Rachford, H.H.: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technology (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Pac. J. Optim. 11(4), 619–644 (2015)
Fortin, M., Glowinski, R.: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland Pub. Co., Amsterdam (1983)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems. North-Holland, Amsterdam (1983)
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989)
Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)
He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2013)
He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)
He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Preprint http://www.optimization-online.org/DB_FILE/2012/09/3611.pdf (2012)
He, B., Yuan, X.: On the \({O}(1/n)\) convergence rate of Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
He, B., Yuan, X.: On nonergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2015)
Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2014). Preprint ArXiv:1401.7079
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers (2012). Preprint ArXiv:1208.3922
Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization with one strongly convex block. Asia Pac. J. Oper. Res. 32, 1550024 (2015)
Lin, T., Ma, S., Zhang, S.: Global convergence of unmodified 3-block ADMM for a class of convex minimization problems (2015). Preprint ArXiv:1505.04252
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3(3), 251–274 (2015)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23, 475–507 (2013)
Peaceman, D.H., Rachford, H.H.: The numerical solution of parabolic elliptic differential equations. SIAM J. Appl. Math. 3, 28–41 (1955)
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-type of constraints. SIAM J. Optim. 25(2), 882–915 (2015)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Wang, X., Hong, M., Ma, S., Luo, Z.-Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers. Pac. J. Optim. 11(4), 645–667 (2015)
Acknowledgments
The authors are grateful to the associate editor and two anonymous referees for their insightful comments that have improved the presentation of this paper greatly.
Author information
Authors and Affiliations
Corresponding author
Additional information
Shiqian Ma: Research of this author was supported in part by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513). Shuzhong Zhang: Research of this author was supported in part by the National Science Foundation under Grant Number CMMI-1462408.
Appendix: Proof of Theorem 4.3
Appendix: Proof of Theorem 4.3
We first prove the following lemma.
Lemma 4.8
The following results hold under the conditions in Scenario 2.
-
1.
The iterative gap of dual variable can be bounded by that of primal variable, i.e.,
$$\begin{aligned} \Vert \lambda ^{k+1} - \lambda ^{k} \Vert ^2 \le L^2 \Vert x_N^{k+1} - x_N^k \Vert , \end{aligned}$$(4.17)where L is the Lipschitz constant for \(\nabla f_N\).
-
2.
The augmented Lagrangian \(L_\gamma \) has a sufficient decrease in each iteration, i.e.,
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k},\ldots , x_{N+1}^{k};\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{N+1}^{k+1};\lambda ^{k+1}\right) \nonumber \\&\quad \ge \frac{\gamma ^2-2L^2}{2\gamma (1+L^2)}\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) .\nonumber \\ \end{aligned}$$(4.18) -
3.
The augmented Lagrangian \({\mathcal {L}}_\gamma (w^k)\) is uniformly lower bounded, and it holds true that
$$\begin{aligned}&\sum \limits _{k=0}^\infty \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^{k+1} - A_i x_i^k \right\| ^2 + \left\| x_N^{k+1} - x_N^k \right\| ^2 + \left\| \lambda ^{k+1}-\lambda ^k \right\| ^2\right) \nonumber \\&\quad \le \frac{2\gamma (1+L^2)}{\gamma ^2-2L^2}\left( {\mathcal {L}}_\gamma (w^0) - L^*\right) \end{aligned}$$(4.19)where \(L^*\) is the uniform lower bound of \({\mathcal {L}}_\gamma (w^k)\), and hence
$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \left( \sum _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2 \right) = 0. \end{aligned}$$(4.20)Moreover, \(\left\{ \left( x_1^k, x_2^k, \ldots ,x_N^k, \lambda ^k\right) : k=0,1,\ldots \right\} \) is a bounded sequence.
-
4.
There exists an upper bound for a subgradient of augmented Lagrangian \({\mathcal {L}}_\gamma \) in each iteration. In fact, we define
$$\begin{aligned} R_i^{k+1}:= & {} \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \\&- \gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}\left( x_{j}^{k}-x_{j}^{k+1}\right) +\left( x_{N}^{k}-x_{N}^{k+1}\right) \right) \end{aligned}$$and
$$\begin{aligned} R_N^{k+1} := \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) , \quad R_{\lambda }^{k+1} := b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} \end{aligned}$$for each positive integer k, and \(i = 1,2,\ldots , N\). Then \(\left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \in \partial {\mathcal {L}}_\gamma (w^{k+1})\). Moreover, it holds that
$$\begin{aligned}&\left\| \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \right\| \le \sum \limits _{i=1}^N \left\| R_i^{k+1} \right\| + \left\| R_\lambda ^{k+1} \right\| \nonumber \\&\quad \le M\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_i^k - x_i^{k+1} \right\| + \left\| \lambda ^k - \lambda ^{k+1} \right\| \right) , \quad \forall k\ge 0,\nonumber \\ \end{aligned}$$(4.21)where M is a constant defined in (4.8).
Proof of Lemma 4.8
- 1.
-
2.
From (4.1), by invoking the convexity of \(f_i\), we have for \(i=1,\ldots ,N-1\):
$$\begin{aligned} 0&= \left( x_i^k - x_i^{k+1}\right) ^\top \left[ g_i\left( x_{i}^{k+1}\right) -A_{i}^{\top }\lambda ^{k}+\gamma A_{i}^{\top }\left( \sum _{j=1}^iA_{j}x_{j}^{k+1}+\sum _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_{N}^k -b\right) \right] \nonumber \\&\le f_i\left( x_i^k\right) - f_i\left( x_i^{k+1}\right) - \left( A_i x_i^k - A_i x_i^{k+1}\right) ^\top \lambda ^k \nonumber \\&\qquad +\, \gamma \left( A_i x_i^k - A_i x_i^{k+1}\right) ^\top \left( \sum _{j=1}^iA_{j}x_{j}^{k+1}+\sum _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_{N}^k -b \right) \nonumber \\&= {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^k,x_i^k,\ldots ,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots ,x_i^{k+1},x_{i+1}^k,\ldots ,\lambda ^k\right) \nonumber \\&\qquad -\, \frac{\gamma }{2}\left\| A_i x_i^k - A_i x_i^{k+1}\right\| ^2. \end{aligned}$$(4.22)Similarly, from (4.2) we can prove that
$$\begin{aligned} {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{N-1}^{k+1},x_N^k,\ldots ;\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots ,x_N^{k+1};\lambda ^k\right) \ge \frac{\gamma }{2}\left\| x_N^k - x_N^{k+1}\right\| ^2.\nonumber \\ \end{aligned}$$(4.23)Summing (4.22) over \(i=1,\ldots ,N-1\) and (4.23), we have
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^k,\ldots ,x_N^k,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^k\right) \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma }{2}\left\| x_N^k - x_N^{k+1} \right\| ^2. \end{aligned}$$(4.24)On the other hand, it follows from (4.2) and (4.17) that
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad = \frac{1}{\gamma }\left\| \lambda ^k - \lambda ^{k+1} \right\| ^2 \ge - \frac{L^2}{\gamma }\left\| x_N^k - x_N^{k+1} \right\| ^2. \end{aligned}$$(4.25)Combining (4.24) and (4.25) yields
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma ^2 - 2L^2}{2\gamma } \left\| x_N^k - x_N^{k+1} \right\| ^2 \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma ^2 - 2L^2}{2\gamma (1+L^2)}\left( \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) \nonumber \\&\quad \ge \frac{\gamma ^2 - 2L^2}{2\gamma (1+L^2)}\left( \sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) .\nonumber \\ \end{aligned}$$(4.26) -
3.
It follows from (4.2) and the fact that \(\nabla f_N\) is Lipschitz continuous with constant L that,
$$\begin{aligned}&f_N\left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1}\right) \\&\quad \le f_N\left( x_N^{k+1}\right) + \left\langle \nabla f_N\left( x_N^{k+1}\right) , \left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1}\right) \right\rangle \\&\qquad +\, \frac{L}{2}\left\| b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} \right\| ^2 \\&\quad = f_N\left( x_N^{k+1}\right) - \left\langle \lambda ^{k+1}, \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\rangle + \frac{L}{2} \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2. \end{aligned}$$This implies that there exists \(L^*>-\infty \), such that
$$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad \ge \sum _{i=1}^{N-1} f_i(x_i^{k+1}) + f_N\left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1}\right) + \frac{\gamma -L}{2} \left\| \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right\| ^2 \nonumber \\&\quad > L^*, \end{aligned}$$(4.27)where the last inequality holds since \(\gamma >L\) and \(\inf _{{\mathcal {X}}_i}f_i>f_i^*\) for \(i=1,2,\ldots ,N\).
Therefore, it directly follows from (4.18) and \(\gamma >\sqrt{2}L\) that,
$$\begin{aligned}&\frac{\gamma ^2-2L^2}{2\gamma (1+L^2)}\sum \limits _{k=0}^K \left( \sum \limits _{i=1}^{N-1} \Vert A_i x_i^{k+1} - A_i x_i^k \Vert ^2 + \Vert x_N^{k+1} - x_N^k \Vert ^2 + \Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right) \\&\quad \le {\mathcal {L}}_\gamma (w^0) - L^*. \end{aligned}$$Letting \(K\rightarrow \infty \) gives (4.19) and (4.20).
It also follows from (4.27), (4.18) and \(\gamma >\sqrt{2}L\) that \({\mathcal {L}}_\gamma (w^0) - f_N^* \ge \sum _{i=1}^{N-1} f_i(x_i^{k+1})\). This implies that \(\left\{ \left( x_1^k, x_2^k, \ldots ,x_{N-1}^k\right) : k=0,1,\ldots \right\} \) is a bounded sequence by using the coerciveness of \(f_i+\mathbf 1 _{{\mathcal {X}}_i}, i=1,2,\ldots ,N-1\). The boundedness of \(\left( x_N^k, \lambda ^k\right) \) can be obtained by using (4.3), (4.17) and (4.20).
-
4.
From the definition of \({\mathcal {L}}_\gamma \), it is clear that for \(i=1,\ldots ,N-1\),
$$\begin{aligned} g_i\left( x_i^{k+1}\right) - A_i^\top \lambda ^{k+1} + \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \in \partial _{x_i} {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$and
$$\begin{aligned} \nabla f\left( x_N^{k+1}\right) - \lambda ^{k+1} + \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) = \nabla _{x_N} {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$and
$$\begin{aligned} b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} = \nabla _{\lambda } {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$where \(g_i\in \partial \left( f_i + \mathbf {1}_{{\mathcal {X}}_i}\right) \) for \(i=1,2,\ldots ,N-1\).
Combining these relations with (4.4) and (4.5) yields that
$$\begin{aligned}&R_i^{k+1} := \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \\&\quad - \gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}(x_{j}^{k}-x_{j}^{k+1})+(x_{N}^{k}-x_{N}^{k+1})\right) \in \partial _{x_i} {\mathcal {L}}_\gamma (w^{k+1}), \\&R_N^{k+1} := \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) = \nabla _{x_N} {\mathcal {L}}_\gamma (w^{k+1}), \\&R_{\lambda }^{k+1} := b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} = \nabla _{\lambda } {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$for \(i=1,2,\ldots ,N-1\). Therefore, \(\left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \in \partial {\mathcal {L}}_\gamma (w^{k+1})\). We now need to bound the norms of \(R_i^{k+1}\), \(i=1,\ldots ,N-1\), \(R_N^k\) and \(R_\lambda ^k\). It holds that
$$\begin{aligned} \left\| R_i^{k+1} \right\|\le & {} \gamma \left\| A_i^\top \right\| \left( \sum \limits _{j=i+1}^{N-1} \left\| A_j x_j^{k} - A_j x_j^{k+1}\right\| + \left\| x_N^{k} - x_N^{k+1}\right\| \right) \\&+\, \gamma \left\| A_i^\top \right\| \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| \\\le & {} \gamma \left\| A_i^\top \right\| \left( \sum \limits _{j=1}^{N-1} \left\| A_j x_j^{k} - A_j x_j^{k+1}\right\| + \left\| x_N^{k} - x_N^{k+1}\right\| \right) + \left\| A_i^\top \right\| \left\| \lambda ^k - \lambda ^{k+1}\right\| \end{aligned}$$and
$$\begin{aligned} \left\| R_N^{k+1} \right\| = \gamma \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| = \left\| \lambda ^k - \lambda ^{k+1}\right\| , \quad \Vert R_\lambda ^{k+1} \Vert = \frac{1}{\gamma } \left\| \lambda ^k - \lambda ^{k+1} \right\| . \end{aligned}$$These relations immediately imply (4.21). \(\Box \)
Proof of Theorem 4.3
-
1.
It has been proven in Lemma 4.8 that \(\left\{ \left( x_1^k, x_2^k, \ldots ,x_N^k, \lambda ^k\right) :\right. \left. k=0,1,\ldots \right\} \) is a bounded sequence. Therefore, we conclude that \(\Omega (w^0)\) is non-empty by the Bolzano-Weierstrass Theorem. Let \(w^* = \left( x_1^*,\ldots , x_N^*,\lambda ^*\right) \in \Omega (w^0)\) be a limit point of \(\{w^k = \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) :k=0,1,\ldots \}\). Then there exists a subsequence \(\left\{ w^{k_q} = \left( x_1^{k_q},\ldots , x_N^{k_q},\lambda ^{k_q}\right) :q=0,1,\ldots \right\} \) such that \(w^{k_q}\rightarrow w^*\) as \(q\rightarrow \infty \). Since \(f_i, i=1,\ldots ,N-1\), are lower semi-continuous, we obtain that
$$\begin{aligned} \liminf \limits _{q\rightarrow \infty } f_i(x_i^{k_q}) \ge f_i(x_i^*), \quad i=1,2,\ldots , N. \end{aligned}$$(4.28)From (1.2), we have for any integer k and any \(i=1,\ldots ,N-1\),
$$\begin{aligned} x_i^{k+1} := \mathop {\mathrm{argmin}}\limits _{x_i\in {\mathcal {X}}_i} \ {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^{k+1}, x_i, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) . \end{aligned}$$Letting \(x_i = x_i^*\) in the above, we get
$$\begin{aligned} {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_i^{k+1}, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) \le {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^{k+1}, x_i^{*}, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) , \end{aligned}$$i.e.,
$$\begin{aligned}&f_i\left( x_i^{k+1}\right) - \left\langle \lambda ^k, A_i x_i^{k+1}\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{j=1}^i A_j x_j^{k+1} + \sum \limits _{j=i+1}^{N-1} A_j x_j^{k} + x_N^k - b \right\| ^2 \\&\quad \le f_i\left( x_i^*\right) - \left\langle \lambda ^k, A_i x_i^*\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{j=1}^{i-1} A_j x_j^{k+1} + A_i x_i^* + \sum \limits _{j=i+1}^{N-1} A_j x_j^{k} + x_N^k - b \right\| ^2. \end{aligned}$$Choosing \(k=k_q-1\) in the above inequality and letting q go to \(+\infty \), we obtain
$$\begin{aligned} \limsup \limits _{q\rightarrow +\infty }f_i\left( x_i^{k_q}\right) \le \limsup \limits _{q\rightarrow +\infty } \left( \frac{\gamma }{2}\left\| A_i x_i^{k_q} - A_i x_i^* \right\| ^2 + \left\langle \lambda ^k, A_i x_i^{k_q}- A_i x_i^*\right\rangle \right) + f_i\left( x_i^*\right) ,\nonumber \\ \end{aligned}$$(4.29)for \(i=1,2,\ldots ,N-1\). Here we have used the facts that the sequence \(\{w^k:k=0,1,\ldots \}\) is bounded, and \(\gamma \) is finite, and that the distance between two successive iterates tends to zero (4.20), and the fact that
$$\begin{aligned}&\sum \limits _{j=1}^i A_j x_j^{k+1} + \sum \limits _{j=i+1}^{N-1} A_j x_j^k + x_N^k - b = \sum \limits _{j=i+1}^{N-1} \left( A_j x_j^k - A_j x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \\&\quad +\,\frac{1}{\gamma } \left( \lambda ^k - \lambda ^{k+1}\right) . \end{aligned}$$From (4.20) we also have \(x_i^{k_q-1}\rightarrow x_i^*\) as \(q\rightarrow \infty \), hence (4.29) reduces to \(\limsup \limits _{q\rightarrow \infty }f_i(x_i^{k_q})\le f_i(x_i^*)\). Therefore, combining with (4.28), \(f_i(x_i^{k_q})\) tends to \(f_i(x_i^*)\) as \(q\rightarrow \infty \). Hence, we can conclude that
$$\begin{aligned} \lim \limits _{q\rightarrow \infty }{\mathcal {L}}_\gamma (w^{k_q})= & {} \lim \limits _{q\rightarrow \infty }\left( \sum \limits _{i=1}^N f_i\left( x_i^{k_q}\right) - \left\langle \lambda ^{k_q}, \sum \limits _{i=1}^{N-1} A_i x_i^{k_q} + x_N^{k_q} -b\right\rangle \right. \\&\left. +\, \frac{\gamma }{2}\left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k_q} + x_N^{k_q} -b \right\| ^2\right) \\= & {} \sum \limits _{i=1}^N f_i\left( x_i^{*}\right) - \left\langle \lambda ^{*}, \sum \limits _{i=1}^{N-1} A_i x_i^{*}+x_N^{*}-b\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{i=1}^{N-1} A_i x_i^{*}+x_N^{*}-b \right\| ^2 \\= & {} {\mathcal {L}}_\gamma (w^{*}). \end{aligned}$$On the other hand, it follows from (4.20) and (4.21) that
$$\begin{aligned} \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right)\in & {} \partial {\mathcal {L}}_\gamma (w^{k+1}) \end{aligned}$$(4.30)$$\begin{aligned} \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right)\rightarrow & {} (0,\ldots ,0), \quad k\rightarrow \infty . \end{aligned}$$(4.31)It implies that \((0,\ldots ,0)\in \partial {\mathcal {L}}_\gamma (x_1^*,\ldots ,x_N^*,\lambda ^*)\) due to the closeness of \(\partial {\mathcal {L}}_\gamma \). Therefore, \(w^* = \left( x_1^*,\ldots ,x_N^*,\lambda ^*\right) \) is a critical point of \({\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )\).
-
2.
The proof for this assertion directly follows from Lemma 5 and Remark 5 of [4]. We omit the proof here for succinctness.
-
3.
We define that \(\tilde{L}\) is the finite limit of \({\mathcal {L}}_\gamma (x_1^k,\ldots ,x_N^k,\lambda ^k)\) as k goes to infinity, i.e.,
$$\begin{aligned} \tilde{L} = \lim \limits _{k\rightarrow \infty } {\mathcal {L}}_\gamma \left( x_1^k,\ldots ,x_N^k,\lambda ^k\right) . \end{aligned}$$Take \(w^*\in \Omega (w^0)\). There exists a subsequence \(w^{k_q}\) converging to \(w^*\) as q goes to infinity. Since we have proven that
$$\begin{aligned} \lim \limits _{q\rightarrow \infty }{\mathcal {L}}_\gamma (w^{k_q}) = {\mathcal {L}}_\gamma (w^{*}), \end{aligned}$$and \({\mathcal {L}}_\gamma (w^{k})\) is a non-increasing sequence, we conclude that \({\mathcal {L}}_\gamma (w^{*}) = \tilde{L}\), hence the restriction of \({\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )\) to \(\Omega (w^0)\) equals \(\tilde{L}\).
\(\square \)
Rights and permissions
About this article
Cite this article
Lin, T., Ma, S. & Zhang, S. Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity. J Sci Comput 69, 52–81 (2016). https://doi.org/10.1007/s10915-016-0182-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-016-0182-0
Keywords
- Alternating direction method of multipliers (ADMM)
- Convergence rate
- Regularization
- Kurdyka–Łojasiewicz property
- Convex optimization