Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity

Abstract

The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems due to its superior practical performance. On the theoretical side however, a counterexample was shown in Chen et al. (Math Program 155(1):57–79, 2016.) indicating that the multi-block ADMM for minimizing the sum of N \((N\ge 3)\) convex functions with N block variables linked by linear constraints may diverge. It is therefore of great interest to investigate further sufficient conditions on the input side which can guarantee convergence for the multi-block ADMM. The existing results typically require the strong convexity on parts of the objective. In this paper, we provide two different ways related to multi-block ADMM that can find an \(\epsilon \)-optimal solution and do not require strong convexity of the objective function. Specifically, we prove the following two results: (1) the multi-block ADMM returns an \(\epsilon \)-optimal solution within \(O(1/\epsilon ^2)\) iterations by solving an associated perturbation to the original problem; this case can be seen as using multi-block ADMM to solve a modified problem; (2) the multi-block ADMM returns an \(\epsilon \)-optimal solution within \(O(1/\epsilon )\) iterations when it is applied to solve a certain sharing problem, under the condition that the augmented Lagrangian function satisfies the Kurdyka–Łojasiewicz property, which essentially covers most convex optimization models except for some pathological cases; this case can be seen as applying multi-block ADMM to solving a special class of problems.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  2. 2.

    Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearization minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  6. 6.

    Cai, X., Han, D., Yuan, X.: The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex. Preprint http://www.optimization-online.org/DB_FILE/2014/11/4644.pdf (2014)

  7. 7.

    Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1), 57–79 (2016)

  8. 8.

    Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. In: Abstract and Applied Analysis, Article ID 183961 (2013)

  9. 9.

    Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Technical report, UCLA CAM Report 15-13 (2015)

  10. 10.

    Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. Technical report, UCLA CAM 13-64 (2013)

  11. 11.

    Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Douglas, J., Rachford, H.H.: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technology (1989)

  14. 14.

    Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Pac. J. Optim. 11(4), 619–644 (2015)

  16. 16.

    Fortin, M., Glowinski, R.: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland Pub. Co., Amsterdam (1983)

    Google Scholar 

  17. 17.

    Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary Value Problems. North-Holland, Amsterdam (1983)

    Google Scholar 

  18. 18.

    Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989)

    Google Scholar 

  19. 19.

    Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Preprint http://www.optimization-online.org/DB_FILE/2012/09/3611.pdf (2012)

  23. 23.

    He, B., Yuan, X.: On the \({O}(1/n)\) convergence rate of Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    He, B., Yuan, X.: On nonergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2014). Preprint ArXiv:1401.7079

  26. 26.

    Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers (2012). Preprint ArXiv:1208.3922

  27. 27.

    Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

  28. 28.

    Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization with one strongly convex block. Asia Pac. J. Oper. Res. 32, 1550024 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Lin, T., Ma, S., Zhang, S.: Global convergence of unmodified 3-block ADMM for a class of convex minimization problems (2015). Preprint ArXiv:1505.04252

  30. 30.

    Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3(3), 251–274 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  32. 32.

    Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23, 475–507 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    Peaceman, D.H., Rachford, H.H.: The numerical solution of parabolic elliptic differential equations. SIAM J. Appl. Math. 3, 28–41 (1955)

    MathSciNet  Article  MATH  Google Scholar 

  35. 35.

    Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)

    Article  Google Scholar 

  36. 36.

    Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-type of constraints. SIAM J. Optim. 25(2), 882–915 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  38. 38.

    Wang, X., Hong, M., Ma, S., Luo, Z.-Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers. Pac. J. Optim. 11(4), 645–667 (2015)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the associate editor and two anonymous referees for their insightful comments that have improved the presentation of this paper greatly.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Shiqian Ma.

Additional information

Shiqian Ma: Research of this author was supported in part by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513). Shuzhong Zhang: Research of this author was supported in part by the National Science Foundation under Grant Number CMMI-1462408.

Appendix: Proof of Theorem 4.3

Appendix: Proof of Theorem 4.3

We first prove the following lemma.

Lemma 4.8

The following results hold under the conditions in Scenario 2.

  1. 1.

    The iterative gap of dual variable can be bounded by that of primal variable, i.e.,

    $$\begin{aligned} \Vert \lambda ^{k+1} - \lambda ^{k} \Vert ^2 \le L^2 \Vert x_N^{k+1} - x_N^k \Vert , \end{aligned}$$
    (4.17)

    where L is the Lipschitz constant for \(\nabla f_N\).

  2. 2.

    The augmented Lagrangian \(L_\gamma \) has a sufficient decrease in each iteration, i.e.,

    $$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k},\ldots , x_{N+1}^{k};\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{N+1}^{k+1};\lambda ^{k+1}\right) \nonumber \\&\quad \ge \frac{\gamma ^2-2L^2}{2\gamma (1+L^2)}\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) .\nonumber \\ \end{aligned}$$
    (4.18)
  3. 3.

    The augmented Lagrangian \({\mathcal {L}}_\gamma (w^k)\) is uniformly lower bounded, and it holds true that

    $$\begin{aligned}&\sum \limits _{k=0}^\infty \left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^{k+1} - A_i x_i^k \right\| ^2 + \left\| x_N^{k+1} - x_N^k \right\| ^2 + \left\| \lambda ^{k+1}-\lambda ^k \right\| ^2\right) \nonumber \\&\quad \le \frac{2\gamma (1+L^2)}{\gamma ^2-2L^2}\left( {\mathcal {L}}_\gamma (w^0) - L^*\right) \end{aligned}$$
    (4.19)

    where \(L^*\) is the uniform lower bound of \({\mathcal {L}}_\gamma (w^k)\), and hence

    $$\begin{aligned} \lim \limits _{k\rightarrow \infty } \left( \sum _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2 \right) = 0. \end{aligned}$$
    (4.20)

    Moreover, \(\left\{ \left( x_1^k, x_2^k, \ldots ,x_N^k, \lambda ^k\right) : k=0,1,\ldots \right\} \) is a bounded sequence.

  4. 4.

    There exists an upper bound for a subgradient of augmented Lagrangian \({\mathcal {L}}_\gamma \) in each iteration. In fact, we define

    $$\begin{aligned} R_i^{k+1}:= & {} \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \\&- \gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}\left( x_{j}^{k}-x_{j}^{k+1}\right) +\left( x_{N}^{k}-x_{N}^{k+1}\right) \right) \end{aligned}$$

    and

    $$\begin{aligned} R_N^{k+1} := \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) , \quad R_{\lambda }^{k+1} := b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} \end{aligned}$$

    for each positive integer k, and \(i = 1,2,\ldots , N\). Then \(\left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \in \partial {\mathcal {L}}_\gamma (w^{k+1})\). Moreover, it holds that

    $$\begin{aligned}&\left\| \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \right\| \le \sum \limits _{i=1}^N \left\| R_i^{k+1} \right\| + \left\| R_\lambda ^{k+1} \right\| \nonumber \\&\quad \le M\left( \sum \limits _{i=1}^{N-1} \left\| A_i x_i^k - A_i x_i^{k+1}\right\| + \left\| x_i^k - x_i^{k+1} \right\| + \left\| \lambda ^k - \lambda ^{k+1} \right\| \right) , \quad \forall k\ge 0,\nonumber \\ \end{aligned}$$
    (4.21)

    where M is a constant defined in (4.8).

Proof of Lemma 4.8

  1. 1.

    (4.17) follows directly from (4.5).

  2. 2.

    From (4.1), by invoking the convexity of \(f_i\), we have for \(i=1,\ldots ,N-1\):

    $$\begin{aligned} 0&= \left( x_i^k - x_i^{k+1}\right) ^\top \left[ g_i\left( x_{i}^{k+1}\right) -A_{i}^{\top }\lambda ^{k}+\gamma A_{i}^{\top }\left( \sum _{j=1}^iA_{j}x_{j}^{k+1}+\sum _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_{N}^k -b\right) \right] \nonumber \\&\le f_i\left( x_i^k\right) - f_i\left( x_i^{k+1}\right) - \left( A_i x_i^k - A_i x_i^{k+1}\right) ^\top \lambda ^k \nonumber \\&\qquad +\, \gamma \left( A_i x_i^k - A_i x_i^{k+1}\right) ^\top \left( \sum _{j=1}^iA_{j}x_{j}^{k+1}+\sum _{j=i+1}^{N-1} A_{j}x_{j}^{k} + x_{N}^k -b \right) \nonumber \\&= {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^k,x_i^k,\ldots ,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots ,x_i^{k+1},x_{i+1}^k,\ldots ,\lambda ^k\right) \nonumber \\&\qquad -\, \frac{\gamma }{2}\left\| A_i x_i^k - A_i x_i^{k+1}\right\| ^2. \end{aligned}$$
    (4.22)

    Similarly, from (4.2) we can prove that

    $$\begin{aligned} {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{N-1}^{k+1},x_N^k,\ldots ;\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots ,x_N^{k+1};\lambda ^k\right) \ge \frac{\gamma }{2}\left\| x_N^k - x_N^{k+1}\right\| ^2.\nonumber \\ \end{aligned}$$
    (4.23)

    Summing (4.22) over \(i=1,\ldots ,N-1\) and (4.23), we have

    $$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^k,\ldots ,x_N^k,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^k\right) \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma }{2}\left\| x_N^k - x_N^{k+1} \right\| ^2. \end{aligned}$$
    (4.24)

    On the other hand, it follows from (4.2) and (4.17) that

    $$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad = \frac{1}{\gamma }\left\| \lambda ^k - \lambda ^{k+1} \right\| ^2 \ge - \frac{L^2}{\gamma }\left\| x_N^k - x_N^{k+1} \right\| ^2. \end{aligned}$$
    (4.25)

    Combining (4.24) and (4.25) yields

    $$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) - {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma ^2 - 2L^2}{2\gamma } \left\| x_N^k - x_N^{k+1} \right\| ^2 \nonumber \\&\quad \ge \frac{\gamma }{2}\sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \frac{\gamma ^2 - 2L^2}{2\gamma (1+L^2)}\left( \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) \nonumber \\&\quad \ge \frac{\gamma ^2 - 2L^2}{2\gamma (1+L^2)}\left( \sum \limits _{i=1}^{N-1}\left\| A_i x_i^k - A_i x_i^{k+1} \right\| ^2 + \left\| x_N^k - x_N^{k+1} \right\| ^2 + \left\| \lambda ^k - \lambda ^{k+1} \right\| ^2\right) .\nonumber \\ \end{aligned}$$
    (4.26)
  3. 3.

    It follows from (4.2) and the fact that \(\nabla f_N\) is Lipschitz continuous with constant L that,

    $$\begin{aligned}&f_N\left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1}\right) \\&\quad \le f_N\left( x_N^{k+1}\right) + \left\langle \nabla f_N\left( x_N^{k+1}\right) , \left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1}\right) \right\rangle \\&\qquad +\, \frac{L}{2}\left\| b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} \right\| ^2 \\&\quad = f_N\left( x_N^{k+1}\right) - \left\langle \lambda ^{k+1}, \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\rangle + \frac{L}{2} \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| ^2. \end{aligned}$$

    This implies that there exists \(L^*>-\infty \), such that

    $$\begin{aligned}&{\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_N^{k+1},\lambda ^{k+1}\right) \nonumber \\&\quad \ge \sum _{i=1}^{N-1} f_i(x_i^{k+1}) + f_N\left( b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1}\right) + \frac{\gamma -L}{2} \left\| \sum _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} -b\right\| ^2 \nonumber \\&\quad > L^*, \end{aligned}$$
    (4.27)

    where the last inequality holds since \(\gamma >L\) and \(\inf _{{\mathcal {X}}_i}f_i>f_i^*\) for \(i=1,2,\ldots ,N\).

    Therefore, it directly follows from (4.18) and \(\gamma >\sqrt{2}L\) that,

    $$\begin{aligned}&\frac{\gamma ^2-2L^2}{2\gamma (1+L^2)}\sum \limits _{k=0}^K \left( \sum \limits _{i=1}^{N-1} \Vert A_i x_i^{k+1} - A_i x_i^k \Vert ^2 + \Vert x_N^{k+1} - x_N^k \Vert ^2 + \Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right) \\&\quad \le {\mathcal {L}}_\gamma (w^0) - L^*. \end{aligned}$$

    Letting \(K\rightarrow \infty \) gives (4.19) and (4.20).

    It also follows from (4.27), (4.18) and \(\gamma >\sqrt{2}L\) that \({\mathcal {L}}_\gamma (w^0) - f_N^* \ge \sum _{i=1}^{N-1} f_i(x_i^{k+1})\). This implies that \(\left\{ \left( x_1^k, x_2^k, \ldots ,x_{N-1}^k\right) : k=0,1,\ldots \right\} \) is a bounded sequence by using the coerciveness of \(f_i+\mathbf 1 _{{\mathcal {X}}_i}, i=1,2,\ldots ,N-1\). The boundedness of \(\left( x_N^k, \lambda ^k\right) \) can be obtained by using (4.3), (4.17) and (4.20).

  4. 4.

    From the definition of \({\mathcal {L}}_\gamma \), it is clear that for \(i=1,\ldots ,N-1\),

    $$\begin{aligned} g_i\left( x_i^{k+1}\right) - A_i^\top \lambda ^{k+1} + \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \in \partial _{x_i} {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$

    and

    $$\begin{aligned} \nabla f\left( x_N^{k+1}\right) - \lambda ^{k+1} + \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) = \nabla _{x_N} {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$

    and

    $$\begin{aligned} b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} = \nabla _{\lambda } {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$

    where \(g_i\in \partial \left( f_i + \mathbf {1}_{{\mathcal {X}}_i}\right) \) for \(i=1,2,\ldots ,N-1\).

    Combining these relations with (4.4) and (4.5) yields that

    $$\begin{aligned}&R_i^{k+1} := \gamma A_i^\top \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) \\&\quad - \gamma A_{i}^{\top }\left( \sum \limits _{j=i+1}^{N-1} A_{j}(x_{j}^{k}-x_{j}^{k+1})+(x_{N}^{k}-x_{N}^{k+1})\right) \in \partial _{x_i} {\mathcal {L}}_\gamma (w^{k+1}), \\&R_N^{k+1} := \gamma \left( \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right) = \nabla _{x_N} {\mathcal {L}}_\gamma (w^{k+1}), \\&R_{\lambda }^{k+1} := b - \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} - x_N^{k+1} = \nabla _{\lambda } {\mathcal {L}}_\gamma (w^{k+1}), \end{aligned}$$

    for \(i=1,2,\ldots ,N-1\). Therefore, \(\left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right) \in \partial {\mathcal {L}}_\gamma (w^{k+1})\). We now need to bound the norms of \(R_i^{k+1}\), \(i=1,\ldots ,N-1\), \(R_N^k\) and \(R_\lambda ^k\). It holds that

    $$\begin{aligned} \left\| R_i^{k+1} \right\|\le & {} \gamma \left\| A_i^\top \right\| \left( \sum \limits _{j=i+1}^{N-1} \left\| A_j x_j^{k} - A_j x_j^{k+1}\right\| + \left\| x_N^{k} - x_N^{k+1}\right\| \right) \\&+\, \gamma \left\| A_i^\top \right\| \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| \\\le & {} \gamma \left\| A_i^\top \right\| \left( \sum \limits _{j=1}^{N-1} \left\| A_j x_j^{k} - A_j x_j^{k+1}\right\| + \left\| x_N^{k} - x_N^{k+1}\right\| \right) + \left\| A_i^\top \right\| \left\| \lambda ^k - \lambda ^{k+1}\right\| \end{aligned}$$

    and

    $$\begin{aligned} \left\| R_N^{k+1} \right\| = \gamma \left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k+1} + x_N^{k+1} - b\right\| = \left\| \lambda ^k - \lambda ^{k+1}\right\| , \quad \Vert R_\lambda ^{k+1} \Vert = \frac{1}{\gamma } \left\| \lambda ^k - \lambda ^{k+1} \right\| . \end{aligned}$$

    These relations immediately imply (4.21). \(\Box \)

Proof of Theorem 4.3

  1. 1.

    It has been proven in Lemma 4.8 that \(\left\{ \left( x_1^k, x_2^k, \ldots ,x_N^k, \lambda ^k\right) :\right. \left. k=0,1,\ldots \right\} \) is a bounded sequence. Therefore, we conclude that \(\Omega (w^0)\) is non-empty by the Bolzano-Weierstrass Theorem. Let \(w^* = \left( x_1^*,\ldots , x_N^*,\lambda ^*\right) \in \Omega (w^0)\) be a limit point of \(\{w^k = \left( x_1^k,\ldots , x_N^k,\lambda ^k\right) :k=0,1,\ldots \}\). Then there exists a subsequence \(\left\{ w^{k_q} = \left( x_1^{k_q},\ldots , x_N^{k_q},\lambda ^{k_q}\right) :q=0,1,\ldots \right\} \) such that \(w^{k_q}\rightarrow w^*\) as \(q\rightarrow \infty \). Since \(f_i, i=1,\ldots ,N-1\), are lower semi-continuous, we obtain that

    $$\begin{aligned} \liminf \limits _{q\rightarrow \infty } f_i(x_i^{k_q}) \ge f_i(x_i^*), \quad i=1,2,\ldots , N. \end{aligned}$$
    (4.28)

    From (1.2), we have for any integer k and any \(i=1,\ldots ,N-1\),

    $$\begin{aligned} x_i^{k+1} := \mathop {\mathrm{argmin}}\limits _{x_i\in {\mathcal {X}}_i} \ {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^{k+1}, x_i, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) . \end{aligned}$$

    Letting \(x_i = x_i^*\) in the above, we get

    $$\begin{aligned} {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_i^{k+1}, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) \le {\mathcal {L}}_\gamma \left( x_1^{k+1},\ldots , x_{i-1}^{k+1}, x_i^{*}, x_{i+1}^k,\ldots , x_N^k;\lambda ^k\right) , \end{aligned}$$

    i.e.,

    $$\begin{aligned}&f_i\left( x_i^{k+1}\right) - \left\langle \lambda ^k, A_i x_i^{k+1}\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{j=1}^i A_j x_j^{k+1} + \sum \limits _{j=i+1}^{N-1} A_j x_j^{k} + x_N^k - b \right\| ^2 \\&\quad \le f_i\left( x_i^*\right) - \left\langle \lambda ^k, A_i x_i^*\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{j=1}^{i-1} A_j x_j^{k+1} + A_i x_i^* + \sum \limits _{j=i+1}^{N-1} A_j x_j^{k} + x_N^k - b \right\| ^2. \end{aligned}$$

    Choosing \(k=k_q-1\) in the above inequality and letting q go to \(+\infty \), we obtain

    $$\begin{aligned} \limsup \limits _{q\rightarrow +\infty }f_i\left( x_i^{k_q}\right) \le \limsup \limits _{q\rightarrow +\infty } \left( \frac{\gamma }{2}\left\| A_i x_i^{k_q} - A_i x_i^* \right\| ^2 + \left\langle \lambda ^k, A_i x_i^{k_q}- A_i x_i^*\right\rangle \right) + f_i\left( x_i^*\right) ,\nonumber \\ \end{aligned}$$
    (4.29)

    for \(i=1,2,\ldots ,N-1\). Here we have used the facts that the sequence \(\{w^k:k=0,1,\ldots \}\) is bounded, and \(\gamma \) is finite, and that the distance between two successive iterates tends to zero (4.20), and the fact that

    $$\begin{aligned}&\sum \limits _{j=1}^i A_j x_j^{k+1} + \sum \limits _{j=i+1}^{N-1} A_j x_j^k + x_N^k - b = \sum \limits _{j=i+1}^{N-1} \left( A_j x_j^k - A_j x_j^{k+1}\right) + \left( x_N^k - x_N^{k+1}\right) \\&\quad +\,\frac{1}{\gamma } \left( \lambda ^k - \lambda ^{k+1}\right) . \end{aligned}$$

    From (4.20) we also have \(x_i^{k_q-1}\rightarrow x_i^*\) as \(q\rightarrow \infty \), hence (4.29) reduces to \(\limsup \limits _{q\rightarrow \infty }f_i(x_i^{k_q})\le f_i(x_i^*)\). Therefore, combining with (4.28), \(f_i(x_i^{k_q})\) tends to \(f_i(x_i^*)\) as \(q\rightarrow \infty \). Hence, we can conclude that

    $$\begin{aligned} \lim \limits _{q\rightarrow \infty }{\mathcal {L}}_\gamma (w^{k_q})= & {} \lim \limits _{q\rightarrow \infty }\left( \sum \limits _{i=1}^N f_i\left( x_i^{k_q}\right) - \left\langle \lambda ^{k_q}, \sum \limits _{i=1}^{N-1} A_i x_i^{k_q} + x_N^{k_q} -b\right\rangle \right. \\&\left. +\, \frac{\gamma }{2}\left\| \sum \limits _{i=1}^{N-1} A_i x_i^{k_q} + x_N^{k_q} -b \right\| ^2\right) \\= & {} \sum \limits _{i=1}^N f_i\left( x_i^{*}\right) - \left\langle \lambda ^{*}, \sum \limits _{i=1}^{N-1} A_i x_i^{*}+x_N^{*}-b\right\rangle + \frac{\gamma }{2}\left\| \sum \limits _{i=1}^{N-1} A_i x_i^{*}+x_N^{*}-b \right\| ^2 \\= & {} {\mathcal {L}}_\gamma (w^{*}). \end{aligned}$$

    On the other hand, it follows from (4.20) and (4.21) that

    $$\begin{aligned} \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right)\in & {} \partial {\mathcal {L}}_\gamma (w^{k+1}) \end{aligned}$$
    (4.30)
    $$\begin{aligned} \left( R_1^{k+1}, \ldots , R_N^{k+1}, R_\lambda ^{k+1}\right)\rightarrow & {} (0,\ldots ,0), \quad k\rightarrow \infty . \end{aligned}$$
    (4.31)

    It implies that \((0,\ldots ,0)\in \partial {\mathcal {L}}_\gamma (x_1^*,\ldots ,x_N^*,\lambda ^*)\) due to the closeness of \(\partial {\mathcal {L}}_\gamma \). Therefore, \(w^* = \left( x_1^*,\ldots ,x_N^*,\lambda ^*\right) \) is a critical point of \({\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )\).

  2. 2.

    The proof for this assertion directly follows from Lemma 5 and Remark 5 of [4]. We omit the proof here for succinctness.

  3. 3.

    We define that \(\tilde{L}\) is the finite limit of \({\mathcal {L}}_\gamma (x_1^k,\ldots ,x_N^k,\lambda ^k)\) as k goes to infinity, i.e.,

    $$\begin{aligned} \tilde{L} = \lim \limits _{k\rightarrow \infty } {\mathcal {L}}_\gamma \left( x_1^k,\ldots ,x_N^k,\lambda ^k\right) . \end{aligned}$$

    Take \(w^*\in \Omega (w^0)\). There exists a subsequence \(w^{k_q}\) converging to \(w^*\) as q goes to infinity. Since we have proven that

    $$\begin{aligned} \lim \limits _{q\rightarrow \infty }{\mathcal {L}}_\gamma (w^{k_q}) = {\mathcal {L}}_\gamma (w^{*}), \end{aligned}$$

    and \({\mathcal {L}}_\gamma (w^{k})\) is a non-increasing sequence, we conclude that \({\mathcal {L}}_\gamma (w^{*}) = \tilde{L}\), hence the restriction of \({\mathcal {L}}_\gamma (x_1,\ldots ,x_N,\lambda )\) to \(\Omega (w^0)\) equals \(\tilde{L}\).

\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lin, T., Ma, S. & Zhang, S. Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity. J Sci Comput 69, 52–81 (2016). https://doi.org/10.1007/s10915-016-0182-0

Download citation

Keywords

  • Alternating direction method of multipliers (ADMM)
  • Convergence rate
  • Regularization
  • Kurdyka–Łojasiewicz property
  • Convex optimization

Mathematics Subject Classification

  • 90C25
  • 90C30