Skip to main content
Log in

Alternating Direction Method for Separable Variables Under Pair-Wise Constraints

  • Published:
Communications in Mathematics and Statistics Aims and scope Submit manuscript

Abstract

While the convergence of alternating direction method (ADM) for two separable variables has been established for years, the validity of its direct generalization to more than two blocks has been studying now. In this paper, we propose an additional requirement on the constraints, i.e., the pair-wise linear constraints and establish the convergence of ADM for more than two blocks. Then we apply our approach to two kinds of optimization problems. We also show several numerical experiments to verify the rationality of proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  2. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with o (1/k) convergence. arXiv preprint arXiv:1312.3040 (2013)

  4. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. Rice University CAAM Technical Report, (TR12-14) (2012)

  5. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    Article  MathSciNet  Google Scholar 

  6. He, B., Tao, M., Yuan, X.: A splitting method for separate convex programming with linking linear constraints. Optimization Online (2010)

  7. He, B., Tao, M., Yuan, X.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22(2), 313–340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. He, B., Tao, M., Yuan, X.: A splitting method for separable convex programming. IMA J. Numer. Anal. 35(1), 394–426 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. He, B., Yuan, X.: Block-wise alternating direction method of multipliers for multiple-block convex programming and beyond. SMAI J. Comput. Math. 1, 145–175 (2015)

  10. Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922 (2012)

  11. Malioutov, D., Çetin, M., Willsky, A.S.: A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 53(8), 3010–3022 (2005)

    Article  MathSciNet  Google Scholar 

  12. Peng, Y., Ganesh, A., Wright, J., Wenli, X., Ma, Y.: Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)

    Article  Google Scholar 

  13. Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  14. Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. arXiv preprint arXiv:1407.7400 (2014)

  15. Yao, H., Gerstoft, P., Shearer, P.M., Mecklenbräuker, C.: Compressive sensing of the Tohoku-Oki Mw 9.0 earthquake: Frequency-dependent rupture modes. Geophys. Res. Lett. 38(20), L20310 (2011)

  16. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their comments and suggestions. The work is supported by the NSF of China (No. 11626253), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhouwang Yang.

Appendix

Appendix

1.1 Proof of Theorem 2.1

Proof of Theorem 2.1

The three pair-wise constraints can be rewritten as compact form: \(Ax+By+Cz-b=0\), where \( A=\left( \begin{array}{c} A_1\\ A_2 \\ 0 \end{array}\right) \), \( B=\left( \begin{array}{c} B_1\\ 0 \\ B_3 \end{array}\right) \), and \( C=\left( \begin{array}{c} 0 \\ C_2\\ C_3 \end{array}\right) \). If we let \( \lambda =\left( \begin{array}{c} \lambda _1 \\ \lambda _2\\ \lambda _3 \end{array}\right) \), then the augmented Lagrangian function of (2.3) is:

$$\begin{aligned} L(x,y,z,\lambda ,\rho )= & {} F(x)+G(y)+H(z)+\lambda ^T(Ax+By+Cz-b)\nonumber \\&+\,\frac{\rho }{2} \Vert Ax+By+Cz-b\Vert _2^2. \end{aligned}$$
(4.1)

Since the augmented Lagrangian function always has a saddle point in the finite dimension case, there exists \((\hat{x},\hat{y},\hat{z},\hat{\lambda })\) such that: \(L(\hat{x},\hat{y},\hat{z},\lambda ,\rho ) \le L(\hat{x},\hat{y},\hat{z},\hat{\lambda },\rho ) \le L(x,y,z,\hat{\lambda },\rho )\) for any \((x,y,z,\lambda )\).

The left inequality shows that: \(A\hat{x}+B\hat{y}+C\hat{z}-b=0\), which combines with the right inequality shows that:

$$\begin{aligned} F(\hat{x})+G(\hat{y})+H(\hat{z})\le & {} F(x)+G(y)+H(z)+\langle \hat{\lambda },Ax+By+Cz-b\rangle \nonumber \\&+\,\frac{\rho }{2} \Vert Ax+By+Cz-b\Vert _2^2. \end{aligned}$$
(4.2)

Let \(x=\hat{x}+t(w-\hat{x}), 0<t<1\), we have \(F(x)-F(\hat{x}) \ge t[F(w)-F(\hat{x})] \) because of the convexity of F. Since the arbitrariness of (xyz), let \((x,y,z)=(x,\hat{y},\hat{z})\) then we get:

$$\begin{aligned} t[F(w)-F(\hat{x})]+t\langle \hat{\lambda },A(w-\hat{x})\rangle +\frac{t^2\rho }{2}\Vert A(w-\hat{x})\Vert _2^2 \ge 0. \end{aligned}$$
(4.3)

Divide by t on both side of the inequality and let \(t \rightarrow 0\) we will get:

$$\begin{aligned} F(w)-F(\hat{x})+\langle \hat{\lambda },A(w-\hat{x})\rangle \ge 0. \end{aligned}$$
(4.4)

Let \(w=x^{(n)}\),

$$\begin{aligned} F(x^{(n)})-F(\hat{x})+\langle \hat{\lambda },A(x^{(n)}-\hat{x})\rangle , \ge 0, \end{aligned}$$
(4.5)

Similarly, we can get:

$$\begin{aligned} G(y^{(n)})-G(\hat{y})+\langle \hat{\lambda },B(y^{(n)}-\hat{y})\rangle\ge & {} 0. \end{aligned}$$
(4.6)
$$\begin{aligned} H(z^{(n)})-H(\hat{z})+\langle \hat{\lambda },C(z^{(n)}-\hat{z})\rangle\ge & {} 0, \end{aligned}$$
(4.7)

Add the three inequalities above and we will get:

$$\begin{aligned}&F(x^{(n)})+G(y^{(n)})+H(z^{(n)})-F(\hat{x})-G(\hat{y})-H(\hat{z})\nonumber \\&\quad +\,\langle \hat{\lambda },A(x^{(n)}-\hat{x})+B(y^{(n)}-\hat{y})+C(z^{(n)}-\hat{z})\rangle \ge 0. \end{aligned}$$
(4.8)

Noted the update of x in Algorithm 1, we can get:

$$\begin{aligned}&F(x)+\frac{\rho }{2}\Vert Ax+By^{(n-1)}+Cz^{(n-1)-b}\Vert _2^2\nonumber \\&\quad +\,\langle \lambda ^{(n)},Ax+By^{(n-1)}+Cz^{(n-1)}-b\rangle \nonumber \\&\ge F(x^{(n)})+\frac{\rho }{2}\Vert Ax^{(n)}+By^{(n-1)}+Cz^{(n-1)-b}\Vert _2^2\nonumber \\&\quad +\,\langle \lambda ^{(n)},Ax^{(n)}+By^{(n-1)}+Cz^{(n-1)}-b\rangle \end{aligned}$$
(4.9)

for any x. Simple derivations of the inequality show that:

$$\begin{aligned}&F(x)-F(x^{(n)})+\langle A(x-x^{(n)}),\lambda ^{(n)}+\rho (Ax^{(n)}+By^{(n-1)}+Cz^{(n-1)}-b)\rangle \nonumber \\&\quad +\,\frac{\rho }{2}\Vert A(x-x^{(n)})\Vert _2^2 \ge 0. \end{aligned}$$
(4.10)

Here we use the similar skill, i.e., let \(x=x^{(n)}+t(w-x^{(n)}), 0<t<1\) and let \( t \rightarrow 0\):

$$\begin{aligned}&F(w)-F(x^{(n)})+\langle A(w-x^{(n)}),\lambda ^{(n)}\nonumber \\&\quad +\,\rho (Ax^{(n)}+By^{(n-1)}+Cz^{(n-1)}-b)\rangle \ge 0. \end{aligned}$$
(4.11)

Letting \(w=\hat{x}\), then we can obtain that:

$$\begin{aligned} F(\hat{x})\,{-}\,F(x^{(n)})\,{+}\,\langle A(\hat{x}-x^{(n)}),\lambda ^{(n)}\,{+}\,\rho (Ax^{(n)}\,{+}\,By^{(n-1)}+Cz^{(n-1)}-b)\rangle \ge 0. \end{aligned}$$
(4.12)

Similarly, following the order of subproblems in Algorithm 1 we can get:

$$\begin{aligned}&G(\hat{y})-G(y^{(n)})+\langle B(\hat{y}-y^{(n)}),\lambda ^{(n)}+\rho (Ax^{(n)}+By^{(n)}+Cz^{(n-1)}-b)\rangle \ge 0, \quad \nonumber \\ \end{aligned}$$
(4.13)
$$\begin{aligned}&H(\hat{z})-H(z^{(n)})+\langle C(\hat{z}-z^{(n)}),\lambda ^{(n)}+\rho (Ax^{(n)}+By^{(n)}+Cz^{(n)}-b)\rangle \ge 0.\quad \nonumber \\ \end{aligned}$$
(4.14)

Add the three inequalities and use the notations \(\bar{x}^{(n)}=x^{(n)}-\hat{x}, \bar{y}^{(n)}=y^{(n)}-\hat{y}, \bar{z}^{(n)}=z^{(n)}-\hat{z}\) we can get:

$$\begin{aligned}&F(\hat{x})+G(\hat{y})+H(\hat{z})-F(x^{(n)})-G(y^{(n)})-H(z^{(n)})\nonumber \\&\quad -\,\langle \lambda ^{(n)},A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\rangle -\rho \Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2\nonumber \\&\quad +\,\rho \langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)}+C\bar{z}^{(n)}-C\bar{z}^{(n-1)},A\bar{x}^{(n)} \rangle \nonumber \\&\quad +\,\rho \langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},B\bar{y}^{(n)} \rangle \ge 0. \end{aligned}$$
(4.15)

Add (4.15) to (4.8) and denote \(\bar{\lambda }^{(n)}=\lambda ^{(n)}-\lambda \):

$$\begin{aligned}&-\langle \bar{\lambda }^{(n)},A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)} \rangle -\rho \Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2\nonumber \\&+\,\rho \langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)}+C\bar{z}^{(n)}-C\bar{z}^{(n-1)},A\bar{x}^{(n)} \rangle +\rho \langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},B\bar{y}^{(n)} \rangle \ge 0.\nonumber \\ \end{aligned}$$
(4.16)

According to the update of \(\lambda \) in Algorithm 1, we know that:

$$\begin{aligned} \bar{\lambda }^{(n+1)}-\bar{\lambda }^{(n)}= & {} \lambda ^{(n+1)}-\lambda ^{(n)} \nonumber \\= & {} \rho (Ax^{(n)}+By^{(n)}+C^{(n)}-b)\nonumber \\= & {} \rho (A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}). \end{aligned}$$
(4.17)

Therefore,

$$\begin{aligned} |\bar{\lambda }^{(n)}|^2-|\bar{\lambda }^{(n+1)}|^2= & {} \langle \bar{\lambda }^{(n)}-\bar{\lambda }^{(n+1)},\bar{\lambda }^{(n)}+\bar{\lambda }^{(n+1)} \rangle \nonumber \\= & {} -\rho \langle A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}, 2\bar{\lambda }^{(n)}+\rho (A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}) \rangle \nonumber \\= & {} -\rho ^2\Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2\nonumber \\&-\,2\rho \langle \bar{\lambda }^{(n)},A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)} \rangle \nonumber \\\ge & {} \rho ^2\Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2 \nonumber \\&-\,2\rho ^2 \langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)}+C\bar{z}^{(n)}-C\bar{z}^{(n-1)},A\bar{x}^{(n)} \rangle \nonumber \\&-\,2\rho ^2 \langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},B\bar{y}^{(n)} \rangle , \end{aligned}$$
(4.18)

where the last inequality follows from (4.16).

From the definition of (4.14) we can replace \(\hat{z}\) with \(z^{(n-1)}\):

$$\begin{aligned} H(z^{(n-1)})-H(z^{(n)})+\langle C(z^{(n-1)}-z^{(n)}),\lambda ^{(n)}+\rho (Ax^{(n)}+By^{(n)}+Cz^{(n)}-b) \rangle \ge 0. \end{aligned}$$
(4.19)

On the other hand, take the \((n-1)\)th iteration of the inequality (4.14) and let \(\hat{z}=z^{(n)}\):

$$\begin{aligned}&H(z^{(n)})-H(z^{(n-1)})+\langle C(z^{(n)}-z^{(n-1)}),\lambda ^{(n-1)}+\rho (Ax^{(n-1)}\nonumber \\&\quad +By^{(n-1)}+Cz^{(n-1)}-b) \rangle \ge 0. \end{aligned}$$
(4.20)

Add the above two inequalities and make some derivations of the inequality then we can see:

$$\begin{aligned}&-\rho \langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},A\bar{x}^{(n)}+B\bar{y}^{(n)} \rangle \nonumber \\&\ge \rho \Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2+\rho \langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},C\bar{z}^{(n-1)} \rangle . \end{aligned}$$
(4.21)

Take (4.21) into (4.18):

$$\begin{aligned} |\bar{\lambda }^{(n)}|^2-|\bar{\lambda }^{(n+1)}|^2\ge & {} \rho ^2\Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2 -2\rho ^2 \langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)},A\bar{x}^{(n)} \rangle \nonumber \\&+\,2\rho ^2\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2+2\rho ^2 \langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},C\bar{z}^{(n-1)} \rangle .\nonumber \\ \end{aligned}$$
(4.22)

Since

$$\begin{aligned}&2\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2+2\langle C\bar{z}^{(n)}-C\bar{z}^{(n-1)},C\bar{z}^{(n-1)}\rangle =\nonumber \\&\Vert C\bar{z}^{(n)}\Vert _2^2-\Vert C\bar{z}^{(n-1)}\Vert _2^2+\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2 \end{aligned}$$
(4.23)

yields:

$$\begin{aligned} |\bar{\lambda }^{(n)}|^2-|\bar{\lambda }^{(n+1)}|^2\ge & {} \rho ^2\Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2\nonumber \\&+\,\rho ^2(\Vert C\bar{z}^{(n)}\Vert _2^2-\Vert C\bar{z}^{(n-1)}\Vert _2^2)\nonumber \\&+\,\rho ^2\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2\nonumber \\&-\,2\rho ^2\langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)},A\bar{x}^{(n)} \rangle . \end{aligned}$$
(4.24)

Then

$$\begin{aligned}&(|\bar{\lambda }^{(n)}|^2+\rho ^2\Vert C\bar{z}^{(n-1)}\Vert _2^2)-(|\bar{\lambda }^{(n+1)}|^2+\rho ^2\Vert C\bar{z}^{(n)}\Vert _2^2) \nonumber \\&\ge \rho ^2\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2 +\rho ^2\Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2 \nonumber \\&\quad -\,2\rho ^2\langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)},A\bar{x}^{(n)}\rangle . \end{aligned}$$
(4.25)

It is easy to see that:

$$\begin{aligned}&\Vert A\bar{x}^{(n)}+B\bar{y}^{(n)}+C\bar{z}^{(n)}\Vert _2^2-2(B\bar{y}^{(n)}-B\bar{y}^{(n-1)},A\bar{x}^{(n)}) \nonumber \\ \ge&\Vert A_1\bar{x}^{(n)}+B_1\bar{y}^{(n)}\Vert _2^2-2(B_1\bar{y}^{(n)}-B_1\bar{y}^{(n-1)},A_1\bar{x}^{(n)}) \nonumber \\ \ge&\Vert A_1\bar{x}^{(n)}+B_1\bar{y}^{(n-1)}\Vert _2^2+\Vert B_1\bar{y}^{(n)}\Vert _2^2-\Vert B_1\bar{y}^{(n-1)}\Vert _2^2. \end{aligned}$$
(4.26)

Combine (4.26) with (4.25) we can get:

$$\begin{aligned}&\left( |\bar{\lambda }^{(n)}|^2+\rho ^2\Vert C\bar{z}^{(n-1)}\Vert _2^2+\rho ^2\Vert B_1\bar{y}^{(n-1)}\Vert _2^2\right) \nonumber \\&\quad -\,\left( |\bar{\lambda }^{(n+1)}|^2+\rho ^2\Vert C\bar{z}^{(n)}\Vert _2^2+\rho ^2\Vert B_1\bar{y}^{(n)}\Vert _2^2\right) \nonumber \\ \ge&\rho ^2\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2+\rho ^2\Vert A_1\bar{x}^{(n)}+B_1\bar{y}^{(n-1)}\Vert _2^2. \end{aligned}$$
(4.27)

Then the right-hand side is nonnegative, that is to say, the nonnegative sequences \( \left\{ (|\bar{\lambda }^{(n)}|^2+\rho ^2\Vert C\bar{z}^{(n-1)}\Vert _2^2+\rho ^2\Vert B_1\bar{y}^{(n-1)}\Vert _2^2)\right\} \) are decreasing. Therefore, it converges while the nonnegative sequences \(\{\Vert C\bar{z}^{(n)}-C\bar{z}^{(n-1)}\Vert _2^2\} \) and \(\{\Vert A_1\bar{x}^{(n)}+B_1\bar{y}^{(n-1)}\Vert _2^2\}\) converge to zero when n goes to infinity. Therefore, we have:

$$\begin{aligned} \lim _{n\rightarrow \infty } C\bar{z}^{(n)}-C\bar{z}^{(n-1)}&=\lim _{n\rightarrow \infty } A_1\bar{x}^{(n)}+B_1\bar{y}^{(n-1)}. \end{aligned}$$
(4.28)

The last two limits yield that:

$$\begin{aligned} \lim _{n\rightarrow \infty } B_1\bar{y}^{(n)}-B_1\bar{y}^{(n-1)}=\lim _{n\rightarrow \infty } A_1\bar{x}^{(n)}+B_1\bar{y}^{(n-1)}=0 \end{aligned}$$
(4.29)

which, combines with the pair-wise structure of the constraints, shows that:

$$\begin{aligned} \lim _{n\rightarrow \infty } \langle B\bar{y}^{(n)}-B\bar{y}^{(n-1)},A\bar{x}^{(n)}\rangle = \lim _{n\rightarrow \infty } \langle B_1\bar{y}^{(n)}-B_1\bar{y}^{(n-1)},A_1\bar{x}^{(n)}\rangle = 0. \end{aligned}$$
(4.30)

Bring (4.28) and (4.30) into (4.8) and (4.15) and take the limit we can get:

$$\begin{aligned}&\limsup _{n\rightarrow \infty }F(x^{(n)})+G(y^{(n)})+H(z^{(n)}) \le F(\hat{x})+G(\hat{y})+H(\hat{z})\nonumber \\&\le \liminf _{n\rightarrow \infty }F(x^{(n)})+G(y^{(n)})+H(z^{(n)}) \end{aligned}$$
(4.31)

that is,

$$\begin{aligned} \lim _{n\rightarrow \infty }F(x^{(n)})+G(y^{(n)})+H(z^{(n)}) = F(\hat{x})+G(\hat{y})+H(\hat{z}). \end{aligned}$$
(4.32)

Then \(\{(x^{(n)},y^{(n)},z^{(n)})\}\) is a minimizing sequence of objective function, and it is convergent. Next, we need to prove the sequence \(\{(x^{(n)},y^{(n)},z^{(n)},\lambda ^{(n)})\}\) converge to the KKT point.

From the above equation, the sequence \(\{(x^{(n)},y^{(n)},z^{(n)},\lambda ^{(n)})\}\) has a convergent subsequence \(\{(x^{(n_k)},y^{(n_k)},z^{(n_k)},\lambda ^{(n_k)})\}\). Letting

$$\begin{aligned} \lim _{k\rightarrow \infty }\{(x^{(n_k)},y^{(n_k)},z^{(n_k)},\lambda ^{(n_k)})\} = (\tilde{x},\tilde{y},\tilde{z},\tilde{\lambda }). \end{aligned}$$
(4.33)

Then \((\tilde{x},\tilde{y},\tilde{z},\tilde{\lambda })\) is an optimal solution of the objective function and \(\tilde{x},\tilde{y}\) and \(\tilde{z}\) satisfy the constraint of problem (2.3):

$$\begin{aligned} A\tilde{x}+B\tilde{y}+C\tilde{z}=b. \end{aligned}$$
(4.34)

Considering the following x-subproblem:

$$\begin{aligned} x^{(n_k+1)}:=\arg \min \left\{ F(x)+\frac{\rho }{2}\Vert Ax+By^{(n_k)}+Cz^{(n_k)}-b-\frac{\lambda ^{(n_k)}}{\rho }\Vert _2^2\right\} .\qquad \end{aligned}$$
(4.35)

Its optimality condition is given by

$$\begin{aligned}&A^T(\lambda ^{(n_k)}-\rho (Ax^{(n_k+1)}+By^{(n_k+1)}+Cz^{(n_k+1)}-b)) \nonumber \\&\quad +\,\rho A^T(B(y^{(n_k+1)}-y^{(n_k)})+C(z^{(n_k+1)}-z^{(n_k)}))\in \partial F(x^{(n_k+1)}). \end{aligned}$$
(4.36)

Because of \(\lambda ^{(n_k+1)}=\lambda ^{(n_k)}-\rho (Ax^{(n_k+1)}+By^{(n_k+1)}+Cz^{(n_k+1)}-b)\). Then (4.36) can be rewritten as

$$\begin{aligned} A^T\lambda ^{(n_k+1)}+\rho (A_1^TB_1(y^{(n_k+1)}-y^{(n_k)})+A_2^TC_2(z^{(n_k+1)}-z^{(n_k)}))\in \partial F(x^{(n_k+1)}). \end{aligned}$$
(4.37)

By(4.28) and (4.29), and taking the limit for both sides that means \(k\rightarrow \infty \), then we can obtain

$$\begin{aligned} A^T\tilde{\lambda }\in \partial F(\tilde{x}). \end{aligned}$$
(4.38)

The same as the y-subproblem and z-subproblem and we can get:

$$\begin{aligned}&\displaystyle B^T\lambda ^{(n_k+1)}+\rho B_3^TC_3(z^{(n_k+1)}-z^{(n_k)})\in \partial G(y^{(n_k+1)}), \end{aligned}$$
(4.39)
$$\begin{aligned}&\displaystyle C^T\lambda ^{(n_k+1)}\in \partial H(z^{(n_k+1)}). \end{aligned}$$
(4.40)

Using the property of limit, we can know that \(\tilde{x},\tilde{y},\tilde{z}\) and \(\tilde{\lambda }\) satisfy the KKT conditions of original problem (2.3):

$$\begin{aligned} \left\{ \begin{array}{l} A^T\tilde{\lambda }\in \partial G(\tilde{x}),\\ B^T\tilde{\lambda }\in \partial G(\tilde{y}),\\ C^T\tilde{\lambda }\in \partial G(\tilde{z}),\\ A\tilde{x}+B\tilde{y}+C\tilde{z}=b. \end{array}\right. \end{aligned}$$
(4.41)

Therefore, the sequence generated by Algorithm 1 converges to the KKT point \((\tilde{x},\tilde{y},\tilde{z},\tilde{\lambda })\). The proof is done. \(\square \)

1.2 Lemma 1

Lemma 1

The pair-wise linear constraints in (2.1) or (2.2) have the following property:

$$\begin{aligned}&\left\| \sum _{i=1}^mA_i\bar{x}_i^{(n)}\right\| _2^2-2\sum _{i=1}^{m-2} \langle A_ix_i^{(n)},\sum _{j=i+1}^{m-1} (A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)}) \rangle \nonumber \\&\ge \sum _{i=1}^{m-2}\sum _{j=i+1}^{m-1} \Vert A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n-1)}\Vert _2^2+\Vert A_j^{(i,j)}\bar{x}_j^{(n)}\Vert _2^2-\Vert A_j^{(i,j)}\bar{x}_j^{(n-1)}\Vert _2^2.\nonumber \\ \end{aligned}$$
(4.42)

Proof

According to the pair-wise structure of the constraints, the L2-norm can be separated to match the inner product terms:

$$\begin{aligned}&\Vert \sum _{i=1}^mA_i\bar{x}_i^{(n)}\Vert _2^2 = \sum _{i=1}^{m-1}\sum _{j=i+1}^{m} \Vert A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n)}\Vert _2^2 \nonumber \\&\ge \sum _{i=1}^{m-2}\sum _{j=i+1}^{m-1} \Vert A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n)}\Vert _2^2 \nonumber \\&\langle A_ix_i^{(n)},(A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)}) \rangle =\langle A_i^{(i,j)}x_i^{(n)},(A_j\bar{x}_j^{(n)}-A_j^{(i,j)}\bar{x}_j^{(n-1)}) \rangle . \end{aligned}$$
(4.43)

For each (ij), we have:

$$\begin{aligned}&\Vert A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n)}\Vert _2^2 -2\langle A_i^{(i,j)}x_i^{(n)},(A_j\bar{x}_j^{(n)}-A_j^{(i,j)}\bar{x}_j^{(n-1)}) \rangle \nonumber \\&=\Vert A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n-1)}\Vert _2^2+\Vert A_i^{(i,j)}x_i^{(n)}\Vert _2^2-\Vert A_i^{(i,j)}x_i^{(n-1)}\Vert _2^2. \end{aligned}$$
(4.44)

Combining the last two equations will lead to the inequality we want. \(\square \)

1.3 Proof of Theorem 2.2

Proof of Theorem 2.2

Similar to three-variable case, the augmented Lagrangian function of problem (2.9) always has a saddle point \((\bar{x}_1,\bar{x}_2,\ldots ,\bar{x}_m,\bar{\lambda })\) such that for any \((x_1,x_2,\ldots ,x_m,\lambda )\)

$$\begin{aligned} L(\bar{x}_1,\bar{x}_2,\ldots ,\bar{x}_m,\lambda ) \le L(\bar{x}_1,\bar{x}_2,\ldots ,\bar{x}_m,\bar{\lambda }) \le L(x_1,x_2,\ldots ,x_m,\bar{\lambda }). \end{aligned}$$
(4.45)

The same derivation in the proof of Theorem 2.1 shows that:

$$\begin{aligned}&\sum _{i=1}^m A_i\bar{x}_i-b=0, \end{aligned}$$
(4.46)
$$\begin{aligned}&\sum _{i=1}^m F_i(x_i^{(n)})- \sum _{i=1}^m F_i(\bar{x}_i)+\langle \bar{\lambda },\sum _{i=1}^m A_i\bar{x}_i^{(n)}\rangle \ge 0, \end{aligned}$$
(4.47)

where \(\bar{x}_i^{(n)}=x_i^{(n)}-\bar{x}_i, i=1,\ldots ,m\).

On the other hand, according to the order of subproblems, we have: (for any \(x_i\))

$$\begin{aligned}&F_i(x_i)-F_i(x_i^{(n)}) - \langle A_i\bar{x}_i^{(n)}, \lambda ^{(n)}+\rho (A_1x_1^{(n)}+ \nonumber \\&\cdots +\, A_ix_i^{(n)}+A_{i+1}x_{i+1}^{(n-1)}+\cdots +A_mx_m^{(n-1)}-b)\rangle \ge 0 \ \ i=1,\ldots ,m.\qquad \qquad \end{aligned}$$
(4.48)

Add all m inequalities, let \(x_i=\bar{x}_i\) and make some simplifications we will have:

$$\begin{aligned}&\sum _{i=1}^m F_i(\bar{x}_i)- \sum _{i=1}^m F_i(x_i^{(n)})-\langle \lambda ^{(n)},\sum _{i=1}^m A_i\bar{x}_i^{(n)}\rangle -\rho \left\| \sum _{i=1}^m A_i\bar{x}_i^{(n)}\right\| _2^2 \nonumber \\&\quad +\,\rho \sum _{i=1}^{m-1} \langle A_ix_i^{(n)},\sum _{j=i}^m (A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)}) \rangle \ge 0. \end{aligned}$$
(4.49)

Add (4.49) to (4.47) and take the notation \(\bar{\lambda }^{(n)}=\lambda ^{(n)}-\bar{\lambda }\), we can see:

$$\begin{aligned}&-\langle \bar{\lambda }^{(n)},\sum _{i=1}^m A_i\bar{x}_i^{(n)}\rangle -\rho \left\| \sum _{i=1}^m A_i\bar{x}_i^{(n)}\right\| _2^2\nonumber \\&+\,\rho \sum _{i=1}^{m-1} \langle A_ix_i^{(n)},\sum _{j=i}^m (A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)}) \rangle \ge 0. \end{aligned}$$
(4.50)

According to the update of Lagrangian multiplier, we have:

$$\begin{aligned} \bar{\lambda }^{(n+1)}-\bar{\lambda }^{(n)}=\lambda ^{(n+1)}-\lambda ^{(n)} =\rho \left( \sum _{i=1}^m A_ix_i^{(n)}-b\right) =\rho \sum _{i=1}^m A_i\bar{x}_i^{(n)}. \end{aligned}$$
(4.51)

Therefore,

$$\begin{aligned} |\bar{\lambda }^{(n)}|^2-|\bar{\lambda }^{(n+1)}|^2= & {} \langle \bar{\lambda }^{(n)}-\bar{\lambda }^{(n+1)},\bar{\lambda }^{(n)}+\bar{\lambda }^{(n+1)}\rangle \nonumber \\= & {} -\rho \langle \sum _{i=1}^m A_i\bar{x}_i^{(n)},2\bar{\lambda }^{(n)}+\rho \sum _{i=1}^m A_i\bar{x}_i^{(n)}\rangle \nonumber \\= & {} -\rho ^2\left\| \sum _{i=1}^m A_i\bar{x}_i^{(n)})\right\| _2^2-2\rho \langle \bar{\lambda }^{(n)},\sum _{i=1}^m A_i\bar{x}_i^{(n)}\rangle \nonumber \\\ge & {} \rho ^2\left\| \sum _{i=1}^m A_i\bar{x}_i^{(n)})\right\| _2^2\nonumber \\&-2\rho ^2\sum _{i=1}^{m-1} \langle A_ix_i^{(n)},\sum _{j=i+1}^m (A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)})\rangle . \end{aligned}$$
(4.52)

Take \(x_i=\bar{x}_m\) in (4.48) we have:

$$\begin{aligned} F_m(\bar{x}_m)-F_m(x_m^{(n)})-\langle A_m\bar{x}_m^{(n)},\lambda ^{(n)}+\rho \sum _{i=1}^mA_i\bar{x}_i^{(n)}\rangle \ge 0. \end{aligned}$$
(4.53)

Let \(\bar{x}_m=x_m^{(n-1)}\) in (4.53) and take the \((n-1)\)th iteration. We will have:

$$\begin{aligned}&F_m(x_m^{(n-1)})-F_m(x_m^{(n)}) -\langle A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)},\lambda ^{(n)}+\rho \sum _{i=1}^mA_i\bar{x}_i^{(n)}\rangle \ge 0, \end{aligned}$$
(4.54)
$$\begin{aligned}&F_m(x_m^{(n)})-F_m(x_m^{(n-1)}) -\langle A_m\bar{x}_m^{(n-1)}-A_m\bar{x}_m^{(n)},\lambda ^{(n-1)}+\rho \sum _{i=1}^mA_i\bar{x}_i^{(n-1)}\rangle \ge 0.\nonumber \\ \end{aligned}$$
(4.55)

Add the two equations together and we will see:

$$\begin{aligned}&-\langle \lambda ^{(n)}-\lambda ^{(n-1)},A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)}\rangle \nonumber \\&-\,\rho \langle A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)},\sum _{i=1}^mA_i\bar{x}_i^{(n)}-\sum _{i=1}^mA_i\bar{x}_i^{(n-1)}\rangle \ge 0. \end{aligned}$$
(4.56)

Combining with the update of \(\lambda \) yields that:

$$\begin{aligned}&-\,\rho \langle \sum _{i=1}^m A_i\bar{x}_i^{(n-1)},A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)}\rangle \nonumber \\&-\,\rho \langle A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)},\sum _{i=1}^mA_i\bar{x}_i^{(n)}-\sum _{i=1}^mA_i\bar{x}_i^{(n-1)}\rangle \ge 0. \end{aligned}$$
(4.57)

Then we can see:

$$\begin{aligned}&|\bar{\lambda }^{(n)}|^2-|\bar{\lambda }^{(n+1)}|^2 \ge \rho ^2\Vert \sum _{i=1}^m A_i\bar{x}_i^{(n)})\Vert _2^2 +2\rho ^2\langle A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)},A_m\bar{x}_m^{(n)}\rangle \nonumber \\&\quad -\,2\rho ^2\sum _{i=1}^{m-2} \langle A_ix_i^{(n)},\sum _{j=i+1}^{m-1} (A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)})\rangle . \end{aligned}$$
(4.58)

Take the identity

$$\begin{aligned}&2\langle A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)},A_m\bar{x}_m^{(n)}\rangle \nonumber \\&=\Vert A_m\bar{x}_m^{(n)}\Vert _2^2-\Vert A_m\bar{x}_m^{(n-1)}\Vert _2^2+\Vert A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)}\Vert _2^2 \end{aligned}$$
(4.59)

into (4.58):

$$\begin{aligned}&(|\bar{\lambda }^{(n)}|^2 +\rho ^2\Vert A_m\bar{x}_m^{(n-1)}\Vert _2^2) -(|\bar{\lambda }^{(n+1)}|^2 +\rho ^2\Vert A_m\bar{x}_m^{(n)}\Vert _2^2) \nonumber \\&\ge \rho ^2\Vert A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)}\Vert _2^2+\rho ^2\Vert \sum _{i=1}^mA_i\bar{x}_i^{(n)}\Vert _2^2 \nonumber \\&\quad -\,2\rho ^2\sum _{i=1}^{m-2} \langle A_ix_i^{(n)},\sum _{j=i+1}^{m-1} (A_j\bar{x}_j^{(n)}-A_j\bar{x}_j^{(n-1)}) \rangle . \end{aligned}$$
(4.60)

According to Lemma 1, we can get:

$$\begin{aligned}&(|\bar{\lambda }^{(n)}|^2+\rho ^2\Vert A_m\bar{x}_m^{(n-1)}\Vert _2^2+\rho ^2\sum _{i=1}^{m-2}\sum _{j=i}^{m-1} \Vert A_j^{(i,j)}\bar{x}_j^{(n-1)}\Vert _2^2) \nonumber \\&\quad -\,(|\bar{\lambda }^{(n+1)}|^2+\rho ^2\Vert A_m\bar{x}_m^{(n)}\Vert _2^2+\rho ^2\sum _{i=1}^{m-2}\sum _{j=i}^{m-1} \Vert A_j^{(i,j)}\bar{x}_j^{(n)}\Vert _2^2) \nonumber \\&\ge \rho ^2\Vert A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)}\Vert _2^2 +\rho ^2\sum _{i=1}^{m-2}\sum _{j=i+1}^{m-1} \Vert A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n-1)}\Vert _2^2.\nonumber \\ \end{aligned}$$
(4.61)

Then the nonnegative sequence

$$\begin{aligned} \left( |\bar{\lambda }^{(n)}|^2 +\rho ^2\Vert A_m\bar{x}_m^{(n-1)}\Vert _2^2+\rho ^2\sum _{i=1}^{m-2}\sum _{j=i}^{m-1} \Vert A_j^{(i,j)}\bar{x}_j^{(n-1)}\Vert _2^2\right) \end{aligned}$$
(4.62)

is decreasing and has a lower bound, which means it converges. Therefore, the nonnegative sequences of right-hand side in (4.61) have the limits 0, which yields:

$$\begin{aligned}&\lim _{n\rightarrow \infty } A_m\bar{x}_m^{(n)}-A_m\bar{x}_m^{(n-1)} =\lim _{n\rightarrow \infty } \sum _{i=1}^mA_i\bar{x}_i^{(n)} =0, \nonumber \\&\lim _{n\rightarrow \infty } A_i^{(i,j)}\bar{x}_i^{(n)}+A_j^{(i,j)}\bar{x}_j^{(n-1)}=0 \ (1 \le i<j \le m-1). \end{aligned}$$
(4.63)

Based on the structure of pair-wise constraints and the last two limits, we can get:

$$\begin{aligned} \lim _{n\rightarrow \infty } A_j^{(i,j)}\bar{x}_j^{(n-1)}-A_j^{(i,j)}\bar{x}_j^{(n-1)}=0, \ 1 \le i<j \le m-1 \end{aligned}$$
(4.64)

which means:

$$\begin{aligned} \lim _{n\rightarrow \infty } \sum _{i=1}^{m-2} \langle A_i^{(i,j)}x_i^{(n)},\sum _{j=i}^{m-1} (A_j^{(i,j)}\bar{x}_j^{(n)}-A_j^{(i,j)}\bar{x}_j^{(n-1)}) \rangle =0. \end{aligned}$$
(4.65)

Take (4.65) and the first two limits in (4.63) to (4.49) and take the superior limit:

$$\begin{aligned} \sum _{i=1}^m F_i(\bar{x}_i) \ge \limsup _{n\rightarrow \infty }\sum _{i=1}^m F_i(x_i^{(n)}). \end{aligned}$$
(4.66)

On the other hand, take the second limit in (4.63) to (4.47) and take the inferior limit:

$$\begin{aligned} \sum _{i=1}^m F_i(\bar{x}_i) \le \liminf _{n\rightarrow \infty }\sum _{i=1}^m F_i(x_i^{(n)}). \end{aligned}$$
(4.67)

Now we can say that Algorithm 2 converges to the minimum point:

$$\begin{aligned} \lim _{n\rightarrow \infty }\sum _{i=1}^m F_i(x_i^{(n)}) = \sum _{i=1}^m F_i(\bar{x}_i). \end{aligned}$$
(4.68)

Then \(\{(x_1^{(n)},x_2^{(n)},\ldots ,x_m^{(n)})\}\) is a minimizing sequence of objective function, and it is convergent. Next, we need to prove the sequence \(\{(x_1^{(n)},x_2^{(n)},\ldots ,x_m^{(n)},\lambda ^{(n)})\}\) converge to the KKT point.

From the above equation, the sequence \(\{(x_1^{(n)},x_2^{(n)},\ldots ,x_m^{(n)},\lambda ^{(n)})\}\) has a convergent subsequence \(\{(x_1^{(n_k)},x_2^{(n_k)},\ldots ,x_m^{(n_k)},\lambda ^{(n_k)})\}\). Letting

$$\begin{aligned} \lim _{k\rightarrow \infty }\{(x_1^{(n_k)},x_2^{(n_k)},\ldots ,x_m^{(n_k)},\lambda ^{(n_k)})\} = (\tilde{x}_1,\tilde{x}_2,\ldots ,\tilde{x}_m,\tilde{\lambda }). \end{aligned}$$
(4.69)

Then \((\tilde{x}_1,\tilde{x}_2,\ldots ,\tilde{x}_m,\tilde{\lambda })\) is an optimal solution of the objective function and \(x_i (i=1,2,\ldots ,m)\) satisfy the constraint of problem (2.3):

$$\begin{aligned} \sum _{i=1}^mA_i\tilde{x}_i=b. \end{aligned}$$
(4.70)

Considering the following \(x_i\)-subproblem:

$$\begin{aligned} x_i^{(n_k+1)}:=\arg \min _{x_i} \{F_i(x_i)+\frac{\rho }{2}\Vert A_ix_i+\sum _{j=1,j\ne i}^m A_jx_j^{(n_k)}-b-\frac{\lambda ^{(n_k)}}{\rho }\Vert _2^2\}.\qquad \end{aligned}$$
(4.71)

Its optimality condition is given by

$$\begin{aligned}&A_i^T\left( \lambda ^{(n_k)}-\rho (\sum _{i=1}^mA_ix_i^{(n_k+1)}-b)\right) \nonumber \\&\quad +\,\rho A_i^T\left( \sum _{j=1,j\ne i}^mA_j(x_j^{(n_k+1)}-x_j^{(n_k)})\right) \in \partial F(x^{(n_k+1)}).\qquad \end{aligned}$$
(4.72)

Because of \(\lambda ^{(n_k+1)}=\lambda ^{(n_k)}-\rho (\sum _{i=1}^mA_ix_i^{(n_k+1)}-b)\). Then (4.72) can be rewritten as

$$\begin{aligned} A_i^T\lambda ^{(n_k+1)}+\rho A_i^T\left( \sum _{j=1,j\ne i}^mA_j\left( x_j^{(n_k+1)}-x_j^{(n_k)}\right) \right) \in \partial F_i(x_i^{(n_k+1)}). \end{aligned}$$
(4.73)

By(4.63) and (4.64), and taking the limit for both sides that means \(k\rightarrow \infty \), then we can obtain

$$\begin{aligned} A_i^T\tilde{\lambda }\in \partial F_i(\tilde{x}_i). \end{aligned}$$
(4.74)

Using the same method, we can get

$$\begin{aligned} A_j^T\tilde{\lambda }\in \partial F_j(\tilde{x}_j),(j=1,2,\ldots ,m). \end{aligned}$$
(4.75)

Then we can know that \(\tilde{x}_i(i=1,\ldots ,m)\) and \(\tilde{\lambda }\) satisfy the KKT conditions of original problem (2.9):

$$\begin{aligned} \left\{ \begin{array}{l} A_i^T\tilde{\lambda }\in \partial F_j(\tilde{x}_i),(i=1,2,\ldots ,m),\\ \sum _{i=1}^mA_i\tilde{x}_i=b. \end{array}\right. \end{aligned}$$
(4.76)

Then \((\tilde{x}_1,\tilde{x}_2,\ldots ,\tilde{x}_m,\tilde{\lambda })\) is the KKT point, which completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Li, Y., Xie, X. et al. Alternating Direction Method for Separable Variables Under Pair-Wise Constraints. Commun. Math. Stat. 5, 59–82 (2017). https://doi.org/10.1007/s40304-017-0100-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40304-017-0100-2

Keywords

Mathematics Subject Classification

Navigation