First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

Gao, Xiang; Zhang, Shu-Zhong

doi:10.1007/s40305-016-0131-5

First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

Published: 18 June 2016

Volume 5, pages 131–159, (2017)
Cite this article

Journal of the Operations Research Society of China Aims and scope Submit manuscript

Xiang Gao¹ &
Shu-Zhong Zhang¹

1019 Accesses
28 Citations
Explore all metrics

Abstract

In this paper, we consider a block-structured convex optimization model, where in the objective the block variables are nonseparable and they are further linearly coupled in the constraint. For the 2-block case, we propose a number of first-order algorithms to solve this model. First, the alternating direction method of multipliers (ADMM) is extended, assuming that it is easy to optimize the augmented Lagrangian function with one block of variables at each time while fixing the other block. We prove that iteration complexity bound holds under suitable conditions, where t is the number of iterations. If the subroutines of the ADMM cannot be implemented, then we propose new alternative algorithms to be called alternating proximal gradient method of multipliers, alternating gradient projection method of multipliers, and the hybrids thereof. Under suitable conditions, the iteration complexity bound is shown to hold for all the newly proposed algorithms. Finally, we extend the analysis for the ADMM to the general multi-block case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bregman-style Partially Symmetric Alternating Direction Method of Multipliers for Nonconvex Multi-block Optimization

Article 19 April 2023

An inexact proximal generalized alternating direction method of multipliers

Article 12 May 2020

The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent

Article 17 October 2014

References

James, G.M., Paulson, C., Rusmevichientong, P.: The constrained lasso. Technical report, University of Southern California (2013)
Alizadeh, M., Li, X., Wang, Z., Scaglione, A., Melton, R.: Demand-side management in the smart grid: information processing for the power switch. IEEE Sig. Process. Mag. 29(5), 55–67 (2012)
Article Google Scholar
Chang, T.-H., Alizadeh, M., Scaglione A.: Coordinated home energy management for real-time power balancing. In: IEEE Power and Energy Society General Meeting, pp. 1–8 (2012)
Li, N., Chen, L., Low, S.H.: Optimal demand response based on utility maximization in power networks. In: IEEE Power and Energy Society General Meeting, pp. 1–8 (2011)
Paatero, J.V., Lund, P.D.: A model for generating household electricity load profiles. Int. J. Ener. Res. 30(5), 273–290 (2006)
Article Google Scholar
Cui, Y., Li, X., Sun, D., Toh, K.-C.: On the convergence properties of a majorized ADMM for linearly constrained convex optimization problems with coupled objective functions. arXiv:1502.00098 (2015)
Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization. arXiv:1401.7079 (2014)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, vol. 23. Prentice Hall, Englewood Cliffs (1989)
MATH Google Scholar
Douglas, J., Rachford, H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet MATH Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. PhD dissertation, Massachusetts Institute of Technology (1989)
Eckstein, J., Bertsekas, D.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends. Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Feng, C., Xu, H., Li, B.: An alternating direction method approach to cloud traffic management. arXiv:1407.8309 (2014)
Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: Advances in neural information processing systems, pp. 2101–2109 (2010)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for $l_1$-minimization with applications to compressed sensing. SIAM J. Imag. Sci. 1(1), 143–168 (2008)
Article MathSciNet MATH Google Scholar
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics, vol. 9. SIAM, Philadelphia (1989)
Book MATH Google Scholar
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Article MathSciNet MATH Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MathSciNet MATH Google Scholar
Deng, W., and Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 1–28 (2012)
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. arXiv:1208.3922 (2012)
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. arXiv:1408.4266 (2014)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Article MathSciNet MATH Google Scholar
Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block admm with o (1/k) convergence. arXiv:1312.3040 (2013)
He, B., Hou, L., Yuan, X.: On full jacobian decomposition of the augmented lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)
Article MathSciNet MATH Google Scholar
He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Optimization Online (2012)
Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. pp. 1–30 (2016)
Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3836–3840 (2015)
Ortega, J., Rheinboldt, W.: Iterative Solution of Nonlinear Equations in Several Variables. Classics in applied mathematics, vol. 30. SIAM, Philadelphia (2000)
Book MATH Google Scholar
Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015)
Article MathSciNet Google Scholar
Gao, X., Jiang, B., Zhang, S.: On the information-adaptive variants of the ADMM: an iteration complexity perspective. Optimization Online (2014)
Liu, J., Chen, J., Ye, J.: Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 547–556. ACM (2009)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: An extragradient-based alternating direction method for convex minimization. Foundations of Computational Mathematics, pp. 1–25 (2015)
Robinson, D.P., Tappenden, R.E.: A flexible admm algorithm for big data applications. arXiv:1502.04391 (2015)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Xiang Gao & Shu-Zhong Zhang

Authors

Xiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Zhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shu-Zhong Zhang.

Additional information

This paper is dedicated to Professor Lian-Sheng Zhang in celebration of his 80th birthday.

This paper was supported in part by National Science Foundation (No. CMMI-1462408).

Appendix: Proofs of the Convergence Theorems

1.1 Appendix 1: Proof of Theorem 3.1

We have $F(w)=\left( \begin{array}{lll} 0 &{}\quad 0 &{}\quad -A^\mathrm{T} \\ 0 &{}\quad 0 &{}\quad -B^\mathrm{T} \\ A &{}\quad B&{}\quad 0 \\ \end{array} \right) \left( \begin{array}{l} x \\ y \\ \lambda \\ \end{array} \right) - \left( \begin{array}{l} 0 \\ 0 \\ b \\ \end{array} \right) ,$ for any $w_1$ and $w_2$, and so

$$\begin{aligned} (w_1-w_2)^\mathrm{T}(F(w_1)-F(w_2))=0. \end{aligned}$$

Expanding on this identity, we have for any $w^0,w^1,\cdots ,w^{t-1}$ and ${\bar{w}} = \frac{1}{t} \sum \limits _{k=0}^{t-1} w^k$, that

$$\begin{aligned} ({\bar{w}}-w)^\mathrm{T}F({\bar{w}})=\frac{1}{t} \sum \limits _{k=0}^{t-1}({w}^k-w)^\mathrm{T}F({w}^k). \end{aligned}$$

(7.1)

We begin our analysis with the following property of the ADMM algorithm.

Proposition 7.1

Suppose $h_2$ is strongly convex with parameter $\sigma >0$. Let $\{{\tilde{w}}^k\}$ be defined by (2.6), and the matrices Q, M, P be given in (2.5). First of all, for any $w\in \Omega $, we have

(7.2)

Furthermore,

(7.3)

Proof

By the optimality condition of the two subproblems in ADMM, we have

$$\begin{aligned}&(x-x^{k+1})^\mathrm{T}\left[ \nabla _x f(x^{k+1},y^k)+h_1'(x^{k+1}) -A^\mathrm{T}(\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b))\right. \\&\quad \left. +\,G(x^{k+1}-x^k)\right] \geqslant 0,\quad \forall x\in \mathcal {X}, \end{aligned}$$

where $h'_1(x^{k+1})\in \partial h_1(x^{k+1})$, and

$$\begin{aligned}&(y-y^{k+1})^{\mathrm{{T}}}\left[ \nabla _y f(x^{k+1},y^{k+1})+h'_2(y^{k+1})\right. \\&\quad \left. -B^{\mathrm{{T}}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k+1}-b))+H(y^{k+1}-y^k)\right] \geqslant 0,\quad \forall y\in \mathcal {Y}\end{aligned}$$

where $h'_2(x^{k+1})\in \partial h_2(x^{k+1})$.

Note that ${\tilde{\lambda }}^k=\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b)$. The above two inequalities can be rewritten as

$$\begin{aligned} (x-{\tilde{x}}^{k})^{\mathrm{T}}\left[ \nabla _x f({\tilde{x}}^k,y^k)+ h'_1({\tilde{x}}^k)-A^\mathrm{T}{\tilde{\lambda }}^k+G({\tilde{x}}^k -x^k)\right] \geqslant 0,\quad \forall x\in \mathcal {X}, \end{aligned}$$

(7.4)

and

$$\begin{aligned}&\quad (y-{\tilde{y}}^{k})^\mathrm{T}\left[ \nabla _y f({\tilde{x}}^k,{\tilde{y}}^k)+ h'_2({\tilde{y}}^k)-B^{\mathrm{T}}{\tilde{\lambda }}^k+\gamma B^\mathrm{T}B({\tilde{y}}^k-y^{k})+H({\tilde{y}}^k-y^k)\right] \nonumber \\&\geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

(7.5)

Observe the following chain of inequalities

(7.6)

Since

$$\begin{aligned} (A{\tilde{x}}^k+B{\tilde{y}}^k-b)-B({\tilde{y}}^k-y^k) -\frac{1}{\gamma }(\lambda ^k-{\tilde{\lambda }}^k)=0, \end{aligned}$$

we have

(7.7)

By the strong convexity of the function $h_2(y)$, we have

$$\begin{aligned} (y-{\tilde{y}}^{k})^\mathrm{T}h'_2({\tilde{y}}^k)\leqslant h_2(y)-h_2({\tilde{y}}^k)-\frac{\sigma }{2}\Vert y-{\tilde{y}}^{k}\Vert ^2. \end{aligned}$$

(7.8)

Because of the convexity of $h_1(x)$ and combining (7.8), (7.7), (7.6), (7.5), and (7.4), we have

$$\begin{aligned}&{h(u)-h(\tilde{u}^k)+\left( \frac{L}{2} +\frac{L^2}{2\sigma }\right) \Vert y^k-{\tilde{y}}^k\Vert ^2+(y -{\tilde{y}}^k)^\mathrm{T}H({\tilde{y}}^k-y^k)}\nonumber \\&\quad + \left( \begin{array}{c} x-{\tilde{x}}^k\\ y-{\tilde{y}}^k\\ \lambda -{\tilde{\lambda }}^k\\ \end{array}\right) ^\mathrm{T}\left[ \left( \begin{array}{c} -A^\mathrm{T}{\tilde{\lambda }}^k\\ -B^\mathrm{T}{\tilde{\lambda }}^k\\ A{\tilde{x}}^k+B{\tilde{y}}^k-b\\ \end{array}\right) -\left( \begin{array}{c} G(x^k-{\tilde{x}}^k)\\ \gamma B^\mathrm{T}B(y^k-{\tilde{y}}^k)\\ -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma }(\lambda ^k-{\tilde{\lambda }}^k)\\ \end{array}\right) \right] \geqslant 0 \end{aligned}$$

for any $w\in \Omega $ and ${\tilde{w}}^k$.

By definition of Q, (7.2) of Proposition 7.1 follows. For (7.3), due to the similarity, we refer to Lemma 3.2 in [17] (noting the matrices Q, P, and M).

The following theorem exhibits an important relationship between two consecutive iterates $w^k$ and $w^{k+1}$ from which the convergence would follow.

Proposition 7.2

Let ${w^k}$ be the sequence generated by the ADMM, ${{\tilde{w}}^k}$ be defined as in (2.6) and H satisfies $H_s:=H-\left( L+\frac{L^2}{\sigma }\right) I_{q}\succeq 0$. Then the following holds:

$$\begin{aligned} \frac{1}{2}\left( \Vert w^*-w^{k}\Vert _{\hat{M}}^2-\Vert w^*-w^{k+1} \Vert _{\hat{M}}^2\right) -\frac{1}{2}\Vert w^k-{\tilde{w}}^k\Vert _{H_d}^2\geqslant 0, \end{aligned}$$

(7.9)

where

$$\begin{aligned} {\hat{H}}=\gamma B^\mathrm{T}B+H, \, \begin{array}{cc} {\hat{M}}=\left( \begin{array}{lll} G &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad \hat{H} &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \frac{1}{\gamma }I_m \\ \end{array} \right) \end{array} \text{ and } H_d=\left( \begin{array}{lll} G &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad H_s &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \frac{1}{\gamma }I_m \\ \end{array} \right) .\nonumber \\ \end{aligned}$$

(7.10)

Proof

It follows from Proposition 7.1 that

(7.11)

Note that $H_{s}:=H-(L+\frac{L^2}{\sigma })I_{q}\succeq 0$, we have the following

$$\begin{aligned}&\quad \left( \frac{L}{2}+\frac{L^2}{2\sigma }\right) \Vert y^k -{\tilde{y}}^k\Vert ^2+(y-{\tilde{y}}^k)^\mathrm{T}H({\tilde{y}}^k-y^k)\nonumber \\&=\left( \frac{L}{2}+\frac{L^2}{2\sigma }\right) \Vert y^k -{\tilde{y}}^k\Vert ^2+\frac{1}{2}\left( \Vert y-y^k\Vert _H^2-\Vert y -{\tilde{y}}^k\Vert _H^2-\Vert y^k-{\tilde{y}}^k\Vert _H^2\right) \nonumber \\&=\frac{1}{2}\left( \Vert y-y^k\Vert _H^2-\Vert y-{\tilde{y}}^k\Vert _H^2\right) -\frac{1}{2}\Vert y^k-{\tilde{y}}^k\Vert _{H_{s}}^2. \end{aligned}$$

(7.12)

Thus, combining (7.11) and (7.12), we have

(7.13)

By the definition of $\hat{M}$ and $H_d$ according to (7.10), it follows from (7.13) that

(7.14)

Letting $w=w^*$ in (7.14), we have

(7.15)

By the monotonicity of F and using the optimality of $w^*$, we have

$$\begin{aligned}&\quad \frac{1}{2}\left( \Vert w^*-w^{k}\Vert _{\hat{M}}^2-\Vert w^* -w^{k+1}\Vert _{\hat{M}}^2\right) -\frac{1}{2}\Vert w^k -{\tilde{w}}^k\Vert _{H_d}^2 \\&\geqslant h(\tilde{u}^k)-h(u^*)+({\tilde{w}}^k-w^*)^\mathrm{T}F({\tilde{w}}^k) \\&\geqslant h(\tilde{u}^k)-h(u^*)+({\tilde{w}}^k-w^*)^\mathrm{T}F(w^*) \\&\geqslant 0, \end{aligned}$$

which completes the proof.

1.2 Appendix 2: Proof of Theorem 3.1.

Proof

First, according to (7.9), it holds that $\{w^k\}$ is bounded and

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert w^k-{\tilde{w}}^k\Vert _{H_d}=0. \end{aligned}$$

(7.16)

Thus, those two sequences have the same cluster points: For any $w^{k_n}\rightarrow w^\infty $, by (7.16) we also have ${\tilde{w}}^{k_n}\rightarrow w^\infty $. Applying inequality (7.2) to $\{w^{k_n}\},\{{\tilde{w}}^{k_n}\}$ and taking the limit, it yields that

$$\begin{aligned} h(u)-h(u^\infty )+(w-w^\infty )^\mathrm{T} F(w^\infty )\geqslant 0. \end{aligned}$$

(7.17)

Consequently, the cluster point $w^\infty $ is an optimal solution. Since (7.9) is true for any optimal solution $w^*$, it also holds for $w^\infty $, and that implies ${w^k}$ will converge to $w^\infty $.

Recall (7.2) and (7.3) in Proposition 7.1, those would imply that

(7.18)

Furthermore, since $H-\left( L+\frac{L^2}{\sigma }\right) I_{q}\succeq 0$, we have

$$\begin{aligned}&\quad \left( \frac{L}{2}+\frac{L^2}{2\sigma }\right) \Vert y^k -{\tilde{y}}^k\Vert ^2+(y-{\tilde{y}}^k)^\mathrm{T}H({\tilde{y}}^k-y^k)\nonumber \\&= \left( \frac{L}{2}+\frac{L^2}{2\sigma }\right) \Vert y^k -{\tilde{y}}^k\Vert ^2+\frac{1}{2}\left( \Vert y-y^k\Vert _H^2-\Vert y -{\tilde{y}}^k\Vert _H^2-\Vert y^k-{\tilde{y}}^k\Vert _H^2\right) \nonumber \\&\leqslant \frac{1}{2}\left( \Vert y-y^k\Vert _H^2 -\Vert y-{\tilde{y}}^k\Vert _H^2\right) . \end{aligned}$$

(7.19)

Thus, combining (7.18) and (7.19) leads to

(7.20)

By the definition of M in (2.5) and denoting $\hat{H}=\gamma B^{\top }B+H$, (7.20) leads to

(7.21)

Before proceeding, let us introduce ${\bar{w}}_n:=\frac{1}{n}\sum \limits _{k=0}^{n-1} {\tilde{w}}^k$. Moreover, recall the definition of ${\bar{u}}_n$ in (3.2), we have

$$\begin{aligned} {\bar{u}}_n=\frac{1}{n}\sum \limits _{k=1}^{n} u^k =\frac{1}{n}\sum \limits _{k=0}^{n-1} \tilde{u}^k.\\ \end{aligned}$$

Now, summing the inequality (7.21) over $k=0,1,\cdots ,t-1$ yields

(7.22)

where the first inequality is due to the convexity of h and (7.1).

Note the above inequality is true for all $x\in \mathcal {X}, y\in \mathcal {Y}$, and $\lambda \in \mathbb R^m$, hence it is also true for any optimal solution $x^*, y^*$, and ${\mathcal {B}}_\rho =\{\lambda : \Vert \lambda \Vert \leqslant \rho \}$. As a result,

$$\begin{aligned}&\quad \,\, \sup \limits _{\lambda \in {\mathcal {B}}_\rho }\left\{ h({\bar{u}}_t)-h(u^*)+({\bar{w}}_t-w^*)^\mathrm{T}F({\bar{w}}_t)\right\} \nonumber \\&= \sup \limits _{\lambda \in {\mathcal {B}}_\rho }\left\{ h({\bar{u}}_t)-h(u^*)+({\bar{x}}_t-x^*)^\mathrm{T}(-A^{\mathrm{T}}\bar{\lambda }_t)\right. \nonumber \\&\quad \,\,\left. +\,({\bar{y}}_t-y^*)^{\mathrm{T}}(-B^{\mathrm{T}}\bar{\lambda }_t)+(\bar{\lambda }_t-\lambda )^\mathrm{T}(A{\bar{x}}_t+B{\bar{y}}_t-b)\right\} \nonumber \\&= \sup \limits _{\lambda \in {\mathcal {B}}_\rho }\left\{ h({\bar{u}}_t)-h(u^*)+\bar{\lambda }_t^\mathrm{T}(Ax^*+By^*-b)-\lambda ^{\mathrm{T}}(A{\bar{x}}_t+B{\bar{y}}_t-b)\right\} \nonumber \\&= \sup \limits _{\lambda \in {\mathcal {B}}_\rho }\left\{ h({\bar{u}}_t)-h(u^*)-\lambda ^\mathrm{T}(A{\bar{x}}_t+B{\bar{y}}_t-b)\right\} \nonumber \\&= h({\bar{u}}_t)-h(u^*)+\rho \Vert A{\bar{x}}_t+B{\bar{y}}_t-b\Vert , \end{aligned}$$

(7.23)

which, combined with (7.22), implies that

and so by optimizing over $(x^*,y^*)\in \mathcal {X}^* \times \mathcal {Y}^*$, we have

(7.24)

This completes the proof.

1.3 Appendix 3: Proof of Theorem 4.1

Similar to the analysis for ADMM, we need the following proposition in the analysis of APGMM.

Proposition 7.3

Let $\{{\tilde{w}}^k\}$ be defined by (2.6), and the matrices Q, M, P be given as in (2.5). For any $w\in \Omega $, we have

$$\begin{aligned}&\quad \,\,h(u)-h(\tilde{u}^k)+(w-{\tilde{w}}^k)^\mathrm{T} F({\tilde{w}}^k)\nonumber \\&\geqslant (w-{\tilde{w}}^k)^{\mathrm{T}} Q(w^k-{\tilde{w}}^k)-\left( \frac{L}{2}(\Vert x^k-{\tilde{x}}^k\Vert ^2+\Vert y^k-{\tilde{y}}^k\Vert ^2)+(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k)\right) \qquad \qquad \quad \end{aligned}$$

(7.25)

$$\begin{aligned}&= \frac{1}{2}\left( \Vert w-w^{k+1}\Vert _{M}^2-\Vert w-w^k\Vert _{M}^2\right) +\frac{1}{2}\Vert x^k-{\tilde{x}}^k\Vert ^2_G+\frac{1}{2\gamma }\Vert \lambda ^k-{\tilde{\lambda }}^k\Vert ^2 \nonumber \\&\quad \,\, -\left( \frac{L}{2}\left( \Vert x^k-{\tilde{x}}^k\Vert ^2+\Vert y^k-{\tilde{y}}^k\Vert ^2\right) +(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k)\right) . \end{aligned}$$

(7.26)

Proof

First, by the optimality condition of the two subproblems in APGMM, we have

$$\begin{aligned}&(x-x^{k+1})^{\mathrm{T}}\left[ \nabla _x f(x^{k},y^k)+h'_1(x^{k+1}) -A^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b))\right. \\&\quad \left. +\,G(x^{k+1}-x^k)\right] \geqslant 0 , \quad \forall x\in \mathcal {X}\end{aligned}$$

and

$$\begin{aligned}&(y-y^{k+1})^{\mathrm{T}}\left[ \nabla _y f(x^{k},y^{k})+h'_2(y^{k+1}) -B^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k+1}-b))\right. \\&\quad \left. +\,H(y^{k+1}-y^k)\right] \geqslant 0,\, \quad \forall y\in \mathcal {Y}. \end{aligned}$$

Note that ${\tilde{\lambda }}^k=\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b)$, and by the definition of ${\tilde{w}}^k$, the above two inequalities are equivalent to

$$\begin{aligned} (x-{\tilde{x}}^{k})^{\mathrm{T}}\left[ \nabla _x f(x^k,y^k)+ h'_1({\tilde{x}}^k)-A^{\mathrm{T}}{\tilde{\lambda }}^k +G({\tilde{x}}^k-x^k)\right] \geqslant 0,\quad \forall x\in \mathcal {X} \end{aligned}$$

(7.27)

and

$$\begin{aligned}&(y-{\tilde{y}}^{k})^{\mathrm{T}}\left[ \nabla _y f(x^k,y^k)+h'_2({\tilde{y}}^k)-B^{\mathrm{T}}{\tilde{\lambda }}^k+\gamma B^{\mathrm{T}}B({\tilde{y}}^k-y^{k})\right. \nonumber \\&\quad \left. +\,H({\tilde{y}}^k-y^k)\right] \geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

(7.28)

Notice that

(7.29)

Besides, we also have

$$\begin{aligned} (A{\tilde{x}}^k+B{\tilde{y}}^k-b)-B({\tilde{y}}^k-y^k) -\frac{1}{\gamma }\left( \lambda ^k-{\tilde{\lambda }}^k\right) =0. \end{aligned}$$

Thus

$$\begin{aligned} (\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}}(A{\tilde{x}}^k +B{\tilde{y}}^k-b)=(\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}}\left( -B(y^k -{\tilde{y}}^k) +\frac{1}{\gamma }(\lambda ^k-{\tilde{\lambda }}^k)\right) .\nonumber \\ \end{aligned}$$

(7.30)

By the convexity of $h_1(x)$ and $h_2(y)$, combining (7.30), (7.29), (7.28), and (7.27), we have

$$\begin{aligned}&h(u)-h(\tilde{u}^k)+\frac{L}{2}\left( \Vert x^k-{\tilde{x}}^k\Vert ^2+\Vert y^k-{\tilde{y}}^k\Vert ^2\right) +(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k) \\&\quad +\left( \begin{array}{c} x-{\tilde{x}}^k\\ y-{\tilde{y}}^k\\ \lambda -{\tilde{\lambda }}^k\\ \end{array}\right) ^{\mathrm{T}}\left[ \left( \begin{array}{c} -A^{\mathrm{T}}{\tilde{\lambda }}^k\\ -B^{\mathrm{T}}{\tilde{\lambda }}^k\\ A{\tilde{x}}^k+B{\tilde{y}}^k-b\\ \end{array}\right) -\left( \begin{array}{c} G(x^k-{\tilde{x}}^k)\\ \gamma B^{\mathrm{T}}B(y^k-{\tilde{y}}^k)\\ -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma }(\lambda ^k-{\tilde{\lambda }}^k)\\ \end{array}\right) \right] \geqslant 0 \end{aligned}$$

for any $w\in \Omega $ and ${\tilde{w}}^k$.

By definition of Q, we have shown (7.25) in Proposition 7.3. Equality (7.26) directly follows from (7.3) in Proposition 7.1.

With Proposition 7.3 in place, we can show Theorem 4.1 by exactly following the same steps as in the proof of Theorem 3.1, noting of course the altered assumptions on the matrices G and H. In the meanwhile, we also point out the following proposition which is similar to Proposition 7.2. Since most steps of the proofs are almost identical to that of the previous theorems, we omit the details for succinctness.

Proposition 7.4

Let ${w^k}$ be the sequence generated by the APGMM, ${{\tilde{w}}^k}$ be as defined in (2.6), and H and G are chosen so as to satisfy $H_s:=H-LI_{q}\succ 0$ and $G_s:=G-LI_{p}\succ 0$. Then the following holds :

$$\begin{aligned} \frac{1}{2}\left( \Vert w^*-w^{k}\Vert _{\hat{M}}^2-\Vert w^*-w^{k +1}\Vert _{\hat{M}}^2\right) -\frac{1}{2}\Vert w^k-{\tilde{w}}^k\Vert _{H_d}^2\geqslant 0, \end{aligned}$$

where

$$\begin{aligned} \begin{array}{c@{\quad }c} \hat{M}=\left( \begin{array}{c@{\quad }c@{\quad }c} G &{} 0 &{} 0 \\ 0 &{} \hat{H} &{} 0 \\ 0 &{} 0 &{} \frac{1}{\gamma }I_m \\ \end{array} \right) , \end{array} H_d=\left( \begin{array}{c@{\quad }c@{\quad }c} G_s &{} 0 &{} 0 \\ 0 &{} H_s &{} 0 \\ 0 &{} 0 &{} \frac{1}{\gamma }I_m \\ \end{array} \right) \end{aligned}$$

and $\hat{H}=\gamma B^{\mathrm{T}}B+H$.

Theorem 4.1 follows from the above propositions.

1.4 Appendix 4: Proof of Theorem 4.2

Similar to the analysis for APGMM, we do not need any strong convexity here, but we do need to assume that the gradients $\nabla _x h_1(x)$ and $\nabla _y h_2(y)$ are Lipschitz continuous. Without loss of generality, we further assume that the Lipschitz constant is the same as $\nabla f(x,y)$ which is L, that is,

$$\begin{aligned}&\Vert \nabla _x h_1(x_2) - \nabla _x h_1(x_1)\Vert \leqslant L \Vert x_2-x_1\Vert , \;\forall \, x_1,x_2\in \mathcal {X},\nonumber \nonumber \\&\Vert \nabla _y h_2(y_2) - \nabla _y h_2(y_1)\Vert \leqslant L \Vert y_2-y_1\Vert , \;\forall \, y_1,y_2\in \mathcal {Y}. \end{aligned}$$

(7.31)

Proposition 7.5

Let $\{{\tilde{w}}^k\}$ be defined by (2.6), and the matrices Q, M, P be as given in (2.5), and $G:=\gamma A^{\mathrm{T}}A+\frac{1}{\alpha }I_{p}, H:=\frac{1}{\alpha }I_{q}-\gamma B^{\mathrm{T}}B\succeq 0$. First of all, for any $w\in \Omega $, we have

$$\begin{aligned}&\quad \,\, h(u)-h(\tilde{u}^k)+(w-{\tilde{w}}^k)^\mathrm{T} F({\tilde{w}}^k) \nonumber \\&\geqslant (w-{\tilde{w}}^k)^{\mathrm{T}} Q(w^k-{\tilde{w}}^k)-\left( L(\Vert x^k-{\tilde{x}}^k\Vert ^2+\Vert y^k-{\tilde{y}}^k\Vert ^2)\right. \nonumber \\&\quad \,\,\left. +\,(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k)\right) \end{aligned}$$

(7.32)

$$\begin{aligned}&=\frac{1}{2}\left( \Vert w-w^{k+1}\Vert _{M}^2-\Vert w-w^k\Vert _{M}^2\right) +\frac{1}{2}\Vert x^k-{\tilde{x}}^k\Vert ^2_G+\frac{1}{2\gamma }\Vert \lambda ^k-{\tilde{\lambda }}^k\Vert ^2\nonumber \\&\quad \,\,-\left( L(\Vert x^k-{\tilde{x}}^k\Vert ^2+\Vert y^k-{\tilde{y}}^k\Vert ^2)+(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k)\right) . \end{aligned}$$

(7.33)

Proof

First, by the optimality condition of the two subproblems in AGPMM, we have

$$\begin{aligned}&(x-x^{k+1})^{\mathrm{T}}\left[ x^{k+1}-x^k+\alpha \left( \nabla _x f(x^{k},y^{k})+\nabla _y h_1(x^{k})\right. \right. \\&\quad \left. \left. -A^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k}+By^{k}-b))\right) \right] \geqslant 0,\quad \forall x\in \mathcal {X}\end{aligned}$$

and

$$\begin{aligned}&(y-y^{k+1})^{\mathrm{T}}\left[ y^{k+1}-y^k+\alpha \left( \nabla _y f(x^{k},y^{k})+\nabla _y h_2(y^{k})\right. \right. \\&\quad \left. \left. -B^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b))\right) \right] \geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

Noting ${\tilde{\lambda }}^k=\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b)$ and the definition of ${\tilde{w}}^k$, the above two inequalities are, respectively, equivalent to

$$\begin{aligned}&(x-{\tilde{x}}^{k})^{\mathrm{T}}\left[ \nabla _x f(x^k,y^k)+\nabla _x h_2(x^k)-A^{\mathrm{T}}{\tilde{\lambda }}^k+\gamma A^{\mathrm{T}}A({\tilde{x}}^k-x^k) \right. \nonumber \\&\quad \quad \left. +\frac{1}{\alpha }({\tilde{x}}^k-x^k)\right] \geqslant 0,\quad \forall x\in \mathcal {X}\end{aligned}$$

(7.34)

and

$$\begin{aligned}&\quad (y-{\tilde{y}}^{k})^{\mathrm{T}}\left[ \nabla _y f(x^k,y^k)+\nabla _y h_2(y^k)-B^{\mathrm{T}}{\tilde{\lambda }}^k+\frac{1}{\alpha }({\tilde{y}}^k -y^k)\right] \nonumber \\&\geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

(7.35)

Similar to Proposition 7.3, we have

(7.36)

Moreover, by (2.4), we have

$$\begin{aligned}&(x-{\tilde{x}}^{k})^{\mathrm{T}}\nabla _x h_1(x^k)\leqslant h_1(x)-h_1({\tilde{x}}^k)+\frac{L}{2}\Vert x^k-{\tilde{x}}^k\Vert ^2, \nonumber \\&(y-{\tilde{y}}^{k})^{\mathrm{T}}\nabla _y h_2(y^k)\leqslant h_2(y)-h_2({\tilde{y}}^k)+\frac{L}{2}\Vert y^k-{\tilde{y}}^k\Vert ^2. \end{aligned}$$

(7.37)

Besides,

$$\begin{aligned} (A{\tilde{x}}^k+B{\tilde{y}}^k-b)-B({\tilde{y}}^k-y^k) -\frac{1}{\gamma }\left( \lambda ^k-{\tilde{\lambda }}^k\right) =0. \end{aligned}$$

Thus

$$\begin{aligned}&\quad (\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}}(A{\tilde{x}}^k+B{\tilde{y}}^k-b)\nonumber \\&=(\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}} \left( -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma }\left( \lambda ^k -{\tilde{\lambda }}^k\right) \right) . \end{aligned}$$

(7.38)

Combining (7.38), (7.37), (7.36), (7.35), and (7.34), and noticing that $G:=\gamma A^{\mathrm{T}}A+\frac{1}{\alpha }I_{p}, H:=\frac{1}{\alpha }I_{q}-\gamma B^{\mathrm{T}}B$, we have, for any $w\in \Omega $ and ${\tilde{w}}^k$, that

$$\begin{aligned}&h(u)-h(\tilde{u}^k)+L(\Vert x^k-{\tilde{x}}^k\Vert ^2+\Vert y^k -{\tilde{y}}^k\Vert ^2)+(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k)\nonumber \\&+ \left( \begin{array}{c} x-{\tilde{x}}^k\\ y-{\tilde{y}}^k\\ \lambda -{\tilde{\lambda }}^k\\ \end{array}\right) ^{\mathrm{T}} \left\{ \left( \begin{array}{c} -A^{\mathrm{T}}{\tilde{\lambda }}^k\\ -B^{\mathrm{T}}{\tilde{\lambda }}^k\\ A{\tilde{x}}^k+B{\tilde{y}}^k-b\\ \end{array}\right) -\left( \begin{array}{c} G(x^k-{\tilde{x}}^k)\\ \gamma B^{\mathrm{T}}B(y^k-{\tilde{y}}^k)\\ -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma }(\lambda ^k -{\tilde{\lambda }}^k)\\ \end{array}\right) \right\} \geqslant 0. \end{aligned}$$

Using the definition of Q, (7.32) follows. In view of (7.3) in Proposition 7.1, equality (7.33) also readily follows.

With Proposition 7.5, similar as before, we can show Theorem 4.2 by following the same approach as in the proof of Theorem 3.1. We skip the details here for succinctness.

Proposition 7.6

Let ${w^k}$ be the sequence generated by the AGPMM, ${{\tilde{w}}^k}$ be defined in (2.6), and $G:=\gamma A^{\mathrm{T}}A+\frac{1}{\alpha }I_{p}, H:=\frac{1}{\alpha }I_{q}-\gamma B^{\mathrm{T}}B$. Suppose that $\alpha $ satisfies that $H_s:=H-2 L I_{q}\succ 0$ and $G_s:=G-2 L I_{p}\succ 0$. Then the following holds

$$\begin{aligned} \frac{1}{2}\left( \Vert w^*-w^{k}\Vert _{\hat{M}}^2-\Vert w^* -w^{k+1}\Vert _{\hat{M}}^2\right) -\frac{1}{2}\Vert w^k -{\tilde{w}}^k\Vert _{H_d}^2\geqslant 0, \end{aligned}$$

where

$$\begin{aligned} \begin{array}{l@{\quad }l} \hat{M}=\left( \begin{array}{l@{\quad }l@{\quad }l} G &{} 0 &{} 0 \\ 0 &{} \hat{H} &{} 0 \\ 0 &{} 0 &{} \frac{1}{\gamma }I_m \\ \end{array} \right) , \end{array} H_d=\left( \begin{array}{l@{\quad }l@{\quad }l} G_s &{} 0 &{} 0 \\ 0 &{} H_s &{} 0 \\ 0 &{} 0 &{} \frac{1}{\gamma }I_m \\ \end{array} \right) , \end{aligned}$$

and ${\hat{H}}=\gamma B^{\mathrm{T}}B+H$.

Theorem 4.2 now follows from the above propositions.

1.5 Appendix 5: Proofs of Theorems 4.3 and 4.4

Proposition 6.1

Let $\{{\tilde{w}}^k\}$ be defined by (2.6), and the matrices Q, M, P be given in (2.5). For any $w\in \Omega $, we have

(7.39)

Proof

First, by the optimality condition of the two subproblems in ADM-PG, we have

$$\begin{aligned}&(x-x^{k+1})^{\mathrm{T}}\left[ \nabla _x f(x^{k+1},y^k)+h'_1(x^{k+1}) -A^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b))\right. \nonumber \\&\quad \left. +\,G(x^{k+1}-x^k)\right] \geqslant 0,\quad \forall x\in \mathcal {X}\end{aligned}$$

and

$$\begin{aligned}&(y-y^{k+1})^{\mathrm{T}}\left[ \nabla _y f(x^{k+1},y^{k}) +h'_2(y^{k+1})-B^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k+1}-b)) \right. \\&\quad \left. +\,H(y^{k+1}-y^k)\right] \geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

Noting ${\tilde{\lambda }}^k=\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b)$ and the definition of ${\tilde{w}}^k$, the above two inequalities are equivalent to

$$\begin{aligned}&\quad (x-{\tilde{x}}^{k})^{\mathrm{T}}\left[ \nabla _x f({\tilde{x}}^k,y^k)+\nabla _x h_1({\tilde{x}}^k)-A^{\mathrm{T}}{\tilde{\lambda }}^k+G({\tilde{x}}^k -x^k)\right] \nonumber \\&\geqslant 0,\quad \forall x\in \mathcal {X}\end{aligned}$$

(7.40)

and

$$\begin{aligned}&(y-{\tilde{y}}^{k})^{\mathrm{T}}\left[ \nabla _y f({\tilde{x}}^k,y^k) +g_2({\tilde{y}}^k)-B^{\mathrm{T}}{\tilde{\lambda }}^k+\gamma B^{\mathrm{T}}B({\tilde{y}}^k-y^{k})\right. \nonumber \\&\quad \left. +\,H({\tilde{y}}^k-y^k)\right] \geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

(7.41)

Moreover,

$$\begin{aligned}&\quad \,(x-{\tilde{x}}^{k})^{\mathrm{T}}\nabla _x f({\tilde{x}}^k,y^k) +(y-{\tilde{y}}^{k})^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)\nonumber \\&=(x-{\tilde{x}}^{k})^{\mathrm{T}}\nabla _x f({\tilde{x}}^k,y^k) +(y-y^{k})^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)+(y^k -{\tilde{y}}^{k})^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)\nonumber \\&\leqslant f(x,y)-f({\tilde{x}}^k,y^k)-({\tilde{y}}^{k} -y^k)^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)\nonumber \\&\quad \,\,(\hbox {from}~(2.3))\nonumber \\&\leqslant f(x,y)-f({\tilde{x}}^k, {\tilde{y}}^k)+\frac{L}{2}\Vert y^k-{\tilde{y}}^k\Vert ^2. \end{aligned}$$

(7.42)

Besides,

$$\begin{aligned} (A{\tilde{x}}^k+B{\tilde{y}}^k-b)-B({\tilde{y}}^k-y^k) -\frac{1}{\gamma }\left( \lambda ^k-{\tilde{\lambda }}^k\right) =0, \end{aligned}$$

and so

$$\begin{aligned}&\quad (\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}}(A{\tilde{x}}^k+B{\tilde{y}}^k-b)\nonumber \\&=(\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}}\left( -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma } (\lambda ^k-{\tilde{\lambda }}^k)\right) . \end{aligned}$$

(7.43)

By the convexity of $h_1(x)$ and $h_2(y)$, combining (7.43), (7.42), (7.41), and (7.40), we have

$$\begin{aligned}&h(u)-h(\tilde{u}^k)+\frac{L}{2}\Vert y^k-{\tilde{y}}^k\Vert ^2 +(y-{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k) \\&+\left( \begin{array}{c} x-{\tilde{x}}^k\\ y-{\tilde{y}}^k\\ \lambda -{\tilde{\lambda }}^k\\ \end{array}\right) ^{\mathrm{T}}\left[ \left( \begin{array}{c} -A^{\mathrm{T}}{\tilde{\lambda }}^k\\ -B^{\mathrm{T}}{\tilde{\lambda }}^k\\ A{\tilde{x}}^k+B{\tilde{y}}^k-b\\ \end{array}\right) -\left( \begin{array}{c} G(x^k-{\tilde{x}}^k)\\ \gamma B^{\mathrm{T}}B(y^k-{\tilde{y}}^k)\\ -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma }(\lambda ^k-{\tilde{\lambda }}^k)\\ \end{array}\right) \right] \geqslant 0 \end{aligned}$$

for any $w\in \Omega $ and ${\tilde{w}}^k$.

By similar derivations as in the proofs for Proposition 7.5, (7.39) follows.

With Proposition 6.1 in place, we can prove Theorem 4.3 similarly as in the proof of Theorem 3.1. We skip the details here for succinctness.

For ADM-GP, we do not need strong convexity, but we do need to assume that the gradient $\nabla _y h_2(y)$ of $h_2(y)$ is Lipschitz continuous. Without loss of generality, we further assume that the Lipschitz constant of $\nabla _y h_2(y)$ is the same as $\nabla f(x,y)$ which is L:

$$\begin{aligned} \Vert \nabla _y h_2(y_2) - \nabla _y h_2(y_1)\Vert \leqslant L \Vert y_2-y_1\Vert , \;\forall \, y_1,y_2\in \mathcal {Y}. \end{aligned}$$

(7.44)

Proposition 6.2

Let $\{{\tilde{w}}^k\}$ be defined by (2.6), and the matrices Q, M, P be given in (2.5), and $H:=\frac{1}{\alpha }I_{q}-\gamma B^{\mathrm{T}}B\succeq 0$. For any $w\in \Omega $, we have

(7.45)

Proof

By the optimality condition of the two subproblems in ADMM, we have

$$\begin{aligned}&(x-x^{k+1})^{\mathrm{T}}\left[ \nabla _x f(x^{k+1},y^k)+h'_1(x^{k+1}) -A^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b))\right. \\&\quad \left. +\,G(x^{k+1}-x^k)\right] \geqslant 0, \quad \quad \forall x\in \mathcal {X}\end{aligned}$$

and

$$\begin{aligned}&(y-y^{k+1})^{\mathrm{T}}\left[ y^{k+1}-y^k+\alpha \left( \nabla _y f(x^{k+1},y^{k})+\nabla _y h_2(y^{k})-B^{\mathrm{T}}(\lambda ^k-\gamma (Ax^{k+1} \right. \right. \\&\quad \left. \left. +\,By^{k}-b))\right) \right] \geqslant 0, \quad \quad \forall y\in \mathcal {Y}. \end{aligned}$$

Noting ${\tilde{\lambda }}^k=\lambda ^k-\gamma (Ax^{k+1}+By^{k}-b)$ and the definition of ${\tilde{w}}^k$, the above two inequalities are equivalent to

$$\begin{aligned} (x-{\tilde{x}}^{k})^{\mathrm{T}}\left[ \nabla _x f({\tilde{x}}^k,y^k)+h'_1({\tilde{x}}^k)-A^{\mathrm{T}}{\tilde{\lambda }}^k +G({\tilde{x}}^k-x^k)\right] \geqslant 0,\quad \forall x\in \mathcal {X} \end{aligned}$$

(7.46)

and

$$\begin{aligned}&\quad (y-{\tilde{y}}^{k})^{\mathrm{T}}\left[ \nabla _y f({\tilde{x}}^k,y^k)+\nabla _y h_2(y^k)-B^{\mathrm{T}}{\tilde{\lambda }}^k+\frac{1}{\alpha }({\tilde{y}}^k -y^k)\right] \nonumber \\&\geqslant 0,\quad \forall y\in \mathcal {Y}. \end{aligned}$$

(7.47)

Therefore,

$$\begin{aligned}&\quad \, (x-{\tilde{x}}^{k})^{\mathrm{T}}\nabla _x f({\tilde{x}}^k,y^k)+(y-{\tilde{y}}^{k})^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)\nonumber \\&= (x-{\tilde{x}}^{k})^{\mathrm{T}}\nabla _x f({\tilde{x}}^k,y^k)+(y-y^{k})^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)+(y^k-{\tilde{y}}^{k})^{\mathrm{T}}\nabla _y f({\tilde{x}}^k,y^k)\nonumber \\&\leqslant f(x,y)-f({\tilde{x}}^k,y^k)-({\tilde{y}}^{k}-y^k)^{\mathrm{T}} \nabla _y f({\tilde{x}}^k,y^k)\nonumber \\&\leqslant f(x,y)-f({\tilde{x}}^k,{\tilde{y}}^k)+\frac{L}{2}\Vert y^k -{\tilde{y}}^k\Vert ^2. \end{aligned}$$

(7.48)

Moreover, by (2.4), we have

$$\begin{aligned} (y-{\tilde{y}}^{k})^{\mathrm{T}}\nabla _y h_2(y^k)\leqslant h_2(y)-h_2({\tilde{y}}^k)+\frac{L}{2}\Vert y^k-{\tilde{y}}^k\Vert ^2. \end{aligned}$$

(7.49)

Since

$$\begin{aligned} A{\tilde{x}}^k+B{\tilde{y}}^k-b - B({\tilde{y}}^k-y^k)-\frac{1}{\gamma }\left( \lambda ^k-{\tilde{\lambda }}^k\right) =0, \end{aligned}$$

we have

$$\begin{aligned}&\quad \,(\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}}(A{\tilde{x}}^k +B{\tilde{y}}^k-b)\nonumber \\&=(\lambda -{\tilde{\lambda }}^k)^{\mathrm{T}} \left( -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma } (\lambda ^k-{\tilde{\lambda }}^k)\right) . \end{aligned}$$

(7.50)

By the convexity of $h_1(x)$, combining (7.50), (7.49), (7.48), (7.47), (7.46), and noticing $H:=\frac{1}{\alpha }I_{q}-\gamma B^{\mathrm{T}}B$ for any $w\in \Omega $ and ${\tilde{w}}^k$, we have

$$\begin{aligned}&h(u)-h(\tilde{u}^k)+L\Vert y^k-{\tilde{y}}^k\Vert ^2+(y -{\tilde{y}}^k)^{\mathrm{T}}H({\tilde{y}}^k-y^k)\\&\quad +\left( \begin{array}{c} x-{\tilde{x}}^k\\ y-{\tilde{y}}^k\\ \lambda -{\tilde{\lambda }}^k\\ \end{array}\right) ^{\mathrm{T}}\left\{ \left( \begin{array}{c} -A^{\mathrm{T}}{\tilde{\lambda }}^k\\ -B^{\mathrm{T}}{\tilde{\lambda }}^k\\ A{\tilde{x}}^k+B{\tilde{y}}^k-b\\ \end{array}\right) -\left( \begin{array}{c} G(x^k-{\tilde{x}}^k)\\ \gamma B^{\mathrm{T}}B(y^k-{\tilde{y}}^k)\\ -B(y^k-{\tilde{y}}^k)+\frac{1}{\gamma }(\lambda ^k-{\tilde{\lambda }}^k)\\ \end{array}\right) \right\} \geqslant 0. \end{aligned}$$

As a result, (7.45) follows.

The proof of Theorem 4.4 follows a similar line of derivations as in the proof of Theorem 3.1, and so we omit the details here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, X., Zhang, SZ. First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints . J. Oper. Res. Soc. China 5, 131–159 (2017). https://doi.org/10.1007/s40305-016-0131-5

Download citation

Received: 18 February 2016
Revised: 16 April 2016
Accepted: 09 May 2016
Published: 18 June 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s40305-016-0131-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

Abstract

Access this article

Similar content being viewed by others

A Bregman-style Partially Symmetric Alternating Direction Method of Multipliers for Nonconvex Multi-block Optimization

An inexact proximal generalized alternating direction method of multipliers

The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proofs of the Convergence Theorems

1.1 Appendix 1: Proof of Theorem 3.1

Proposition 7.1

Proof

Proposition 7.2

Proof

1.2 Appendix 2: Proof of Theorem 3.1.

Proof

1.3 Appendix 3: Proof of Theorem 4.1

Proposition 7.3

Proof

Proposition 7.4

1.4 Appendix 4: Proof of Theorem 4.2

Proposition 7.5

Proof

Proposition 7.6

1.5 Appendix 5: Proofs of Theorems 4.3 and 4.4

Proposition 6.1

Proof

Proposition 6.2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints

Abstract

Access this article

Similar content being viewed by others

A Bregman-style Partially Symmetric Alternating Direction Method of Multipliers for Nonconvex Multi-block Optimization

An inexact proximal generalized alternating direction method of multipliers

The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proofs of the Convergence Theorems

Appendix: Proofs of the Convergence Theorems

1.1 Appendix 1: Proof of Theorem 3.1

Proposition 7.1

Proof

Proposition 7.2

Proof

1.2 Appendix 2: Proof of Theorem 3.1.

Proof

1.3 Appendix 3: Proof of Theorem 4.1

Proposition 7.3

Proof

Proposition 7.4

1.4 Appendix 4: Proof of Theorem 4.2

Proposition 7.5

Proof

Proposition 7.6

1.5 Appendix 5: Proofs of Theorems 4.3 and 4.4

Proposition 6.1

Proof

Proposition 6.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation