Skip to main content
Log in

Accelerated primal–dual proximal block coordinate updating methods for constrained convex optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Block coordinate update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal–dual BCU method for solving linearly constrained convex program with multi-block variables. The method is an accelerated version of a primal–dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an O(1 / t) convergence rate under convexity assumption. We show that the rate can be accelerated to \(O(1/t^2)\) if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In fact, [15] presents a more general algorithmic framework. It assumes two groups of variables, and each has multi-block structure. Our method in Algorithm 2 is an accelerated version of one special case of Algorithm 1 in [15].

  2. Besides the scenario that g and h are strongly convex, h is smooth, and B is of full row-rank, [13, Theorem 3.1] also shows linear convergence of the linearized ADMM under three other different scenarios.

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bredies, K., Sun, H.: Accelerated douglas-rachford methods for the solution of convex-concave saddle-point problems. arXiv preprint arXiv:1604.06282 (2016)

  4. Cai, X., Han, D., Yuan, X.: The direct extension of admm for three-block separable convex minimization models is convergent when one function is strongly convex. Optim. Online (2014)

  5. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25(2), 1221–1248 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  11. Dang, C., Lan, G.: Randomized methods for saddle point computation. arXiv preprint arXiv:1409.8625 (2014)

  12. Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block admm with \(o (1/k)\) convergence. J. Sci. Comput. 71(2), 712–736 (2017)

  13. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  15. Gao, X., Xu, Y., Zhang, S.: Randomized primal-dual proximal block coordinate updates. arXiv preprint arXiv:1605.05969 (2016)

  16. Gao, X., Zhang, S.-Z.: First-order algorithms for convex optimization with nonseparable objective and coupled onstraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)

  17. Glowinski, R., Marrocco, A.: Sur l’approximation, par eléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM. Math. Modell. Numer. Anal. 9(R2), 41–76 (1975)

    MATH  Google Scholar 

  18. Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  19. He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  20. He, B., Tao, M., Yuan, X.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22(2), 313–340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  21. He, B., Yuan, X.: On the acceleration of augmented lagrangian method for linearly constrained optimization. Optim. Online (2010)

  22. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. James, G. M., Paulson, C., Rusmevichientong, P.: Penalized and constrained regression. Technical report (2013)

  24. Li, H., Lin, Z.: Optimal nonergodic \( o (1/k) \) convergence rate: when linearized adm meets nesterov’s extrapolation. arXiv preprint arXiv:1608.06366 (2016)

  25. Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia-Pac. J. Oper. Res. 32(04), 1550024 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the admm with multiblock variables. SIAM J. Optim. 25(3), 1478–1497 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Markowitz, H.: Portfolio selection. J. Finance 7(1), 77–91 (1952)

    Google Scholar 

  29. Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). Soviet Math. Doklady 27(2), 372–376 (1983)

    MATH  Google Scholar 

  31. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  32. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  33. Ouyang, Y., Chen, Y., Lan, G., Pasiliao Jr., E.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  34. Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)

    MATH  Google Scholar 

  35. Peng, Z., Xu, Y., Yan, M., Yin, W.: Arock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  36. Pesquet, J.-C., Repetti, A.: A class of randomized primal-dual algorithms for distributed optimization. arXiv preprint arXiv:1406.6404 (2014)

  37. Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1–2), 433–484 (2016)

  38. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  39. Xu, Y.: Hybrid jacobian and gauss-seidel proximal block coordinate update methods for linearly constrained convex programming. arXiv preprint arXiv:1608.03928 (2016)

  40. Xu, Y.: Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  41. Xu, Y.: Asynchronous parallel primal-dual block update methods. arXiv preprint arXiv:1705.06391 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangyang Xu.

Additional information

This work is partly supported by NSF Grant DMS-1719549 and CMMI-1462408.

Appendices

Appendix A: Technical proofs: Sect. 2

In this section, we give the detailed proofs of the lemmas and theorems in Sect. 2. The following lemma will be used a few times. Note that when \(S=[M]\), the result is deterministic.

Lemma 7.1

Let S be a uniformly selected subset of [M] with cardinality m and \(x^o\) be a vector independent of S. Suppose \(x^+\) is a random vector dependent on S and its coordinates out of S are the same as \(x^o\). Let \(\beta \in \mathbb {R}\), \(\lambda ^o\) and \(r^o\) be vectors independent of S, and W a positive semidefinite \(M\times M\) block diagonal matrix. If

$$\begin{aligned} \nabla _{S} f(x^o)+\tilde{\nabla }g_S(x_S^+)-A_S^\top (\lambda ^o- \beta r^o)+W_S(x_S^+-x_S^o)=0, \end{aligned}$$

then for any x, it holds that

$$\begin{aligned} \begin{aligned}&~ \mathbb {E}_S\left[ F(x^+)-F(x)+\frac{\mu }{2}\Vert x^+-x\Vert ^2-\left\langle A(x^+-x), \lambda ^o-\beta r^o\right\rangle \right] \\&\quad \le ~ (1- \theta )\left[ F(x^o)-F(x)+\frac{\mu }{2}\Vert x^o-x\Vert ^2-\big \langle A(x^o-x), \lambda ^o-\beta r^o\big \rangle \right] \\&\qquad ~-\frac{1}{2}\mathbb {E}_S\left[ \Vert x^+-x\Vert _W^2-\Vert x^o-x\Vert _W^2+\Vert x^+-x^o\Vert _{W-L_m I}^2\right] , \end{aligned} \end{aligned}$$
(45)

where \(\theta =\frac{m}{M}\), \(L_m\) is given in Assumption 4, and the expectation is taken on S.

Proof

For any x, we have

$$\begin{aligned} \left\langle x_S^+-x_S, \nabla _{S} f(x^o)+\tilde{\nabla }g_S(x_S^+)-A_S^\top (\lambda ^o- \beta r^o)+W_S(x_S^+-x_S^o)\right\rangle =0. \end{aligned}$$

We split the left hand side of the above equation into four terms and bound each of them as below. First, we have

$$\begin{aligned}&~\mathbb {E}_S\left\langle x^+_S-x_S, \nabla _{S} f(x^o)\right\rangle \\&\quad =~\mathbb {E}_S\left\langle x^+-x^o, \nabla f(x^o)\right\rangle + \mathbb {E}_S\left\langle x^o_S-x_S, \nabla _S f(x^o)\right\rangle \\&\quad \ge ~\mathbb {E}_S \left[ f(x^+)-f(x^o)-\frac{L_m}{2}\Vert x^+-x^o\Vert ^2\right] + \theta [f(x^o)-f(x)]\\&\quad =~ \mathbb {E}_S\left[ f(x^+)-f(x)-\frac{L_m}{2}\Vert x^+-x^o\Vert ^2\right] - (1- \theta )[f(x^o)-f(x)], \end{aligned}$$
(46)

where the first equality uses the fact \(x_i^+=x_i^o,\,\forall i\not \in S\), and the inequality follows from the uniform distribution of S, the convexity of f, and also the inequality (18).

Secondly, it follows from the strong convexity of g that

$$\begin{aligned} \left\langle x^+_S-x_S, \tilde{\nabla } g_S(x_S^+)\right\rangle \ge g_S(x_S^+) - g_S(x_S) + \sum _{i\in S}\frac{\mu }{2}\Vert x_i^+-x_i\Vert ^2. \end{aligned}$$
(47)

Since \(g_S(x_S^+) - g_S(x_S)=g(x^+) - g(x^o) + g_S(x_S^o)- g_S(x_S)\) and \(\mathbb {E}_S[g_S(x_S^o)- g_S(x_S)]=\theta [g(x^o)-g(x)]\), we have

$$\begin{aligned} \mathbb {E}_S[g_S(x_S^+) - g_S(x_S)]=&~\mathbb {E}_S[g(x^+) - g(x^o)]+\theta [g(x^o)-g(x)] \\=&~\mathbb {E}_S[g(x^+)-g(x)]-(1-\theta ) [g(x^o)-g(x)]. \end{aligned}$$
(48)

Similarly, it holds \(\mathbb {E}_S\sum _{i\in S}\frac{\mu }{2}\Vert x_i^+-x_i\Vert ^2=\frac{\mu }{2}\left( \mathbb {E}_S\Vert x^+-x\Vert ^2-(1-\theta )\Vert x^o-x\Vert ^2\right) .\) Hence, taking expectation on both sides of (47) yields

$$\begin{aligned}&~\mathbb {E}_S\left\langle x^+_S-x_S, \tilde{\nabla } g_S(x_S^+)\right\rangle \nonumber \\&\quad \ge ~\mathbb {E}_S \left[ g(x^+) - g(x)+\frac{\mu }{2}\Vert x^+-x\Vert ^2\right] \nonumber \\&\qquad -(1-\theta )\left[ g(x^o) - g(x)+\frac{\mu }{2}\Vert x^o-x\Vert ^2\right] . \end{aligned}$$
(49)

Thirdly, by essentially the same arguments on showing (48), we have

$$\begin{aligned} \mathbb {E}_S \left\langle x^+_S-x_S, -A_S^\top (\lambda ^o-\beta r^o)\right\rangle= & {} -\,\mathbb {E}_S \left\langle A(x^+-x), \lambda ^o-\beta r^o\right\rangle \nonumber \\&+\, (1-\theta ) \big \langle A(x^o-x), \lambda ^o-\beta r^o\big \rangle . \end{aligned}$$
(50)

Fourth, note \(\left\langle x^+_S-x_S, W_S(x_S^+-x_S^o)\right\rangle =\left\langle x^+-x, W(x^+-x^o)\right\rangle \), and thus by (6),

$$\begin{aligned} \mathbb {E}_S\left\langle x^+_S-x_S, W_S(x_S^+-x_S^o)\right\rangle =\frac{1}{2}\mathbb {E}_S\left[ \Vert x^+-x\Vert _W^2-\Vert x^o-x\Vert _W^2+\Vert x^+-x^o\Vert _W^2\right] . \end{aligned}$$
(51)

The desired result is obtained by adding (46), (49), (50), and (51), and recalling \(F=f+g\). \(\square \)

1.1 Proof of Lemma 2.1

From (7a), we have the optimality condition

$$\begin{aligned} \nabla f(x^k)-A^\top (\lambda ^k-\beta _kr^k)+\tilde{\nabla } g(x^{k+1})+P^k(x^{k+1}-x^k)=0. \end{aligned}$$

Hence, for any x such that \(Ax=b\), it follows from the definition of \(\Phi \) in (3) and Lemma 7.1 with \(S=[M]\), \(x^o=x^k\), \(\lambda ^o=\lambda ^k\), \(\beta =\beta _k\), \(x^+=x^{k+1}\), and \(W=P^k\) that

$$\begin{aligned} \Phi (x^{k+1},x,\lambda ) \le&~ \left\langle Ax^{k+1}-b,\lambda ^k-\beta _kr^k\right\rangle - \left\langle Ax^{k+1}-b, \lambda \right\rangle \\&-\frac{1}{2}\mathbb {E}_S\left[ \Vert x^{k+1}-x\Vert _{P^k+\mu I}^2-\Vert x^k-x\Vert _{P^k}^2+\Vert x^{k+1}-x^k\Vert _{P^k-L_f I}^2\right] .\nonumber \\ \end{aligned}$$
(52)

Using the fact \(\lambda ^{k+1}=\lambda ^k-\rho _k(Ax^{k+1}-b)\), we have

$$\begin{aligned} \left\langle Ax^{k+1}-b,\lambda ^k-\lambda \right\rangle =&~\frac{1}{\rho _k}\left\langle \lambda ^k-\lambda ^{k+1},\lambda ^k-\lambda \right\rangle \\\overset{(6)}{=}&~\frac{1}{2\rho _k}\left[ \Vert \lambda -\lambda ^k\Vert ^2-\Vert \lambda -\lambda ^{k+1}\Vert ^2+\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2\right] . \end{aligned}$$
(53)

In addition, we write \(r^k=r^k-r^{k+1}+r^{k+1}=r^{k+1}-A(x^{k+1}-x^k)\) and have

$$\begin{aligned}&~\left\langle Ax^{k+1}-b,-\beta _k r^k\right\rangle \\&\quad =~-\beta _k\Vert r^{k+1}\Vert ^2+\beta _k\left\langle A(x^{k+1}-x), A(x^{k+1}-x^k)\right\rangle \\&\quad \overset{(6)}{=}~-\beta _k\Vert r^{k+1}\Vert ^2+\frac{\beta _k}{2}\left[ \Vert A(x^{k+1}-x)\Vert ^2-\Vert A(x^k-x)\Vert ^2+\Vert A(x^{k+1}-x^k)\Vert ^2\right] \nonumber \\ \end{aligned}$$
(54)

Substituting (53) and (54) into (52) gives the inequality in (8).

1.2 Proof of Theorem 2.2

First, we have

$$\begin{aligned}&\sum _{k=1}^t\frac{k+k_0+1}{2\rho _k}\left[ \Vert \lambda -\lambda ^k\Vert ^2-\Vert \lambda -\lambda ^{k+1}\Vert ^2\right] \nonumber \\&\quad = \frac{k_0+2}{2\rho _1}\Vert \lambda -\lambda ^1\Vert ^2-\frac{t+k_0+1}{2\rho _t}\Vert \lambda -\lambda ^{t+1}\Vert ^2\nonumber \\&\qquad +\sum _{k=2}^t \left( \frac{k+k_0+1}{2\rho _k}-\frac{k+k_0}{2\rho _{k-1}}\right) \Vert \lambda -\lambda ^k\Vert ^2 \nonumber \\&\quad \overset{(10)}{\le }\frac{k_0+2}{2\rho _1}\Vert \lambda -\lambda ^1\Vert ^2 . \end{aligned}$$
(55)

In addition,

$$\begin{aligned}&-\sum _{k=1}^t\frac{k+k_0+1}{2}\left( \Vert x^{k+1}-x\Vert _{P^k-\beta _k A^\top A+\mu I}^2-\Vert x^k-x\Vert _{P^k-\beta _k A^\top A}^2\right) \nonumber \\&\quad = \frac{k_0+2}{2}\Vert x^1-x\Vert _{P^1-\beta _1 A^\top A}^2-\frac{t+k_0+1}{2}\Vert x^{t+1}-x\Vert _{P^t-\beta _t A^\top A+\mu I}^2\nonumber \\&\qquad +\frac{1}{2}\sum _{k=2}^t\left( (k+k_0+1)\Vert x^k-x\Vert _{P^k-\beta _k A^\top A}^2-(k+k_0)\Vert x^k-x\Vert _{P^{k-1}-\beta _{k-1}A^\top A+\mu I}^2\right) \nonumber \\&\quad \overset{(11)}{\le }~\frac{k_0+2}{2}\Vert x^1-x\Vert _{P^1-\beta _1 A^\top A}^2-\frac{t+k_0+1}{2}\Vert x^{t+1}-x\Vert _{P^t-\beta _t A^\top A+\mu I}^2. \end{aligned}$$
(56)

Now multiplying \(k+k_0+1\) to both sides of (8) and adding it over k, we obtain (12) by using (55) and (56), and noting \(\Vert \lambda ^k-\lambda ^{k+1}\Vert ^2=\rho _k^2\Vert r^{k+1}\Vert ^2\) and \(\Vert x^{k+1}-x^k\Vert _{P^k-\beta _k A^\top A - L_f I}^2 \ge 0\).

1.3 Proof of Theorem 2.3

From the choice of \(k_0\) and the condition \(P-\beta A^\top A \preceq \frac{\mu }{2} I\), it is not difficult to verify

$$\begin{aligned}&(k+k_0+1)\left[ kP-k\beta A^\top A+L_f I\right] \\&\quad \preceq (k+k_0)\left[ (k-1)P-(k-1)\beta A^\top A+(L_f+\mu )I\right] ,\,\forall k\ge 1. \end{aligned}$$

Hence, the condition in (11) holds. In addition, it is easy to see that all conditions in (9) and (10) also hold. Therefore, we have (12), which, by taking parameters in (14) and \(x=x^*\), reduces to

$$\begin{aligned}&\sum _{k=1}^t(k+k_0+1)\Phi (x^{k+1},x^*,\lambda )+\sum _{k=1}^t\frac{k(k+k_0+1)}{2}\beta \Vert r^{k+1}\Vert ^2\nonumber \\&\quad +\frac{t+k_0+1}{2}\Vert x^{t+1}-x^*\Vert ^2_{t(P-\beta A^\top A)+(L_f+\mu ) I} \le \phi _1(x^*,\lambda ), \end{aligned}$$
(57)

where we have used the fact \(\lambda ^1=0\).

Letting \(\lambda =\lambda ^*\), we have from (5) and (57) that (by dropping nonnegative \(\Phi (x^{k+1},x^*,\lambda ^*)\)’s):

$$\begin{aligned} \frac{t(t+k_0+1)}{2}\beta \Vert r^{t+1}\Vert ^2+\frac{t+k_0+1}{2}\Vert x^{t+1}-x^*\Vert ^2_{t(P-\beta A^\top A)+(L_f+\mu ) I} \le \phi _1(x^*,\lambda ^*), \end{aligned}$$

which indicates (15). In addition, from the convexity of F and (57), we have that for any \(\lambda \), it holds \(\frac{t(t+2k_0+3)}{2}\Phi (\bar{x}^{t+1},x^*,\lambda )\le \phi _1(x^*,\lambda ),\) which together with Lemmas 1.2 and 1.3 implies (16).

Appendix B: Technical proofs: Sect. 3

In this section, we give the proofs of the lemmas and theorems in Sect. 3.

1.1 Proof of Lemma 3.1

From the update in (17a), we have the optimality condition:

$$\begin{aligned} \nabla _{S_k} f(x^k)-A_{S_k}^\top (\lambda ^k-\beta _k r^k)+\tilde{\nabla } g_{S_k}(x_{S_k}^{k+1})+\eta _k (x_{S_k}^{k+1}-x_{S_k}^k) = 0. \end{aligned}$$
(58)

It follows from the update rule of \(\lambda \) that

$$\begin{aligned} -\langle Ax^{k+1}-b, \lambda ^k\rangle = - \langle Ax^{k+1}-b, \lambda ^{k+1}\rangle - \rho _k\Vert r^{k+1}\Vert ^2. \end{aligned}$$

Plugging (54) and the above equation into (45) with \(S=S_k, \lambda ^o=\lambda ^k, \beta =\beta _k, x^o=x^k\), \(x^+=x^{k+1}\), \(W=\eta _k I\), and x satisfying \(Ax=b\), we have the desired result by taking expectation and recalling the definition of \(\Delta \) in (2) and \(\Phi \) in (3).

1.2 Proof of Theorem 3.2

Let \(\beta _k=\beta , \rho _k=\rho \) and \(\eta _k=\eta \) in (19), and also note \(\mu =0\) and \(\eta \ge L_m+\beta \Vert A\Vert ^2\). We have

$$\begin{aligned}&~\mathbb {E}\left[ \Phi (x^{k+1},x,\lambda ^{k+1})+(\beta -\rho )\Vert r^{k+1}\Vert ^2\right] \\&\quad \le ~(1-\theta )\mathbb {E}\left[ \Phi (x^k,x,\lambda ^k)+\beta \Vert r^k\Vert ^2\right] \\&\qquad - \frac{1}{2}\mathbb {E}\left[ \Vert x^{k+1}-x\Vert ^2_{\eta I-\beta A^\top A}-\Vert x^{k}-x\Vert ^2_{\eta I-\beta A^\top A}\right] . \nonumber \end{aligned}$$

Summing the above inequality over \(k=1\) through t and noting \(\rho \le \theta \beta \) give

$$\begin{aligned}&~\mathbb {E}\left[ \Phi (x^{t+1},x,\lambda ^{t+1})+(\beta -\rho )\Vert r^{t+1}\Vert ^2\right] + \theta \sum _{k=1}^{t-1}\mathbb {E}\Phi (x^{k+1},x,\lambda ^{k+1}) \\&\quad \le ~(1-\theta )\mathbb {E}\left[ \Phi (x^1,x,\lambda ^1)+\beta \Vert r^1\Vert ^2\right] + \frac{1}{2}\Vert x^1-x\Vert ^2_{\eta I-\beta A^\top A}. \nonumber \end{aligned}$$
(59)

By the update of \(\lambda \), it follows that

$$\begin{aligned}&\theta \sum _{k=1}^{t-1} \Phi (x^{k+1},x,\lambda ^{k+1})=~\theta \sum _{k=1}^{t-1} \left[ \Phi (x^{k+1},x,\lambda )+\frac{1}{\rho } \langle \lambda ^{k+1}-\lambda , \lambda ^{k+1}-\lambda ^k\rangle \right] \\&\quad =~\theta \sum _{k=1}^{t-1} \Phi (x^{k+1},x,\lambda )+\frac{\theta }{2\rho }\sum _{k=1}^{t-1} \left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \\&\quad =~\theta \sum _{k=1}^{t-1} \Phi (x^{k+1},x,\lambda )+\frac{\theta }{2\rho }\left[ \Vert \lambda ^{t}-\lambda \Vert ^2-\lambda ^1-\lambda \Vert ^2+\sum _{k=1}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \end{aligned}$$
(60)

and

$$\begin{aligned} \Phi (x^{t+1},x,\lambda ^{t+1}) =&~\Phi (x^{t+1},x,\lambda ) - \langle \lambda ^t - \lambda - \rho r^{t+1}, r^{t+1}\rangle \\=&~\Phi (x^{t+1},x,\lambda )- \langle \lambda ^t - \lambda , r^{t+1}\rangle +\rho \Vert r^{t+1}\Vert ^2. \end{aligned}$$
(61)

Since \(\rho \le \theta \beta \), by Young’s inequality, it holds

$$\begin{aligned} \beta \Vert r^{t+1}\Vert ^2 - \langle \lambda ^t - \lambda , r^{t+1}\rangle + \frac{\theta }{2\rho }\Vert \lambda ^{t}-\lambda \Vert ^2 \ge 0. \end{aligned}$$

Then plugging (60) and (61) into (59), we have

$$\begin{aligned}&~\mathbb {E}\Phi (x^{t+1},x,\lambda ) +\theta \sum _{k=1}^{t-1} \mathbb {E}\Phi (x^{k+1},x,\lambda )\\&\quad \le ~(1-\theta )\mathbb {E}\left[ \Phi (x^1,x,\lambda ^1)+\beta \Vert r^1\Vert ^2\right] + \frac{1}{2}\Vert x^1-x\Vert ^2_{\eta I-\beta A^\top A} + \frac{\theta }{2\rho }\mathbb {E}\Vert \lambda ^1-\lambda \Vert ^2\\&\quad \le ~ \mathbb {E}\phi _2(x,\lambda ), \end{aligned}$$
(62)

where in the last inequality we have used \(\lambda ^1=0\), \(\theta >0\) and \(\Vert r^1\Vert ^2=\Vert x^1-x\Vert ^2_{\beta A^\top A}\).

Therefore, from the convexity of F, it follows that \(\mathbb {E}\Phi (\bar{x}^{t},x^*,\lambda ) \le \frac{1}{1+\theta (t-1)}\mathbb {E}\phi _2(x^*,\lambda ),\,\forall \lambda \), and we obtain the desired result from Lemmas 1.2 and 1.3.

1.3 Proof of Theorem 3.3

We first establish a few inequalities below.

Proposition 8.1

If (21e), (21f) and (21g) hold, then

$$\begin{aligned}&-\sum _{k=1}^t(k+k_0+1) \mathbb {E}\left[ \Delta _{\eta _k I-\beta _k A^\top A}(x^{k+1},x^k,x)-\frac{L_m}{2}\Vert x^{k+1}-x^k\Vert ^2\right] \\&\qquad -\frac{\mu (t+k_0+1)}{2}\mathbb {E}\Vert x^{t+1}-x\Vert ^2-\sum _{k=2}^t\frac{\mu \big (\theta (k+k_0+1)-1\big )}{2}\mathbb {E}\Vert x^k-x\Vert ^2\\&\quad \le \frac{\eta _1(k_0+2)}{2}\mathbb {E}\Vert x^1-x\Vert ^2-\frac{(t+k_0+1)}{2}\mathbb {E}\Vert x^{t+1}-x\Vert ^2_{(\mu +\eta _t)I-\beta _t A^\top A}. \end{aligned}$$
(63)

Proof

This inequality can be easily shown by noting that for any \(1\le k\le t\), the weight matrix of \(\frac{1}{2}\Vert x^{k+1}-x^k\Vert ^2\) is \( \beta _k(k+k_0+1)A^\top A-(k+k_0+1)(\eta _k-L_m)I\), which is negative semidefinite, and for any \(2\le k\le t\), the weight matrix of \(\frac{1}{2}\Vert x^{k}-x\Vert ^2\) is

$$\begin{aligned}&\big [\beta _{k-1}(k+k_0)-\beta _k(k+k_0+1)\big ]A^\top A\\&\quad +\left[ (k+k_0+1)\eta _k-(k+k_0)\eta _{k-1}-\mu \big (\theta (k+k_0+1)-1\big )\right] I, \end{aligned}$$

which is also negative semidefinite.\(\square \)

Proposition 8.2

If (21a), (21c) and (21d) hold, then

$$\begin{aligned}&-\frac{t+k_0+1}{\rho _t}\mathbb {E}\Delta (\lambda ^{t+1},\lambda ^t,\lambda )-\sum _{k=2}^t\frac{\theta (k+k_0+1)-1}{\rho _{k-1}}\mathbb {E}\Delta (\lambda ^{k},\lambda ^{k-1},\lambda )\\&\quad \le \frac{\theta (k_0+3)-1}{2\rho _1}\mathbb {E}\Vert \lambda ^1-\lambda \Vert ^2. \end{aligned}$$
(64)

Proof

On the left hand side of (64), the coefficient of each \(\frac{1}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\) is negative. For \(2\le k\le t-1\), the coefficient of \(\frac{1}{2}\Vert \lambda ^k-\lambda \Vert ^2\) is \(\frac{\theta (k+k_0+2)-1}{\rho _k}-\frac{\theta (k+k_0+1)-1}{\rho _{k-1}}\), which is nonpositive; the coefficient of \(\frac{1}{2}\Vert \lambda ^t-\lambda \Vert ^2\) is \(\frac{t+k_0+1}{\rho _t}-\frac{\theta (t+k_0+1)-1}{\rho _{t-1}}\), which is nonpositive; the coefficient of \(\frac{1}{2}\Vert \lambda ^{t+1}-\lambda \Vert ^2\) is also nonpositive. Hence, dropping these nonpositive terms, we have the desired result.\(\square \)

Now we are ready to prove Theorem 3.3.

Proof of Theorem 3.3

Multiplying \(k+k_0+1\) to both sides of (19), summing it up from \(k=1\) through t, and moving the terms about \(\Phi (x^k,x,\lambda ^k)+\frac{\mu }{2}\Vert x^k-x\Vert ^2\) and \(\Vert r^k\Vert ^2\) to the left hand side for \(2\le k\le t\) give

$$\begin{aligned}&(t+k_0+1) \mathbb {E}\left[ \Phi (x^{t+1},x,\lambda ^{t+1})+(\beta _t-\rho _t)\Vert r^{t+1}\Vert ^2+ \frac{\mu }{2}\Vert x^{t+1}-x\Vert ^2\right] \\&\qquad +\sum _{k=2}^t\big (\theta (k+k_0+1)-1\big )\mathbb {E}\left[ \Phi (x^k,x,\lambda ^k)+\frac{\mu }{2}\Vert x^k-x\Vert ^2\right] \\&\qquad +\sum _{k=2}^t\big ((\beta _{k-1}-\rho _{k-1})(k+k_0)-(1-\theta )(k+k_0+1)\beta _k\big )\mathbb {E}\Vert r^k\Vert ^2\\&\quad \le (1-\theta )(k_0+2)\mathbb {E}\left[ \Phi (x^1,x,\lambda ^1)+\beta _1\Vert r^1\Vert ^2+\frac{\mu }{2}\Vert x^1-x\Vert ^2\right] \\&\qquad -\sum _{k=1}^t(k+k_0+1) \mathbb {E}\left[ \Delta _{\eta _k I-\beta _k A^\top A}(x^{k+1},x^k,x)-\frac{L_m}{2}\Vert x^{k+1}-x^k\Vert ^2\right] . \nonumber \end{aligned}$$
(65)

Hence, from (21b) and (63), it follows that

$$\begin{aligned} \begin{aligned}&~(t+k_0+1) \mathbb {E}\Phi (x^{t+1},x,\lambda ^{t+1})+\sum _{k=2}^t\big (\theta (k+k_0+1)-1\big )\mathbb {E}\Phi (x^k,x,\lambda ^k)\\&\quad \le ~ (1-\theta )(k_0+2)\mathbb {E}\left[ \Phi (x^1,x,\lambda ^1)+\beta _1\Vert r^1\Vert ^2+\frac{\mu }{2}\Vert x^1-x\Vert ^2\right] \\&\qquad ~+\frac{\eta _1(k_0+2)}{2}\mathbb {E}\Vert x^1-x\Vert ^2-\frac{t+k_0+1}{2}\mathbb {E}\Vert x^{t+1}-x\Vert ^2_{(\mu +\eta _t)I-\beta _t A^\top A}. \end{aligned} \end{aligned}$$
(66)

In addition, from the update of \(\lambda \) in (17c), we have

$$\begin{aligned} \langle \lambda ^{k+1}-\lambda , Ax^{k+1}-b\rangle =-\frac{1}{\rho _k}\langle \lambda ^{k+1}-\lambda ,\lambda ^{k+1}-\lambda ^k\rangle =-\frac{1}{\rho _k}\Delta (\lambda ^{k+1},\lambda ^k,\lambda ), \end{aligned}$$
(67)

and thus

$$\begin{aligned}&(t+k_0+1)\mathbb {E}\langle \lambda ^{t+1}-\lambda , Ax^{t+1}-b\rangle +\!\!\sum _{k=2}^t\big (\theta (k+k_0+1)-1\big )\mathbb {E}\langle \lambda ^k-\lambda , Ax^k-b\rangle \\&\quad =-\frac{t+k_0+1}{\rho _t}\mathbb {E}\Delta (\lambda ^{t+1},\lambda ^t,\lambda )-\sum _{k=2}^t\frac{\theta (k+k_0+1)-1}{\rho _{k-1}}\mathbb {E}\Delta (\lambda ^{k},\lambda ^{k-1},\lambda )\\&\quad \overset{(64)}{\le }\frac{\theta (k_0+3)-1}{2\rho _1}\mathbb {E}\Vert \lambda ^1-\lambda \Vert ^2. \end{aligned}$$

Since \(\Phi (x^k,x,\lambda )=\Phi (x^k,x,\lambda ^k)+\langle \lambda ^k-\lambda , Ax^k-b\rangle ,\) we obtain the desired result by adding the above inequality to (66).\(\square \)

1.4 Proof of Proposition 3.4

Note that (24) implies \(k_0\ge \frac{4}{\theta }\), and thus (21a) must hold. Also, it is easy to see that (21d) holds with equality from the second equation of (23b). Since \(I\succeq \frac{A^\top A}{\Vert A\Vert _2^2}\), we can easily have (21f) by plugging in \(\beta _k\) and \(\eta _k\) defined in (23a) and (23c) respectively.

To verify (21c), we plug in \(\rho _k\) defined in the first equation of (23b), and it is equivalent to requiring that for any \(2\le k\le t-1\)

$$\begin{aligned} \frac{\theta (k+k_0+1)-1}{\theta (k-1)+2+\theta }\ge & {} \frac{\theta (k+k_0+2)-1}{\theta k+2+\theta }\Longleftrightarrow 1+\frac{\theta (k_0+1)-3}{\theta k+2}\\\ge & {} 1+\frac{\theta (k_0+1)-3}{\theta k+2+\theta }. \end{aligned}$$

The inequality on the right hand side obviously holds, and thus we have (21c).

Plugging in the formula of \(\beta _k\), (21e) is equivalent to

$$\begin{aligned} (\theta k+2+\theta )(k+k_0+1)\ge (\theta k+2)(k+k_0), \end{aligned}$$

which holds trivially, and thus (21e) follows.

With the given \(\beta _k\) and \(\rho _k\), (21b) becomes \(\frac{6}{6-5\theta }(\theta k+2)(k+k_0)\ge (k+k_0+1)(\theta k+2+\theta ),\,\forall 2\le k\le t,\) which is equivalent to \(\frac{6}{6-5\theta }\ge \frac{(k_0+3)(3\theta +2)}{(k_0+2)(2\theta +2)}\). Note that \(\frac{k_0+3}{k_0+2}\) is decreasing with respect to \(k_0\ge 0\) and also \(\frac{6}{6-5\theta }\ge \frac{(\frac{3}{\theta }+3)(3\theta +2)}{(\frac{3}{\theta }+2)(2\theta +2)}\). Hence, (21b) is satisfied from the fact \(k_0\ge \frac{4}{\theta }\).

Finally, we show (21g). Plugging in \(\eta _k\), we have that (21g) becomes

$$\begin{aligned}&(k+k_0)\left( \frac{\mu }{2}\left( \theta k+2\right) +L_m\right) +\mu \big (\theta (k+k_0+1)-1\big )\\&\quad \ge (k+k_0+1)\left( \frac{\mu }{2}\left( \theta k+2+\theta \right) +L_m\right) ,\,\forall k\ge 2, \end{aligned}$$

which is equivalent to \(k_0+1\ge \frac{4}{\theta }+\frac{2L_m}{\theta \mu }\). Hence, for \(k_0\) given in (24), (21g) must hold. Therefore, we have verified all conditions in (21).

1.5 Proof of Theorem 3.5

From Proposition 3.4, we have the inequality in (22) that, as \(\lambda ^1=0\), reduces to

$$\begin{aligned}&(t+k_0+1) \mathbb {E}\Phi (x^{t+1},x,\lambda )+\sum _{k=2}^t\big (\theta (k+k_0+1)-1\big )\mathbb {E}\Phi (x^k,x,\lambda )\nonumber \\&\quad \le \phi _3(x,\lambda )-\frac{t+k_0+1}{2}\mathbb {E}\Vert x^{t+1}-x\Vert _{(\mu +\eta _t) I-\beta _t A^\top A}^2. \end{aligned}$$
(68)

For \(\rho \ge 1\), we have

$$\begin{aligned} (\mu +\eta _t) I-\beta _t A^\top A\succeq \left( \frac{(\rho -1)\mu }{2\rho }(\theta t + \theta + 2) + \mu +L_m\right) I. \end{aligned}$$
(69)

Letting \(x=x^*\) and using the convexity of F, we have from (68) and the above inequality that

$$\begin{aligned} \mathbb {E}\left[ F(\bar{x}^{t+1})-F(x^*)-\big \langle \lambda ,A\bar{x}^{t+1}-b\big \rangle \right] \le \frac{1}{T}\mathbb {E}\phi _3(x^*,\lambda ),\,\forall \lambda , \end{aligned}$$
(70)

which together with Lemmas 1.2 and 1.3 with \(\gamma =\max (2\Vert \lambda ^*\Vert , 1+\Vert \lambda ^*\Vert )\) indicates (25).

In addition, note

$$\begin{aligned} \Phi (x^{t+1},x^*,\lambda ^*)\ge \frac{\mu }{2}\Vert x^{t+1}-x^*\Vert ^2. \end{aligned}$$

Hence, letting \((x,\lambda )=(x^*,\lambda ^*)\) in (68) and using (5), we have from (69) that

$$\begin{aligned} \frac{t+k_0+1}{2}\left( \frac{(\rho -1)\mu }{2\rho }(\theta t + \theta + 2)+2\mu +L_m\right) \mathbb {E}\Vert x^{t+1}-x^*\Vert ^2 \le \phi _3(x^*,\lambda ^*), \end{aligned}$$
(71)

and the proof is completed.

Appendix C: Technical proofs: Sect. 4

In this section, we provide the proofs of the lemmas and theorems in Sect. 4.

1.1 Proof of Lemma 4.1

Note \(r^{k+1}-r^k=A(x^{k+1}-x^k)+B(y^{k+1}-y^k)\). Hence by (6), we have

$$\begin{aligned} \begin{aligned} \left\langle A(x^{k+1}-x),-\beta r^k\right\rangle =&~-\beta \left\langle A(x^{k+1}-x), r^{k+1}\right\rangle +\beta \left\langle A(x^{k+1}-x), B(y^{k+1}-y^k)\right\rangle \\&~+\frac{\beta }{2}\left[ \Vert A(x^{k+1}-x)\Vert ^2-\Vert A(x^k-x)\Vert ^2+\Vert A(x^{k+1}-x^k)\Vert ^2\right] . \end{aligned} \end{aligned}$$
(72)

In addition, \(\langle A(x^{k+1}-x), \lambda ^k\rangle = \langle A(x^{k+1}-x), \lambda ^{k+1}+\rho r^{k+1}\rangle \). Plugging this equation and (72) into (45) with \(x^o=x^k, \lambda ^o=\lambda ^k, x^+=x^{k+1}, W=\eta _x I\) and taking expectation yield

$$\begin{aligned}&~ \mathbb {E}\left[ F(x^{k+1})-F(x)+\frac{\mu }{2}\Vert x^{k+1}-x\Vert ^2-\big \langle A(x^{k+1}-x), \lambda ^{k+1}\big \rangle \right. \nonumber \\&\qquad \left. +(\beta -\rho )\big \langle A(x^{k+1}-x), r^{k+1}\big \rangle \right] \nonumber \\&\qquad ~+\frac{1}{2}\mathbb {E}\left[ \Vert x^{k+1}-x\Vert _P^2-\Vert x^k-x\Vert _P^2+\Vert x^{k+1}-x^k\Vert ^2_{P-L_mI}\right] \nonumber \\&\quad \le ~ (1- \theta )\mathbb {E}\left[ F(x^k)-F(x)+\frac{\mu }{2}\Vert x^k-x\Vert ^2-\big \langle A(x^k-x), \lambda ^k-\beta r^k\big \rangle \right] \\&\qquad ~+\beta \mathbb {E}\left\langle A(x^{k+1}-x), B(y^{k+1}-y^k)\right\rangle ,\nonumber \end{aligned}$$
(73)

where \(P=\eta _x I-\beta A^\top A\).

From (30), the optimality condition for \( \tilde{y}^{k+1}\) is

$$\begin{aligned} \nabla h( \tilde{y}^{k+1})- B^\top \lambda ^k+ \beta B^\top r^{k+\frac{1}{2}} +\eta _y (\tilde{y}^{k+1}-y^k)=0. \end{aligned}$$
(74)

Since \({\mathrm {Prob}}(y^{k+1}=\tilde{y}^{k+1})=\theta ,\, {\mathrm {Prob}}(y^{k+1}=y^k)=1-\theta ,\) we have

$$\begin{aligned}&\mathbb {E}\left\langle y^{k+1}- y, \nabla h( y^{k+1})- B^\top \lambda ^{k} + \beta B^\top r^{k+\frac{1}{2}}+\eta _y(y^{k+1}-y^k)\right\rangle \\&\quad =(1-\theta )\mathbb {E}\left\langle y^k- y, \nabla h( y^k)- B^\top \lambda ^k + \beta B^\top r^{k+\frac{1}{2}}\right\rangle , \end{aligned}$$

or equivalently,

$$\begin{aligned}&\mathbb {E}\left\langle y^{k+1}- y, \nabla h( y^{k+1})- B^\top \lambda ^{k+1} +( \beta -\rho ) B^\top r^{k+1}\right. \nonumber \\&\qquad \left. - \beta B^\top B(y^{k+1}-y^k)+\eta _y(y^{k+1}-y^k)\right\rangle \\&\quad =(1-\theta )\mathbb {E}\left\langle y^k- y, \nabla h( y^k)- B^\top \lambda ^k + \beta B^\top r^k\right\rangle \nonumber \\&\qquad +\,\beta (1-\theta ) \mathbb {E}\left\langle B(y^k- y),A(x^{k+1}-x^k)\right\rangle . \end{aligned}$$
(75)

Recall \(Q=\eta _y I -\beta B^\top B\). We have

$$\begin{aligned}&\left\langle y^{k+1}- y, - \beta B^\top B(y^{k+1}-y^k)+\eta _y(y^{k+1}-y^k)\right\rangle \\&\quad&= \frac{1}{2}\left[ \Vert y^{k+1}- y\Vert _Q^2-\Vert y^k- y\Vert _Q^2+\Vert y^{k+1}- y^k\Vert _Q^2\right] . \end{aligned}$$

Therefore adding (75) to (73), noting \(Ax+By=b\), and plugging (67) with \(\rho _k=\rho \), we have the desired result.

1.2 Proof of Theorem 4.2

Before proving Theorem 4.2, we establish a few inequalities. First, using Young’s inequality, we have the following results.

Lemma 9.1

For any \(\tau _1,\tau _2>0\), it holds that

$$\begin{aligned}&\langle A(x^{k+1}- x^*), B( y^{k+1}- y^k)\rangle \le \frac{1}{2\tau _1}\Vert A(x^{k+1}-x^*)\Vert ^2+\frac{\tau _1}{2}\Vert B(y^{k+1}-y^k)\Vert ^2, \end{aligned}$$
(76)
$$\begin{aligned}&\langle B(y^k- y^*),A(x^{k+1}-x^k)\rangle \le \frac{1}{2\tau _2}\Vert B(y^k- y^*)\Vert ^2+\frac{\tau _2}{2}\Vert A(x^{k+1}-x^k)\Vert ^2. \end{aligned}$$
(77)

In addition, we are able to bound the \(\lambda \)-term by y-term and the residual r. The proofs are given in Appendix C.4 and C.5.

Lemma 9.2

For any \(\delta >0\), we have

$$\begin{aligned}&\mathbb {E}\Vert B^\top (\lambda ^{k+1}-\lambda ^*)\Vert ^2-(1-\theta )(1+\delta )\mathbb {E}\Vert B^\top (\lambda ^k-\lambda ^*)\Vert ^2\\&\quad \le 4\mathbb {E}\big [L_h^2\Vert y^{k+1}-y^*\Vert ^2+\Vert Q(y^{k+1}-y^k)\Vert ^2\big ]+2(\beta -\rho )^2\mathbb {E}\Vert B^\top r^{k+1}\Vert ^2\nonumber \\&\qquad +\,2\rho ^2(1-\theta )\left( 1+\frac{1}{\delta }\right) \mathbb {E}\big [\Vert B^\top r^{k+1}\Vert ^2+\Vert B^\top B(y^{k+1}-y^k)\Vert ^2\big ]. \end{aligned}$$
(78)

Lemma 9.3

Assume (38). Then

$$\begin{aligned}&\frac{\sigma _{\min }(BB^\top )}{2}\big [\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\big ]\\&\quad \le \Vert B^\top (\lambda ^{k+1}-\lambda ^*)\Vert ^2-(1-\theta )(1+\delta )\Vert B^\top (\lambda ^k-\lambda ^*)\Vert ^2+\kappa \Vert B^\top (\lambda ^{k+1}-\lambda ^k)\Vert ^2,\nonumber \\ \end{aligned}$$
(79)

where \(\sigma _{\min }(BB^\top )\) denotes the smallest singular value of \(BB^\top \).

Lemma 9.4

Let \(c,\delta ,\tau _1,\tau _2\) and \(\kappa \) be constants satisfying the conditions in Theorem 4.2. Then

$$\begin{aligned}&~\beta \mathbb {E}\big \langle A(x^{k+1}- x^*), B( y^{k+1}- y^k)\big \rangle +\beta (1 -\theta )\mathbb {E}\big \langle B(y^k- y^*),A(x^{k+1}-x^k)\big \rangle \\&\qquad ~+\frac{c}{2}\sigma _{\min }(BB^\top )\mathbb {E}\big [\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\big ]\\&\quad \le ~ \frac{1}{2}\mathbb {E}\Vert x^{k+1}- x^k\Vert _{P-L_mI}^2+\frac{\beta }{2\tau _1}\mathbb {E}\Vert A(x^{k+1}-x^*)\Vert ^2\\&\qquad ~+\frac{1}{2}\mathbb {E}\Vert y^{k+1}- y^k\Vert _Q^2+\frac{\beta (1-\theta )}{2\tau _2}\mathbb {E}\Vert B(y^k-y^*)\Vert ^2+4cL_h^2\mathbb {E}\Vert y^{k+1}-y^*\Vert ^2\\&\qquad ~+\left[ c\rho ^2\left( \kappa +2(1-\theta )\left( 1+\frac{1}{\delta }\right) \right) +2c(\beta -\rho )^2\right] \mathbb {E}\Vert B^\top r^{k+1}\Vert ^2.\nonumber \end{aligned}$$
(80)

Now we are ready to show Theorem 4.2.

Proof of Theorem 4.2

Letting \((x,y,\lambda )=(x^*,y^*,\lambda ^*)\) in (34), plugging (32) into it, and noting \(Ax^*+By^*=b\), we have

$$\begin{aligned}&\mathbb {E}\Psi (z^{k+1},z^*)+( \beta -\rho )\mathbb {E}\Vert r^{k+1}\Vert ^2+\mathbb {E}\left[ \Delta _P(x^{k+1},x^k, x^*)-\frac{L_m}{2}\Vert x^{k+1}- x^k\Vert ^2\right] \nonumber \\&\qquad +\,\mathbb {E}\Delta _Q(y^{k+1},y^k, y^*)+\frac{\mu }{2}\mathbb {E}\Vert x^{k+1}-x^*\Vert ^2+\frac{1}{\rho }\mathbb {E}\Delta (\lambda ^{k+1},\lambda ^{k},\lambda ^*)\nonumber \\&\quad \le (1-\theta )\mathbb {E}\Psi (z^k,z^*)+\beta (1 -\theta )\mathbb {E}\Vert r^k\Vert ^2+\frac{1-\theta }{\rho }\mathbb {E}\Delta (\lambda ^{k},\lambda ^{k-1},\lambda ^*)\nonumber \\&\qquad +\,\frac{\mu (1-\theta )}{2}\mathbb {E}\Vert x^k-x^*\Vert ^2 +\, \beta \mathbb {E}\big \langle A(x^{k+1}- x^*), B( y^{k+1}- y^k)\big \rangle +\beta (1 -\theta )\nonumber \\&\qquad \mathbb {E}\big \langle B(y^k- y^*),A(x^{k+1}-x^k)\big \rangle , \nonumber \\ \end{aligned}$$
(81)

where \(\Psi \) is defined in (36). Note

$$\begin{aligned}&~\frac{1}{\rho }\Delta (\lambda ^{k+1},\lambda ^{k},\lambda ^*)\\&\quad =~\frac{1}{2\rho }\big [\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\big ]\\&\qquad -\frac{\rho }{2}\left( \frac{1}{\theta }-1\right) \Vert r^{k+1}\Vert ^2 - \frac{\theta }{2\rho }\Vert \lambda ^{k}-\lambda ^*\Vert ^2, \end{aligned}$$

and

$$\begin{aligned}&~\frac{1-\theta }{\rho }\Delta (\lambda ^{k},\lambda ^{k-1},\lambda ^*)\\&\quad =~\frac{1}{2\rho }\big [\Vert \lambda ^{k}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k-1}-\lambda ^*\Vert ^2 +\frac{1}{\theta }\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\big ]\\&\qquad -\frac{\rho }{2}\left( \frac{1}{\theta }-(1-\theta )\right) \Vert r^k\Vert ^2- \frac{\theta }{2\rho }\Vert \lambda ^{k}-\lambda ^*\Vert ^2. \end{aligned}$$

Adding (80) to (81) and plugging the above two equations yield

$$\begin{aligned}&\mathbb {E}\Psi (z^{k+1},z^*)+( \beta -\rho )\mathbb {E}\Vert r^{k+1}\Vert ^2+\mathbb {E}\left[ \Delta _P(x^{k+1},x^k, x^*)-\frac{L_m}{2}\Vert x^{k+1}- x^k\Vert ^2\right] \nonumber \\&\qquad +\mathbb {E}\Delta _Q(y^{k+1},y^k, y^*)+\frac{\mu }{2}\mathbb {E}\Vert x^{k+1}-x^*\Vert ^2-\frac{\rho }{2}\left( \frac{1}{\theta }-1\right) \mathbb {E}\Vert r^{k+1}\Vert ^2 - \frac{\theta }{2\rho }\mathbb {E}\Vert \lambda ^{k}-\lambda ^*\Vert ^2\nonumber \\&\qquad +\left( \frac{1}{2\rho }+\frac{c}{2}\sigma _{\min }(BB^\top )\right) \mathbb {E}\big [\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\big ]\nonumber \\&\quad \le (1-\theta )\mathbb {E}\Psi (z^k,z^*)+\beta (1 -\theta )\mathbb {E}\Vert r^k\Vert ^2 -\frac{\rho }{2}\left( \frac{1}{\theta }-(1-\theta )\right) \mathbb {E}\Vert r^k\Vert ^2- \frac{\theta }{2\rho }\mathbb {E}\Vert \lambda ^{k}-\lambda ^*\Vert ^2 \nonumber \\&\qquad +\frac{1}{2\rho }\mathbb {E}\big [\Vert \lambda ^{k}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k-1}-\lambda ^*\Vert ^2 +\frac{1}{\theta }\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\big ]\nonumber \\&\qquad +\frac{\mu (1-\theta )}{2}\mathbb {E}\Vert x^k-x^*\Vert ^2+\frac{1}{2}\mathbb {E}\Vert x^{k+1}- x^k\Vert _{P-L_mI}^2+\frac{\beta }{2\tau _1}\mathbb {E}\Vert A(x^{k+1}-x^*)\Vert ^2\nonumber \\&\qquad +\frac{1}{2}\mathbb {E}\Vert y^{k+1}- y^k\Vert _Q^2+\frac{\beta (1-\theta )}{2\tau _2}\mathbb {E}\Vert B(y^k-y^*)\Vert ^2+4cL_h^2\mathbb {E}\Vert y^{k+1}-y^*\Vert ^2\\&\qquad +\left[ c\rho ^2\left( \kappa +2(1-\theta )\left( 1+\frac{1}{\delta }\right) \right) +2c(\beta -\rho )^2\right] \mathbb {E}\Vert B^\top r^{k+1}\Vert ^2. \nonumber \end{aligned}$$

Using the definition in (2) to expand \(\Delta _P(x^{k+1},x^k, x^*)\) and \(\Delta _Q(y^{k+1},y^k, y^*)\) in the above inequality, and then rearranging terms, we have

$$\begin{aligned}&\mathbb {E}\Psi (z^{k+1},z^*)+\left( ( \beta -\rho )-\frac{\rho }{2}\left( \frac{1}{\theta }-1\right) \right) \mathbb {E}\Vert r^{k+1}\Vert ^2\\&\qquad -\left[ c\rho ^2\left( \kappa +2(1-\theta )\left( 1+\frac{1}{\delta }\right) \right) +2c(\beta -\rho )^2\right] \mathbb {E}\Vert B^\top r^{k+1}\Vert ^2\\&\qquad +\,\mathbb {E}\left[ \frac{1}{2}\Vert x^{k+1}-x^*\Vert _P^2+\frac{\mu }{2}\Vert x^{k+1}-x^*\Vert ^2-\frac{\beta }{2\tau _1}\Vert A(x^{k+1}-x^*)\Vert ^2\right] \nonumber \\&\qquad +\,\mathbb {E}\left[ \frac{1}{2}\Vert y^{k+1}-y^*\Vert _Q^2-4cL_h^2\Vert y^{k+1}-y^*\Vert ^2\right] \nonumber \\&\qquad +\,\left( \frac{1}{2\rho }+\frac{c}{2}\sigma _{\min }(BB^\top )\right) \mathbb {E}\left[ \Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\quad \le (1-\theta )\mathbb {E}\Psi (z^k,z^*)+\beta (1 -\theta )\mathbb {E}\Vert r^k\Vert ^2 -\frac{\rho }{2}\left( \frac{1}{\theta }-(1-\theta )\right) \mathbb {E}\Vert r^k\Vert ^2 + \frac{1}{2}\mathbb {E}\Vert x^{k}-x^*\Vert _P^2 \nonumber \\&\qquad +\frac{\mu (1-\theta )}{2}\mathbb {E}\Vert x^k-x^*\Vert ^2+\frac{1}{2}\mathbb {E}\Vert y^{k}- y^*\Vert _Q^2+\frac{\beta (1-\theta )}{2\tau _2}\mathbb {E}\Vert B(y^k-y^*)\Vert ^2\nonumber \\&\qquad +\frac{1}{2\rho }\mathbb {E}\left[ \Vert \lambda ^{k}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k-1}-\lambda ^*\Vert ^2 +\frac{1}{\theta }\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] . \end{aligned}$$
(82)

Since \(\rho = \theta \beta \), it holds

$$\begin{aligned} ( \beta -\rho )-\frac{\rho }{2}\left( \frac{1}{\theta }-1\right) = \frac{\beta -\rho }{2}, \quad \beta (1 -\theta )-\frac{\rho }{2}\left( \frac{1}{\theta }-(1-\theta )\right) \le \frac{\beta (1-\theta )}{2}, \end{aligned}$$

and thus the inequality (82) implies

$$\begin{aligned}&\mathbb {E}\Psi (z^{k+1},z^*)+\frac{ \beta -\rho }{2}\mathbb {E}\Vert r^{k+1}\Vert ^2 \nonumber \\&\qquad -\left[ c\rho ^2\left( \kappa +2(1-\theta )\left( 1+\frac{1}{\delta }\right) \right) +2c(\beta -\rho )^2\right] \mathbb {E}\Vert B^\top r^{k+1}\Vert ^2\\&\qquad +\,\mathbb {E}\left[ \frac{1}{2}\Vert x^{k+1}-x^*\Vert _P^2+\frac{\mu }{2}\Vert x^{k+1}-x^*\Vert ^2-\frac{\beta }{2\tau _1}\Vert A(x^{k+1}-x^*)\Vert ^2\right] \nonumber \\&\qquad +\,\mathbb {E}\left[ \frac{1}{2}\Vert y^{k+1}-y^*\Vert _Q^2-4cL_h^2\Vert y^{k+1}-y^*\Vert ^2\right] \nonumber \\&\qquad +\left( \frac{1}{2\rho }+\frac{c}{2}\sigma _{\min }(BB^\top )\right) \mathbb {E}\big [\Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\big ]\nonumber \\&\quad \le \psi (z^k,z^*; P,Q,\beta ,\rho ,c,\tau _2), \end{aligned}$$
(83)

where \(\psi \) is defined in (37).

From (33), it follows that

$$\begin{aligned} (1-\alpha )\Psi (z^{k+1},z^*)+\frac{\alpha \mu }{2}\Vert x^{k+1}-x^*\Vert ^2+\alpha \nu \Vert y^{k+1}-y^*\Vert ^2\le \Psi (z^{k+1},z^*) . \end{aligned}$$
(84)

In addition, note that

$$\begin{aligned} \Vert r^{k+1}\Vert ^2= & {} \Vert Ax^{k+1}+By^{k+1}-(Ax^*+By^*)\Vert ^2\\\le & {} 2\Vert A\Vert _2^2\Vert x^{k+1}-x^*\Vert ^2+2\Vert B\Vert _2^2\Vert y^{k+1}-y^*\Vert ^2\\\le & {} \gamma \left( \frac{\alpha \mu }{4}\Vert x^{k+1}-x^*\Vert ^2+\frac{\alpha \nu }{4}\Vert y^{k+1}-y^*\Vert ^2\right) , \end{aligned}$$

and thus

$$\begin{aligned} \frac{1}{\gamma }\Vert r^{k+1}\Vert ^2\le \frac{\alpha \mu }{4}\Vert x^{k+1}-x^*\Vert ^2+\frac{\alpha \nu }{4}\Vert y^{k+1}-y^*\Vert ^2. \end{aligned}$$
(85)

Adding (84) and (85) to (83) gives the desired result.\(\square \)

1.3 Proof of Theorem 4.3

From \(0<\alpha <\theta \), the full row-rankness of B, and the conditions in (41), it is easy to see that \(\eta >1\). Next we find lower bounds of the terms on the left hand of (40). Since \(\eta \le \frac{1-\alpha }{1-\theta }\), we have

$$\begin{aligned} \eta (1-\theta )\Psi (z^{k+1},z^*)\le (1-\alpha )\Psi (z^{k+1},z^*). \end{aligned}$$
(86)

Note \(\Vert A\Vert _2\le 1\) and

$$\begin{aligned} \left( \frac{\alpha \mu }{2}+\mu -\frac{\beta }{\tau _1}\right) I\succeq & {} \frac{\frac{\alpha \mu }{2}+\theta \mu -\frac{\beta }{\tau _1}}{\eta _x+\mu (1-\theta )}(\eta _x I-\beta A^\top A)\\&+ \frac{\frac{\alpha \mu }{2}+\theta \mu -\frac{\beta }{\tau _1}}{\eta _x+\mu (1-\theta )}\mu (1-\theta )I +\mu (1-\theta )I. \end{aligned}$$

Hence, from \(\eta \le 1+\frac{\frac{\alpha \mu }{2}+\theta \mu -\frac{\beta }{\tau _1}}{\eta _x+\mu (1-\theta )}\) and \(P=\eta _x I-\beta A^\top A\), it follows that

$$\begin{aligned} \eta \Vert x^{k+1}- x^*\Vert ^2_{P+\mu (1-\theta )I} \le \Vert x^{k+1}- x^*\Vert ^2_{P+(\frac{\alpha \mu }{2}+\mu )I-\frac{\beta }{\tau _1}A^\top A}. \end{aligned}$$
(87)

Similarly, since

$$\begin{aligned} \left( \frac{3\alpha \nu }{2}-8c L_h^2\right) I\succeq & {} \frac{\frac{3\alpha \nu }{2}-8c L_h^2-\frac{\beta (1-\theta )}{\tau _2}}{\eta _y+\frac{\beta (1-\theta )}{\tau _2}}(\eta _y I-\beta B^\top B) \\&+\frac{\frac{3\alpha \nu }{2}-8c L_h^2-\frac{\beta (1-\theta )}{\tau _2}}{\eta _y+\frac{\beta (1-\theta )}{\tau _2}}\frac{\beta (1-\theta )}{\tau _2}I+\frac{\beta (1-\theta )}{\tau _2}I, \end{aligned}$$

\(Q=\eta _y I-\beta B^\top B\), and \(B^\top B \preceq I\), we have

$$\begin{aligned} \eta \Vert y^{k+1}- y^*\Vert ^2_{Q+\frac{\beta (1 -\theta )}{\tau _2}B^\top B} \le \Vert y^{k+1}- y^*\Vert ^2_{Q+(\frac{3\alpha \nu }{2}-8cL_h^2)I}. \end{aligned}$$
(88)

For the r-term, we note from the definition of \(\eta \) that

$$\begin{aligned} \eta \frac{ \beta (1-\theta )}{2}\le \left( \frac{ \beta (1-\theta )}{2}+\frac{1}{\gamma }\right) -\left( c\rho ^2\left( \kappa +2(1-\theta )\left( 1+\frac{1}{\delta }\right) \right) +2c(\beta -\rho )^2\right) . \end{aligned}$$

In addition, since \(\Vert B\Vert _2\le 1\), it holds \(\Vert B^\top r^{k+1}\Vert \le \Vert r^{k+1}\Vert \), and thus

$$\begin{aligned} \eta \frac{ \beta (1-\theta )}{2}\Vert r^{k+1}\Vert ^2\le & {} \left( \frac{ \beta (1-\theta )}{2}+\frac{1}{\gamma }\right) \Vert r^{k+1}\Vert ^2\nonumber \\&-\left( c\rho ^2\left( \kappa +2(1-\theta )\left( 1+\frac{1}{\delta }\right) \right) +2c(\beta -\rho )^2\right) \Vert B^\top r^{k+1}\Vert ^2.\nonumber \\ \end{aligned}$$
(89)

Finally, it is obvious to have

$$\begin{aligned} \begin{aligned}&~ \frac{\eta }{2\rho }\left[ \Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \\&\quad \le ~\left( \frac{1}{2\rho }+\frac{c}{2}\sigma _{\min }(BB^\top )\right) \left[ \Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] . \end{aligned} \end{aligned}$$
(90)

Therefore, we obtain (42) by the definition of \(\psi \) and adding (86) through (90).

1.4 Proof of Lemma 9.2

Let \( \tilde{\lambda }^{k+1}=\lambda ^k-\rho (Ax^{k+1}+B\tilde{y}^{k+1}-b). \) Then from the update of y, we have

$$\begin{aligned} \begin{aligned}&~\mathbb {E}\Vert B^\top (\lambda ^{k+1}-\lambda ^*)\Vert ^2\\&\quad =~\theta \mathbb {E}\Vert B^\top (\tilde{\lambda }^{k+1}-\lambda ^*)\Vert ^2+(1-\theta )\mathbb {E}\Vert B^\top (\lambda ^k-\lambda ^*-\rho (Ax^{k+1}+By^k-b))\Vert ^2. \end{aligned} \end{aligned}$$
(91)

Below we bound the two terms on the right hand side of (91). First, the definition of \(\tilde{\lambda }^{k+1}\) together with (74) implies

$$\begin{aligned} B^\top \tilde{\lambda }^{k+1}=\nabla h(\tilde{y}^{k+1})+Q(\tilde{y}^{k+1}-y^k)+(\beta -\rho )B^\top (Ax^{k+1}+B\tilde{y}^{k+1}-b). \end{aligned}$$
(92)

Hence, by the Young’s inequality and the condition in (32b), we have

$$\begin{aligned} \begin{aligned}&~\theta \mathbb {E}\Vert B^\top (\tilde{\lambda }^{k+1}-\lambda ^*)\Vert ^2\\&\quad \le ~2\theta \mathbb {E}\Vert \nabla h(\tilde{y}^{k+1})-\nabla h(y^*)+Q(\tilde{y}^{k+1}-y^k)\Vert ^2\\&\qquad +2\theta (\beta -\rho )^2\mathbb {E}\Vert B^\top (Ax^{k+1}+B\tilde{y}^{k+1}-b)\Vert ^2. \end{aligned} \end{aligned}$$
(93)

Since \({\mathrm {Prob}}(y^{k+1}=\tilde{y}^{k+1})=\theta \) and \({\mathrm {Prob}}(y^{k+1}=y^k)=1-\theta \), it follows that

$$\begin{aligned} \begin{aligned}&~\mathbb {E}\Vert \nabla h(y^{k+1})-\nabla h(y^*)+Q(y^{k+1}-y^k)\Vert ^2\\&\quad =~\theta \mathbb {E}\Vert \nabla h(\tilde{y}^{k+1}){-}\nabla h(y^*)+Q(\tilde{y}^{k+1}{-}y^k)\Vert ^2+(1-\theta )\mathbb {E}\Vert \nabla h(y^k)-\nabla h(y^*)\Vert ^2, \end{aligned} \end{aligned}$$

and thus

$$\begin{aligned}&\theta \mathbb {E}\Vert \nabla h(\tilde{y}^{k+1})-\nabla h(y^*)+Q(\tilde{y}^{k+1}-y^k)\Vert ^2\\&\quad \le \mathbb {E}\Vert \nabla h(y^{k+1})-\nabla h(y^*)+Q(y^{k+1}-y^k)\Vert ^2. \end{aligned}$$

Similarly,

$$\begin{aligned} \theta (\beta -\rho )^2\mathbb {E}\Vert B^\top (Ax^{k+1}+B\tilde{y}^{k+1}-b)\Vert ^2\le (\beta -\rho )^2\mathbb {E}\Vert B^\top (Ax^{k+1}+B y^{k+1}-b)\Vert ^2. \end{aligned}$$

Plugging the above two equations into (93) and applying the Young’s inequality and also the Lipschitz continuity of \(\nabla h\) give

$$\begin{aligned} \theta \mathbb {E}\Vert B^\top (\tilde{\lambda }^{k+1}-\lambda ^*)\Vert ^2 \le 4\mathbb {E}\big [L_h^2\Vert y^{k+1}-y^*\Vert ^2+\Vert Q(y^{k+1}-y^k)\Vert ^2\big ]+2(\beta -\rho )^2\mathbb {E}\Vert B^\top r^{k+1}\Vert ^2. \end{aligned}$$
(94)

In addition, from the Young’s inequality, it follows for any \(\delta >0\) that

$$\begin{aligned}&\Vert B^\top (\lambda ^k-\lambda ^*-\rho (Ax^{k+1}+By^k-b))\Vert ^2\\&\quad \le (1+\delta )\Vert B^\top (\lambda ^k-\lambda ^*)\Vert ^2+\rho ^2\left( 1+\frac{1}{\delta }\right) \Vert B^\top (Ax^{k+1}+By^k-b)\Vert ^2. \end{aligned}$$

Note \(\Vert B^\top (Ax^{k+1}+By^k-b)\Vert ^2\le 2\Vert B^\top r^{k+1}\Vert ^2+2\Vert B^\top B(y^{k+1}-y^k)\Vert ^2\). Therefore, plugging (94) and the above two inequalites into (91), we complete the proof.

1.5 Proof of Lemma 9.3

It is straightforward to verify

$$\begin{aligned}&~\Vert B^\top (\lambda ^{k+1}-\lambda ^*)\Vert ^2-(1-\theta )(1+\delta )\Vert B^\top (\lambda ^k-\lambda ^*)\Vert ^2+\kappa \Vert B^\top (\lambda ^{k+1}-\lambda ^k)\Vert ^2\\&\quad =~\left[ \begin{array}{c}\lambda ^{k+1}-\lambda ^*\\ \lambda ^{k+1}-\lambda ^k \end{array}\right] ^\top \left[ \begin{array}{cc}(1-(1-\theta )(1+\delta )) &{}\quad (1-\theta )(1+\delta )\\ (1-\theta )(1+\delta )&{}\quad (\kappa -(1-\theta )(1+\delta ))\end{array}\right] \otimes BB^\top \left[ \begin{array}{c}(\lambda ^{k+1}-\lambda ^*)\\ (\lambda ^{k+1}-\lambda ^k) \end{array}\right] , \end{aligned}$$

and

$$\begin{aligned}&~\left[ \begin{array}{c}\lambda ^{k+1}-\lambda ^*\\ \lambda ^{k+1}-\lambda ^k \end{array}\right] ^\top \left[ \begin{array}{cc}\theta &{}\quad (1-\theta )\\ (1-\theta ) &{}\quad (\frac{1}{\theta }-(1-\theta ))\end{array}\right] \otimes I\left[ \begin{array}{c}\lambda ^{k+1}-\lambda ^*\\ \lambda ^{k+1}-\lambda ^k \end{array}\right] \\&\quad = ~\left[ \Vert \lambda ^{k+1}-\lambda ^*\Vert ^2-(1-\theta )\Vert \lambda ^{k}-\lambda ^*\Vert ^2+\frac{1}{\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] . \end{aligned}$$

Hence, we have the desired result from (38) and the inequality \(U\otimes V\succeq \sigma _{\min }(V) U\otimes I\) for any PSD matrices U and V.

1.6 Proof of Lemma 9.4

From (39a) and (39b), we have

$$\begin{aligned} \beta (1 -\theta )\frac{\tau _2}{2}\Vert A(x^{k+1}-x^k)\Vert ^2\le \frac{1}{2}\Vert x^{k+1}- x^k\Vert _{P-L_mI}^2, \end{aligned}$$

and

$$\begin{aligned}&4c\Vert Q(y^{k+1}-y^k)\Vert ^2+2c\rho ^2(1-\theta )\left( 1+\frac{1}{\delta }\right) \Vert B^\top B(y^{k+1}-y^k)\Vert ^2\\&\qquad +\frac{\beta \tau _1}{2}\Vert B(y^{k+1}-y^k)\Vert ^2\nonumber \\&\quad \le \frac{1}{2}\Vert y^{k+1}- y^k\Vert _Q^2. \end{aligned}$$

The desired result is then obtained by adding the above two inequalities together with \(\beta \) times of (76), \(\beta (1-\theta )\) times of (77), c times of both (78) and (79), and also noting \(\lambda ^{k+1}-\lambda ^k=-\rho r^{k+1}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Zhang, S. Accelerated primal–dual proximal block coordinate updating methods for constrained convex optimization. Comput Optim Appl 70, 91–128 (2018). https://doi.org/10.1007/s10589-017-9972-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-017-9972-z

Keywords

Mathematics Subject Classification

Navigation