Skip to main content

Smoothing Alternating Direction Methods for Fully Nonsmooth Constrained Convex Optimization

  • Chapter
  • First Online:
Large-Scale and Distributed Optimization

Part of the book series: Lecture Notes in Mathematics ((LNM,volume 2227))

  • 2251 Accesses

Abstract

We propose two new alternating direction methods to solve “fully” nonsmooth constrained convex problems. Our algorithms have the best known worst-case iteration-complexity guarantee under mild assumptions for both the objective residual and feasibility gap. Through theoretical analysis, we show how to update all the algorithmic parameters automatically with clear impact on the convergence performance. We also provide a representative numerical example showing the advantages of our methods over the classical alternating direction methods using a well-known feasibility problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. A. Alotaibi, P.L. Combettes, N. Shahzad, Best approximation from the Kuhn-Tucker set of composite monotone inclusions. Numer. Funct. Anal. Optim. 36(12), 1513–1532 (2015)

    Article  MathSciNet  Google Scholar 

  2. H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, Berlin, 2011)

    Book  Google Scholar 

  3. A. Beck, M. Teboulle, A fast dual proximal gradient algorithm for convex minimization and applications. Oper. Res. Lett. 42(1), 1–6 (2014)

    Article  MathSciNet  Google Scholar 

  4. J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for non-convex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  Google Scholar 

  5. S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  6. R.S. Burachik, V. Martín-Márquez, An approach for the convex feasibility problem via monotropic programming. J. Math. Anal. Appl. 453(2), 746–760 (2017)

    Article  MathSciNet  Google Scholar 

  7. X. Cai, D. Han, X. Yuan, On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)

    Article  MathSciNet  Google Scholar 

  8. E. Candès, B. Recht, Exact matrix completion via convex optimization. Commun. ACM 55(6), 111–119 (2012)

    Article  Google Scholar 

  9. V. Cevher, S. Becker, M. Schmidt, Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. 31(5), 32–43 (2014)

    Article  Google Scholar 

  10. A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  11. D. Davis, W. Yin, Convergence rate analysis of several splitting schemes, in Splitting Methods in Communication, Imaging, Science, and Engineering (Springer, Cham, 2016), pp. 115–163

    Book  Google Scholar 

  12. D. Davis, W. Yin, Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017)

    Article  MathSciNet  Google Scholar 

  13. D. Davis, W. Yin, A three-operator splitting scheme and its optimization applications. Tech. Report. (2015)

    Google Scholar 

  14. W. Deng, W. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    Article  MathSciNet  Google Scholar 

  15. J. Eckstein, D. Bertsekas, On the Douglas Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  Google Scholar 

  16. E. Ghadimi, A. Teixeira, I. Shames, M. Johansson, Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)

    Article  MathSciNet  Google Scholar 

  17. T. Goldstein, B. ODonoghue, S. Setzer, Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2012)

    Article  MathSciNet  Google Scholar 

  18. B. He, X. Yuan, On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2012)

    Article  MathSciNet  Google Scholar 

  19. B. He, X. Yuan, On the O(1∕n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    Article  MathSciNet  Google Scholar 

  20. T. Lin, S. Ma, S. Zhang, On the global linear convergence of the ADMM with multi- block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)

    Article  MathSciNet  Google Scholar 

  21. T. Lin, S. Ma, S. Zhang, Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 69(1), 52–81 (2016)

    Article  MathSciNet  Google Scholar 

  22. T. Lin, S. Ma, S. Zhang, An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)

    Article  MathSciNet  Google Scholar 

  23. I. Necoara, J. Suykens, Applications of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)

    Article  MathSciNet  Google Scholar 

  24. A. Nemirovskii, Prox-method with rate of convergence O(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  Google Scholar 

  25. A. Nemirovskii, D. Yudin, Problem Complexity and Method Efficiency in Optimization (Wiley Interscience, New York, 1983)

    Google Scholar 

  26. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization, vol. 87 (Kluwer Academic Publishers, Norwell, 2004)

    Google Scholar 

  27. Y. Nesterov, Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  28. Y. Ouyang, Y. Chen, G. Lan, E.J. Pasiliao, An accelerated linearized alternating direction method of multiplier. SIAM J. Imaging Sci. 8(1), 644–681 (2015)

    Article  MathSciNet  Google Scholar 

  29. N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  30. R.T. Rockafellar, Convex Analysis. Princeton Mathematics Series, vol. 28 (Princeton University Press, Princeton, 1970)

    Google Scholar 

  31. R. Shefi, M. Teboulle, Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)

    Article  MathSciNet  Google Scholar 

  32. R. Shefi, M. Teboulle, On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)

    Article  MathSciNet  Google Scholar 

  33. F. Simon, R. Holger, A Mathematical Introduction to Compressive Sensing (Springer, New York, 2013)

    MATH  Google Scholar 

  34. M. Tao, X. Yuan, On the O(1∕t)-convergence rate of alternating direction method with logarithmic-quadratic proximal regularization. SIAM J. Optim. 22(4), 1431–1448 (2012)

    Article  MathSciNet  Google Scholar 

  35. Q. Tran-Dinh, V. Cevher, Constrained convex minimization via model-based excessive gap, in Proceedings of the Neural Information Processing Systems (NIPS), Montreal, vol. 27 Dec. 2014, pp. 721–729

    Google Scholar 

  36. Q. Tran-Dinh, O. Fercoq, V. Cevher, A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28, 96–134 (2018)

    Article  MathSciNet  Google Scholar 

  37. P. Tseng, D. Bertsekas, Relaxation methods for problems with strictly convex cost and linear constraints. Math. Oper. Res. 16(3), 462–481 (1991)

    Article  MathSciNet  Google Scholar 

  38. W. Wang, A. Banerjee, Bregman alternating direction method of multipliers, in Advances in Neural Information Processing Systems 27 (NIPS 2014), pp. 1–9

    Google Scholar 

  39. E. Wei, A. Ozdaglar, On the O(1∕k)-convergence of asynchronous distributed alternating direction method of multipliers, in Global Conference on Signal and Information Processing (GlobalSIP) (IEEE, Piscataway, 2013), pp. 551–554

    Google Scholar 

Download references

Acknowledgements

QTD’s work was supported in part by the NSF-grant No. DMS-1619884, USA. VC’s work was supported by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 725594—time-data). The authors would like to acknowledge Dr. C.B., Vu, and Dr. V.Q. Nguyen with their help on verifying the technical proofs and the numerical experiment. The authors also thank Mr. Ahmet Alacaoglu, Mr. Nhan Pham, and Ms. Yuzixuan Zhu for their careful proofreading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Tran-Dinh .

Editor information

Editors and Affiliations

Appendix: Proofs of Technical Results

Appendix: Proofs of Technical Results

This appendix provides full proofs of technical results presented in the main text.

4.1.1 Proof of Lemma 2: The Primal-Dual Bounds

First, using the fact that \(-d(\lambda ) \leq -d^{\star } = f^{\star } \leq \mathcal {L}(x, \lambda ^{\star }) = f(x) + \langle \lambda ^{\star }, Au + Bv - c\rangle \leq f(x) + \Vert \lambda ^{\star }\Vert \Vert Au + Bv - c\Vert \), we get

$$\displaystyle \begin{aligned} -\Vert \lambda^{\star}\Vert\Vert Au + Bv - c\Vert \leq f(x) - f^{\star} \leq f(x) + d(\lambda), \end{aligned} $$
(4.47)

which is exactly the lower bound (4.14).

Next, since A λ  ∈ ∂g(u ) due to (4.8), by Fenchel-Young’s inequality, we have g(u ) + g (A λ ) = 〈A λ , u 〉, which implies g (A λ ) = 〈A λ , u 〉− g(u ). Using this relation and the definition of φ γ, we have

$$\displaystyle \begin{aligned} \varphi_{\gamma}(\lambda) &:= \max\left\{\langle A^{\top}\lambda,u\rangle - g(u) - \gamma b_{\mathcal{U}}(u,\bar{u}^c)\right\} \geq \langle A^{\top}\lambda,u^{\star}\rangle - g(u^{\star}) - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c)\\ &=\langle A^{\top}\lambda^{\star},u^{\star} \rangle - g(u^{\star}) + \langle A^{\top}(\lambda - \lambda^{\star}),u^{\star}\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c)\\ &= g^{\ast}(A^{\top}\lambda^{\star}) + \langle A^{\top}(\lambda - \lambda^{\star}), u^{\star}\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c)\\ &= \varphi(\lambda^{\star}) + \langle \lambda - \lambda^{\star},Au^{\star}\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c). \end{aligned} $$

Alternatively, we have ψ(λ) ≥ ψ(λ ) + 〈∇ψ(λ ), λ − λ 〉, where ∇ψ(λ ) = Bh (B λ ) − c = Bv  − c due to the last relation in (4.8), where ∇h (B λ ) ∈ ∂h (B λ ) is one subgradient of ∂h . Hence, ψ(λ) ≥ ψ(λ ) + 〈λ − λ , Bv  − c〉. Adding this inequality to the last estimation with the fact that d γ = φ γ + ψ and d = φ + ψ, we obtain

$$\displaystyle \begin{aligned} \hspace{-6pt}d_{\gamma}(\lambda) \geq d(\lambda^{\star}) + \langle \lambda - \lambda^{\star}, Au^{\star} + Bv^{\star} - c\rangle - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) \overset{(4.8)}{=} d^{\star} - \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) \end{aligned} $$
(4.48)

Using this inequality with d  = −f and the definition (4.13) of f β we have

$$\displaystyle \begin{aligned} &f(x) - f^{\star} \overset{(4.13)+(4.48)}{\leq} f_{\beta}(x) + d_{\gamma}(\lambda) + \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) - \frac{1}{2\beta}\Vert Au + Bv - c\Vert^2 \\ &\quad = G_{\gamma\beta}(w) + \gamma b_{\mathcal{U}}(u^{\star},\bar{u}^c) - \frac{1}{2\beta}\Vert Au + Bv - c\Vert^2. \end{aligned} $$
(4.49)

Let \(S := G_{\gamma \beta }(w) + \gamma b_{\mathcal {U}}(u^{\star },\bar {u}^c)\). Then, by dropping the last term \(- \frac {1}{2\beta }\Vert Au + Bv - c\Vert ^2\) in (4.49), we obtain the first inequality of (4.15).

Let t := ∥Au + Bv − c∥. Using again (4.47) and (4.49), we can see that \(\frac {1}{2\beta }t^2 - \Vert \lambda ^{\star }\Vert t - S \leq 0\). Solving this quadratic inequation w.r.t. t and noting that t ≥ 0, we obtain the second bound of (4.15). The last estimate of (4.15) is a direct consequence of (4.49), the first one of (4.15). Finally, from (4.47), we have f(x) ≥ f  −∥λ ∥∥Au + Bv − c∥. Substituting this into (4.49) we get \(d(\lambda ) - d^{\star } - \Vert \lambda ^{\star }\Vert \Vert Au + Bv - c\Vert \leq S - \frac {1}{2\beta }\Vert Au + Bv - c\Vert ^2\), which implies

$$\displaystyle \begin{aligned} d(\lambda) - d^{\star} \leq S - (1/(2\beta))\Vert Au + Bv - c\Vert^2 + \Vert \lambda^{\star}\Vert\Vert Au + Bv - c\Vert. \end{aligned}$$

By discarding − (1∕(2β))∥Au + Bv − c2 and using the second estimate of (4.15) into the last estimate, we obtain the last inequality of (4.15). \(\square \)

4.1.2 Convergence Analysis of Algorithm 1

We provide a full proof of Lemmas and Theorems related to the convergence of Algorithm 1. First, we prove the following key lemma, which will be used to prove Lemma 3.

Lemma 8

Let \(\bar {\lambda }^{k+1}\) be generated by (SAMA). Then

(4.50)

where

$$\displaystyle \begin{aligned} \begin{array}{ll} \hat{\ell}_{\gamma_{k+1}}(\lambda) &:= \varphi_{\gamma_{k+1}}(\hat{\lambda}^k) + \langle \nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k), \lambda - \hat{\lambda}^k\rangle + \psi(\lambda) \vspace{1ex}\\ & \leq d_{\gamma_{k+1}}(\lambda) - \frac{\gamma_{k+1}}{2}\Vert u^{\ast}_{\gamma_{k+1}}(A^{\top}\lambda) - \hat{u}^{k+1} \Vert^2. \end{array} \end{aligned} $$
(4.51)

In addition, for any z, γ k, γ k+1 > 0, the function \(g_{\gamma }^{\ast }\) defined by (4.11) satisfies

$$\displaystyle \begin{aligned} g^{\ast}_{\gamma_{k+1}}(z) \leq g^{\ast}_{\gamma_k}(z) + (\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(u^{\ast}_{\gamma_{k+1}}(z), \bar{u}^c). \end{aligned} $$
(4.52)

Proof

First, it is well-known that SAMA is equivalent to the proximal-gradient step applying to the smoothed dual problem

$$\displaystyle \begin{aligned} \min_{\lambda}\left\{ \varphi_{\gamma_{k+1}}(\lambda) + \psi(\lambda) : \lambda\in\mathbb{R}^n\right\}. \end{aligned}$$

This proximal-gradient step can be presented as

$$\displaystyle \begin{aligned} \bar{\lambda}^{k+1} := \mathrm{prox}_{\eta_k\psi}\left(\hat{\lambda}^k - \eta_k\nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k)\right). \end{aligned}$$

We write down the optimality condition of this corresponding minimization problem of this step as

$$\displaystyle \begin{aligned} 0 \in \partial{\psi}(\bar{\lambda}^{k+1}) + \nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k) + \eta_k^{-1}(\bar{\lambda}^{k+1} - \hat{\lambda}^k). \end{aligned}$$

Using this condition and the convexity of ψ, for any \(\nabla {\psi }(\bar {\lambda }^{k+1})\in \partial {\psi }(\bar {\lambda }^{k+1})\), we have

$$\displaystyle \begin{aligned} \psi(\bar{\lambda}^{k+1}) &\leq \psi(\lambda) + \langle \nabla{\psi}(\bar{\lambda}^{k+1}),\bar{\lambda}^{k+1} - \lambda\rangle \\ &= \psi(\lambda) + \langle \nabla{\varphi_{\gamma_{k+1}}}(\hat{\lambda}^k),\lambda - \bar{\lambda}^{k+1}\rangle + \eta_k^{-1}\langle \bar{\lambda}^{k+1} - \hat{\lambda}^k, \lambda - \bar{\lambda}^{k+1}\rangle. \end{aligned} $$
(4.53)

Next, by the definition \(\varphi _{\gamma }(\lambda ) := g^{\ast }_{\gamma }(A^{\top }\lambda )\), we can show from (4.11) that \(\hat {u}^{k+1} = u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k)\). Since \(g^{\ast }_{\gamma }\) is (1∕γ)-Lipschitz gradient continuous, we have

$$\displaystyle \begin{aligned} \frac{\gamma}{2}\Vert \nabla{g}^{\ast}_{\gamma}(z) - \nabla{g}^{\ast}_{\gamma}(\hat{z})\Vert^2 \leq g^{\ast}_{\gamma}(z) - g^{\ast}_{\gamma}(\hat{z}) - \langle \nabla{g}^{\ast}_{\gamma}(\hat{z}), z - \hat{z}\rangle \leq \frac{1}{2\gamma}\Vert z - \hat{z}\Vert^2. \end{aligned}$$

Using this inequality with γ := γ k+1, \(\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\lambda ) = u^{\ast }_{\gamma _{k+1}}(A^{\top }\lambda )\), \(\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\hat {\lambda }^k) = u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k) = \hat {u}^{k+1}\), and \(\nabla {\varphi _{\gamma _{k+1}}}(\lambda ) = A\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\lambda )\), we have

$$\displaystyle \begin{aligned} \begin{array}{ll} \frac{\gamma_{k+1}}{2}\Vert u^{\ast}_{\gamma_{k+1}}(A^{\top}\lambda) - \hat{u}^{k+1}\Vert^2 &\leq \varphi_{\gamma_{k+1}}(\lambda) - \varphi_{\gamma_{k+1}}(\hat{\lambda}^k)- \langle \nabla{\varphi}_{\gamma_{k+1}}(\hat{\lambda}^k), \lambda - \hat{\lambda}^{k}\rangle \vspace{1ex}\\ & \leq \frac{1}{2\gamma_{k+1}}\Vert A^{\top}(\lambda - \hat{\lambda}^k)\Vert^2 \leq \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \lambda - \hat{\lambda}^k\Vert^2. \end{array}\end{aligned} $$
(4.54)

Using (4.54) with \(\lambda = \bar {\lambda }^{k+1}\), we have

$$\displaystyle \begin{aligned} \varphi_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \leq \varphi_{\gamma_{k+1}}(\hat{\lambda}^k) + \langle \nabla{\varphi}_{\gamma_{k+1}}(\hat{\lambda}^k),\bar{\lambda}^{k+1} - \hat{\lambda}^k\rangle + \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2. \end{aligned} $$

Summing up this inequality and (4.53), then using the definition of \(\hat {\ell }_{\gamma _{k+1}}(\lambda )\) in (4.51), we obtain

$$\displaystyle \begin{aligned} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \leq \hat{\ell}_{\gamma_{k+1}}(\lambda) + \tfrac{1}{\eta_k}\langle \bar{\lambda}^{k+1} - \hat{\lambda}^k, \lambda - \hat{\lambda}^k\rangle - \left(\tfrac{1}{\eta_k} - \tfrac{\Vert A\Vert^2}{2\gamma_{k+1}}\right)\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2. \end{aligned} $$
(4.55)

Here, the second inequality in (4.51) follows from the right-hand side of (4.54).

Now, using (4.55) with \(\lambda := \bar {\lambda }^k\), then combining with (4.51), we get

$$\displaystyle \begin{aligned} \begin{array}{ll} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) &\leq d_{\gamma_{k+1}}(\bar{\lambda}^k) + \frac{1}{\eta_k}\langle \bar{\lambda}^{k+1} - \hat{\lambda}^k, \bar{\lambda}^k - \hat{\lambda}^k\rangle - \left(\frac{1}{\eta_k} - \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\right)\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2 \vspace{1ex}\\ &\quad - \frac{\gamma_{k+1}}{2}\Vert u^{\ast}_{\gamma_{k+1}}(A^{\top}\bar{\lambda}^k) - \hat{u}^{k+1} \Vert^2. \end{array} \end{aligned} $$

Multiplying the last inequality by 1 − τ k ∈ [0, 1] and (4.55) by τ k ∈ [0, 1], then summing up the results, we obtain (4.50).

Finally, from (4.11), since \(g^{\ast }_{\gamma }(z) := \max _{u}\{P(u, \gamma ; z) := \langle z, u\rangle - g(u) - \gamma b_{\mathcal {U}}(u;\bar {u}^c)\}\), is the maximization of P over u indexing in γ and z, which is concave in u and linear in γ, we have \(g^{\ast }_{\gamma }(z)\) is convex w.r.t. γ > 0. Moreover, \(\frac {d g^{\ast }_{\gamma }(z)}{d\gamma } = -b_{\mathcal {U}}(u^{\ast }_{\gamma }(z), \bar {u}^c)\). Hence, using the convexity of \(g^{\ast }_{\gamma }\) w.r.t. γ > 0, we have \(g^{\ast }_{\gamma _k}(z) \geq g^{\ast }_{\gamma _{k+1}}(z) - (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}(u^{\ast }_{\gamma }(z), \bar {u}^c)\), which is indeed (4.52). □

4.1.2.1 Proof of Lemma 4: Bound on G γβ for the First Iteration

Since \(\bar {w}^1 := (\bar {u}^1, \bar {v}^1, \bar {\lambda }^1)\) is updated by (4.19), similar to (SAMA), we can use (4.55) with k = 0, \(\lambda := \hat {\lambda }^0\) and \(\hat {\ell }_{\gamma _1}(\hat {\lambda }^0) \leq d_{\gamma _1}(\hat {\lambda }^0)\) to obtain

$$\displaystyle \begin{aligned} d_{\gamma_1}(\bar{\lambda}^1) \leq d_{\gamma_1}(\hat{\lambda}^0) - \left(\frac{1}{\eta_0} - \frac{\Vert A\Vert^2}{2\gamma_1}\right)\Vert \bar{\lambda}^1 - \hat{\lambda}^0\Vert^2. \end{aligned} $$
(4.56)

Since \(\bar {v}^1\) solves the second problem in (4.19) and \(v^{\ast }(\hat {\lambda }^0) \in \mathrm {dom}\left (h\right )\), we have

$$\displaystyle \begin{aligned} \begin{array}{ll} &h(v^{\ast}(\hat{\lambda}^0)) - \langle \hat{\lambda}^0,Bv^{\ast}(\hat{\lambda}^0)\rangle + \frac{\eta_0}{2}\Vert A\bar{u}^1 + Bv^{\ast}(\hat{\lambda}^0) - c\Vert^2 \geq h(\bar{v}^1) \vspace{1ex}\\ &\quad - \langle \hat{\lambda}^0,B\bar{v}^1\rangle + \frac{\eta_0}{2}\Vert A\bar{u}^1 + B\bar{v}^1 - c\Vert^2 + \frac{\eta_0}{2}\Vert B(v^{\ast}(\hat{\lambda}^0) - \bar{v}^1)\Vert^2. \end{array} \end{aligned} $$

Using D f in (4.9), this inequality implies

(4.57)

Using the definition of d γ, we further estimate (4.56) using (4.57) as follows:

Since \(G_{\gamma _1\beta _1}(\bar {w}^1) = f_{\beta _1}(\bar {x}^1) + d_{\gamma _1}(\bar {\lambda }^1)\), we obtain (4.20) from the last inequality. If \(\beta _1 \geq \frac {2\gamma _1}{\eta _0(5\gamma _1 - 2\Vert A\Vert ^2\eta _0)}\), then (4.20) leads to \(G_{\gamma _1\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2 + \frac {1}{\eta _0}\langle \hat {\lambda }^0, \bar {\lambda }^1 - \hat {\lambda }^0\rangle \). \(\square \)

4.1.2.2 Proof of Lemma 3: Gap Reduction Condition

For notational simplicity, we first define the following abbreviations

$$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \bar{z}^k &:= A\bar{u}^k + B\bar{v}^k - c \vspace{0.5ex}\\ \hat{z}^{k+1} &:= A\hat{u}^{k+1} + B\hat{v}^{k+1} - c \vspace{0.5ex}\\ \bar{u}_{k+1}^{*} &:= u^{*}_{\gamma_{k+1}}(A^{\top}\bar{\lambda}^k)~~\text{the solution of (4.11) at }\bar{\lambda}^k, \vspace{0.5ex}\\ \hat{v}^{*}_k &:= v^{*}(\hat{\lambda}^k) \in\partial{h^{\ast}}(A^{\top}\hat{\lambda}^k) ~~\text{a subgradient of {$h^{\ast}$} defined by (4.5) at }A^{\top}\hat{\lambda}^k,\text{ and}\vspace{0.5ex}\\ D_k &:= \Vert A\hat{u}^{k+1} + B(2\hat{v}^{*}_k - \hat{v}^{k+1}) - c\Vert. \end{array}\right. \end{aligned}$$

From SAMA, we have \(\bar {\lambda }^{k+1} - \hat {\lambda }^k = \eta _k(c - A\hat {u}^{k+1} - B\hat {v}^{k+1}) = -\eta _k\hat {z}^{k+1}\). In addition, by (4.16), we have \(\hat {\lambda }^k = (1-\tau _k)\bar {\lambda }^k + \tau _k\lambda _k^{*}\), which leads to \((1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k - \hat {\lambda }^k = \tau _k(\hat {\lambda }^k - \lambda _k^{*})\). Using these expressions into (4.50) with \(\lambda := \hat {\lambda }^k\), and then using (4.51) with \(\hat {\ell }_{\gamma _{k+1}}(\hat {\lambda }^k) \leq d_{\gamma _{k+1}}(\hat {\lambda }^k)\), we obtain

$$\displaystyle \begin{aligned} \begin{array}{ll} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) &\leq (1-\tau_k)d_{\gamma_{k+1}}(\bar{\lambda}^k) + \tau_kd_{\gamma_{k+1}}(\hat{\lambda}^k) + \tau_k\langle \hat{z}^{k+1}, \lambda_k^{*} - \hat{\lambda}^k\rangle \vspace{1ex}\\ &\quad - \eta_k\left(1 - \frac{\eta_k\Vert A\Vert^2}{2\gamma_{k+1}}\right)\Vert\hat{z}^{k+1}\Vert^2 - (1-\tau_k)\frac{\gamma_{k+1}}{2}\Vert \bar{u}^{\ast}_{k+1} - \hat{u}^{k+1}\Vert^2. \end{array} \end{aligned} $$
(4.58)

By (4.52) with the fact that \(\varphi _{\gamma }(\lambda ) := g^{\ast }_{\gamma }(A^{\top }\lambda )\), for any γ k+1 > 0 and γ k > 0, we have

$$\displaystyle \begin{aligned} \varphi_{\gamma_{k+1}}(\bar{\lambda}^k) \leq \varphi_{\gamma_k}(\bar{\lambda}^k) + (\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}_{k+1}^{\ast}, \bar{u}_c). \end{aligned}$$

Using this inequality and the fact that d γ := φ γ + ψ, we have

$$\displaystyle \begin{aligned} d_{\gamma_{k+1}}(\bar{\lambda}^k) \leq d_{\gamma_k}(\bar{\lambda}^k) + (\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}_{k+1}^{\ast}, \bar{u}_c). \end{aligned} $$
(4.59)

Next, using \(\hat {v}^{k+1}\) from SAMA and its optimality condition, we can show that

$$\displaystyle \begin{aligned}\begin{array}{ll} &h^{\ast}(B^{\top}\hat{\lambda}^k) - \frac{\eta_k}{2}\Vert A\hat{u}^{k+1} + B\hat{v}^{*}_k - c\Vert^2 = \langle B^{\top}\hat{\lambda}^k, \hat{v}^{*}_k\rangle - h(\hat{v}^{*}_k) - \frac{\eta_k}{2}\Vert A\hat{u}^{k+1} + B\hat{v}^{*}_k - c\Vert^2 \vspace{1ex}\\ &\quad \leq \langle B^{\top}\hat{\lambda}^k, \hat{v}^{k+1}\rangle - h(\hat{v}^{k+1}) - \frac{\eta_k}{2}\Vert A\hat{u}^{k+1} + B\hat{v}^{k+1} - c\Vert^2 - \frac{\eta_k}{2}\Vert B(\hat{v}^{*}_k - \hat{v}^{k+1})\Vert^2. \end{array}\end{aligned} $$

Since ψ(λ) := h (B λ) − c λ, this inequality leads to

Now, by this estimate, \(d_{\gamma _{k+1}} = \varphi _{\gamma _{k+1}} + \psi \) and SAMA, we can derive

Combining this inequality, (4.58) and (4.59), we obtain

(4.60)

Now, using the definition G k, we have

$$\displaystyle \begin{aligned} \begin{array}{ll} G_k(\bar{w}^k) &:= f_{\beta_k}(\bar{x}^k) + d_{\gamma_k}(\bar{\lambda}^k) = f(\bar{x}^k) + d_{\gamma_k}(\bar{\lambda}^k) + \frac{1}{2\beta_k}\Vert A\bar{u}^k + B\bar{v}^k - c\Vert^2 \vspace{0.5ex}\\ & = f(\bar{x}^k) + d_{\gamma_k}(\bar{\lambda}^k) + \frac{1}{2\beta_k}\Vert\bar{z}^k\Vert^2. \end{array} \end{aligned}$$

Let us define \(\varDelta {G}_k := (1-\tau _k)G_k(\bar {w}^k) - G_{k+1}(\bar {w}^{k+1})\). Then, we can show that

$$\displaystyle \begin{aligned} \begin{array}{ll} \varDelta{G}_k &= (1-\tau_k)f(\bar{x}^k) + (1-\tau_k)d_{\gamma_k}(\bar{\lambda}^k) - f(\bar{x}^{k+1}) - d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \vspace{1ex}\\ &\quad + \frac{(1-\tau_k)}{2\beta_k}\Vert \bar{z}^k\Vert^2 - \frac{1}{2\beta_{k+1}}\Vert\bar{z}^{k+1}\Vert^2. \end{array} \end{aligned} $$
(4.61)

By (4.16), we have \(\bar {z}^{k+1} = (1-\tau _k)\bar {z}^k + \tau _k\hat {z}^{k+1}\). Using this expression and the condition β k+1 ≥ (1 − τ k)β k in (4.17), we can easily show that

$$\displaystyle \begin{aligned} \frac{(1 - \tau_k)}{2\beta_k}\Vert\bar{z}^k\Vert^2 - \frac{1}{2\beta_{k+1}}\Vert\bar{z}^{k+1}\Vert^2 \geq - \frac{\tau_k}{\beta_{k}}\langle \hat{z}^{k+1}, \bar{z}^k\rangle - \frac{\tau_k^2}{2\beta_{k}(1-\tau_k)}\Vert\hat{z}^{k+1}\Vert^2. \end{aligned}$$

Substituting this inequality into (4.61), and using the convexity of f, we further get

(4.62)

Substituting (4.60) into (4.62) and using \(\lambda ^{*}_k := \frac {1}{\beta _k}(c - A\bar {u}^k - B\bar {v}^k) = -\frac {1}{\beta _k}\bar {z}^k\), we obtain

$$\displaystyle \begin{aligned} \varDelta{G}_k \geq \Big[ \eta_k\Big(1 + \frac{\tau_k}{2} - \frac{\Vert A\Vert^2\eta_k}{2\gamma_{k+1}}\Big) - \frac{\tau_k^2}{2(1-\tau_k)\beta_k}\Big]\Vert\hat{z}^{k+1}\Vert^2 + R_k - \frac{\tau_k\eta_k}{2}\Vert \hat{z}^{k+1}\Vert D_k. \end{aligned} $$
(4.63)

where

$$\displaystyle \begin{aligned} R_k := \tfrac{1-\tau_k}{2}\gamma_{k+1}\Vert \bar{u}^{\ast}_{k+1}-\hat{u}^{k+1}\Vert^2 + \tau_k\gamma_{k+1}b_{\mathcal{U}}(\hat{u}^{k+1}, \bar{u}^c) - (1-\tau_k)(\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}^{\ast}_{k+1}, \bar{u}^c). \end{aligned}$$

Furthermore, we have

$$\displaystyle \begin{aligned} \frac{\eta_k}{4}\Vert \hat{z}^{k+1}\Vert^2 - \frac{\tau_k\eta_k}{2}\Vert \hat{z}^{k+1}\Vert D_k = \frac{\eta_k}{4}\big[\Vert z^{k+1}\Vert - \tau_kD_k\big]^2 - \frac{\eta_k\tau_k^2D_k^2}{4} \geq - \frac{\eta_k\tau_k^2D_k^2}{4}. \end{aligned}$$

Using this estimate into (4.63), we finally get

$$\displaystyle \begin{aligned} \varDelta{G}_k \geq \Big[ \eta_k\Big(\frac{3}{4} + \frac{\tau_k}{2} - \frac{\Vert A\Vert^2\eta_k}{2\gamma_{k+1}}\Big) - \frac{\tau_k^2}{2(1-\tau_k)\beta_k}\Big]\Vert\hat{z}^{k+1}\Vert^2 + R_k - \frac{\eta_k\tau_k^2D_k^2}{4}. \end{aligned} $$
(4.64)

Next step, we estimate R k. Let \(\bar {a}_k := \bar {u}^{*}_{k+1} - \bar {u}_c\), \(\hat {a}_k := \hat {u}^{k+1} - \bar {u}_c\). Using the smoothness of \(b_{\mathcal {U}}\), we can estimate R k explicitly as

$$\displaystyle \begin{aligned} \begin{array}{ll} 2\gamma_{k+1}^{-1}R_k & \geq (1-\tau_k)\Vert \bar{a}_k - \hat{a}_k\Vert^2 - (1-\tau_k)(\gamma_{k+1}^{-1}\gamma_{k} - 1)L_b\Vert \bar{a}_k\Vert^2 + \tau_k\Vert \hat{a}_k\Vert^2\vspace{1ex}\\ & = \Vert\hat{a}^k - (1-\tau_k)\bar{a}_k\Vert^2 + (1-\tau_k)\left(\tau_k - (\gamma_{k+1}^{-1}\gamma_{k} - 1)L_b\right)\Vert\bar{a}_k\Vert^2. \end{array} \end{aligned} $$
(4.65)

By the condition \((1+L_b^{-1}\tau _k)\gamma _{k+1} \geq \gamma _k\) in (4.17), we have \(\tau _k - (\gamma _{k+1}^{-1}\gamma _{k} - 1)L_b\geq 0\). Using this condition in (4.65), we obtain R k ≥ 0. Finally, by (4.9) we can show that D k ≤ D f. Using this inequality, R k ≥ 0, and the second condition of (4.17), we can show from (4.63) that \(\varDelta {G}_k \geq -\frac {\eta _k\tau _k^2}{4}D_f^2\), which implies (4.18). \(\square \)

4.1.2.3 Proof of Lemma 5: Parameter Updates

The tightest update for γ k and β k is \(\gamma _{k+1} := \frac {\gamma _k}{\tau _k+1}\) and β k+1 := (1 − τ k)β k due to (4.17). Using these updates in the third condition in (4.17) leads to \(\frac {(1-\tau _{k+1})^2}{(1+\tau _{k+1})\tau _{k+1}^2} \geq \frac {1-\tau _k}{\tau _k^2}\). By directly checking this condition, we can see that \(\tau _k = \mathcal {O}(1/k)\) which is the optimal choice.

Clearly, if we choose \(\tau _k := \frac {3}{k+4}\), then 0 < τ k < 1 for k ≥ 0 and τ 0 = 3∕4. Next, we choose \(\gamma _{k+1} := \frac {\gamma _k}{1+\tau _k/3} \geq \frac {\gamma _k}{1+\tau _k}\). Substituting \(\tau _k = \frac {3}{k+4}\) into this formula we have \(\gamma _{k+1} = \left (\frac {k+4}{k+5}\right )\gamma _k\). By induction, we obtain \(\gamma _{k+1} = \frac {5\gamma _1}{k+5}\). This implies \(\eta _k = \frac {5\gamma _1}{2\Vert A\Vert ^2(k+5)}\). With \(\tau _k = \frac {3}{k+4}\) and \(\gamma _{k+1} = \frac {5\gamma _1}{k+5}\), we choose β k from the third condition of (4.17) as \(\beta _k = \frac {2\Vert A\Vert ^2\tau _k^2}{(1-\tau _k^2)\gamma _{k+1}} = \frac {18\Vert A\Vert ^2(k+5)}{5\gamma _1(k+1)(k+7)}\) for k ≥ 1. Using the value of τ k and β k, we need to check the second condition β k+1 ≥ (1 − τ k)β k of (4.17). Indeed, this condition is equivalent to 2k 2 + 28k + 88 ≥ 0, which is true for all k ≥ 0. From the update rule of β k, it is obvious that \(\beta _k \leq \frac {18\Vert A\Vert ^2}{5\gamma _1(k+1)}\). \(\square \)

4.1.2.4 Proof of Theorem 1: Convergence of Algorithm 1

We estimate the term \(\tau _k^2\eta _k\) in (4.18) as

$$\displaystyle \begin{aligned} \tau_k^2\eta_k = \frac{45\gamma_1}{2\Vert A\Vert^2(k+4)^2(k+5)} < \frac{45\gamma_1}{2\Vert A\Vert^2(k+4)(k+5)} - \left(1 - \tau_k\right)\frac{45\gamma_1}{2\Vert A\Vert^2(k+3)(k+4)}. \end{aligned} $$

Combing this estimate and (4.18), we get

$$\displaystyle \begin{aligned} G_{k+1}(\bar{w}^{k+1}) - \frac{45\gamma_1D_f^2}{8\Vert A\Vert^2(k+4)(k+5)} \leq (1-\tau_k)\left[G_k(\bar{w}^k) - \frac{45\gamma_1D_f^2}{8\Vert A\Vert^2(k+3)(k+4)}\right]. \end{aligned}$$

By induction, we have \(G_k(\bar {w}^k) - \frac {45\gamma _1D_f^2}{8\Vert A\Vert ^2(k+3)(k+4)} \leq \omega _k[G_1(\bar {w}^1) - \frac {9\gamma _1}{32\Vert A\Vert ^2}D_f^2] \leq 0\) whenever \(G_1(\bar {w}^1) \leq \frac {3\gamma _1}{4\Vert A\Vert ^2}D_f\), where \(\omega _k := \prod _{i=1}^{k-1}(1-\tau _i)\). Hence, we finally get

$$\displaystyle \begin{aligned} G_{k}(\bar{w}^{k}) \leq \frac{45\gamma_1D_f^2}{8\Vert A\Vert^2(k+3)(k+4)}. \end{aligned} $$
(4.66)

Since \(\eta _0 = \frac {\gamma _1}{2\Vert A\Vert ^2}\), it satisfies the condition 5γ 1 > 2η 0A2 in Lemma 4. In addition, from Lemma 5, we have \(\beta _1 = \frac {27\Vert A\Vert ^2}{20\gamma _1} > \frac {\Vert A\Vert ^2}{\gamma _1}\), which satisfies the second condition in Lemma 4. We also note that \(\beta _k \leq \frac {18\Vert A\Vert ^2}{5\gamma _1(k+1)}\). If we take \(\hat {\lambda }^0 = \boldsymbol {0}^m\), then Lemma 4 shows that \(G_{\gamma _1\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{2}D_f^2 = \frac {\gamma _1}{4\Vert A\Vert ^2}D_f^2 < \frac {9\gamma _1}{32\Vert A\Vert ^2}D_f^2\). Using this estimate and (4.66) into Lemma 2, we obtain (4.23). Finally, if we choose γ 1 := ∥A∥, then we obtain the worst-case iteration-complexity of Algorithm 1 is \(\mathcal {O}(\varepsilon ^{-1})\). \(\square \)

4.1.3 Proof of Corollary 1: Strong Convexity of g

First, we show that if condition (4.24) hold, then (4.25) holds. Since ∇φ given by (4.5) is Lipschitz continuous with \(L_{d^g_0} := \mu _g^{-1}\Vert A\Vert ^2\), similar to the proof of Lemma 3, we have

$$\displaystyle \begin{aligned} \varDelta{G_{\beta_k}} \geq \left[ \eta_k\left(\frac{3}{4} + \frac{\tau_k}{2} - \frac{\eta_k\Vert A\Vert^2}{2\mu_g}\right) - \frac{\tau_k^2}{2(1-\tau_k)\beta_k}\right]\Vert\hat{z}^{k+1}\Vert^2 - \frac{\tau_k^2\eta_k}{4}D_f^2, \end{aligned} $$
(4.67)

where \(\varDelta {G_{\beta _k}} := (1-\tau _k)G_{\beta _k}(\bar {w}^k) - G_{\beta _{k+1}}(\bar {w}^{k+1})\). Under the condition (4.24), (4.67) implies (4.25).

The update rule (4.27) is in fact derived from (4.24). We finally prove the bounds (4.28). First, we consider the product \(\tau ^2_k\eta _k\). By (4.27) we have

$$\displaystyle \begin{aligned} \tau_k^2\eta_k &= \frac{9\mu_g}{2\Vert A\Vert^2(k+4)^2} < \frac{9\mu_g}{2\Vert A\Vert^2(k+3)(k+4)} \\&= \frac{9\mu_g}{4\Vert A\Vert^2(k+4)} - (1-\tau_k)\frac{9\mu_g}{4\Vert A\Vert^2(k+3)} \end{aligned} $$

By induction, it follows from (4.25) and this last expression that:

$$\displaystyle \begin{aligned} G_{\beta_k}(\bar{w}^k) - \frac{9\mu_gD_f^2}{16\Vert A\Vert^2(k+3)} \leq \omega_k\Big(G_{\beta_1}(\bar{w}^1) - \frac{9\mu_gD_f^2}{64\Vert A\Vert^2}\Big) \leq 0, \end{aligned} $$
(4.68)

whenever \(G_{\beta _1}(\bar {w}^1) \leq \frac {9\mu _gD_f^2}{64\Vert A\Vert ^2}\). Since \(\bar {u}^1\) is given by (4.26), with the same argument as the proof of Lemma 4, we can show that if \(\frac {1}{\beta _1} \leq \frac {5\eta _0}{2} - \frac {\Vert A\Vert ^2\eta _0^2}{\mu _g}\), then \(G_{\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2\). However, from the update rule (4.27), we can see that \(\eta _0 = \frac {\mu _g}{2\Vert A\Vert ^2}\) and \(\beta _1 = \frac {18\Vert A\Vert ^2}{16\mu _g}\). Using these quantities, we can clearly show that \(\frac {1}{\beta _1} \leq \frac {5\eta _0}{2} - \frac {\Vert A\Vert ^2\eta _0^2}{\mu _g} = \frac {\mu _g}{\Vert A\Vert ^2}\). Moreover, \(G_{\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2 < \frac {9\mu _g}{64\Vert A\Vert ^2}D_f^2\). Hence, (4.68) holds. Finally, it remains to use Lemma 2 to obtain (4.28). The second part in (4.30) is proved similarly. The estimate (4.31) is a direct consequence of (4.68). \(\square \)

4.1.4 Convergence Analysis of Algorithm 2

This appendix provides full proof of Lemmas and Theorems related to the convergence of Algorithm 2.

4.1.4.1 Proof of Lemma 6: Gap Reduction Condition

We first require the following key lemma to analyze the convergence of our SADMM scheme, whose proof is similar to (4.55) and we omit the details here.

Lemma 9

Let \(\bar {\lambda }^{k+1}\) be generated by SADMM. Then, for \(\lambda \in \mathbb {R}^n\) , one has

$$\displaystyle \begin{aligned} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) \leq \tilde{\ell}_{\gamma_{k+1}}(\lambda) + \tfrac{1}{\eta_k}\langle \bar{\lambda}^{k+1} -\hat{\lambda}^k, \lambda - \hat{\lambda}^k\rangle - \tfrac{1}{\eta_k}\Vert \hat{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2 + \tfrac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \tilde{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2, \end{aligned} $$

where \(\tilde {\lambda }^k := \hat {\lambda }^k - \rho _k(A\hat {u}^{k+1} + B\hat {v}^k - c)\) and \(\tilde {\ell }_{\gamma }(\lambda ) := \varphi _{\gamma }(\tilde {\lambda }^k) + \langle \nabla {\varphi _{\gamma }}(\tilde {\lambda }^k), \lambda - \tilde {\lambda }^k\rangle + \psi (\lambda )\).

Now, we can prove Lemma 6. We still use the same notations as in the proof of Lemma 3. In addition, let us denote by \(\hat {u}^{*}_{k+1} := u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k)\) and \(\bar {u}^{\ast }_{k+1} := u^{\ast }_{\gamma _{k+1}}(A^{\top }\bar {\lambda }^k)\) given in (4.12), \(\tilde {z}^k := A\hat {u}^{k+1} + B\hat {v}^k - c\) and \(\breve {D}_k := \Vert A\hat {u}^{*}_{k+1} + B\hat {v}^k - c\Vert \).

First, since \(\varphi _{\gamma }(\tilde {\lambda }^k) + \langle \nabla {\varphi _{\gamma }}(\tilde {\lambda }^k), \lambda - \tilde {\lambda }^k\rangle \leq \varphi _{\gamma }(\lambda )\), it follows from Lemma 9 that

$$\displaystyle \begin{aligned} \begin{array}{ll} d_{\gamma_{k+1}}(\bar{\lambda}^{k+1}) &\leq d_{\gamma_{k+1}}(\lambda) + \tfrac{1}{\eta_k}\langle \bar{\lambda}^{k+1} -\hat{\lambda}^k, \lambda - \hat{\lambda}^k\rangle - \tfrac{1}{\eta_k}\Vert \hat{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2 \vspace{1ex}\\ & \quad + \tfrac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \tilde{\lambda}^k - \bar{\lambda}^{k+1}\Vert^2. \end{array} \end{aligned} $$
(4.69)

Next, using [26, Theorem 2.1.5 (2.1.10)] with \(g^{\ast }_{\gamma }\) defined in (4.11) and \(\lambda := (1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k\) for any τ k ∈ [0, 1], we have

$$\displaystyle \begin{aligned} \varphi_{\gamma_{k+1}}(\lambda) \leq (1-\tau_k)\varphi_{\gamma_{k+1}}(\bar{\lambda}^k) + \tau_k\varphi_{\gamma_{k+1}}(\hat{\lambda}^k) - \frac{\tau_k(1-\tau_k)\gamma_{k+1}}{2}\Vert \hat{u}^{*}_{k+1} - \bar{u}^{*}_{k+1}\Vert^2. \end{aligned} $$
(4.70)

Since ψ is convex, we also have \(\psi (\lambda ) \leq (1-\tau _k)\psi (\bar {\lambda }^k) + \tau _k\psi (\hat {\lambda }^k)\) and \(\lambda - \hat {\lambda }^k = (1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k - \hat {\lambda }^k = \tau _k(\hat {\lambda }^k - \lambda ^{\ast }_k)\) due to (4.33). Combining these expressions, the definition d γ := φ γ + ψ, (4.69), and (4.70), we can derive

(4.71)

On the one hand, since \(\hat {u}^{k+1}\) is the solution of the first convex subproblem in SADMM, using its optimality condition, we can show that

$$\displaystyle \begin{aligned} \begin{array}{ll} \varphi_{\gamma_{k+1}}(\hat{\lambda}^k) - \frac{\rho_k}{2}\breve{D}_k^2 &= \langle \hat{\lambda}^k, A\hat{u}^{*}_{k+1}\rangle - g(\hat{u}^{*}_{k+1}) - \gamma_{k+1}b_{\mathcal{U}}(\hat{u}^{*}_{k+1},\bar{u}^c) - \frac{\rho_k}{2}\breve{D}_k^2\vspace{1ex}\\ &\leq \langle \hat{\lambda}^k,A\hat{u}^{k+1}\rangle - g(\hat{u}^{k+1}) - \frac{\rho_k}{2}\Vert\tilde{z}^k\Vert^2 - \gamma_{k+1}b_{\mathcal{U}}(\hat{u}^{k+1}, \bar{u}_c)\vspace{1ex}\\ &\quad - \frac{\rho_k}{2}\Vert A(\hat{u}^{*}_{k+1} - \hat{u}^{k+1})\Vert^2 - \frac{\gamma_{k+1}}{2}\Vert \hat{u}^{*}_{k+1} - \hat{u}^{k+1} \Vert^2. \end{array} \end{aligned} $$
(4.72)

On the other hand, similar to the proof of Lemma 3, we can show that

(4.73)

Combining (4.72) and (4.73) and noting that d γ := φ γ + ψ, we have

(4.74)

Next, using the strong convexity of \(b_{\mathcal {U}}\) with \(\mu _{b_{\mathcal {U}}} = 1\), we can show that

(4.75)

Combining (4.71), (4.59), (4.74) and (4.75), we can derive

(4.76)
$$\displaystyle \begin{aligned} \begin{array}{ll} \hat{R}_k &:= \frac{\gamma_{k+1}}{2}(1-\tau_k)\tau_k\Vert \hat{u}^{\ast}_{k+1} - \bar{u}^{\ast}_{k+1}\Vert^2 + \frac{\gamma_{k+1}}{4}\tau_k\Vert \hat{u}^{\ast}_{k+1} - \bar{u}_c \Vert^2 \vspace{1ex}\\ &\quad - (1 - \tau_k)(\gamma_k - \gamma_{k+1})b_{\mathcal{U}}(\bar{u}^{\ast}_{k+1}, \bar{u}^c). \end{array} \end{aligned} $$
(4.77)

From SADMM, we have \(\bar {\lambda }^{k+1} - \hat {\lambda }^k = -\eta _k\hat {z}^{k+1}\) and \(\tilde {\lambda }^k - \hat {\lambda }^k = -\rho _k\tilde {z}^k\). Plugging these expressions and (4.77) into (4.76) we can simplify this estimate as

(4.78)

Using again the elementary inequality \(\nu \Vert a\Vert ^2 + \kappa \Vert b\Vert ^2 \geq \frac {\nu \kappa }{\nu +\kappa }\Vert a - b\Vert ^2\), under the condition \(\gamma _{k+1} \geq \Vert A\Vert ^2\left (\eta _k + \frac {\rho _k}{\tau _k}\right )\) in (4.34), we can show that

$$\displaystyle \begin{aligned} \frac{1}{2\eta_k}\Vert \bar{\lambda}^{k+1} - \hat{\lambda}^k\Vert^2 + \frac{\tau_k}{2\rho_k}\Vert \tilde{\lambda}^k - \hat{\lambda}^k\Vert^2 - \frac{\Vert A\Vert^2}{2\gamma_{k+1}}\Vert \bar{\lambda}^{k+1} - \tilde{\lambda}^k\Vert^2 \geq 0. \end{aligned} $$
(4.79)

On the other hand, similar to the proof of Lemma 3, we can show that \(\frac {\eta _k}{4}\Vert \hat {z}^{k+1}\Vert ^2 - \frac {\tau _k\eta _k}{2}\Vert \hat {z}^{k+1}\Vert D_k \geq - \frac {\eta _k\tau _k^2}{4}D_k^2\). Using this inequality, (4.79), and \(\lambda ^{*}_k = -\frac {1}{\beta _k}\bar {z}^k\), we can simplify (4.78) as

(4.80)

Since β k+1 ≥ (1 − τ k)β k due to (4.34), similar to the proof of (4.62) we have

(4.81)

Combining (4.80) and (4.81), we get

$$\displaystyle \begin{aligned} \varDelta{G}_k \geq \frac{1}{2}\Big[ \Big(\frac{1}{2} + \tau_k\Big)\eta_k - \frac{\tau_k^2}{(1 - \tau_k)\beta_k}\Big]\Vert\hat{z}^{k+1}\Vert^2 + \hat{R}_k - \left(\frac{\eta_k\tau_k^2}{4}D_k^2 + \frac{\tau_k\rho_k}{2}\breve{D}_k^2\right). \end{aligned} $$
(4.82)

Next, we estimate \(\hat {R}_k\) defined by (4.77) as follows. We define \(\bar {a}_k := \bar {u}^{*}_{k+1} - \bar {u}_c\), \(\hat {a}_k := \hat {u}^{*}_{k+1} - \bar {u}_c\). Using \(b_{\mathcal {U}}(\bar {u}^{\ast }_{k+1}, \bar {u}^c) \leq \frac {L_b}{2}\Vert \bar {u}^{\ast }_{k+1} - \bar {u}^c\Vert ^2\), we can write \(\hat {R}_k\) explicitly as

$$\displaystyle \begin{aligned} \begin{array}{ll} \frac{2\hat{R}_k}{\gamma_{k+1}} &= (1 - \tau_k)\tau_k\Vert \bar{a}_k - \hat{a}_k\Vert^2 + \frac{\tau_k}{2}\Vert \hat{a}_k\Vert^2 - (1 - \tau_k)\big(\frac{\gamma_{k}}{\gamma_{k+ 1}} - 1\big)L_b\Vert \bar{a}_k\Vert^2\vspace{1ex}\\ &= \tau_k\left(\frac{3}{2} -\tau_k\right)\left\Vert \hat{a}_k - \frac{(1-\tau)}{(3/2-\tau_k)}\bar{a}_k\right\Vert^2 + (1-\tau_k)\left[\frac{\tau_k}{3-2\tau_k} + \left(1- \frac{\gamma_k}{\gamma_{k+1}}\right)L_b\right]\Vert\bar{a}\Vert^2. \end{array} \end{aligned}$$

Since \(\gamma _{k+1} \geq \left (\frac {3-2\tau _k}{3 - (2-L_b^{-1})\tau _k}\right )\gamma _k\) due to (4.34), it is easy to show that \(\hat {R}_k \geq 0\). In addition, by (4.34), we also have \((1 + 2\tau _k)\eta _k - \frac {2\tau _k^2}{(1 - \tau _k)\beta _k} \geq 0\). Using these conditions, we can show from (4.82) that \(\varDelta {G}_k \geq - \frac {\eta _k\tau _k^2}{4}D_k^2 - \frac {\tau _k\rho _k}{2}\breve {D}_k^2 \geq -\left (\frac {\tau _k^2\eta _k}{4} + \frac {\tau _k\rho _k}{2}\right )D_f^2\), which is indeed the gap reduction condition (4.35). \(\square \)

4.1.4.2 Proof of Lemma 7: Parameter Updates

Similar to the proof of Lemma 5, we can show that the optimal rate of \(\left \{\tau _k\right \}\) is \(\mathcal {O}(1/k)\). From the conditions (4.34), it is clear that if we choose \(\tau _k := \frac {3}{k+4}\) then \(0 < \tau _k \leq \frac {3}{4} < 1\) for k ≥ 0. Next, we choose \(\gamma _{k+1} := \left (\frac {3-2\tau _k}{3-\tau _k}\right )\gamma _k\). Then γ k satisfies (4.34). Substituting \(\tau _k = \frac {3}{k+4}\) into this formula we have \(\gamma _{k+1} = \left (\frac {k+2}{k+3}\right )\gamma _k\). By induction, we obtain \(\gamma _{k+1} = \frac {3\gamma _1}{k+3}\). Now, we choose \(\eta _k := \frac {\gamma _{k+1}}{2\Vert A\Vert ^2} = \frac {3\gamma _1}{2\Vert A\Vert ^2(k+3)}\). Then, from the last condition of (4.34), we choose \(\rho _k := \frac {\tau _k\gamma _{k+1}}{2\Vert A\Vert ^2} = \frac {9\gamma _1}{2\Vert A\Vert ^2(k+3)(k+4)}\).

To derive an update for β k, from the third condition of (4.34) with equality, we can derive \(\beta _k = \frac {2\tau _k^2}{(1-\tau _k)(1+2\tau _k)\eta _k} = \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)} < \frac {9\Vert A\Vert ^2}{5\gamma _1(k+1)}\). We need to check the second condition β k+1 ≥ (1 − τ k)β k in (4.34). Indeed, we have \(\beta _{k+1} = \frac {6\Vert A\Vert ^2(k+4)}{\gamma _1(k+2)(k+11)} \geq (1 - \tau _k)\beta _k = \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)}\), which is true for all k ≥ 0. Hence, the second condition of (4.34) holds. \(\square \)

4.1.4.3 Proof of Theorem 2: Convergence of Algorithm 2

First, we check the conditions of Lemma 4. From the update rule (4.36), we have \(\eta _0 = \frac {\gamma _1}{2\Vert A\Vert ^2}\) and \(\beta _1 = \frac {12\Vert A\Vert ^2}{11\gamma _1}\). Hence, 5γ 1 = 10∥A2η 0 > 2∥A2η 0, which satisfies the first condition of Lemma 4. Now, \(\frac {2\gamma _1}{(5\gamma _1-2\eta _0\Vert A\Vert ^2)\eta _0} = \frac {\Vert A\Vert ^2}{\gamma _1} < \frac {12\Vert A\Vert ^2}{11\gamma _1} = \beta _1\). Hence, the second condition of Lemma 4 holds.

Next, since \(\tau _k = \frac {3}{k+4}\), \(\rho _k = \frac {9\gamma _1}{2\Vert A\Vert ^2(k+3)(k+4)}\) and \(\eta _k = \frac {3\gamma _1}{2\Vert A\Vert ^2(k+3)}\), we can derive

$$\displaystyle \begin{aligned} \begin{array}{ll} \frac{\tau_k^2\eta_k}{4} + \frac{\tau_k\rho_k}{2} &= \frac{81\gamma_1}{8\Vert A\Vert^2(k + 3)(k + 4)^2} \vspace{1ex}\\ & < \frac{81\gamma_1}{8\Vert A\Vert^2(k+3)(k+4)} - \left(1 - \tau_k\right)\frac{81\gamma_1}{8\Vert A\Vert^2(k+2)(k + 3)}. \end{array} \end{aligned} $$

Substituting this inequality into (4.35) and rearrange the result we obtain

$$\displaystyle \begin{aligned} G_{k+1}(\bar{w}^{k+1}) - \frac{81\gamma_1D_f^2}{8\Vert A\Vert^2(k+3)(k+4)} \leq (1 - \tau_k)\Big[G_k(\bar{w}^k) - \frac{81\gamma_1D_f^2}{8\Vert A\Vert^2(k+2)(k+3)}\Big]. \end{aligned} $$

By induction, we obtain \(G_k(\bar {w}^k) - \frac {81\gamma _1D_f^2}{8\Vert A\Vert ^2(k+2)(k+3)} \leq \omega _k\Big [G_0(\bar {w}^0) -\frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\Big ] \leq 0\) as long as \(G_0(\bar {w}^0) \leq \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\). Now using Lemma 4, we have \(G_0(\bar {w}^0) \leq \frac {\eta _0}{4}D_f^2 = \frac {\gamma _1}{8\Vert A\Vert ^2}D_f^2 < \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\). Hence, \(G_k(\bar {w}^k) \leq \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2(k+2)(k+3)}\).

Finally, by using Lemma 2 with \(\beta _k := \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)}\) and \(\beta _k \leq \frac {9\Vert A\Vert ^2}{5\gamma _1(k+1)}\), and simplifying the results, we obtain the bounds in (4.37). If we choose γ 1 := ∥A∥ then, we obtain the worst-case iteration-complexity of Algorithm 2 is \(\mathcal {O}(\varepsilon ^{-1})\). \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tran-Dinh, Q., Cevher, V. (2018). Smoothing Alternating Direction Methods for Fully Nonsmooth Constrained Convex Optimization. In: Giselsson, P., Rantzer, A. (eds) Large-Scale and Distributed Optimization. Lecture Notes in Mathematics, vol 2227. Springer, Cham. https://doi.org/10.1007/978-3-319-97478-1_4

Download citation

Publish with us

Policies and ethics