Abstract
We propose two new alternating direction methods to solve “fully” nonsmooth constrained convex problems. Our algorithms have the best known worst-case iteration-complexity guarantee under mild assumptions for both the objective residual and feasibility gap. Through theoretical analysis, we show how to update all the algorithmic parameters automatically with clear impact on the convergence performance. We also provide a representative numerical example showing the advantages of our methods over the classical alternating direction methods using a well-known feasibility problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
A. Alotaibi, P.L. Combettes, N. Shahzad, Best approximation from the Kuhn-Tucker set of composite monotone inclusions. Numer. Funct. Anal. Optim. 36(12), 1513–1532 (2015)
H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer, Berlin, 2011)
A. Beck, M. Teboulle, A fast dual proximal gradient algorithm for convex minimization and applications. Oper. Res. Lett. 42(1), 1–6 (2014)
J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for non-convex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
R.S. Burachik, V. Martín-Márquez, An approach for the convex feasibility problem via monotropic programming. J. Math. Anal. Appl. 453(2), 746–760 (2017)
X. Cai, D. Han, X. Yuan, On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)
E. Candès, B. Recht, Exact matrix completion via convex optimization. Commun. ACM 55(6), 111–119 (2012)
V. Cevher, S. Becker, M. Schmidt, Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. 31(5), 32–43 (2014)
A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
D. Davis, W. Yin, Convergence rate analysis of several splitting schemes, in Splitting Methods in Communication, Imaging, Science, and Engineering (Springer, Cham, 2016), pp. 115–163
D. Davis, W. Yin, Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017)
D. Davis, W. Yin, A three-operator splitting scheme and its optimization applications. Tech. Report. (2015)
W. Deng, W. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
J. Eckstein, D. Bertsekas, On the Douglas Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
E. Ghadimi, A. Teixeira, I. Shames, M. Johansson, Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)
T. Goldstein, B. ODonoghue, S. Setzer, Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2012)
B. He, X. Yuan, On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2012)
B. He, X. Yuan, On the O(1∕n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
T. Lin, S. Ma, S. Zhang, On the global linear convergence of the ADMM with multi- block variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
T. Lin, S. Ma, S. Zhang, Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 69(1), 52–81 (2016)
T. Lin, S. Ma, S. Zhang, An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)
I. Necoara, J. Suykens, Applications of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)
A. Nemirovskii, Prox-method with rate of convergence O(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
A. Nemirovskii, D. Yudin, Problem Complexity and Method Efficiency in Optimization (Wiley Interscience, New York, 1983)
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization, vol. 87 (Kluwer Academic Publishers, Norwell, 2004)
Y. Nesterov, Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Y. Ouyang, Y. Chen, G. Lan, E.J. Pasiliao, An accelerated linearized alternating direction method of multiplier. SIAM J. Imaging Sci. 8(1), 644–681 (2015)
N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)
R.T. Rockafellar, Convex Analysis. Princeton Mathematics Series, vol. 28 (Princeton University Press, Princeton, 1970)
R. Shefi, M. Teboulle, Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)
R. Shefi, M. Teboulle, On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
F. Simon, R. Holger, A Mathematical Introduction to Compressive Sensing (Springer, New York, 2013)
M. Tao, X. Yuan, On the O(1∕t)-convergence rate of alternating direction method with logarithmic-quadratic proximal regularization. SIAM J. Optim. 22(4), 1431–1448 (2012)
Q. Tran-Dinh, V. Cevher, Constrained convex minimization via model-based excessive gap, in Proceedings of the Neural Information Processing Systems (NIPS), Montreal, vol. 27 Dec. 2014, pp. 721–729
Q. Tran-Dinh, O. Fercoq, V. Cevher, A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28, 96–134 (2018)
P. Tseng, D. Bertsekas, Relaxation methods for problems with strictly convex cost and linear constraints. Math. Oper. Res. 16(3), 462–481 (1991)
W. Wang, A. Banerjee, Bregman alternating direction method of multipliers, in Advances in Neural Information Processing Systems 27 (NIPS 2014), pp. 1–9
E. Wei, A. Ozdaglar, On the O(1∕k)-convergence of asynchronous distributed alternating direction method of multipliers, in Global Conference on Signal and Information Processing (GlobalSIP) (IEEE, Piscataway, 2013), pp. 551–554
Acknowledgements
QTD’s work was supported in part by the NSF-grant No. DMS-1619884, USA. VC’s work was supported by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 725594—time-data). The authors would like to acknowledge Dr. C.B., Vu, and Dr. V.Q. Nguyen with their help on verifying the technical proofs and the numerical experiment. The authors also thank Mr. Ahmet Alacaoglu, Mr. Nhan Pham, and Ms. Yuzixuan Zhu for their careful proofreading.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Proofs of Technical Results
Appendix: Proofs of Technical Results
This appendix provides full proofs of technical results presented in the main text.
4.1.1 Proof of Lemma 2: The Primal-Dual Bounds
First, using the fact that \(-d(\lambda ) \leq -d^{\star } = f^{\star } \leq \mathcal {L}(x, \lambda ^{\star }) = f(x) + \langle \lambda ^{\star }, Au + Bv - c\rangle \leq f(x) + \Vert \lambda ^{\star }\Vert \Vert Au + Bv - c\Vert \), we get
which is exactly the lower bound (4.14).
Next, since A ⊤λ ⋆ ∈ ∂g(u ⋆) due to (4.8), by Fenchel-Young’s inequality, we have g(u ⋆) + g ∗(A ⊤λ ⋆) = 〈A ⊤λ ⋆, u ⋆〉, which implies g ∗(A ⊤λ ⋆) = 〈A ⊤λ ⋆, u ⋆〉− g(u ⋆). Using this relation and the definition of φ γ, we have
Alternatively, we have ψ(λ) ≥ ψ(λ ⋆) + 〈∇ψ(λ ⋆), λ − λ ⋆〉, where ∇ψ(λ ⋆) = B∇h ∗(B ⊤λ ⋆) − c = Bv ⋆ − c due to the last relation in (4.8), where ∇h ∗(B ⊤λ ⋆) ∈ ∂h ∗(B ⊤λ ⋆) is one subgradient of ∂h ∗. Hence, ψ(λ) ≥ ψ(λ ⋆) + 〈λ − λ ⋆, Bv ⋆ − c〉. Adding this inequality to the last estimation with the fact that d γ = φ γ + ψ and d = φ + ψ, we obtain
Using this inequality with d ⋆ = −f ⋆ and the definition (4.13) of f β we have
Let \(S := G_{\gamma \beta }(w) + \gamma b_{\mathcal {U}}(u^{\star },\bar {u}^c)\). Then, by dropping the last term \(- \frac {1}{2\beta }\Vert Au + Bv - c\Vert ^2\) in (4.49), we obtain the first inequality of (4.15).
Let t := ∥Au + Bv − c∥. Using again (4.47) and (4.49), we can see that \(\frac {1}{2\beta }t^2 - \Vert \lambda ^{\star }\Vert t - S \leq 0\). Solving this quadratic inequation w.r.t. t and noting that t ≥ 0, we obtain the second bound of (4.15). The last estimate of (4.15) is a direct consequence of (4.49), the first one of (4.15). Finally, from (4.47), we have f(x) ≥ f ⋆ −∥λ ⋆∥∥Au + Bv − c∥. Substituting this into (4.49) we get \(d(\lambda ) - d^{\star } - \Vert \lambda ^{\star }\Vert \Vert Au + Bv - c\Vert \leq S - \frac {1}{2\beta }\Vert Au + Bv - c\Vert ^2\), which implies
By discarding − (1∕(2β))∥Au + Bv − c∥2 and using the second estimate of (4.15) into the last estimate, we obtain the last inequality of (4.15). \(\square \)
4.1.2 Convergence Analysis of Algorithm 1
We provide a full proof of Lemmas and Theorems related to the convergence of Algorithm 1. First, we prove the following key lemma, which will be used to prove Lemma 3.
Lemma 8
Let \(\bar {\lambda }^{k+1}\) be generated by (SAMA). Then
where
In addition, for any z, γ k, γ k+1 > 0, the function \(g_{\gamma }^{\ast }\) defined by (4.11) satisfies
Proof
First, it is well-known that SAMA is equivalent to the proximal-gradient step applying to the smoothed dual problem
This proximal-gradient step can be presented as
We write down the optimality condition of this corresponding minimization problem of this step as
Using this condition and the convexity of ψ, for any \(\nabla {\psi }(\bar {\lambda }^{k+1})\in \partial {\psi }(\bar {\lambda }^{k+1})\), we have
Next, by the definition \(\varphi _{\gamma }(\lambda ) := g^{\ast }_{\gamma }(A^{\top }\lambda )\), we can show from (4.11) that \(\hat {u}^{k+1} = u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k)\). Since \(g^{\ast }_{\gamma }\) is (1∕γ)-Lipschitz gradient continuous, we have
Using this inequality with γ := γ k+1, \(\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\lambda ) = u^{\ast }_{\gamma _{k+1}}(A^{\top }\lambda )\), \(\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\hat {\lambda }^k) = u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k) = \hat {u}^{k+1}\), and \(\nabla {\varphi _{\gamma _{k+1}}}(\lambda ) = A\nabla {g^{\ast }_{\gamma _{k+1}}}(A^{\top }\lambda )\), we have
Using (4.54) with \(\lambda = \bar {\lambda }^{k+1}\), we have
Summing up this inequality and (4.53), then using the definition of \(\hat {\ell }_{\gamma _{k+1}}(\lambda )\) in (4.51), we obtain
Here, the second inequality in (4.51) follows from the right-hand side of (4.54).
Now, using (4.55) with \(\lambda := \bar {\lambda }^k\), then combining with (4.51), we get
Multiplying the last inequality by 1 − τ k ∈ [0, 1] and (4.55) by τ k ∈ [0, 1], then summing up the results, we obtain (4.50).
Finally, from (4.11), since \(g^{\ast }_{\gamma }(z) := \max _{u}\{P(u, \gamma ; z) := \langle z, u\rangle - g(u) - \gamma b_{\mathcal {U}}(u;\bar {u}^c)\}\), is the maximization of P over u indexing in γ and z, which is concave in u and linear in γ, we have \(g^{\ast }_{\gamma }(z)\) is convex w.r.t. γ > 0. Moreover, \(\frac {d g^{\ast }_{\gamma }(z)}{d\gamma } = -b_{\mathcal {U}}(u^{\ast }_{\gamma }(z), \bar {u}^c)\). Hence, using the convexity of \(g^{\ast }_{\gamma }\) w.r.t. γ > 0, we have \(g^{\ast }_{\gamma _k}(z) \geq g^{\ast }_{\gamma _{k+1}}(z) - (\gamma _k - \gamma _{k+1})b_{\mathcal {U}}(u^{\ast }_{\gamma }(z), \bar {u}^c)\), which is indeed (4.52). □
4.1.2.1 Proof of Lemma 4: Bound on G γβ for the First Iteration
Since \(\bar {w}^1 := (\bar {u}^1, \bar {v}^1, \bar {\lambda }^1)\) is updated by (4.19), similar to (SAMA), we can use (4.55) with k = 0, \(\lambda := \hat {\lambda }^0\) and \(\hat {\ell }_{\gamma _1}(\hat {\lambda }^0) \leq d_{\gamma _1}(\hat {\lambda }^0)\) to obtain
Since \(\bar {v}^1\) solves the second problem in (4.19) and \(v^{\ast }(\hat {\lambda }^0) \in \mathrm {dom}\left (h\right )\), we have
Using D f in (4.9), this inequality implies
Using the definition of d γ, we further estimate (4.56) using (4.57) as follows:
Since \(G_{\gamma _1\beta _1}(\bar {w}^1) = f_{\beta _1}(\bar {x}^1) + d_{\gamma _1}(\bar {\lambda }^1)\), we obtain (4.20) from the last inequality. If \(\beta _1 \geq \frac {2\gamma _1}{\eta _0(5\gamma _1 - 2\Vert A\Vert ^2\eta _0)}\), then (4.20) leads to \(G_{\gamma _1\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2 + \frac {1}{\eta _0}\langle \hat {\lambda }^0, \bar {\lambda }^1 - \hat {\lambda }^0\rangle \). \(\square \)
4.1.2.2 Proof of Lemma 3: Gap Reduction Condition
For notational simplicity, we first define the following abbreviations
From SAMA, we have \(\bar {\lambda }^{k+1} - \hat {\lambda }^k = \eta _k(c - A\hat {u}^{k+1} - B\hat {v}^{k+1}) = -\eta _k\hat {z}^{k+1}\). In addition, by (4.16), we have \(\hat {\lambda }^k = (1-\tau _k)\bar {\lambda }^k + \tau _k\lambda _k^{*}\), which leads to \((1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k - \hat {\lambda }^k = \tau _k(\hat {\lambda }^k - \lambda _k^{*})\). Using these expressions into (4.50) with \(\lambda := \hat {\lambda }^k\), and then using (4.51) with \(\hat {\ell }_{\gamma _{k+1}}(\hat {\lambda }^k) \leq d_{\gamma _{k+1}}(\hat {\lambda }^k)\), we obtain
By (4.52) with the fact that \(\varphi _{\gamma }(\lambda ) := g^{\ast }_{\gamma }(A^{\top }\lambda )\), for any γ k+1 > 0 and γ k > 0, we have
Using this inequality and the fact that d γ := φ γ + ψ, we have
Next, using \(\hat {v}^{k+1}\) from SAMA and its optimality condition, we can show that
Since ψ(λ) := h ∗(B ⊤λ) − c ⊤λ, this inequality leads to
Now, by this estimate, \(d_{\gamma _{k+1}} = \varphi _{\gamma _{k+1}} + \psi \) and SAMA, we can derive
Combining this inequality, (4.58) and (4.59), we obtain
Now, using the definition G k, we have
Let us define \(\varDelta {G}_k := (1-\tau _k)G_k(\bar {w}^k) - G_{k+1}(\bar {w}^{k+1})\). Then, we can show that
By (4.16), we have \(\bar {z}^{k+1} = (1-\tau _k)\bar {z}^k + \tau _k\hat {z}^{k+1}\). Using this expression and the condition β k+1 ≥ (1 − τ k)β k in (4.17), we can easily show that
Substituting this inequality into (4.61), and using the convexity of f, we further get
Substituting (4.60) into (4.62) and using \(\lambda ^{*}_k := \frac {1}{\beta _k}(c - A\bar {u}^k - B\bar {v}^k) = -\frac {1}{\beta _k}\bar {z}^k\), we obtain
where
Furthermore, we have
Using this estimate into (4.63), we finally get
Next step, we estimate R k. Let \(\bar {a}_k := \bar {u}^{*}_{k+1} - \bar {u}_c\), \(\hat {a}_k := \hat {u}^{k+1} - \bar {u}_c\). Using the smoothness of \(b_{\mathcal {U}}\), we can estimate R k explicitly as
By the condition \((1+L_b^{-1}\tau _k)\gamma _{k+1} \geq \gamma _k\) in (4.17), we have \(\tau _k - (\gamma _{k+1}^{-1}\gamma _{k} - 1)L_b\geq 0\). Using this condition in (4.65), we obtain R k ≥ 0. Finally, by (4.9) we can show that D k ≤ D f. Using this inequality, R k ≥ 0, and the second condition of (4.17), we can show from (4.63) that \(\varDelta {G}_k \geq -\frac {\eta _k\tau _k^2}{4}D_f^2\), which implies (4.18). \(\square \)
4.1.2.3 Proof of Lemma 5: Parameter Updates
The tightest update for γ k and β k is \(\gamma _{k+1} := \frac {\gamma _k}{\tau _k+1}\) and β k+1 := (1 − τ k)β k due to (4.17). Using these updates in the third condition in (4.17) leads to \(\frac {(1-\tau _{k+1})^2}{(1+\tau _{k+1})\tau _{k+1}^2} \geq \frac {1-\tau _k}{\tau _k^2}\). By directly checking this condition, we can see that \(\tau _k = \mathcal {O}(1/k)\) which is the optimal choice.
Clearly, if we choose \(\tau _k := \frac {3}{k+4}\), then 0 < τ k < 1 for k ≥ 0 and τ 0 = 3∕4. Next, we choose \(\gamma _{k+1} := \frac {\gamma _k}{1+\tau _k/3} \geq \frac {\gamma _k}{1+\tau _k}\). Substituting \(\tau _k = \frac {3}{k+4}\) into this formula we have \(\gamma _{k+1} = \left (\frac {k+4}{k+5}\right )\gamma _k\). By induction, we obtain \(\gamma _{k+1} = \frac {5\gamma _1}{k+5}\). This implies \(\eta _k = \frac {5\gamma _1}{2\Vert A\Vert ^2(k+5)}\). With \(\tau _k = \frac {3}{k+4}\) and \(\gamma _{k+1} = \frac {5\gamma _1}{k+5}\), we choose β k from the third condition of (4.17) as \(\beta _k = \frac {2\Vert A\Vert ^2\tau _k^2}{(1-\tau _k^2)\gamma _{k+1}} = \frac {18\Vert A\Vert ^2(k+5)}{5\gamma _1(k+1)(k+7)}\) for k ≥ 1. Using the value of τ k and β k, we need to check the second condition β k+1 ≥ (1 − τ k)β k of (4.17). Indeed, this condition is equivalent to 2k 2 + 28k + 88 ≥ 0, which is true for all k ≥ 0. From the update rule of β k, it is obvious that \(\beta _k \leq \frac {18\Vert A\Vert ^2}{5\gamma _1(k+1)}\). \(\square \)
4.1.2.4 Proof of Theorem 1: Convergence of Algorithm 1
We estimate the term \(\tau _k^2\eta _k\) in (4.18) as
Combing this estimate and (4.18), we get
By induction, we have \(G_k(\bar {w}^k) - \frac {45\gamma _1D_f^2}{8\Vert A\Vert ^2(k+3)(k+4)} \leq \omega _k[G_1(\bar {w}^1) - \frac {9\gamma _1}{32\Vert A\Vert ^2}D_f^2] \leq 0\) whenever \(G_1(\bar {w}^1) \leq \frac {3\gamma _1}{4\Vert A\Vert ^2}D_f\), where \(\omega _k := \prod _{i=1}^{k-1}(1-\tau _i)\). Hence, we finally get
Since \(\eta _0 = \frac {\gamma _1}{2\Vert A\Vert ^2}\), it satisfies the condition 5γ 1 > 2η 0∥A∥2 in Lemma 4. In addition, from Lemma 5, we have \(\beta _1 = \frac {27\Vert A\Vert ^2}{20\gamma _1} > \frac {\Vert A\Vert ^2}{\gamma _1}\), which satisfies the second condition in Lemma 4. We also note that \(\beta _k \leq \frac {18\Vert A\Vert ^2}{5\gamma _1(k+1)}\). If we take \(\hat {\lambda }^0 = \boldsymbol {0}^m\), then Lemma 4 shows that \(G_{\gamma _1\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{2}D_f^2 = \frac {\gamma _1}{4\Vert A\Vert ^2}D_f^2 < \frac {9\gamma _1}{32\Vert A\Vert ^2}D_f^2\). Using this estimate and (4.66) into Lemma 2, we obtain (4.23). Finally, if we choose γ 1 := ∥A∥, then we obtain the worst-case iteration-complexity of Algorithm 1 is \(\mathcal {O}(\varepsilon ^{-1})\). \(\square \)
4.1.3 Proof of Corollary 1: Strong Convexity of g
First, we show that if condition (4.24) hold, then (4.25) holds. Since ∇φ given by (4.5) is Lipschitz continuous with \(L_{d^g_0} := \mu _g^{-1}\Vert A\Vert ^2\), similar to the proof of Lemma 3, we have
where \(\varDelta {G_{\beta _k}} := (1-\tau _k)G_{\beta _k}(\bar {w}^k) - G_{\beta _{k+1}}(\bar {w}^{k+1})\). Under the condition (4.24), (4.67) implies (4.25).
The update rule (4.27) is in fact derived from (4.24). We finally prove the bounds (4.28). First, we consider the product \(\tau ^2_k\eta _k\). By (4.27) we have
By induction, it follows from (4.25) and this last expression that:
whenever \(G_{\beta _1}(\bar {w}^1) \leq \frac {9\mu _gD_f^2}{64\Vert A\Vert ^2}\). Since \(\bar {u}^1\) is given by (4.26), with the same argument as the proof of Lemma 4, we can show that if \(\frac {1}{\beta _1} \leq \frac {5\eta _0}{2} - \frac {\Vert A\Vert ^2\eta _0^2}{\mu _g}\), then \(G_{\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2\). However, from the update rule (4.27), we can see that \(\eta _0 = \frac {\mu _g}{2\Vert A\Vert ^2}\) and \(\beta _1 = \frac {18\Vert A\Vert ^2}{16\mu _g}\). Using these quantities, we can clearly show that \(\frac {1}{\beta _1} \leq \frac {5\eta _0}{2} - \frac {\Vert A\Vert ^2\eta _0^2}{\mu _g} = \frac {\mu _g}{\Vert A\Vert ^2}\). Moreover, \(G_{\beta _1}(\bar {w}^1) \leq \frac {\eta _0}{4}D_f^2 < \frac {9\mu _g}{64\Vert A\Vert ^2}D_f^2\). Hence, (4.68) holds. Finally, it remains to use Lemma 2 to obtain (4.28). The second part in (4.30) is proved similarly. The estimate (4.31) is a direct consequence of (4.68). \(\square \)
4.1.4 Convergence Analysis of Algorithm 2
This appendix provides full proof of Lemmas and Theorems related to the convergence of Algorithm 2.
4.1.4.1 Proof of Lemma 6: Gap Reduction Condition
We first require the following key lemma to analyze the convergence of our SADMM scheme, whose proof is similar to (4.55) and we omit the details here.
Lemma 9
Let \(\bar {\lambda }^{k+1}\) be generated by SADMM. Then, for \(\lambda \in \mathbb {R}^n\) , one has
where \(\tilde {\lambda }^k := \hat {\lambda }^k - \rho _k(A\hat {u}^{k+1} + B\hat {v}^k - c)\) and \(\tilde {\ell }_{\gamma }(\lambda ) := \varphi _{\gamma }(\tilde {\lambda }^k) + \langle \nabla {\varphi _{\gamma }}(\tilde {\lambda }^k), \lambda - \tilde {\lambda }^k\rangle + \psi (\lambda )\).
Now, we can prove Lemma 6. We still use the same notations as in the proof of Lemma 3. In addition, let us denote by \(\hat {u}^{*}_{k+1} := u^{\ast }_{\gamma _{k+1}}(A^{\top }\hat {\lambda }^k)\) and \(\bar {u}^{\ast }_{k+1} := u^{\ast }_{\gamma _{k+1}}(A^{\top }\bar {\lambda }^k)\) given in (4.12), \(\tilde {z}^k := A\hat {u}^{k+1} + B\hat {v}^k - c\) and \(\breve {D}_k := \Vert A\hat {u}^{*}_{k+1} + B\hat {v}^k - c\Vert \).
First, since \(\varphi _{\gamma }(\tilde {\lambda }^k) + \langle \nabla {\varphi _{\gamma }}(\tilde {\lambda }^k), \lambda - \tilde {\lambda }^k\rangle \leq \varphi _{\gamma }(\lambda )\), it follows from Lemma 9 that
Next, using [26, Theorem 2.1.5 (2.1.10)] with \(g^{\ast }_{\gamma }\) defined in (4.11) and \(\lambda := (1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k\) for any τ k ∈ [0, 1], we have
Since ψ is convex, we also have \(\psi (\lambda ) \leq (1-\tau _k)\psi (\bar {\lambda }^k) + \tau _k\psi (\hat {\lambda }^k)\) and \(\lambda - \hat {\lambda }^k = (1-\tau _k)\bar {\lambda }^k + \tau _k\hat {\lambda }^k - \hat {\lambda }^k = \tau _k(\hat {\lambda }^k - \lambda ^{\ast }_k)\) due to (4.33). Combining these expressions, the definition d γ := φ γ + ψ, (4.69), and (4.70), we can derive
On the one hand, since \(\hat {u}^{k+1}\) is the solution of the first convex subproblem in SADMM, using its optimality condition, we can show that
On the other hand, similar to the proof of Lemma 3, we can show that
Combining (4.72) and (4.73) and noting that d γ := φ γ + ψ, we have
Next, using the strong convexity of \(b_{\mathcal {U}}\) with \(\mu _{b_{\mathcal {U}}} = 1\), we can show that
Combining (4.71), (4.59), (4.74) and (4.75), we can derive
From SADMM, we have \(\bar {\lambda }^{k+1} - \hat {\lambda }^k = -\eta _k\hat {z}^{k+1}\) and \(\tilde {\lambda }^k - \hat {\lambda }^k = -\rho _k\tilde {z}^k\). Plugging these expressions and (4.77) into (4.76) we can simplify this estimate as
Using again the elementary inequality \(\nu \Vert a\Vert ^2 + \kappa \Vert b\Vert ^2 \geq \frac {\nu \kappa }{\nu +\kappa }\Vert a - b\Vert ^2\), under the condition \(\gamma _{k+1} \geq \Vert A\Vert ^2\left (\eta _k + \frac {\rho _k}{\tau _k}\right )\) in (4.34), we can show that
On the other hand, similar to the proof of Lemma 3, we can show that \(\frac {\eta _k}{4}\Vert \hat {z}^{k+1}\Vert ^2 - \frac {\tau _k\eta _k}{2}\Vert \hat {z}^{k+1}\Vert D_k \geq - \frac {\eta _k\tau _k^2}{4}D_k^2\). Using this inequality, (4.79), and \(\lambda ^{*}_k = -\frac {1}{\beta _k}\bar {z}^k\), we can simplify (4.78) as
Since β k+1 ≥ (1 − τ k)β k due to (4.34), similar to the proof of (4.62) we have
Combining (4.80) and (4.81), we get
Next, we estimate \(\hat {R}_k\) defined by (4.77) as follows. We define \(\bar {a}_k := \bar {u}^{*}_{k+1} - \bar {u}_c\), \(\hat {a}_k := \hat {u}^{*}_{k+1} - \bar {u}_c\). Using \(b_{\mathcal {U}}(\bar {u}^{\ast }_{k+1}, \bar {u}^c) \leq \frac {L_b}{2}\Vert \bar {u}^{\ast }_{k+1} - \bar {u}^c\Vert ^2\), we can write \(\hat {R}_k\) explicitly as
Since \(\gamma _{k+1} \geq \left (\frac {3-2\tau _k}{3 - (2-L_b^{-1})\tau _k}\right )\gamma _k\) due to (4.34), it is easy to show that \(\hat {R}_k \geq 0\). In addition, by (4.34), we also have \((1 + 2\tau _k)\eta _k - \frac {2\tau _k^2}{(1 - \tau _k)\beta _k} \geq 0\). Using these conditions, we can show from (4.82) that \(\varDelta {G}_k \geq - \frac {\eta _k\tau _k^2}{4}D_k^2 - \frac {\tau _k\rho _k}{2}\breve {D}_k^2 \geq -\left (\frac {\tau _k^2\eta _k}{4} + \frac {\tau _k\rho _k}{2}\right )D_f^2\), which is indeed the gap reduction condition (4.35). \(\square \)
4.1.4.2 Proof of Lemma 7: Parameter Updates
Similar to the proof of Lemma 5, we can show that the optimal rate of \(\left \{\tau _k\right \}\) is \(\mathcal {O}(1/k)\). From the conditions (4.34), it is clear that if we choose \(\tau _k := \frac {3}{k+4}\) then \(0 < \tau _k \leq \frac {3}{4} < 1\) for k ≥ 0. Next, we choose \(\gamma _{k+1} := \left (\frac {3-2\tau _k}{3-\tau _k}\right )\gamma _k\). Then γ k satisfies (4.34). Substituting \(\tau _k = \frac {3}{k+4}\) into this formula we have \(\gamma _{k+1} = \left (\frac {k+2}{k+3}\right )\gamma _k\). By induction, we obtain \(\gamma _{k+1} = \frac {3\gamma _1}{k+3}\). Now, we choose \(\eta _k := \frac {\gamma _{k+1}}{2\Vert A\Vert ^2} = \frac {3\gamma _1}{2\Vert A\Vert ^2(k+3)}\). Then, from the last condition of (4.34), we choose \(\rho _k := \frac {\tau _k\gamma _{k+1}}{2\Vert A\Vert ^2} = \frac {9\gamma _1}{2\Vert A\Vert ^2(k+3)(k+4)}\).
To derive an update for β k, from the third condition of (4.34) with equality, we can derive \(\beta _k = \frac {2\tau _k^2}{(1-\tau _k)(1+2\tau _k)\eta _k} = \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)} < \frac {9\Vert A\Vert ^2}{5\gamma _1(k+1)}\). We need to check the second condition β k+1 ≥ (1 − τ k)β k in (4.34). Indeed, we have \(\beta _{k+1} = \frac {6\Vert A\Vert ^2(k+4)}{\gamma _1(k+2)(k+11)} \geq (1 - \tau _k)\beta _k = \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)}\), which is true for all k ≥ 0. Hence, the second condition of (4.34) holds. \(\square \)
4.1.4.3 Proof of Theorem 2: Convergence of Algorithm 2
First, we check the conditions of Lemma 4. From the update rule (4.36), we have \(\eta _0 = \frac {\gamma _1}{2\Vert A\Vert ^2}\) and \(\beta _1 = \frac {12\Vert A\Vert ^2}{11\gamma _1}\). Hence, 5γ 1 = 10∥A∥2η 0 > 2∥A∥2η 0, which satisfies the first condition of Lemma 4. Now, \(\frac {2\gamma _1}{(5\gamma _1-2\eta _0\Vert A\Vert ^2)\eta _0} = \frac {\Vert A\Vert ^2}{\gamma _1} < \frac {12\Vert A\Vert ^2}{11\gamma _1} = \beta _1\). Hence, the second condition of Lemma 4 holds.
Next, since \(\tau _k = \frac {3}{k+4}\), \(\rho _k = \frac {9\gamma _1}{2\Vert A\Vert ^2(k+3)(k+4)}\) and \(\eta _k = \frac {3\gamma _1}{2\Vert A\Vert ^2(k+3)}\), we can derive
Substituting this inequality into (4.35) and rearrange the result we obtain
By induction, we obtain \(G_k(\bar {w}^k) - \frac {81\gamma _1D_f^2}{8\Vert A\Vert ^2(k+2)(k+3)} \leq \omega _k\Big [G_0(\bar {w}^0) -\frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\Big ] \leq 0\) as long as \(G_0(\bar {w}^0) \leq \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\). Now using Lemma 4, we have \(G_0(\bar {w}^0) \leq \frac {\eta _0}{4}D_f^2 = \frac {\gamma _1}{8\Vert A\Vert ^2}D_f^2 < \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2}\). Hence, \(G_k(\bar {w}^k) \leq \frac {27\gamma _1D_f^2}{16\Vert A\Vert ^2(k+2)(k+3)}\).
Finally, by using Lemma 2 with \(\beta _k := \frac {6\Vert A\Vert ^2(k+3)}{\gamma _1(k+1)(k+10)}\) and \(\beta _k \leq \frac {9\Vert A\Vert ^2}{5\gamma _1(k+1)}\), and simplifying the results, we obtain the bounds in (4.37). If we choose γ 1 := ∥A∥ then, we obtain the worst-case iteration-complexity of Algorithm 2 is \(\mathcal {O}(\varepsilon ^{-1})\). \(\square \)
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Tran-Dinh, Q., Cevher, V. (2018). Smoothing Alternating Direction Methods for Fully Nonsmooth Constrained Convex Optimization. In: Giselsson, P., Rantzer, A. (eds) Large-Scale and Distributed Optimization. Lecture Notes in Mathematics, vol 2227. Springer, Cham. https://doi.org/10.1007/978-3-319-97478-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-97478-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97477-4
Online ISBN: 978-3-319-97478-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)