Skip to main content
Log in

Generalized alternating direction method of multipliers: new theoretical insights and applications

  • Full Length Paper
  • Published:
Mathematical Programming Computation Aims and scope Submit manuscript

Abstract

Recently, the alternating direction method of multipliers (ADMM) has received intensive attention from a broad spectrum of areas. The generalized ADMM (GADMM) proposed by Eckstein and Bertsekas is an efficient and simple acceleration scheme of ADMM. In this paper, we take a deeper look at the linearized version of GADMM where one of its subproblems is approximated by a linearization strategy. This linearized version is particularly efficient for a number of applications arising from different areas. Theoretically, we show the worst-case \({\mathcal {O}}(1/k)\) convergence rate measured by the iteration complexity (\(k\) represents the iteration counter) in both the ergodic and a nonergodic senses for the linearized version of GADMM. Numerically, we demonstrate the efficiency of this linearized version of GADMM by some rather new and core applications in statistical learning. Code packages in Matlab for these applications are also developed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. As well known in [2, 8, 14, 25], \(\alpha \in (1,2)\) usually results in acceleration for the GADMM. We thus do not report the numerical result when \(\alpha \in (0,1)\).

References

  1. Anderson, T.W.: An introduction to multivariate statistical analysis, 3rd edn. Wiley (2003)

  2. Bertsekas, D.P.: Constrained optimization and Lagrange multiplier methods. Academic Press, New York (1982)

    MATH  Google Scholar 

  3. Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 6, 989–1010 (2004)

    Article  MathSciNet  Google Scholar 

  4. Blum, E., Oettli, W.: Mathematische Optimierung. Grundlagen und Verfahren. Ökonometrie und Unternehmensforschung. Springer, Berlin (1975)

    Google Scholar 

  5. Boley, D.: Local linear convergence of ADMM on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)

    Article  Google Scholar 

  7. Cai, T.T., Liu, W.: A Direct estimation approach to sparse linear discriminant analysis. J. Amer. Stat. Assoc. 106, 1566–1577 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  8. Cai, X., Gu, G., He, B., Yuan, X.: A proximal point algorithm revisit on alternating direction method of multipliers. Sci. China Math. 56(10), 2179–2186 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  9. Candès, E.J., Tao, T.: The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann. Stat. 35, 2313–2351 (2007)

    Article  MATH  Google Scholar 

  10. Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53, 406–413 (2011)

    Article  MathSciNet  Google Scholar 

  11. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. Manuscript (2012)

  12. Eckstein, J.: Parallel alternating direction multiplier decomposition of convex programs. J. Optim. Theory Appli. 80(1), 39–62 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  13. Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Report RRR 32–2012 (2012)

  14. Eckstein, J., Bertsekas, D.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  15. Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36, 2605–2037 (2008)

    Article  MATH  Google Scholar 

  16. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  17. Fan, J., Feng, Y., Tong, X.: A road to classification in high dimensional space: the regularized optimal affine discriminant. J. R. Stat. Soc. Series B Stat. Methodol. 74, 745–771 (2012)

    Article  MathSciNet  Google Scholar 

  18. Fan, J., Zhang, J., Yu, K.: Vast portfolio selection with gross-exposure constraints. J. Am. Stat. Assoc. 107, 592–606 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  19. Fazeland, M., Hindi, H., Boyd, S.: A rank minimization heuristic with application to minimum order system approximation. Proc. Am. Control Conf. (2001)

  20. Fortin, M., Glowinski, R.: Augmented Lagrangian methods: applications to the numerical solutions of boundary value problems Stud. Math. Appl. 15. NorthHolland, Amsterdam (1983)

    Google Scholar 

  21. Gabay, D.: Applications of the method of multipliers to variational inequalities, Augmented Lagrange Methods: applications to the solution of boundary-valued problems. Fortin, M. Glowinski, R. eds. North Holland pp. 299–331 (1983)

  22. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  23. Glowinski, R.: On alternating directon methods of multipliers: a historical perspective. Springer Proceedings of a Conference Dedicated to J. Periaux (to appear)

  24. Glowinski, R., Marrocco, A.: Approximation par éléments finis d’ordre un et résolution par pénalisation-dualité d’une classe de problèmes non linéaires. R.A.I.R.O., R2, pp. 41–76 (1975)

  25. Gol’shtein, E.G., Tret’yakov, N.V.: Modified Lagrangian in convex programming and their generalizations. Math. Program. Study 10, 86–97 (1979)

    Article  MathSciNet  Google Scholar 

  26. Grier, H.E., Krailo, M.D., Tarbell, N.J., Link, M.P., Fryer, C.J., Pritchard, D.J., Gebhardt, M.C., Dickman, P.S., Perlman, E.J., Meyers, P.A.: Addition of ifosfamide and etoposide to standard chemotherapy for Ewing’s sarcoma and primitive neuroectodermal tumor of bone. New Eng. J. Med. 348, 694–701 (2003)

  27. Han, D., Yuan, X.: Local linear convergence of the alternating direction method of multipliers for quadratic programs. SIAM J. Numer. Anal. 51(6), 3446–3457 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  28. Hans, C.P., Weisenburger, D.D., Greiner, T.C., Gascone, R.D., Delabie, J., Ott, G., M’uller-Hermelink, H., Campo, E., Braziel, R., Elaine, S.: Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood 103, 275–282 (2004)

  29. He, B., Liao, L.-Z., Han, D.R., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92, 103–118 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  30. He, B., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Let. 23, 151–161 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  31. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  32. He, B., Yuan, X.: On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numerische Mathematik (to appear)

  33. He, B., Yuan, X.: On convergence rate of the Douglas–Rachford operator splitting method. Math. Program (to appear)

  34. Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 302–320 (1969)

    Google Scholar 

  35. James, G.M., Paulson, C., Rusmevichientong, P.: The constrained LASSO. Manuscript (2012)

  36. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operator. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  37. Martinet, B.: Regularisation, d’inéquations variationelles par approximations succesives. Rev. Francaise d’Inform. Recherche Oper. 4, 154–159 (1970)

    MATH  MathSciNet  Google Scholar 

  38. McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242–253 (2010)

    Article  Google Scholar 

  39. Nemirovsky, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization, Wiley-Interscience series in discrete mathematics. Wiley, New York (1983)

    Google Scholar 

  40. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/{k^2})\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  41. Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM J. Sci. Comput. 33(4), 1643–1668 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  42. Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press (1969)

  43. Shao, J., Wang, Y., Deng, X., Wang, S.: Sparse linear discriminant analysis by thresholding for high dimensional data. Ann. Stat. 39, 1241–1265 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  44. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  45. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Series B Stat. Methodol. 67, 91–108 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  46. Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39, 1335–1371 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  47. Wang, L., Zhu, J., Zou, H.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24, 412–419 (2008)

    Article  Google Scholar 

  48. Wang, X., Yuan, X.: The linearized alternating direction method of multipliers for Dantzig Selector. SIAM J. Sci. Comput. 34, 2782–2811 (2012)

    MathSciNet  Google Scholar 

  49. Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc. Series B Stat. Methodol. 73, 753–772 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  50. Yang, J., Yuan, X.: Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math. Comput. 82, 301–329 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  51. Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MATH  Google Scholar 

  52. Zhang, X.Q., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 6, 20–46 (2010)

    MathSciNet  Google Scholar 

  53. Zhang, X.Q., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imag. Sci. 3(3), 253–276 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  54. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoming Yuan.

Additional information

Xiaoming Yuan: This author was supported by the Faculty Research Grant from HKBU: FRG2/13-14/061 and the General Research Fund from Hong Kong Research Grants Council: 203613.

Bingsheng He: This author was supported by the NSFC Grant 11471156.

Appendices

Appendices

We show that our analysis in Sects. 3 and 4 can be extended to the case where both the \(\mathbf {x}\)- and \(\mathbf {y}\)-subproblems in (3) are linearized. The resulting scheme, called doubly linearized version of the GADMM (“DL-GADMM” for short), reads as

$$\begin{aligned} \mathbf {x}^{t+1}&= ~ \mathop {{\mathrm {argmin}}}_{\mathbf {x}\in {\mathcal {X}}}\Big \{f_1(\mathbf {x})-\mathbf {x}^T\mathbf {A}^T{\gamma }^t + \frac{\rho }{2}\Vert \mathbf {A}\mathbf {x}+\mathbf {B}\mathbf {y}^t-\mathbf {b}\Vert ^2 + \frac{1}{2}\Vert \mathbf {x}- \mathbf {x}^t \Vert ^2_{\mathbf {G}_1}\Big \}, \quad \nonumber \\ \mathbf {y}^{t+1}&= ~ \mathop {{\mathrm {argmin}}}_{\mathbf {y}\in {\mathcal {Y}}} \Big \{f_2(\mathbf {y}) -\mathbf {y}^T\mathbf {B}^T{\gamma }^t + \frac{\rho }{2} \left\| \alpha \mathbf {A}\mathbf {x}^{t+1}\right. \nonumber \\&\left. +(1-\alpha )(\mathbf {b}-\mathbf {B}\mathbf {y}^t) +\mathbf {B}\mathbf {y}-\mathbf {b}\right\| ^2 + \frac{1}{2}\Vert \mathbf {y}-\mathbf {y}^t\Vert ^2_{\mathbf {G}_2}\Big \}, \nonumber \\ {\gamma }^{t+1}&= ~ {\gamma }^t-\rho \Big (\alpha \mathbf {A}\mathbf {x}^{t+1} +(1-\alpha )(\mathbf {b}-\mathbf {B}\mathbf {y}^t) + \mathbf {B}\mathbf {y}^{t+1}-\mathbf {b}\Big ), \end{aligned}$$
(79)

where the matrices \(\mathbf {G}_1\in {\mathbb {R}}^{n_1\times n_1}\) and \(\mathbf {G}_2\in {\mathbb {R}}^{n_2\times n_2}\) are both symmetric and positive definite.

For further analysis, we define two matrices, which are analogous to \(\mathbf {H}\) and \(\mathbf {Q}\) in (11), respectively, as

$$\begin{aligned} \begin{aligned} \mathbf {H}_2&= \begin{pmatrix} \mathbf {G}_1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad \frac{\rho }{\alpha }\mathbf {B}^T\mathbf {B}+\mathbf {G}_2 &{}\quad \frac{1-\alpha }{\alpha }\mathbf {B}^T\\ 0 &{}\quad \frac{1-\alpha }{\alpha }\mathbf {B}&{}\quad \frac{1}{\alpha \rho }\mathbf {I}_n \end{pmatrix}, \\ \mathbf {Q}_2&= \begin{pmatrix} \mathbf {G}_1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad \rho \mathbf {B}^T\mathbf {B}+\mathbf {G}_2 &{}\quad (1-\alpha )\mathbf {B}^T\\ 0 &{}\quad -\mathbf {B}&{}\quad \frac{1}{\rho }\mathbf {I}_n \end{pmatrix}. \end{aligned} \end{aligned}$$
(80)

Obviously, we have

$$\begin{aligned} \mathbf {Q}_2 = \mathbf {H}_2\mathbf {M}, \end{aligned}$$
(81)

where \(\mathbf {M}\) is defined in (10). Note that the equalities (8) and (9) still hold.

1.1 A worst-case \({\mathcal {O}}(1/k)\) convergence rate in the ergodic sense for (79)

We first establish a worst-case \({\mathcal {O}}(1/k)\) convergence rate in the ergodic sense for the DL-GADMM (79). Indeed, using the relationship (81), the resulting proof is nearly the same as that in Sect. 3 for the L-GADMM (4). We thus only list two lemmas (analogous to Lemmas 1 and 2) and one theorem (analogous to Theorem 2) to demonstrate a worst-case \({\mathcal {O}}(1/k)\) convergence rate in the ergodic sense for (79), and omit the details of proofs.

Lemma 7

Let the sequence \(\{\mathbf {w}^t\}\) be generated by the DL-GADMM (79) with \(\alpha \in (0,2)\) and the associated sequence \(\{\widetilde{\mathbf {w}}^t\}\) be defined in (7). Then we have

$$\begin{aligned} \begin{aligned} f(\mathbf {u}) - f(\widetilde{\mathbf {u}}^t) +\left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^TF(\widetilde{\mathbf {w}}^t) \ge \left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T\mathbf {Q}_2\left( \mathbf {w}^{t}-\widetilde{\mathbf {w}}^t\right) , \;\; \forall \mathbf {w}\in \varOmega , \end{aligned} \end{aligned}$$
(82)

where \(\mathbf {Q}_2\) is defined in (80).

Lemma 8

Let the sequence \(\{\mathbf {w}^t\}\) be generated by the DL-GADMM (79) with \(\alpha \in (0,2)\) and the associated sequence \(\{\widetilde{\mathbf {w}}^t\}\) be defined in (7). Then for any \(\mathbf {w}\in \varOmega \), we have

$$\begin{aligned}&\left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T\mathbf {Q}_2\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) \nonumber \\&\quad = \frac{1}{2}\left( \Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}-\mathbf {w}^t\Vert _{\mathbf {H}_2}^2\right) + \frac{1}{2}\Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2\nonumber \\&\quad \quad + \frac{1}{2}\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{2-\alpha }{2\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2. \end{aligned}$$
(83)

Theorem 7

Let \(\mathbf {H}_2\) be given by (80) and \(\{\mathbf {w}^t\}\) be the sequence generated by the DL-GADMM (79) with \(\alpha \in (0,2)\). For any integer \(k>0\), let \(\widehat{\mathbf {w}}_k\) be defined by

$$\begin{aligned} \widehat{\mathbf {w}}_k = \frac{1}{k+1} \sum _{t=0}^k \widetilde{\mathbf {w}}^t, \end{aligned}$$
(84)

where \(\widetilde{\mathbf {w}}^t\) is defined in (7). Then, \(\widehat{\mathbf {w}}_k\in \varOmega \) and

$$\begin{aligned} f(\widehat{\mathbf {u}}_k) -f(\mathbf {u}) + \left( \widehat{\mathbf {w}}_{k}-\mathbf {w}\right) ^T F(\mathbf {w}) \le \frac{1}{2(k+1)}\Vert \mathbf {w}-\mathbf {w}^0\Vert _{\mathbf {H}_2}^2,\quad \forall \mathbf {w}\in \varOmega . \end{aligned}$$

1.2 A worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for (79)

Next, we prove a worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for the DL-GADMM (79). Note that Lemma 4 still holds by replacing \(\mathbf {H}\) with \(\mathbf {H}_2\). That is, if \(\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 = 0\), \(\widetilde{\mathbf {w}}^t\) defined in (7) is an optimal solution point to (5). Thus, for the sequence \(\{\mathbf {w}^t\}\) generated by the DL-GADMM (79), it is reasonable to measure the accuracy of an iterate by \(\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2\).

Proofs of the following two lemmas are analogous to those of Lemmas 5 and 6, respectively. We thus omit them.

Lemma 9

Let the sequence \(\{\mathbf {w}^t\}\) be generated by the DL-GADMM (79) with \(\alpha \in (0,2)\) and the associated \(\{ \widetilde{\mathbf {w}}^t\}\) be defined in (7); the matrix \(\mathbf {Q}_2\) be defined in (80). Then, we have

$$\begin{aligned} \left( \widetilde{\mathbf {w}}^t - \widetilde{\mathbf {w}}^{t+1}\right) ^T\mathbf {Q}_2 \left[ \left( \mathbf {w}^t-\mathbf {w}^{t+1}\right) - \left( \widetilde{\mathbf {w}}^t-\widetilde{\mathbf {w}}^{t+1}\right) \right] \ge 0. \end{aligned}$$

Lemma 10

Let the sequence \(\{\mathbf {w}^t\}\) be generated by the DL-GADMM (79) with \(\alpha \in (0,2)\) and the associated \(\{ \widetilde{\mathbf {w}}^t\}\) be defined in (7); the matrices \(\mathbf {M}\), \(\mathbf {H}_2\), \(\mathbf {Q}_2\) be defined in (10) and (80). Then, we have

$$\begin{aligned} \begin{aligned}&\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) ^T\mathbf {M}^T\mathbf {H}_2\mathbf {M}\left[ \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) -\left( \mathbf {w}^{t+1}- \widetilde{\mathbf {w}}^{t+1}\right) \right] \\&\quad \ge \frac{1}{2} \left\| \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) - \left( \mathbf {w}^{t+1} -\widetilde{\mathbf {w}}^{t+1}\right) \right\| ^2_{\left( \mathbf {Q}_2^T+\mathbf {Q}_2\right) }. \end{aligned} \end{aligned}$$

Based on the above two lemmas, we see that the sequence \(\{\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}\}\) is monotonically non-increasing. That is, we have the following theorem.

Theorem 8

Let the sequence \(\{\mathbf {w}^t\}\) be generated by the DL-GADMM (79) and the matrix \(\mathbf {H}_2\) be defined in (80). Then, we have

$$\begin{aligned} \Vert \mathbf {w}^{t+1} - \mathbf {w}^{t+2}\Vert _{\mathbf {H}_2}^2 \le \Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2. \end{aligned}$$

Note that for the DL-GADMM (79), the \(\mathbf {y}\)-subproblem is also proximally regularized, and we can not extend the inequality (31) to this new case. This is indeed the main difficulty for proving a worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for the DL-GADMM (79). A more elaborated analysis is needed. Let us show one lemma first to bound the left-hand side in (31).

Lemma 11

Let \(\{\mathbf {y}^t\}\) be the sequence generated by the DL-GADMM (79) with \(\alpha \in (0,2)\). Then, we have

$$\begin{aligned} \left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) \mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right) \ge \frac{1}{2} \Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2 -\frac{1}{2}\Vert \mathbf {y}^{t-1}-\mathbf {y}^t\Vert _{\mathbf {G}_2}^2. \end{aligned}$$
(85)

Proof

It follows from the optimality condition of the \(\mathbf {y}\)-subproblem in (79) that

$$\begin{aligned} f_2(\mathbf {y}) - f_2(\mathbf {y}^{t+1}) + \left( \mathbf {y}- \mathbf {y}^{t+1}\right) ^T \big [-\mathbf {B}^T{\gamma }^{t+1} + \mathbf {G}_2(\mathbf {y}^{t+1}-\mathbf {y}^t)\big ] \ge 0, \quad \forall \mathbf {y}\in {\mathcal {Y}}. \end{aligned}$$
(86)

Similarly, we also have,

$$\begin{aligned} f_2(\mathbf {y}) - f_2(\mathbf {y}^{t}) + (\mathbf {y}- \mathbf {y}^{t})^T \big [-\mathbf {B}^T{\gamma }^{t} + \mathbf {G}_2(\mathbf {y}^{t}-\mathbf {y}^{t-1})\big ] \ge 0, \quad \forall \mathbf {y}\in {\mathcal {Y}}. \end{aligned}$$
(87)

Setting \(\mathbf {y}= \mathbf {y}^{t}\) in (86) and \(\mathbf {y}=\mathbf {y}^{t+1}\) in (87), and summing them up, we have

$$\begin{aligned} \begin{aligned} \left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) \mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right)&\ge (\mathbf {y}^{t+1}-\mathbf {y}^t)\mathbf {G}_2 (\mathbf {y}^{t+1} - \mathbf {y}^t + \mathbf {y}^{t-1} -\mathbf {y}^{t})\\&\ge \Vert \mathbf {y}^t\!{-}\mathbf {y}^{t{+}1}\Vert _{\mathbf {G}_2}^2 \!{-}\! \frac{1}{2}\Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2 {-} \frac{1}{2}\Vert \mathbf {y}^{t-1}-\mathbf {y}^{t}\Vert _{\mathbf {G}_2}^2\\&= \frac{1}{2}\Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2 - \frac{1}{2}\Vert \mathbf {y}^{t-1}-\mathbf {y}^{t}\Vert _{\mathbf {G}_2}^2, \end{aligned} \end{aligned}$$

where the second inequality holds by the fact that \(\mathbf {a}^T\mathbf {b}\ge - \frac{1}{2}(\Vert \mathbf {a}\Vert ^2 + \Vert \mathbf {b}\Vert ^2)\). The assertion (85) is proved. \(\square \)

Two more lemmas should be proved in order to establish a worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for the DL-GADMM (79).

Lemma 12

The sequence \(\{\mathbf {w}^t\}\) generated by the DL-GADMM (79) with \(\alpha \in (0,2)\) and the associated \(\{ \widetilde{\mathbf {w}}^t\}\) be defined in (7), then we have

$$\begin{aligned} \begin{aligned} c_\alpha \left( \Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2+\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2\right) \le \Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \end{aligned} \end{aligned}$$
(88)

where \(c_\alpha \) is defined in (37).

Proof

By the definition of \(\mathbf {Q}_2\), \(\mathbf {M}\) and \(\mathbf {H}_2\), we have

$$\begin{aligned} \begin{aligned}&\ \ \Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2\\&\quad =\Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2+\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{2-\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^{t}\Vert ^2\\&\quad \ge \min \left\{ \frac{2-\alpha }{\alpha }, 1\right\} \left( \Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2+\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2\right) , \end{aligned} \end{aligned}$$

which implies the assertion (88) immediately. \(\square \)

In the next lemma, we refine the bound of \((\mathbf {w}-\widetilde{\mathbf {w}}^t)^T\mathbf {Q}_2(\mathbf {w}^{t}-\widetilde{\mathbf {w}}^t)\) in (82). The refined bound consists of the terms \(\Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2\) recursively, which is favorable for establishing a worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for the DL-GADMM (79).

Lemma 13

Let \(\{\mathbf {w}^t\}\) be the sequence generated by the DL-GADMM (79) with \(\alpha \in (0,2)\). Then, \(\widetilde{\mathbf {w}}^t\in \varOmega \) and

$$\begin{aligned} \begin{aligned} f(\mathbf {u}) - f(\mathbf {u}^t) + (\mathbf {w}-\widetilde{\mathbf {w}})^TF(\mathbf {w})&\ge \frac{1}{2}\big (\Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert ^2_{\mathbf {H}_2} - \Vert \mathbf {w}- \mathbf {w}^t\Vert _{\mathbf {H}_2}^2\big )\\&\quad + \frac{1}{2}\Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2, \quad \forall \mathbf {w}\in \varOmega , \end{aligned} \end{aligned}$$
(89)

where \(\mathbf {M}\) is defined in (10), and \(\mathbf {H}_2\) and \(\mathbf {Q}_2\) are defined in (80).

Proof

By the identity \(\mathbf {Q}_2(\mathbf {w}^t - \widetilde{\mathbf {w}}^t) = \mathbf {H}_2(\mathbf {w}^t - \mathbf {w}^{t+1})\), it holds that

$$\begin{aligned} \left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T \mathbf {Q}_2 \left( \mathbf {w}^t - \widetilde{\mathbf {w}}^t\right) = \left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T \mathbf {H}_2\left( \mathbf {w}^t-\mathbf {w}^{t+1}\right) , \quad \forall \mathbf {w}\in \varOmega . \end{aligned}$$

Setting \(\mathbf {a}=\mathbf {w}\), \(\mathbf {b}=\widetilde{\mathbf {w}}^t\), \(\mathbf {c}=\mathbf {w}^t\) and \(\mathbf {d}= \mathbf {w}^{t+1}\) in the identity

$$\begin{aligned} (\mathbf {a}-\mathbf {b})^T\mathbf {H}_2(\mathbf {c}-\mathbf {d}) {=} \frac{1}{2}\left( \Vert \mathbf {a}{-}\mathbf {d}\Vert ^2_{\mathbf {H}_2}-\Vert \mathbf {a}{-}\mathbf {c}\Vert ^2_{\mathbf {H}_2}\right) {+} \frac{1}{2}\left( \Vert \mathbf {c}{-}\mathbf {b}\Vert ^2_{\mathbf {H}_2}-\Vert \mathbf {d}-\mathbf {b}\Vert ^2_{\mathbf {H}_2}\right) , \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned}&2\left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) \mathbf {Q}_2\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) \\&\quad = \Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}-\mathbf {w}^{t}\Vert _{\mathbf {H}_2}^2 + \Vert \mathbf {w}^t-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}^{t+1}-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2. \end{aligned} \end{aligned}$$
(90)

Meanwhile, we have

$$\begin{aligned} \begin{aligned} \Vert \mathbf {w}^t-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}^{t+1}-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2&= \Vert \mathbf {w}^t-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert (\mathbf {w}^t-\widetilde{\mathbf {w}}^{t}) - \left( \mathbf {w}^t-\mathbf {w}^{t+1}\right) \Vert _{\mathbf {H}_2}^2\\&= \Vert \mathbf {w}^t- \widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert (\mathbf {w}^t-\widetilde{\mathbf {w}}^{t}) - \mathbf {M}(\mathbf {w}^t-\widetilde{\mathbf {w}}^{t})\Vert _{\mathbf {H}_2}^2\\&= \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) (2\mathbf {H}_2\mathbf {M}- \mathbf {M}^T\mathbf {H}\mathbf {M})\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) \\&= \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) (\mathbf {Q}^T_2+ \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}\mathbf {M})\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) , \end{aligned} \end{aligned}$$

where the last equality comes from the identity \(\mathbf {Q}_2 = \mathbf {H}_2\mathbf {M}\).

Substituting the above identity into (90), we have, for all \(\mathbf {w}\in \varOmega \),

$$\begin{aligned} 2\left( \mathbf {w}\!-\!\widetilde{\mathbf {w}}^t\right) \mathbf {Q}_2\left( \mathbf {w}^t\!-\!\widetilde{\mathbf {w}}^t\right) \!=\! \Vert \mathbf {w}\!-\!\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 \!-\! \Vert \mathbf {w}\!-\!\mathbf {w}^{t}\Vert _{\mathbf {H}_2}^2 \!+\! \Vert \mathbf {w}^t\! -\! \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T \!+\! \mathbf {Q}_2 \!-\!\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \end{aligned}$$

Plugging this identity into (82), our claim follows immediately. \(\square \)

Then, we show the boundedness of the sequence \(\{\mathbf {w}^t\}\) generated by the DL-GADMM (79), which essentially implies the convergence of \(\{\mathbf {w}^t\}\).

Theorem 9

Let \(\{\mathbf {w}^t\}\) be the sequence generated by the DL-GADMM (79) with \(\alpha \in (0,2)\). Then, it holds that

$$\begin{aligned} \sum _{t=0}^\infty \Vert \mathbf {w}^{t} -\widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \le \Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2 , \end{aligned}$$
(91)

where \(\mathbf {H}_2\) is defined in (80).

Proof

Setting \(\mathbf {w}= \mathbf {w}^*\) in (89), we have

$$\begin{aligned} \begin{aligned} f(\mathbf {u}^*) - f(\mathbf {u}^t) + \left( \mathbf {w}^*-\widetilde{\mathbf {w}}^t\right) ^TF(\mathbf {w}^*)&\ge \frac{1}{2}\big (\Vert \mathbf {w}^*-\mathbf {w}^{t+1}\Vert ^2_{\mathbf {H}_2} - \Vert \mathbf {w}^* - \mathbf {w}^t\Vert _{\mathbf {H}_2}^2\big )\\&\quad + \frac{1}{2}\Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2. \end{aligned} \end{aligned}$$

Then, recall (5), we have

$$\begin{aligned} \Vert \mathbf {w}^{t} -\widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2\le \Vert \mathbf {w}^{t} -\mathbf {w}^*\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}^{t+1} -\mathbf {w}^*\Vert _{\mathbf {H}_2}^2. \end{aligned}$$

It is easy to see that \(\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}\succeq {\varvec{0}}\). Thus, it holds

$$\begin{aligned} \sum _{t=0}^\infty \Vert \mathbf {w}^{t} -\widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \le \Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2, \end{aligned}$$

which completes the proof. \(\square \)

Finally, we establish a worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for the DL-GADMM (79).

Theorem 10

Let the sequence \(\{\mathbf {w}^t\}\) be generated by the scheme DL-GADMM (79) with \(\alpha \in (0,2)\). It holds that

$$\begin{aligned} \Vert \mathbf {w}^k-\mathbf {w}^{k+1}\Vert _{\mathbf {H}_2}^2 = {\mathcal {O}}(1/k). \end{aligned}$$
(92)

Proof

By the definition of \(\mathbf {H}_2\) in (80), we have

$$\begin{aligned} \ \Vert \mathbf {w}^t \!-\! \mathbf {w}^{t+1}\Vert ^2_{\mathbf {H}_2} \!&= \! \Vert \mathbf {x}^t \!-\! \widetilde{\mathbf {x}}^t\Vert ^2_{\mathbf {G}_1} \!+\!\Vert \mathbf {y}^t \!-\! \widetilde{\mathbf {y}}^t\Vert ^2_{\mathbf {G}_2}\!+\! \frac{1}{\alpha \rho } \left( \Vert \rho \mathbf {B}\left( \mathbf {y}^t \!-\! \mathbf {y}^{t+1}\right) \Vert ^2 \right. \nonumber \\&\!+\, \Vert {\gamma }^t\!-\!{\gamma }^{t+1}\Vert ^2 \left. + 2(1 - \alpha )\rho \left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) ^T\mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right) \right) \nonumber \\&= \Vert \mathbf {x}^t - \widetilde{\mathbf {x}}^t\Vert ^2_{\mathbf {G}_1} +\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert ^2_{\mathbf {G}_2}+\frac{\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2 \nonumber \\&-2\left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) ^T\mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right) . \end{aligned}$$
(93)

Using (85, 88, 91) and (93), we obtain

$$\begin{aligned} \begin{aligned} \sum _{t=1}^k \Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2&\le \frac{1}{c_\alpha }\sum _{t=1}^k\Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2\\&\quad \quad +\sum _{t=1}^k\left( \Vert \mathbf {y}^{t-1}-\mathbf {y}^t\Vert _{\mathbf {G}_2}^2-\Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2\right) \\&\le \frac{1}{c_\alpha }\Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2+\Vert \mathbf {y}^{0}-\mathbf {y}^1\Vert _{\mathbf {G}_2}^2. \end{aligned} \end{aligned}$$

By Theorem 8, the sequence \(\{\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2\}\) is non-increasing. Thus, we have

$$\begin{aligned} \begin{aligned} k \Vert \mathbf {w}^k - \mathbf {w}^{k+1} \Vert _{\mathbf {H}_2}^2\le&\sum _{t=1}^k \Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 \\ \le&\frac{1}{c_\alpha }\Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2+\Vert \mathbf {y}^{0}- \mathbf {y}^1\Vert _{\mathbf {G}_2}^2, \end{aligned} \end{aligned}$$

and the assertion (92) is proved. \(\square \)

Recall that for the sequence \(\{\mathbf {w}^t\}\) generated by the DL-GADMM (79), it is reasonable to measure the accuracy of an iterate by \(\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2\). Thus, Theorem 10 demonstrates a worst-case \({\mathcal {O}}(1\!/k)\) convergence rate in a nonergodic sense for the DL-GADMM (79).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, E.X., He, B., Liu, H. et al. Generalized alternating direction method of multipliers: new theoretical insights and applications. Math. Prog. Comp. 7, 149–187 (2015). https://doi.org/10.1007/s12532-015-0078-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12532-015-0078-2

Keywords

Mathematics Subject Classification

Navigation