Generalized alternating direction method of multipliers: new theoretical insights and applications

Fang, Ethan X.; He, Bingsheng; Liu, Han; Yuan, Xiaoming

doi:10.1007/s12532-015-0078-2

Generalized alternating direction method of multipliers: new theoretical insights and applications

Full Length Paper
Published: 06 February 2015

Volume 7, pages 149–187, (2015)
Cite this article

Mathematical Programming Computation Aims and scope Submit manuscript

Ethan X. Fang¹,
Bingsheng He²,
Han Liu¹ &
…
Xiaoming Yuan³

1734 Accesses
62 Citations
3 Altmetric
Explore all metrics

Abstract

Recently, the alternating direction method of multipliers (ADMM) has received intensive attention from a broad spectrum of areas. The generalized ADMM (GADMM) proposed by Eckstein and Bertsekas is an efficient and simple acceleration scheme of ADMM. In this paper, we take a deeper look at the linearized version of GADMM where one of its subproblems is approximated by a linearization strategy. This linearized version is particularly efficient for a number of applications arising from different areas. Theoretically, we show the worst-case ${\mathcal {O}}(1/k)$ convergence rate measured by the iteration complexity ($k$ represents the iteration counter) in both the ergodic and a nonergodic senses for the linearized version of GADMM. Numerically, we demonstrate the efficiency of this linearized version of GADMM by some rather new and core applications in statistical learning. Code packages in Matlab for these applications are also developed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

Article 28 January 2021

On Glowinski’s Open Question on the Alternating Direction Method of Multipliers

Article 10 July 2018

Self Equivalence of the Alternating Direction Method of Multipliers

Notes

As well known in [2, 8, 14, 25], $\alpha \in (1,2)$ usually results in acceleration for the GADMM. We thus do not report the numerical result when $\alpha \in (0,1)$.

References

Anderson, T.W.: An introduction to multivariate statistical analysis, 3rd edn. Wiley (2003)
Bertsekas, D.P.: Constrained optimization and Lagrange multiplier methods. Academic Press, New York (1982)
MATH Google Scholar
Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 6, 989–1010 (2004)
Article MathSciNet Google Scholar
Blum, E., Oettli, W.: Mathematische Optimierung. Grundlagen und Verfahren. Ökonometrie und Unternehmensforschung. Springer, Berlin (1975)
Google Scholar
Boley, D.: Local linear convergence of ADMM on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MATH MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)
Article Google Scholar
Cai, T.T., Liu, W.: A Direct estimation approach to sparse linear discriminant analysis. J. Amer. Stat. Assoc. 106, 1566–1577 (2011)
Article MATH MathSciNet Google Scholar
Cai, X., Gu, G., He, B., Yuan, X.: A proximal point algorithm revisit on alternating direction method of multipliers. Sci. China Math. 56(10), 2179–2186 (2013)
Article MATH MathSciNet Google Scholar
Candès, E.J., Tao, T.: The Dantzig selector: statistical estimation when $p$ is much larger than $n$. Ann. Stat. 35, 2313–2351 (2007)
Article MATH Google Scholar
Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53, 406–413 (2011)
Article MathSciNet Google Scholar
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. Manuscript (2012)
Eckstein, J.: Parallel alternating direction multiplier decomposition of convex programs. J. Optim. Theory Appli. 80(1), 39–62 (1994)
Article MATH MathSciNet Google Scholar
Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Report RRR 32–2012 (2012)
Eckstein, J., Bertsekas, D.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MATH MathSciNet Google Scholar
Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36, 2605–2037 (2008)
Article MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Fan, J., Feng, Y., Tong, X.: A road to classification in high dimensional space: the regularized optimal affine discriminant. J. R. Stat. Soc. Series B Stat. Methodol. 74, 745–771 (2012)
Article MathSciNet Google Scholar
Fan, J., Zhang, J., Yu, K.: Vast portfolio selection with gross-exposure constraints. J. Am. Stat. Assoc. 107, 592–606 (2012)
Article MATH MathSciNet Google Scholar
Fazeland, M., Hindi, H., Boyd, S.: A rank minimization heuristic with application to minimum order system approximation. Proc. Am. Control Conf. (2001)
Fortin, M., Glowinski, R.: Augmented Lagrangian methods: applications to the numerical solutions of boundary value problems Stud. Math. Appl. 15. NorthHolland, Amsterdam (1983)
Google Scholar
Gabay, D.: Applications of the method of multipliers to variational inequalities, Augmented Lagrange Methods: applications to the solution of boundary-valued problems. Fortin, M. Glowinski, R. eds. North Holland pp. 299–331 (1983)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: On alternating directon methods of multipliers: a historical perspective. Springer Proceedings of a Conference Dedicated to J. Periaux (to appear)
Glowinski, R., Marrocco, A.: Approximation par éléments finis d’ordre un et résolution par pénalisation-dualité d’une classe de problèmes non linéaires. R.A.I.R.O., R2, pp. 41–76 (1975)
Gol’shtein, E.G., Tret’yakov, N.V.: Modified Lagrangian in convex programming and their generalizations. Math. Program. Study 10, 86–97 (1979)
Article MathSciNet Google Scholar
Grier, H.E., Krailo, M.D., Tarbell, N.J., Link, M.P., Fryer, C.J., Pritchard, D.J., Gebhardt, M.C., Dickman, P.S., Perlman, E.J., Meyers, P.A.: Addition of ifosfamide and etoposide to standard chemotherapy for Ewing’s sarcoma and primitive neuroectodermal tumor of bone. New Eng. J. Med. 348, 694–701 (2003)
Han, D., Yuan, X.: Local linear convergence of the alternating direction method of multipliers for quadratic programs. SIAM J. Numer. Anal. 51(6), 3446–3457 (2013)
Article MATH MathSciNet Google Scholar
Hans, C.P., Weisenburger, D.D., Greiner, T.C., Gascone, R.D., Delabie, J., Ott, G., M’uller-Hermelink, H., Campo, E., Braziel, R., Elaine, S.: Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood 103, 275–282 (2004)
He, B., Liao, L.-Z., Han, D.R., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92, 103–118 (2002)
Article MATH MathSciNet Google Scholar
He, B., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Let. 23, 151–161 (1998)
Article MATH MathSciNet Google Scholar
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MATH MathSciNet Google Scholar
He, B., Yuan, X.: On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numerische Mathematik (to appear)
He, B., Yuan, X.: On convergence rate of the Douglas–Rachford operator splitting method. Math. Program (to appear)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 302–320 (1969)
Google Scholar
James, G.M., Paulson, C., Rusmevichientong, P.: The constrained LASSO. Manuscript (2012)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operator. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MATH MathSciNet Google Scholar
Martinet, B.: Regularisation, d’inéquations variationelles par approximations succesives. Rev. Francaise d’Inform. Recherche Oper. 4, 154–159 (1970)
MATH MathSciNet Google Scholar
McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242–253 (2010)
Article Google Scholar
Nemirovsky, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization, Wiley-Interscience series in discrete mathematics. Wiley, New York (1983)
Google Scholar
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate $O(1/{k^2})$. Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Ng, M.K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM J. Sci. Comput. 33(4), 1643–1668 (2011)
Article MATH MathSciNet Google Scholar
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press (1969)
Shao, J., Wang, Y., Deng, X., Wang, S.: Sparse linear discriminant analysis by thresholding for high dimensional data. Ann. Stat. 39, 1241–1265 (2011)
Article MATH MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Series B Stat. Methodol. 67, 91–108 (2005)
Article MATH MathSciNet Google Scholar
Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39, 1335–1371 (2011)
Article MATH MathSciNet Google Scholar
Wang, L., Zhu, J., Zou, H.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24, 412–419 (2008)
Article Google Scholar
Wang, X., Yuan, X.: The linearized alternating direction method of multipliers for Dantzig Selector. SIAM J. Sci. Comput. 34, 2782–2811 (2012)
MathSciNet Google Scholar
Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc. Series B Stat. Methodol. 73, 753–772 (2011)
Article MATH MathSciNet Google Scholar
Yang, J., Yuan, X.: Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math. Comput. 82, 301–329 (2013)
Article MATH MathSciNet Google Scholar
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MATH Google Scholar
Zhang, X.Q., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 6, 20–46 (2010)
MathSciNet Google Scholar
Zhang, X.Q., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imag. Sci. 3(3), 253–276 (2010)
Article MATH MathSciNet Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544, USA
Ethan X. Fang & Han Liu
International Centre of Management Science and Engineering, and Department of Mathematics, Nanjing University, Nanjing, 210093, China
Bingsheng He
Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong
Xiaoming Yuan

Authors

Ethan X. Fang
View author publications
You can also search for this author in PubMed Google Scholar
Bingsheng He
View author publications
You can also search for this author in PubMed Google Scholar
Han Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Yuan.

Additional information

Xiaoming Yuan: This author was supported by the Faculty Research Grant from HKBU: FRG2/13-14/061 and the General Research Fund from Hong Kong Research Grants Council: 203613.

Bingsheng He: This author was supported by the NSFC Grant 11471156.

Appendices

We show that our analysis in Sects. 3 and 4 can be extended to the case where both the $\mathbf {x}$- and $\mathbf {y}$-subproblems in (3) are linearized. The resulting scheme, called doubly linearized version of the GADMM (“DL-GADMM” for short), reads as

$$\begin{aligned} \mathbf {x}^{t+1}&= ~ \mathop {{\mathrm {argmin}}}_{\mathbf {x}\in {\mathcal {X}}}\Big \{f_1(\mathbf {x})-\mathbf {x}^T\mathbf {A}^T{\gamma }^t + \frac{\rho }{2}\Vert \mathbf {A}\mathbf {x}+\mathbf {B}\mathbf {y}^t-\mathbf {b}\Vert ^2 + \frac{1}{2}\Vert \mathbf {x}- \mathbf {x}^t \Vert ^2_{\mathbf {G}_1}\Big \}, \quad \nonumber \\ \mathbf {y}^{t+1}&= ~ \mathop {{\mathrm {argmin}}}_{\mathbf {y}\in {\mathcal {Y}}} \Big \{f_2(\mathbf {y}) -\mathbf {y}^T\mathbf {B}^T{\gamma }^t + \frac{\rho }{2} \left\| \alpha \mathbf {A}\mathbf {x}^{t+1}\right. \nonumber \\&\left. +(1-\alpha )(\mathbf {b}-\mathbf {B}\mathbf {y}^t) +\mathbf {B}\mathbf {y}-\mathbf {b}\right\| ^2 + \frac{1}{2}\Vert \mathbf {y}-\mathbf {y}^t\Vert ^2_{\mathbf {G}_2}\Big \}, \nonumber \\ {\gamma }^{t+1}&= ~ {\gamma }^t-\rho \Big (\alpha \mathbf {A}\mathbf {x}^{t+1} +(1-\alpha )(\mathbf {b}-\mathbf {B}\mathbf {y}^t) + \mathbf {B}\mathbf {y}^{t+1}-\mathbf {b}\Big ), \end{aligned}$$

(79)

where the matrices $\mathbf {G}_1\in {\mathbb {R}}^{n_1\times n_1}$ and $\mathbf {G}_2\in {\mathbb {R}}^{n_2\times n_2}$ are both symmetric and positive definite.

For further analysis, we define two matrices, which are analogous to $\mathbf {H}$ and $\mathbf {Q}$ in (11), respectively, as

$$\begin{aligned} \begin{aligned} \mathbf {H}_2&= \begin{pmatrix} \mathbf {G}_1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad \frac{\rho }{\alpha }\mathbf {B}^T\mathbf {B}+\mathbf {G}_2 &{}\quad \frac{1-\alpha }{\alpha }\mathbf {B}^T\\ 0 &{}\quad \frac{1-\alpha }{\alpha }\mathbf {B}&{}\quad \frac{1}{\alpha \rho }\mathbf {I}_n \end{pmatrix}, \\ \mathbf {Q}_2&= \begin{pmatrix} \mathbf {G}_1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad \rho \mathbf {B}^T\mathbf {B}+\mathbf {G}_2 &{}\quad (1-\alpha )\mathbf {B}^T\\ 0 &{}\quad -\mathbf {B}&{}\quad \frac{1}{\rho }\mathbf {I}_n \end{pmatrix}. \end{aligned} \end{aligned}$$

(80)

Obviously, we have

$$\begin{aligned} \mathbf {Q}_2 = \mathbf {H}_2\mathbf {M}, \end{aligned}$$

(81)

where $\mathbf {M}$ is defined in (10). Note that the equalities (8) and (9) still hold.

1.1 A worst-case ${\mathcal {O}}(1/k)$ convergence rate in the ergodic sense for (79)

We first establish a worst-case ${\mathcal {O}}(1/k)$ convergence rate in the ergodic sense for the DL-GADMM (79). Indeed, using the relationship (81), the resulting proof is nearly the same as that in Sect. 3 for the L-GADMM (4). We thus only list two lemmas (analogous to Lemmas 1 and 2) and one theorem (analogous to Theorem 2) to demonstrate a worst-case ${\mathcal {O}}(1/k)$ convergence rate in the ergodic sense for (79), and omit the details of proofs.

Lemma 7

Let the sequence $\{\mathbf {w}^t\}$ be generated by the DL-GADMM (79) with $\alpha \in (0,2)$ and the associated sequence $\{\widetilde{\mathbf {w}}^t\}$ be defined in (7). Then we have

$$\begin{aligned} \begin{aligned} f(\mathbf {u}) - f(\widetilde{\mathbf {u}}^t) +\left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^TF(\widetilde{\mathbf {w}}^t) \ge \left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T\mathbf {Q}_2\left( \mathbf {w}^{t}-\widetilde{\mathbf {w}}^t\right) , \;\; \forall \mathbf {w}\in \varOmega , \end{aligned} \end{aligned}$$

(82)

where $\mathbf {Q}_2$ is defined in (80).

Lemma 8

Let the sequence $\{\mathbf {w}^t\}$ be generated by the DL-GADMM (79) with $\alpha \in (0,2)$ and the associated sequence $\{\widetilde{\mathbf {w}}^t\}$ be defined in (7). Then for any $\mathbf {w}\in \varOmega $, we have

$$\begin{aligned}&\left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T\mathbf {Q}_2\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) \nonumber \\&\quad = \frac{1}{2}\left( \Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}-\mathbf {w}^t\Vert _{\mathbf {H}_2}^2\right) + \frac{1}{2}\Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2\nonumber \\&\quad \quad + \frac{1}{2}\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{2-\alpha }{2\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2. \end{aligned}$$

(83)

Theorem 7

Let $\mathbf {H}_2$ be given by (80) and $\{\mathbf {w}^t\}$ be the sequence generated by the DL-GADMM (79) with $\alpha \in (0,2)$. For any integer $k>0$, let $\widehat{\mathbf {w}}_k$ be defined by

$$\begin{aligned} \widehat{\mathbf {w}}_k = \frac{1}{k+1} \sum _{t=0}^k \widetilde{\mathbf {w}}^t, \end{aligned}$$

(84)

where $\widetilde{\mathbf {w}}^t$ is defined in (7). Then, $\widehat{\mathbf {w}}_k\in \varOmega $ and

$$\begin{aligned} f(\widehat{\mathbf {u}}_k) -f(\mathbf {u}) + \left( \widehat{\mathbf {w}}_{k}-\mathbf {w}\right) ^T F(\mathbf {w}) \le \frac{1}{2(k+1)}\Vert \mathbf {w}-\mathbf {w}^0\Vert _{\mathbf {H}_2}^2,\quad \forall \mathbf {w}\in \varOmega . \end{aligned}$$

1.2 A worst-case ${\mathcal {O}}(1/k)$ convergence rate in a nonergodic sense for (79)

Next, we prove a worst-case ${\mathcal {O}}(1/k)$ convergence rate in a nonergodic sense for the DL-GADMM (79). Note that Lemma 4 still holds by replacing $\mathbf {H}$ with $\mathbf {H}_2$. That is, if $\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 = 0$, $\widetilde{\mathbf {w}}^t$ defined in (7) is an optimal solution point to (5). Thus, for the sequence $\{\mathbf {w}^t\}$ generated by the DL-GADMM (79), it is reasonable to measure the accuracy of an iterate by $\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2$.

Proofs of the following two lemmas are analogous to those of Lemmas 5 and 6, respectively. We thus omit them.

Lemma 9

Let the sequence $\{\mathbf {w}^t\}$ be generated by the DL-GADMM (79) with $\alpha \in (0,2)$ and the associated $\{ \widetilde{\mathbf {w}}^t\}$ be defined in (7); the matrix $\mathbf {Q}_2$ be defined in (80). Then, we have

$$\begin{aligned} \left( \widetilde{\mathbf {w}}^t - \widetilde{\mathbf {w}}^{t+1}\right) ^T\mathbf {Q}_2 \left[ \left( \mathbf {w}^t-\mathbf {w}^{t+1}\right) - \left( \widetilde{\mathbf {w}}^t-\widetilde{\mathbf {w}}^{t+1}\right) \right] \ge 0. \end{aligned}$$

Lemma 10

Let the sequence $\{\mathbf {w}^t\}$ be generated by the DL-GADMM (79) with $\alpha \in (0,2)$ and the associated $\{ \widetilde{\mathbf {w}}^t\}$ be defined in (7); the matrices $\mathbf {M}$, $\mathbf {H}_2$, $\mathbf {Q}_2$ be defined in (10) and (80). Then, we have

$$\begin{aligned} \begin{aligned}&\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) ^T\mathbf {M}^T\mathbf {H}_2\mathbf {M}\left[ \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) -\left( \mathbf {w}^{t+1}- \widetilde{\mathbf {w}}^{t+1}\right) \right] \\&\quad \ge \frac{1}{2} \left\| \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) - \left( \mathbf {w}^{t+1} -\widetilde{\mathbf {w}}^{t+1}\right) \right\| ^2_{\left( \mathbf {Q}_2^T+\mathbf {Q}_2\right) }. \end{aligned} \end{aligned}$$

Based on the above two lemmas, we see that the sequence $\{\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}\}$ is monotonically non-increasing. That is, we have the following theorem.

Theorem 8

Let the sequence $\{\mathbf {w}^t\}$ be generated by the DL-GADMM (79) and the matrix $\mathbf {H}_2$ be defined in (80). Then, we have

$$\begin{aligned} \Vert \mathbf {w}^{t+1} - \mathbf {w}^{t+2}\Vert _{\mathbf {H}_2}^2 \le \Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2. \end{aligned}$$

Note that for the DL-GADMM (79), the $\mathbf {y}$-subproblem is also proximally regularized, and we can not extend the inequality (31) to this new case. This is indeed the main difficulty for proving a worst-case ${\mathcal {O}}(1/k)$ convergence rate in a nonergodic sense for the DL-GADMM (79). A more elaborated analysis is needed. Let us show one lemma first to bound the left-hand side in (31).

Lemma 11

Let $\{\mathbf {y}^t\}$ be the sequence generated by the DL-GADMM (79) with $\alpha \in (0,2)$. Then, we have

$$\begin{aligned} \left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) \mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right) \ge \frac{1}{2} \Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2 -\frac{1}{2}\Vert \mathbf {y}^{t-1}-\mathbf {y}^t\Vert _{\mathbf {G}_2}^2. \end{aligned}$$

(85)

Proof

It follows from the optimality condition of the $\mathbf {y}$-subproblem in (79) that

$$\begin{aligned} f_2(\mathbf {y}) - f_2(\mathbf {y}^{t+1}) + \left( \mathbf {y}- \mathbf {y}^{t+1}\right) ^T \big [-\mathbf {B}^T{\gamma }^{t+1} + \mathbf {G}_2(\mathbf {y}^{t+1}-\mathbf {y}^t)\big ] \ge 0, \quad \forall \mathbf {y}\in {\mathcal {Y}}. \end{aligned}$$

(86)

Similarly, we also have,

$$\begin{aligned} f_2(\mathbf {y}) - f_2(\mathbf {y}^{t}) + (\mathbf {y}- \mathbf {y}^{t})^T \big [-\mathbf {B}^T{\gamma }^{t} + \mathbf {G}_2(\mathbf {y}^{t}-\mathbf {y}^{t-1})\big ] \ge 0, \quad \forall \mathbf {y}\in {\mathcal {Y}}. \end{aligned}$$

(87)

Setting $\mathbf {y}= \mathbf {y}^{t}$ in (86) and $\mathbf {y}=\mathbf {y}^{t+1}$ in (87), and summing them up, we have

$$\begin{aligned} \begin{aligned} \left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) \mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right)&\ge (\mathbf {y}^{t+1}-\mathbf {y}^t)\mathbf {G}_2 (\mathbf {y}^{t+1} - \mathbf {y}^t + \mathbf {y}^{t-1} -\mathbf {y}^{t})\\&\ge \Vert \mathbf {y}^t\!{-}\mathbf {y}^{t{+}1}\Vert _{\mathbf {G}_2}^2 \!{-}\! \frac{1}{2}\Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2 {-} \frac{1}{2}\Vert \mathbf {y}^{t-1}-\mathbf {y}^{t}\Vert _{\mathbf {G}_2}^2\\&= \frac{1}{2}\Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2 - \frac{1}{2}\Vert \mathbf {y}^{t-1}-\mathbf {y}^{t}\Vert _{\mathbf {G}_2}^2, \end{aligned} \end{aligned}$$

where the second inequality holds by the fact that $\mathbf {a}^T\mathbf {b}\ge - \frac{1}{2}(\Vert \mathbf {a}\Vert ^2 + \Vert \mathbf {b}\Vert ^2)$. The assertion (85) is proved. $\square $

Two more lemmas should be proved in order to establish a worst-case ${\mathcal {O}}(1/k)$ convergence rate in a nonergodic sense for the DL-GADMM (79).

Lemma 12

The sequence $\{\mathbf {w}^t\}$ generated by the DL-GADMM (79) with $\alpha \in (0,2)$ and the associated $\{ \widetilde{\mathbf {w}}^t\}$ be defined in (7), then we have

$$\begin{aligned} \begin{aligned} c_\alpha \left( \Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2+\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2\right) \le \Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \end{aligned} \end{aligned}$$

(88)

where $c_\alpha $ is defined in (37).

Proof

By the definition of $\mathbf {Q}_2$, $\mathbf {M}$ and $\mathbf {H}_2$, we have

$$\begin{aligned} \begin{aligned}&\ \ \Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2\\&\quad =\Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2+\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{2-\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^{t}\Vert ^2\\&\quad \ge \min \left\{ \frac{2-\alpha }{\alpha }, 1\right\} \left( \Vert \mathbf {x}^t-\widetilde{\mathbf {x}}^{t}\Vert _{\mathbf {G}_1}^2+\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert _{\mathbf {G}_2}^2 + \frac{\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2\right) , \end{aligned} \end{aligned}$$

which implies the assertion (88) immediately. $\square $

In the next lemma, we refine the bound of $(\mathbf {w}-\widetilde{\mathbf {w}}^t)^T\mathbf {Q}_2(\mathbf {w}^{t}-\widetilde{\mathbf {w}}^t)$ in (82). The refined bound consists of the terms $\Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2$ recursively, which is favorable for establishing a worst-case ${\mathcal {O}}(1/k)$ convergence rate in a nonergodic sense for the DL-GADMM (79).

Lemma 13

Let $\{\mathbf {w}^t\}$ be the sequence generated by the DL-GADMM (79) with $\alpha \in (0,2)$. Then, $\widetilde{\mathbf {w}}^t\in \varOmega $ and

$$\begin{aligned} \begin{aligned} f(\mathbf {u}) - f(\mathbf {u}^t) + (\mathbf {w}-\widetilde{\mathbf {w}})^TF(\mathbf {w})&\ge \frac{1}{2}\big (\Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert ^2_{\mathbf {H}_2} - \Vert \mathbf {w}- \mathbf {w}^t\Vert _{\mathbf {H}_2}^2\big )\\&\quad + \frac{1}{2}\Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2, \quad \forall \mathbf {w}\in \varOmega , \end{aligned} \end{aligned}$$

(89)

where $\mathbf {M}$ is defined in (10), and $\mathbf {H}_2$ and $\mathbf {Q}_2$ are defined in (80).

Proof

By the identity $\mathbf {Q}_2(\mathbf {w}^t - \widetilde{\mathbf {w}}^t) = \mathbf {H}_2(\mathbf {w}^t - \mathbf {w}^{t+1})$, it holds that

$$\begin{aligned} \left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T \mathbf {Q}_2 \left( \mathbf {w}^t - \widetilde{\mathbf {w}}^t\right) = \left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) ^T \mathbf {H}_2\left( \mathbf {w}^t-\mathbf {w}^{t+1}\right) , \quad \forall \mathbf {w}\in \varOmega . \end{aligned}$$

Setting $\mathbf {a}=\mathbf {w}$, $\mathbf {b}=\widetilde{\mathbf {w}}^t$, $\mathbf {c}=\mathbf {w}^t$ and $\mathbf {d}= \mathbf {w}^{t+1}$ in the identity

$$\begin{aligned} (\mathbf {a}-\mathbf {b})^T\mathbf {H}_2(\mathbf {c}-\mathbf {d}) {=} \frac{1}{2}\left( \Vert \mathbf {a}{-}\mathbf {d}\Vert ^2_{\mathbf {H}_2}-\Vert \mathbf {a}{-}\mathbf {c}\Vert ^2_{\mathbf {H}_2}\right) {+} \frac{1}{2}\left( \Vert \mathbf {c}{-}\mathbf {b}\Vert ^2_{\mathbf {H}_2}-\Vert \mathbf {d}-\mathbf {b}\Vert ^2_{\mathbf {H}_2}\right) , \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned}&2\left( \mathbf {w}-\widetilde{\mathbf {w}}^t\right) \mathbf {Q}_2\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) \\&\quad = \Vert \mathbf {w}-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}-\mathbf {w}^{t}\Vert _{\mathbf {H}_2}^2 + \Vert \mathbf {w}^t-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}^{t+1}-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2. \end{aligned} \end{aligned}$$

(90)

Meanwhile, we have

$$\begin{aligned} \begin{aligned} \Vert \mathbf {w}^t-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}^{t+1}-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2&= \Vert \mathbf {w}^t-\widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert (\mathbf {w}^t-\widetilde{\mathbf {w}}^{t}) - \left( \mathbf {w}^t-\mathbf {w}^{t+1}\right) \Vert _{\mathbf {H}_2}^2\\&= \Vert \mathbf {w}^t- \widetilde{\mathbf {w}}^{t}\Vert _{\mathbf {H}_2}^2 - \Vert (\mathbf {w}^t-\widetilde{\mathbf {w}}^{t}) - \mathbf {M}(\mathbf {w}^t-\widetilde{\mathbf {w}}^{t})\Vert _{\mathbf {H}_2}^2\\&= \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) (2\mathbf {H}_2\mathbf {M}- \mathbf {M}^T\mathbf {H}\mathbf {M})\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) \\&= \left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) (\mathbf {Q}^T_2+ \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}\mathbf {M})\left( \mathbf {w}^t-\widetilde{\mathbf {w}}^t\right) , \end{aligned} \end{aligned}$$

where the last equality comes from the identity $\mathbf {Q}_2 = \mathbf {H}_2\mathbf {M}$.

Substituting the above identity into (90), we have, for all $\mathbf {w}\in \varOmega $,

$$\begin{aligned} 2\left( \mathbf {w}\!-\!\widetilde{\mathbf {w}}^t\right) \mathbf {Q}_2\left( \mathbf {w}^t\!-\!\widetilde{\mathbf {w}}^t\right) \!=\! \Vert \mathbf {w}\!-\!\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 \!-\! \Vert \mathbf {w}\!-\!\mathbf {w}^{t}\Vert _{\mathbf {H}_2}^2 \!+\! \Vert \mathbf {w}^t\! -\! \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T \!+\! \mathbf {Q}_2 \!-\!\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \end{aligned}$$

Plugging this identity into (82), our claim follows immediately. $\square $

Then, we show the boundedness of the sequence $\{\mathbf {w}^t\}$ generated by the DL-GADMM (79), which essentially implies the convergence of $\{\mathbf {w}^t\}$.

Theorem 9

Let $\{\mathbf {w}^t\}$ be the sequence generated by the DL-GADMM (79) with $\alpha \in (0,2)$. Then, it holds that

$$\begin{aligned} \sum _{t=0}^\infty \Vert \mathbf {w}^{t} -\widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \le \Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2 , \end{aligned}$$

(91)

where $\mathbf {H}_2$ is defined in (80).

Proof

Setting $\mathbf {w}= \mathbf {w}^*$ in (89), we have

$$\begin{aligned} \begin{aligned} f(\mathbf {u}^*) - f(\mathbf {u}^t) + \left( \mathbf {w}^*-\widetilde{\mathbf {w}}^t\right) ^TF(\mathbf {w}^*)&\ge \frac{1}{2}\big (\Vert \mathbf {w}^*-\mathbf {w}^{t+1}\Vert ^2_{\mathbf {H}_2} - \Vert \mathbf {w}^* - \mathbf {w}^t\Vert _{\mathbf {H}_2}^2\big )\\&\quad + \frac{1}{2}\Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2. \end{aligned} \end{aligned}$$

Then, recall (5), we have

$$\begin{aligned} \Vert \mathbf {w}^{t} -\widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2\le \Vert \mathbf {w}^{t} -\mathbf {w}^*\Vert _{\mathbf {H}_2}^2 - \Vert \mathbf {w}^{t+1} -\mathbf {w}^*\Vert _{\mathbf {H}_2}^2. \end{aligned}$$

It is easy to see that $\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}\succeq {\varvec{0}}$. Thus, it holds

$$\begin{aligned} \sum _{t=0}^\infty \Vert \mathbf {w}^{t} -\widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}^T_2+\mathbf {Q}_2-\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2 \le \Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2, \end{aligned}$$

which completes the proof. $\square $

Finally, we establish a worst-case ${\mathcal {O}}(1/k)$ convergence rate in a nonergodic sense for the DL-GADMM (79).

Theorem 10

Let the sequence $\{\mathbf {w}^t\}$ be generated by the scheme DL-GADMM (79) with $\alpha \in (0,2)$. It holds that

$$\begin{aligned} \Vert \mathbf {w}^k-\mathbf {w}^{k+1}\Vert _{\mathbf {H}_2}^2 = {\mathcal {O}}(1/k). \end{aligned}$$

(92)

Proof

By the definition of $\mathbf {H}_2$ in (80), we have

$$\begin{aligned} \ \Vert \mathbf {w}^t \!-\! \mathbf {w}^{t+1}\Vert ^2_{\mathbf {H}_2} \!&= \! \Vert \mathbf {x}^t \!-\! \widetilde{\mathbf {x}}^t\Vert ^2_{\mathbf {G}_1} \!+\!\Vert \mathbf {y}^t \!-\! \widetilde{\mathbf {y}}^t\Vert ^2_{\mathbf {G}_2}\!+\! \frac{1}{\alpha \rho } \left( \Vert \rho \mathbf {B}\left( \mathbf {y}^t \!-\! \mathbf {y}^{t+1}\right) \Vert ^2 \right. \nonumber \\&\!+\, \Vert {\gamma }^t\!-\!{\gamma }^{t+1}\Vert ^2 \left. + 2(1 - \alpha )\rho \left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) ^T\mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right) \right) \nonumber \\&= \Vert \mathbf {x}^t - \widetilde{\mathbf {x}}^t\Vert ^2_{\mathbf {G}_1} +\Vert \mathbf {y}^t - \widetilde{\mathbf {y}}^t\Vert ^2_{\mathbf {G}_2}+\frac{\alpha }{\rho }\Vert {\gamma }^t-\widetilde{{\gamma }}^t\Vert ^2 \nonumber \\&-2\left( \mathbf {y}^t - \mathbf {y}^{t+1}\right) ^T\mathbf {B}^T\left( {\gamma }^t-{\gamma }^{t+1}\right) . \end{aligned}$$

(93)

Using (85, 88, 91) and (93), we obtain

$$\begin{aligned} \begin{aligned} \sum _{t=1}^k \Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2&\le \frac{1}{c_\alpha }\sum _{t=1}^k\Vert \mathbf {w}^t - \widetilde{\mathbf {w}}^t\Vert _{\mathbf {Q}_2^T + \mathbf {Q}_2 -\mathbf {M}^T\mathbf {H}_2\mathbf {M}}^2\\&\quad \quad +\sum _{t=1}^k\left( \Vert \mathbf {y}^{t-1}-\mathbf {y}^t\Vert _{\mathbf {G}_2}^2-\Vert \mathbf {y}^t-\mathbf {y}^{t+1}\Vert _{\mathbf {G}_2}^2\right) \\&\le \frac{1}{c_\alpha }\Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2+\Vert \mathbf {y}^{0}-\mathbf {y}^1\Vert _{\mathbf {G}_2}^2. \end{aligned} \end{aligned}$$

By Theorem 8, the sequence $\{\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2\}$ is non-increasing. Thus, we have

$$\begin{aligned} \begin{aligned} k \Vert \mathbf {w}^k - \mathbf {w}^{k+1} \Vert _{\mathbf {H}_2}^2\le&\sum _{t=1}^k \Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2 \\ \le&\frac{1}{c_\alpha }\Vert \mathbf {w}^0-\mathbf {w}^*\Vert _{\mathbf {H}_2}^2+\Vert \mathbf {y}^{0}- \mathbf {y}^1\Vert _{\mathbf {G}_2}^2, \end{aligned} \end{aligned}$$

and the assertion (92) is proved. $\square $

Recall that for the sequence $\{\mathbf {w}^t\}$ generated by the DL-GADMM (79), it is reasonable to measure the accuracy of an iterate by $\Vert \mathbf {w}^t-\mathbf {w}^{t+1}\Vert _{\mathbf {H}_2}^2$. Thus, Theorem 10 demonstrates a worst-case ${\mathcal {O}}(1\!/k)$ convergence rate in a nonergodic sense for the DL-GADMM (79).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, E.X., He, B., Liu, H. et al. Generalized alternating direction method of multipliers: new theoretical insights and applications. Math. Prog. Comp. 7, 149–187 (2015). https://doi.org/10.1007/s12532-015-0078-2

Download citation

Received: 27 February 2014
Accepted: 15 January 2015
Published: 06 February 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s12532-015-0078-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized alternating direction method of multipliers: new theoretical insights and applications

Abstract

Access this article

Similar content being viewed by others

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

On Glowinski’s Open Question on the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendices

1.1 A worst-case \({\mathcal {O}}(1/k)\) convergence rate in the ergodic sense for (79)

Lemma 7

Lemma 8

Theorem 7

1.2 A worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for (79)

Lemma 9

Lemma 10

Theorem 8

Lemma 11

Proof

Lemma 12

Proof

Lemma 13

Proof

Theorem 9

Proof

Theorem 10

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Generalized alternating direction method of multipliers: new theoretical insights and applications

Abstract

Access this article

Similar content being viewed by others

Analysis of the Alternating Direction Method of Multipliers for Nonconvex Problems

On Glowinski’s Open Question on the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendices

1.1 A worst-case \({\mathcal {O}}(1/k)\) convergence rate in the ergodic sense for (79)

Lemma 7

Lemma 8

Theorem 7

1.2 A worst-case \({\mathcal {O}}(1/k)\) convergence rate in a nonergodic sense for (79)

Lemma 9

Lemma 10

Theorem 8

Lemma 11

Proof

Lemma 12

Proof

Lemma 13

Proof

Theorem 9

Proof

Theorem 10

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation