Parallel Multi-Block ADMM with o(1 / k) Convergence

Deng, Wei; Lai, Ming-Jun; Peng, Zhimin; Yin, Wotao

doi:10.1007/s10915-016-0318-2

Parallel Multi-Block ADMM with o(1 / k) Convergence

Published: 12 November 2016

Volume 71, pages 712–736, (2017)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Wei Deng¹,
Ming-Jun Lai²,
Zhimin Peng³ &
…
Wotao Yin³

5261 Accesses
246 Citations
3 Altmetric
Explore all metrics

Abstract

This paper introduces a parallel and distributed algorithm for solving the following minimization problem with linear constraints:

$$\begin{aligned} \text {minimize} ~~&f_1(\mathbf{x}_1) + \cdots + f_N(\mathbf{x}_N)\\ \text {subject to}~~&A_1 \mathbf{x}_1 ~+ \cdots + A_N\mathbf{x}_N =c,\\&\mathbf{x}_1\in {\mathcal {X}}_1,~\ldots , ~\mathbf{x}_N\in {\mathcal {X}}_N, \end{aligned}$$

where $N \ge 2$, $f_i$ are convex functions, $A_i$ are matrices, and ${\mathcal {X}}_i$ are feasible sets for variable $\mathbf{x}_i$. Our algorithm extends the alternating direction method of multipliers (ADMM) and decomposes the original problem into N smaller subproblems and solves them in parallel at each iteration. This paper shows that the classic ADMM can be extended to the N-block Jacobi fashion and preserve convergence in the following two cases: (i) matrices $A_i$ are mutually near-orthogonal and have full column-rank, or (ii) proximal terms are added to the N subproblems (but without any assumption on matrices $A_i$). In the latter case, certain proximal terms can let the subproblem be solved in more flexible and efficient ways. We show that $\Vert {\mathbf {x}}^{k+1} - {\mathbf {x}}^k\Vert _M^2$ converges at a rate of o(1 / k) where M is a symmetric positive semi-definte matrix. Since the parameters used in the convergence analysis are conservative, we introduce a strategy for automatically tuning the parameters to substantially accelerate our algorithm in practice. We implemented our algorithm (for the case ii above) on Amazon EC2 and tested it on basis pursuit problems with >300 GB of distributed data. This is the first time that successfully solving a compressive sensing problem of such a large scale is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Preconditioned golden ratio primal-dual algorithm with linesearch

Article 16 April 2024

Notes

References

Awanou, G., Lai, M.J., Wenston, P.: The multivariate spline method for numerical solution of partial differential equations and scattered data interpolation. In: Chen, G., Lai, M.J. (eds.) Wavelets and Splines, pp. 24–74. Nashboro Press, Nashville (2006)
Google Scholar
Bertsekas, D., Tsitsiklis, J.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena Scientific, Belmont (1997)
MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–1967 (2012)
Article MathSciNet MATH Google Scholar
Chen, C., He, B.S., Ye, Y.Y., Yuan, X.M.: The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1), 57–79 (2016)
Article MathSciNet MATH Google Scholar
Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. Abstr. Appl. Anal. 2013, 183961 (2013)
MathSciNet MATH Google Scholar
Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64(1), 81–101 (1994)
Article MathSciNet MATH Google Scholar
Corman, E., Yuan, X.M.: A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24(4), 1614–1638 (2014)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. UCLA CAM Report, pp. 14–51 (2014)
Davis, D., Yin, W.: Convergence rates of relaxed peaceman–rachford and admm under regularity assumptions. UCLA CAM Report, pp. 14–58 (2014)
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. UCLA CAM Report, pp. 15–13 (2015)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Article MathSciNet Google Scholar
Everett, H.: Generalized lagrange multiplier method for solving problems of optimum allocation of resources. Oper. Res. 11(3), 399–417 (1963)
Article MathSciNet MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: Numerical methods for nonlinear variational problems. Springer Series in Computational Physics. Springer, Berlin (1984)
Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. Laboria (1975)
Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)
Article MathSciNet MATH Google Scholar
Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)
Article MathSciNet MATH Google Scholar
He, B.S.: A class of projection and contraction methods for monotone variational inequalities. Appl. Math. Optim. 35(1), 69–76 (1997)
Article MathSciNet MATH Google Scholar
He, B.S.: Parallel splitting augmented lagrangian methods for monotone structured variational inequalities. Comput. Optim. Appl. 42(2), 195–212 (2009)
Article MathSciNet MATH Google Scholar
He, B.S., Hou, L.S., Yuan, X.M.: On full Jacobian decomposition of the augmented lagrangian method for separable convex programming. SIAM J. Optim. 25, 2274–2312 (2015)
Article MathSciNet MATH Google Scholar
He, B.S., Tao, M., Yuan, X.M.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22(2), 313–340 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M.: On the $O(1/n)$ convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M.: On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2015)
Article MathSciNet MATH Google Scholar
Hong, M., Luo, Z.Q.: On the Linear Convergence of the Alternating Direction Method of Multipliers. arXiv:1208.3922 (2012)
Li, M., Sun, D., Toh, K.C.: A Convergent 3-Block Semi-Proximal ADMM for Convex Minimization Problems with One Strongly Convex Block. arXiv:1410.7933 [math] (2014)
Lin, T., Ma, S., Zhang, S.: On the Convergence Rate of Multi-Block ADMM. arXiv:1408.4265 [math] (2014)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet MATH Google Scholar
Mota, J.F., Xavier, J.M., Aguiar, P.M., Puschel, M.: D-admm: a communication-efficient distributed algorithm for separable optimization. IEEE Trans. Signal Process. 61, 2718–2723 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Article Google Scholar
Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In: IEEE Asilomar Conference on Signals Systems and Computers (2013)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1997)
MATH Google Scholar
Shor, N.Z., Kiwiel, K.C., Ruszcayski, A.: Minimization Methods for Non-differentiable Functions. Springer, New York (1985)
Book Google Scholar
Tao, M., Yuan, X.M.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)
Article MathSciNet MATH Google Scholar
Wang, X.F., Hong, M.Y., Ma, S.Q., Luo, Z.Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers. Pac. J. Optim. 11(4), 57–81 (2015)
MathSciNet MATH Google Scholar
Yang, J.F., Zhang, Y.: Alternating direction algorithms for $\ell _1$-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)
Article MathSciNet Google Scholar
Zhang, X., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46(1), 20–46 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Wei Deng is partially supported by NSF grant ECCS-1028790. Ming-Jun Lai is partially supported by a Simon collaboration grant for 2013–2018 and by NSF grant DMS-1521537. Zhimin Peng and Wotao Yin are partially supported by NSF grants DMS-0748839 and DMS-1317602, and ARO/ARL MURI grant FA9550-10-1-0567. The authors would like to thank anonymous reviewers for their careful reading and suggestions.

Author information

Authors and Affiliations

Department of Computational and Applied Mathematics, Rice University, Houston, TX, 77005, USA
Wei Deng
Department of Mathematics, University of Georgia, Athens, GA, 30602, USA
Ming-Jun Lai
Department of Mathematics, University of California, Los Angeles, CA, 90095, USA
Zhimin Peng & Wotao Yin

Authors

Wei Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Jun Lai
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Peng
View author publications
You can also search for this author in PubMed Google Scholar
Wotao Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Jun Lai.

Appendix: On o(1 / k) Convergence Rate of ADMM

The convergence of the standard two-block ADMM has been long established in the literature [14, 16]. Its convergence rate has been actively studied; see [9, 10, 12, 17, 23,24,25, 28] and the references therein. In the following, we briefly review the convergence analysis for ADMM ($N=2$) and then improve the O(1 / k) convergence rate established in [24] slightly to o(1 / k) by using the same technique as in Sect. 2.2.

As suggested in [24], the quantity $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2$ can be used to measure the optimality of the iterations of ADMM , where

$$\begin{aligned} {\mathbf {w}}:=\begin{pmatrix} \mathbf{x}_2\\ \lambda \end{pmatrix}, ~H:=\begin{pmatrix} \rho A_2^\top A_2 &{}\quad \\ \quad &{}\frac{1}{\rho }{\mathbf {I}}\end{pmatrix}, \end{aligned}$$

and ${\mathbf {I}}$ is the identity matrix of size $m\times m$. Note that $\mathbf{x}_1$ is not part of ${\mathbf {w}}$ because $\mathbf{x}_1$ can be regarded as an intermediate variable in the iterations of ADMM, whereas $(\mathbf{x}_2,\lambda )$ are the essential variables [3]. In fact, if $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2=0$ then ${\mathbf {w}}^{k+1}$ is optimal. The reasons are as follows. Recall the subproblems of ADMM:

$$\begin{aligned} \mathbf{x}_1^{k+1}&= \hbox {arg min}_{\mathbf{x}_1}~f_1(\mathbf{x}_1)+\frac{\rho }{2} \left\| A_1\mathbf{x}_1+A_2\mathbf{x}_2^k-\lambda ^k/\rho \right\| ^2, \end{aligned}$$

(6.1)

$$\begin{aligned} \mathbf{x}_2^{k+1}&= \hbox {arg min}_{\mathbf{x}_2}~f_2(\mathbf{x}_2)+\frac{\rho }{2}\left\| A_1\mathbf{x}^{k+1}_1+A_2 \mathbf{x}_2-\lambda ^k/\rho \right\| ^2. \end{aligned}$$

(6.2)

By the formula for $\lambda ^{k+1}$, their optimality conditions can be written as:

$$\begin{aligned} A_1^\top \lambda ^{k+1}-\rho A_1^\top A_2\left( \mathbf{x}_2^k-\mathbf{x}_2^{k+1}\right)&\in \partial f\left( \mathbf{x}_1^{k+1}\right) , \end{aligned}$$

(6.3)

$$\begin{aligned} A_2^\top \lambda ^{k+1}&\in \partial f_2\left( \mathbf{x}_2^{k+1}\right) . \end{aligned}$$

(6.4)

In comparison with the KKT conditions (1.14a) and (1.14b), we can see that $\mathbf{u}^{k+1}=\left( \mathbf{x}^{k+1}_1,\mathbf{x}^{k+1}_2,\lambda ^{k+1}\right) $ is a solution of (1.1) if and only if the following holds:

$$\begin{aligned}&{\mathbf {r}}^{k+1}_p:=A_1\mathbf{x}_1^{k+1}+A_2\mathbf{x}_2^{k+1}-c = 0 \quad \text{(primal } \text{ feasibility) }, \end{aligned}$$

(6.5)

$$\begin{aligned}&{\mathbf {r}}^{k+1}_d:=\rho A_1^\top A_2\left( \mathbf{x}_2^k-\mathbf{x}_2^{k+1}\right) = 0 \quad \quad \text{(dual } \text{ feasibility) }. \end{aligned}$$

(6.6)

By the update formula for $\lambda ^{k+1}$, we can write ${\mathbf {r}}_p$ equivalently as

$$\begin{aligned} {\mathbf {r}}^{k+1}_p = \frac{1}{\rho }\left( \lambda ^k-\lambda ^{k+1}\right) . \end{aligned}$$

(6.7)

Clearly, if $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2=0$ then the optimality conditions (6.5) and (6.6) are satisfied, so ${\mathbf {w}}^{k+1}$ is a solution. On the other hand, if $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2$ is large, then ${\mathbf {w}}^{k+1}$ is likely to be far away from being a solution. Therefore, the quantity $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2$ can be viewed as a measure of the distance between the iteration ${\mathbf {w}}^{k+1}$ and the solution set. Furthermore, based on the variational inequality (1.15) and the variational characterization of the iterations of ADMM, it is reasonable to use the quadratic term $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2$ rather than $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H$ to measure the convergence rate of ADMM (see [24] for more details).

The work [24] proves that $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2$ converges to zero at a rate of O(1 / k). The key steps of the proof are to establish the following properties:

the sequence $\{{\mathbf {w}}^k\}$ is contractive:
$$\begin{aligned} \Vert {\mathbf {w}}^k-{\mathbf {w}}^*\Vert ^2_H-\Vert {\mathbf {w}}^{k+1}-{\mathbf {w}}^*\Vert ^2_H\ge \Vert {\mathbf {w}}^{k}-{\mathbf {w}}^{k+1}\Vert ^2_H, \end{aligned}$$
(6.8)
the sequence $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2$ is monotonically non-increasing:
$$\begin{aligned} \Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert ^2_H\le \Vert {\mathbf {w}}^{k-1}-{\mathbf {w}}^k\Vert ^2_H. \end{aligned}$$
(6.9)

The contraction property (6.8) has been long established and its proof dates back to [14, 16]. Inspired by [24], we provide a shorter proof for (6.9) than the one in [24].

Proof (Proof of (6.9))

Let $\Delta \mathbf{x}_i^{k+1}=\mathbf{x}_i^k-\mathbf{x}_i^{k+1}$ and $\Delta \lambda ^{k+1}=\lambda ^k-\lambda ^{k+1}$. By Lemma 1.1, i.e., (1.17), the optimality condition 6.3 at the k-th and $(k+1)$-th iterations yields: $\langle \Delta \mathbf{x}_1^{k+1},~A_1^\top \Delta \lambda ^{k+1}-\rho A_1^\top A_2(\Delta \mathbf{x}_2^k-\Delta \mathbf{x}_2^{k+1})\rangle \ge 0. $ Similarly for (6.4), we obtain $ \langle \Delta \mathbf{x}_2^{k+1},~A_2^\top \Delta \lambda ^{k+1}\rangle \ge 0. $ Adding the above two inequalities together, we have

$$\begin{aligned} \left( A_1\Delta \mathbf{x}_1^{k+1}+A_2\Delta \mathbf{x}_2^{k+1}\right) ^\top \Delta \lambda ^{k+1}-\rho \left( A_1\Delta \mathbf{x}_1^{k+1}\right) ^\top A_2\left( \Delta \mathbf{x}_2^k-\Delta \mathbf{x}_2^{k+1}\right) \ge 0. \end{aligned}$$

(6.10)

Using the equality according to (6.7):

$$\begin{aligned} A_1\Delta \mathbf{x}_1^{k+1}+A_2\Delta \mathbf{x}_2^{k+1} = \frac{1}{\rho }\left( \Delta \lambda ^k-\Delta \lambda ^{k+1}\right) , \end{aligned}$$

(6.11)

(6.10) becomes $\frac{1}{\rho }\left( \Delta \lambda ^k-\Delta \lambda ^{k+1}\right) ^\top \Delta \lambda ^{k+1}- \left( \Delta \lambda ^k-\Delta \lambda ^{k+1}-\rho A_2\Delta \mathbf{x}_2^{k+1}\right) ^ \top A_2\left( \Delta \mathbf{x}_2^k\right. \left. -\Delta \mathbf{x}_2^{k+1}\right) \ge 0. $ After rearranging the terms, we get

$$\begin{aligned}&\left( \sqrt{\rho }A_2\Delta \mathbf{x}_2^k+\frac{1}{\sqrt{\rho }}\Delta \lambda ^k\right) ^ \top \left( \sqrt{\rho }A_2\Delta \mathbf{x}_2^{k+1}+\frac{1}{\sqrt{\rho }} \Delta \lambda ^{k+1}\right) -\left( A_2\Delta \mathbf{x}_2^k\right) ^\top \Delta \lambda ^k \nonumber \\&\quad -\left( A_2\Delta \mathbf{x}_2^{k+1}\right) ^\top \Delta \lambda ^{k+1} \nonumber \\&\quad \ge \frac{1}{\rho }\left\| \Delta \lambda ^{k+1}\right\| ^2+\rho \left\| A_2\Delta \mathbf{x}_2^{k+1}\right\| ^2 =\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2. \end{aligned}$$

(6.12)

By the Cauchy-Schwarz inequality, we have $(a_1+b_1)^\top (a_2+b_2)\le (\Vert a_1+b_1\Vert ^2+\Vert a_2+b_2\Vert ^2)/2, $ or equivalently, $(a_1+b_1)^\top (a_2+b_2)-a_1^\top b_1-a_2^\top b_2 \le (\Vert a_1\Vert ^2+\Vert b_1\Vert ^2+\Vert a_2\Vert ^2+\Vert b_2\Vert ^2)/2. $ Applying this inequality to the left-hand side of (6.12), we have

$$\begin{aligned} \Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2\le & {} \left( \rho \Vert A_2\Delta \mathbf{x}_2^{k}\Vert ^2+\frac{1}{\rho }\Vert \Delta \lambda ^{k}\Vert ^2+\rho \Vert A_2\Delta \mathbf{x}_2^{k+1} \Vert ^2+\frac{1}{\rho }\Vert \Delta \lambda ^{k+1}\Vert ^2\right) /2\\= & {} \left( \Vert {\mathbf {w}}^{k-1}-{\mathbf {w}}^{k}\Vert _H^2+\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert _H^2\right) /2, \end{aligned}$$

and thus (6.9) follows immediately.

We are now ready to improve the convergence rate from O(1 / k) to o(1 / k).

Theorem 6.1

The sequence $\{{\mathbf {w}}^k\}$ generated by Algorithm 2 (for $N=2$) converges to a solution ${\mathbf {w}}^*$ of problem (1.1) in the H-norm, i.e., $\Vert {\mathbf {w}}^k-{\mathbf {w}}^*\Vert _H^2\rightarrow 0$, and $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert ^2_H = o(1/k)$. Therefore,

$$\begin{aligned} \left\| A_1\mathbf{x}_1^k- A_1\mathbf{x}_1^{k+1}\right\| ^2 + \left\| A_2\mathbf{x}_2^k - A_2\mathbf{x}_2^{k+1}\right\| ^2 + \left\| \lambda ^k- \lambda ^{k+1}\right\| ^2 = o(1/k). \end{aligned}$$

(6.13)

Proof

Using the contractive property of the sequence $\{{\mathbf {w}}^k\}$ (6.8) along with the optimality conditions, the convergence of $\Vert {\mathbf {w}}^k-{\mathbf {w}}^*\Vert _H^2\rightarrow 0$ follows from the standard analysis for contraction methods [19].

By (6.8), we have

$$\begin{aligned} \sum _{k=1}^{n} \Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert ^2_H \le \Vert {\mathbf {w}}^{1}-{\mathbf {w}}^*\Vert ^2_H - \Vert {\mathbf {w}}^{n+1}-{\mathbf {w}}^*\Vert ^2_H,~\forall n. \end{aligned}$$

(6.14)

Therefore, $\sum _{k=1}^{\infty } \Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert ^2_H<\infty $. By (6.9), $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert ^2_H$ is monotonically non-increasing and nonnegative. So Lemma 1.1 indicates that $\Vert {\mathbf {w}}^k-{\mathbf {w}}^{k+1}\Vert ^2_H = o(1/k)$, which further implies that $\Vert A_2\mathbf{x}_2^k - A_2\mathbf{x}_2^{k+1}\Vert ^2 = o(1/k)$ and $\Vert \lambda ^k- \lambda ^{k+1}\Vert ^2 = o(1/k)$. By (6.11), we also have $\Vert A_1\mathbf{x}_1^k- A_1\mathbf{x}_1^{k+1}\Vert ^2= o(1/k)$. Thus (6.13) follows immediately. $\square $

Remark 6.1

The proof technique based on Lemma 1.1 can be applied to improve some other existing convergence rates of O(1 / k) (e.g., [8, 21]) to o(1 / k) as well.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, W., Lai, MJ., Peng, Z. et al. Parallel Multi-Block ADMM with o(1 / k) Convergence. J Sci Comput 71, 712–736 (2017). https://doi.org/10.1007/s10915-016-0318-2

Download citation

Received: 17 April 2015
Revised: 20 May 2016
Accepted: 01 November 2016
Published: 12 November 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10915-016-0318-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Multi-Block ADMM with o(1 / k) Convergence

Abstract

Access this article

Similar content being viewed by others

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Random Gradient-Free Minimization of Convex Functions

Preconditioned golden ratio primal-dual algorithm with linesearch

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: On o(1 / k) Convergence Rate of ADMM

Proof (Proof of (6.9))

Theorem 6.1

Proof

Remark 6.1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel Multi-Block ADMM with o(1 / k) Convergence

Abstract

Access this article

Similar content being viewed by others

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Random Gradient-Free Minimization of Convex Functions

Preconditioned golden ratio primal-dual algorithm with linesearch

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: On o(1 / k) Convergence Rate of ADMM

Appendix: On o(1 / k) Convergence Rate of ADMM

Proof (Proof of (6.9))

Theorem 6.1

Proof

Remark 6.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation