Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Wang, Yu; Yin, Wotao; Zeng, Jinshan

doi:10.1007/s10915-018-0757-z

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Published: 07 June 2018

Volume 78, pages 29–63, (2019)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Yu Wang¹,
Wotao Yin² &
Jinshan Zeng³

17k Accesses
570 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi (x_0,\ldots ,x_p,y)$, subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $x_0,\ldots ,x_p,y$, followed by updating the dual variable. We separate the variable y from $x_i$’s as it has a special role in our analysis. The developed convergence guarantee covers a variety of nonconvex functions such as piecewise linear functions, $\ell _q$ quasi-norm, Schatten-q quasi-norm ($0<q<1$), minimax concave penalty (MCP), and smoothly clipped absolute deviation penalty. It also allows nonconvex constraints such as compact manifolds (e.g., spherical, Stiefel, and Grassman manifolds) and linear complementarity constraints. Also, the $x_0$-block can be almost any lower semi-continuous function. By applying our analysis, we show, for the first time, that several ADMM algorithms applied to solve nonconvex models in statistical learning, optimization on manifold, and matrix decomposition are guaranteed to converge. Our results provide sufficient conditions for ADMM to converge on (convex or nonconvex) monotropic programs with three or more blocks, as they are special cases of our model. ADMM has been regarded as a variant to the augmented Lagrangian method (ALM). We present a simple example to illustrate how ADMM converges but ALM diverges with bounded penalty parameter $\beta $. Indicated by this example and other analysis in this paper, ADMM might be a better choice than ALM for some nonconvex nonsmooth problems, because ADMM is not only easier to implement, it is also more likely to converge for the concerned scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization

Article 26 May 2022

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

Article 24 November 2017

On the linear convergence of the alternating direction method of multipliers

Article 06 July 2016

Notes

This is the best that one hope (except for very specific problems) since [62, Section 1] shows a convex 2-block problem, which ADMM fails to converge.
“Globally” here means regardless of where the initial point is.
A nonnegative sequence ${a_k}$ induces its running best sequence $b_k=\min \{a_i : i\le k\}$; therefore, ${a_k}$ has running best rate of o(1 / k) if $b_k=o(1/k)$.

References

Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012)
Article MATH Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, London (2014)
MATH Google Scholar
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization, vol. 10. SIAM, Philadelphia (2014)
Book MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article MATH Google Scholar
Bouaziz, S., Tagliasacchi, A., Pauly, M.: Sparse iterative closest point. In: Computer graphics forum, vol. 32, pp. 113–123. Wiley Online Library (2013)
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)
Article MathSciNet MATH Google Scholar
Chartrand, R.: Nonconvex splitting for regularized low-rank $+$ sparse decomposition. IEEE Trans. Signal Process. 60(11), 5810–5819 (2012)
Article MathSciNet MATH Google Scholar
Chartrand, R., Wohlberg, B.: A nonconvex ADMM algorithm for group sparsity with sparse groups. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6009–6013. IEEE (2013)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155, 57–79 (2016)
Article MathSciNet MATH Google Scholar
Chen C., Yuan, X., Zeng, S., Zhang, J.: Penalty splitting methods for solving mathematical program with equilibrium constraints. Manuscript (private communication) (2016)
Conn, A.R., Gould, N.I., Toint, P.: A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2), 545–572 (1991)
Article MathSciNet MATH Google Scholar
Cottle, R., Dantzig, G.: Complementary pivot theory of mathematical programming. Linear Algebra Appl. 1, 103–125 (1968)
Article MathSciNet MATH Google Scholar
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science and Engineering. Springer, New York (2016)
Google Scholar
Davis, D., Yin, W.: Convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42(3), 783–805 (2017)
Article MathSciNet MATH Google Scholar
Deng, W., Lai, M.J., Peng, Z., Yin, W.: Parallel multi-block ADMM with $o (1/k)$ convergence. J. Sci. Comput. 71, 712–736 (2017)
Article MathSciNet MATH Google Scholar
Ding, C., Sun, D., Sun, J., Toh, K.C.: Spectral operators of matrices. Math. Program. 168(1–2), 509–531 (2018)
Article MathSciNet MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer Series in Computational Physics. Springer, New York (1984)
Book MATH Google Scholar
Glowinski, R., Marroco, A.: On the approximation by finite elements of order one, and resolution, penalisation-duality for a class of nonlinear dirichlet problems. ESAIM Math. Model. Numer. Anal. 9(R2), 41–76 (1975)
MATH Google Scholar
He, B., Yuan, X.: On the $o(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969)
Article MathSciNet MATH Google Scholar
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Article MathSciNet MATH Google Scholar
Hu, Y., Chi, E., Allen, G.I.: ADMM algorithmic regularization paths for sparse statistical machine learning. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science and Engineering. Springer, New York (2016)
Google Scholar
Ivanov, M., Zlateva, N.: Abstract subdifferential calculus and semi-convex functions. Serdica Math. J. 23(1), 35p–58p (1997)
MathSciNet MATH Google Scholar
Iutzeler, F., Bianchi, P., Ciblat, P., Hachem, W.: Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In: 2013 IEEE 52nd Annual Conference On Decision and Control (CDC), pp. 3671–3676. IEEE (2013)
Jiang, B., Ma, S., Zhang, S.: Alternating direction method of multipliers for real and complex polynomial optimization models. Optimization 63(6), 883–898 (2014)
Article MathSciNet MATH Google Scholar
Knopp, K.: Infinite Sequences and Series. Courier Corporation, Chelmsford (1956)
MATH Google Scholar
Kryštof, V., Zajíček, L.: Differences of two semiconvex functions on the real line. Preprint (2015)
Lai, R., Osher, S.: A splitting method for orthogonality constrained problems. J. Sci. Comput. 58(2), 431–449 (2014)
Article MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
Article MathSciNet MATH Google Scholar
Li, R.C., Stewart, G.: A new relative perturbation theorem for singular subspaces. Linear Algebra Appl. 313(1), 41–51 (2000)
Article MathSciNet MATH Google Scholar
Liavas, A.P., Sidiropoulos, N.D.: Parallel algorithms for constrained tensor factorization via the alternating direction method of multipliers. IEEE Trans. Signal Process. 63(20), 5450–5463 (2015)
Article MathSciNet MATH Google Scholar
Łojasiewicz, S.: Sur la géométrie semi-et sous-analytique. Ann. Inst. Fourier (Grenoble) 43(5), 1575–1595 (1993)
Article MathSciNet MATH Google Scholar
Lu, Z., Zhang, Y.: An augmented lagrangian approach for sparse principal component analysis. Math. Program. 135(1–2), 149–193 (2012)
Article MathSciNet MATH Google Scholar
Magnússon, S., Weeraddana, P.C., Rabbat, M.G., Fischione, C.: On the convergence of alternating direction Lagrangian methods for nonconvex structured optimization problems. IEEE Trans. Control Netw. Syst. 3(3), 296–309 (2015)
Article MathSciNet MATH Google Scholar
Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6), 959–972 (1977)
Article MathSciNet MATH Google Scholar
Miksik, O., Vineet, V., Pérez, P., Torr, P.H., Cesson Sévigné, F.: Distributed non-convex ADMM-inference in large-scale random fields. In: British Machine Vision Conference. BMVC (2014)
Möllenhoff, T., Strekalovskiy, E., Moeller, M., Cremers, D.: The primal-dual hybrid gradient method for semiconvex splittings. SIAM J. Imaging Sci. 8(2), 827–857 (2015)
Article MathSciNet MATH Google Scholar
Oymak, S., Mohan, K., Fazel, M., Hassibi, B.: A simplified approach to recovery conditions for low rank matrices. In: 2011 IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2318–2322. IEEE (2011)
Peng, Z., Xu, Y., Yan, M., Yin, W.: ARock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)
Article MathSciNet MATH Google Scholar
Poliquin, R., Rockafellar, R.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348(5), 1805–1838 (1996)
Article MathSciNet MATH Google Scholar
Powell, M.J.: A method for non-linear constraints in minimization problems. UKAEA (1967)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer Science & Business Media (2009)
Rosenberg, J., et al.: Applications of analysis on Lipschitz manifolds. In: Proceedings of Miniconferences on Harmonic Analysis and Operator Algebras (Canberra, t987), Proceedings Centre for Mathematical Analysis, vol. 16, pp. 269–283 (1988)
Shen, Y., Wen, Z., Zhang, Y.: Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29(2), 239–263 (2014)
Article MathSciNet MATH Google Scholar
Slavakis, K., Giannakis, G., Mateos, G.: Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Sig. Process. Mag. 31(5), 18–31 (2014)
Article Google Scholar
Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205. IEEE (2014)
Sun, R., Luo, Z.-Q., Ye, Y.: On the expected convergence of randomly permuted ADMM. arXiv preprint arXiv:1503.06387 (2015)
Wang, F., Cao, W., Xu, Z.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. arXiv preprint arXiv:1505.03063 (2015)
Wang, F., Xu, Z., Xu, H.K.: Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems. arXiv preprint arXiv:1410.8625 (2014)
Wang, X., Hong, M., Ma, S., Luo, Z.Q.: Solving multiple-block separable convex minimization problems using two-block alternating direction method of multipliers. arXiv preprint arXiv:1308.5294 (2013)
Wang, Y., Zeng, J., Peng, Z., Chang, X., Xu, Z.: Linear convergence of adaptively iterative thresholding algorithm for compressed sensing. IEEE Trans. Signal Process. 63(11), 2957–2971 (2015)
Article MathSciNet MATH Google Scholar
Watson, G.A.: Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33–45 (1992)
Article MathSciNet MATH Google Scholar
Wen, Z., Peng, X., Liu, X., Sun, X., Bai, X.: Asset allocation under the basel accord risk measures. arXiv preprint arXiv:1308.1321 (2013)
Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and ptychographic phase retrieval. Inverse Prob. 28(11), 115010 (2012)
Article MathSciNet MATH Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)
Article MathSciNet MATH Google Scholar
Wikipedia: Schatten norm—Wikipedia, the free encyclopedia (2015). (Online; Accessed 18 Oct 2015)
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Article MathSciNet MATH Google Scholar
Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)
Article MathSciNet MATH Google Scholar
Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science and Engineering, pp. 165–194. Springer, New York (2016)
Chapter Google Scholar
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for nonconvex background/foreground extraction. SIAM J. Imaging Sci. 10(1), 74–110 (2017)
Article MathSciNet MATH Google Scholar
You, S., Peng, Q.: A non-convex alternating direction method of multipliers heuristic for optimal power flow. In: 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), pp. 788–793. IEEE (2014)
Zeng, J., Lin, S., Xu, Z.: Sparse regularization: convergence of iterative jumping thresholding algorithm. IEEE Trans. Signal Process. 64(19), 5106–5117 (2016)
Article MathSciNet MATH Google Scholar
Zeng, J., Peng, Z., Lin, S.: A Gauss–Seidel iterative thresholding algorithm for $\ell_q$ regularized least squares regression. J. Comput. Appl. Math. 319, 220–235 (2017)
Article MathSciNet MATH Google Scholar
Zeng, J., Lin, S., Wang, Y., Xu, Z.: $L_{1/2}$ regularization: convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 62(9), 2317–2329 (2014)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank Drs. Wei Shi, Ting Kei Pong, and Qing Ling for their insightful comments, and Drs. Xin Liu and Yangyang Xu for helpful discussions. We thank the three anonymous reviewers for their review and helpful comments.

Author information

Authors and Affiliations

Department of Statistics, University of California, Berkeley (UCB), Berkeley, CA, 94704, USA
Yu Wang
Department of Mathematics, University of California, Los Angeles (UCLA), Los Angeles, CA, 90025, USA
Wotao Yin
School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, 330022, Jiangxi, China
Jinshan Zeng

Authors

Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wotao Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jinshan Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinshan Zeng.

Additional information

The work of W. Yin was supported in part by NSF Grants DMS-1720237 and ECCS-1462397, and ONR Grant N00014171216. The work of J. Zeng was supported in part by the NSFC Grants (61603162, 11501440, 61772246, 61603163) and the doctoral start-up foundation of Jiangxi Normal Univerity.

Appendix

Proof of Proposition 1

The fact that convex functions and the $C^1$ regular functions are prox-regular has been proved in the previous literature, for example, see [43]. Here we only prove the second part of the proposition.

(1): For functions $r( x) = \sum _{i} |x_i|^q$ where $0< q < 1$, the set of general subgradient of $r(\cdot )$ is

$$\begin{aligned} \partial r(x) = \left\{ d=[d_1;\ldots ;d_n]\left| d_i = q\cdot \mathrm {sign}(x_i)|x_i|^{q-1} \text { if } x_i \ne 0 \text {; } d_i\in \mathbb {R}\text { if } x_i = 0\right. \right\} . \end{aligned}$$

For any two positive constants $C>0$ and $M>1$, take $\gamma = \max \left\{ \frac{4({n}C^q+MC)}{c^2},q(1-q)c^{q-2}\right\} $, where

$c \triangleq \frac{1}{3}(\frac{q}{M})^{\frac{1}{1-q}}$. The exclusion set $S_{M}$ contains the set $\{ x|\min _{x_i\ne 0} |x_i|\le 3c\}$. For any point $z\in \mathbb {B}(0,C)/S_{M}$ and $y\in \mathbb {B}(0,C)$, if $\Vert z-y\Vert \le c$, then $\mathrm {supp}(z) \subset \mathrm {supp}(y)$ and $\Vert z\Vert _0 \le \Vert y\Vert _0$, where $\mathbb {B}(0,C) \triangleq \{x| \Vert x\Vert <C\}$, $\mathrm {supp}(z)$ denotes the index set of all non-zero elements of z and $\Vert z\Vert _0$ denotes the cardinality of $\mathrm {supp}(z)$. Define

$$\begin{aligned} y'_{i} = \left\{ \begin{array}{cc} y_{i} &{} \quad i\in \mathrm {supp}(z)\\ 0 &{} \quad i\not \in \mathrm {supp}(z) \end{array}\right. ,~i=1,\ldots , p. \end{aligned}$$

Then for any $d\in \partial r(z)$, the following line of proof holds,

$$\begin{aligned} \Vert y\Vert _q^q-\Vert z\Vert _q^q - \big<d,y-z\big> {\mathop {\ge }\limits ^{(a)}}\,&\Vert y'\Vert _q^q-\Vert z\Vert _q^q - \big <d,y'-z\big >\nonumber \\ {\mathop {\ge }\limits ^{(b)}}&- \frac{q(1-q)}{2}c^{q-2}\Vert z-y'\Vert ^2\nonumber \\ {\mathop {\ge }\limits ^{(c)}}&- \frac{q(1-q)}{2}c^{q-2}\Vert z-y\Vert ^2, \end{aligned}$$

(51)

where (a) holds for $\Vert y\Vert _q^q = \Vert y'\Vert _q^q + \Vert y-y'\Vert _q^q$ by the definition of $y'$,

(b) holds because r(x) is twice differentiable along the line segment connecting z and $y'$, and the second order derivative is no bigger than $q(1-q)c^{q-2}$, and (c) holds because $\Vert z-y\Vert \ge \Vert z-y'\Vert $. While if $\Vert z-y\Vert > c$, then for any $d\in \partial r(z)$, we have

$$\begin{aligned} \Vert y\Vert _q^q-\Vert z\Vert _q^q-\big <d,y-z\big >\ge -(2nC^q + 2MC) \ge -\frac{2nC^q+2MC}{c^2}\Vert y-z\Vert ^2. \end{aligned}$$

(52)

Combining (51) and (52) yields the result.

(2): We are going to verify that Schatten-q quasi-norm ${\Vert \cdot \Vert }_q$ is restricted prox-regular. Without loss of generality, suppose $A\in \mathbb {R}^{n\times n}$ is a square matrix.

Suppose the singular value decomposition (SVD) of A is

$$\begin{aligned} A = U\varSigma V^T = [U_1,U_2]\cdot \left[ \begin{array}{cc} \varSigma _1 &{} 0\\ 0 &{} 0 \end{array}\right] \cdot \left[ \begin{array}{c} V_1^T\\ V_2^T \end{array}\right] , \end{aligned}$$

(53)

where $U,V\in \mathbb {R}^{n\times n}$ are orthogonal matrices, and $\varSigma _1\in \mathbb {R}^{K\times K}$ is diagonal whose diagonal elements are $\sigma _i(A)$, $i=1,\ldots ,K$. Then the general subgradient of ${\Vert A \Vert }_q^q$ [55] is

$$\begin{aligned} \partial {\Vert A \Vert }_q^q = U_1DV_1^T + \{U_2\varGamma V_2^T\big | \varGamma \text { is an arbitrary matrix}\}, \end{aligned}$$

where $D\in \mathbb {R}^{K\times K}$ is a diagonal matrix whose ith diagonal element is $ d_i = q\sigma _i(A)^{q-1}$.

Now we are going to prove ${\Vert \cdot \Vert }_q^q$ is restricted prox-regular, i.e., for any positive parameters $M, P>0$, there exists $\gamma >0$ such that for any ${\Vert B \Vert }_F<P$, ${\Vert A \Vert }_F<P$, $A\not \in S_M = \{A| \forall ~X\in \partial {\Vert A \Vert }_q^q,~{\Vert X \Vert }_F>M\}$, and $T = U_{1} D V_{1}^T + U_{2}\varGamma V_{2}^T\in \partial {\Vert A \Vert }_q^q, {\Vert T \Vert }_F\le M$, we hope to show

$$\begin{aligned} {\Vert B \Vert }_q^q - {\Vert A \Vert }_q^q - \big <T,B - A\big > \ge -\frac{\gamma }{2}{\Vert A-B \Vert }^2_F. \end{aligned}$$

(54)

Let $\epsilon _0 = \frac{1}{3}(M/q)^{1/(q - 1)}$. If ${\Vert B - A \Vert } > \epsilon _0$, we have

$$\begin{aligned}&{\Vert B \Vert }_q^q - {\Vert A \Vert }_q^q - \big <T,B - A\big > \ge - n^2P^q - M\cdot \Vert B - A\Vert _F\nonumber \\&\quad \ge -(M\epsilon _0^{-1}+n^2P^q\epsilon _0^{-2}){\Vert A-B \Vert }_F^2. \end{aligned}$$

(55)

If ${\Vert B-A \Vert }_F<\epsilon _0$, consider the decomposition of $B = U_B \varSigma ^B V_B^T = B_1 + B_2$ where $B_1 = U_B \varSigma ^B_1 V_B^T$, $\varSigma ^B_1$ is the diagonal matrix preserving elements of $\varSigma ^B$ bigger than $\frac{1}{3}(M/q)^{1/(q - 1)}$, and $B_2 = U_B \varSigma ^B_2 V_B^T$ where $\varSigma ^B_2 = \varSigma ^B - \varSigma ^B_1$.

Define a set $S' \triangleq \{T\in {\mathbb {R}}^{n \times n}|{\Vert T \Vert }_F \le P,~ \min _{\sigma _i>0} \sigma _i(T) \ge \epsilon _0\}$. Let’s prove $A, B_1\in S'$. If $\min _{\sigma _i >0} \sigma _i(A) < (M/q)^{1/(q - 1)}$, then for any $X\in \partial {\Vert A \Vert }_q^q$, $X = U_1DV_1^T + U_2\varGamma V_2^T$ and

$$\begin{aligned} {\Vert X \Vert }_F \ge {\Vert U_1DV_1^T \Vert }_F \ge \min _{\sigma _i>0} q\sigma _i^{q-1} \ge M, \end{aligned}$$

which contradicts with the face that $A\not \in S_M$. As for $B_1$, because of ${\Vert A - B \Vert }_F\le \epsilon _0$ and $\min _{\sigma _i >0} \sigma _i(A) < (M/q)^{1/(q - 1)}$, using Weyl inequalities will we get $B_1\in S'$.

Define the function $F:S'\subset \mathbb {R}^{n\times n}\rightarrow \mathbb {R}^{n\times n}$, for $A = U_1\varSigma V_1^T$, $F(A) = U_1 D V_1^T$, where

$$\begin{aligned} D = \mathrm {diag}(q\sigma _1(A)^{q-1},\ldots ,q\sigma _1(A)^{q-1}) \end{aligned}$$

and ($0^{q-1} = 0$). Based on [18, Theorem 4.1] and the compactness of $S'$, F(A) is Lipschitz continuous in $S'$, i.e., there exists $L>0$, for any two matrices $A, B\in S'$ , ${\Vert F(A) - F(B) \Vert }_F\le L{\Vert A - B \Vert }_F$. This implies

$$\begin{aligned} {\Vert B_1 \Vert }_q^q - {\Vert A \Vert }_q^q - \big <U_1DV_1^T,B_1 - A\big > \ge -\frac{L}{2}{\Vert B_1 - A \Vert }_F^2. \end{aligned}$$

(56)

In addition, because ${\Vert U_{2}^TU_B \Vert }_F< {\Vert B_1 - A \Vert }_F/\epsilon _0$ and ${\Vert V_{2}^TV_B \Vert }_F < {\Vert B_1 - A \Vert }_F/\epsilon _0$ (see [33]),

$$\begin{aligned} \big<U_{2}\varGamma V_{2}^T,B_1 - A\big>= \big <\varGamma ,U_{2}^TU_B\varSigma _BV_B^TV_{2}\big > \ge - \frac{M^2}{\epsilon _0^2} {\Vert B_1 - A \Vert }_F^2. \end{aligned}$$

(57)

Furthermore, ${\Vert B_2 \Vert }_q^q - \big <T,B_2\big > \ge 0$ and ${\Vert B_1 - A \Vert }_F \le {\Vert B - A \Vert }_F+{\Vert B - B_1 \Vert }_F\le 2{\Vert B - A \Vert }_F$, together with (56) and (57) we have

$$\begin{aligned} {\Vert B \Vert }_q^q - {\Vert A \Vert }_q^q - \big< T,B - A\big>&= {\Vert B_1 \Vert }_q^q - {\Vert A \Vert }_q^q - \big<T,B_1 - A\big> + {\Vert B_2 \Vert }_q^q - \big <T,B_2\big >\nonumber \\&\ge -\left( \frac{L}{2} + \frac{M^2}{\epsilon _0^2}\right) {\Vert B_1 - A \Vert }_F^2 \ge -\left( 2L + \frac{4M^2}{\epsilon _0^2}\right) {\Vert B - A \Vert }_F^2. \end{aligned}$$

(58)

Combining (55) and (58), we finally prove (54) with appropriate $\gamma $.

(3): We need to show that the indicator function $\iota _S$ of a p-dimensional compact $C^2$ manifold S is restricted prox-regular. First of all, by definition, the exclusion set $S_M$ of $\iota _S$ is empty for any $M>0$. Since S is compact and $C^2$, there are a series of $C^2$ homeomorphisms $h_\eta : \mathbb {R}^{p} \mapsto \mathbb {R}^n$, $\eta \in \{1,\ldots , m\}$ and $\delta >0$ such that for any x, there exist an $\eta $ and an $\alpha _x$ satisfying $x = h_\eta (\alpha _x)\in S$. Furthermore, for any $\Vert y - x\Vert \le \delta $, we can find an $\alpha _y$ satisfying $y = h_\eta (\alpha _y)$.

Note that $\partial \iota _{S}(x)= \mathrm {Im}(J_{h_\eta }(x))^\perp $, where $J_{h_\eta }$ is the Jacobian of $h_\eta $. For any $d\in \partial \iota _S(x)$, $\Vert d\Vert \le M$ and $\Vert x-y\Vert \le \delta $,

$$\begin{aligned} \iota _S(y) - \iota _S(x) - \big<d,y - x\big> \nonumber =&-\big<d,h_\eta (\alpha _y) - h_\eta (\alpha _x)\big> \\ \nonumber =&-\big <d,h_\eta (\alpha _y) - h_\eta (\alpha _x) - J_{h_\eta }(\alpha _y - \alpha _x)\big > \\ \nonumber \ge&-\Vert d\Vert \cdot \gamma \Vert \alpha _y - \alpha _x\Vert ^2 \\ \ge&-M \gamma C^2 \Vert x - y\Vert ^2, \end{aligned}$$

(59)

where $\gamma $ and C are the Lipschitz constants of $\nabla h_\eta $ and $ h^{-1}_\eta $, respectively. For any $\Vert y-x\Vert \ge \delta $,

$$\begin{aligned} \iota _S(y) - \iota _S(x) - \big<d,y - x\big> \nonumber =&- \big <d,y - x\big > \\ \nonumber \ge&- \Vert d\Vert \cdot \Vert y - x\Vert \\ \ge&-\frac{M}{\delta }\Vert y - x\Vert ^2, \end{aligned}$$

(60)

where M is the maximum of $\Vert d\Vert $ over $\partial \iota _S(x)$. Combining (59) and (60) shows that $\iota _{S}$ is restricted prox-regular. $\square $

Proof

(Lemma1) By the definitions of H in A3(a) and $y^k$, we have $y^k = H(By^k)$. Therefore, $\Vert y^{k_1} - y^{k_2}\Vert =\Vert H(By^{k_1}) - H(By^{k_2})\Vert \le {\bar{M}} \Vert By^{k_1} - By^{k_2}\Vert .$ Similarly, by the optimality of $x^k_i$, we have $x^k_i = F_i(A_ix_i^k)$. Therefore, $\Vert x^{k_1}_i - x_i^{k_2}\Vert =\Vert F_i(A_ix_i^{k_1}) - F_i(A_ix_i^{k_2})\Vert \le {\bar{M}} \Vert A_ix_i^{k_1} - A_ix_i^{k_2}\Vert .$$\square $

Proof

(Lemma 2) Let us first show that the y-subproblem is well defined. To begin with, we will show that h(y) is lower bounded by a quadratic function of By:

$$\begin{aligned} h(y) \ge h(H(0)) - \left( {\bar{M}}\Vert \nabla h(H(0))\Vert \right) \cdot \Vert By\Vert -\frac{L_h{\bar{M}}^2}{2} \Vert By\Vert ^2. \end{aligned}$$

By A3, we know h(y) is lower bounded by h(H(By)):

$$\begin{aligned} h(y)\ge h(H(By)). \end{aligned}$$

Because of A5 and A3, h(H(By)) is lower bounded by a quadratic function of By:

$$\begin{aligned} h(H(By))&\ge h(H(0)) + \big <\nabla h(H(0)),H(By) - H(0)\big > -\frac{L_h}{2} \Vert H(By) - H(0)\Vert ^2 \end{aligned}$$

(61)

$$\begin{aligned}&\ge h(H(0)) - \Vert \nabla h(H(0))\Vert \cdot {\bar{M}}\cdot \Vert By\Vert -\frac{L_h{\bar{M}}^2}{2} \Vert By\Vert ^2 \end{aligned}$$

(62)

Therefore h(y) is also bounded by the quadratic function:

$$\begin{aligned} h(y) \ge h(H(0)) - \Vert \nabla h(H(0))\Vert \cdot {\bar{M}}\cdot \Vert By\Vert -\frac{L_h{\bar{M}}^2}{2} \Vert By\Vert ^2. \end{aligned}$$

Recall that y-subproblem is to minimize the Lagrangian function w.r.t. y, by neglecting other constants, it is equivalent to minimize:

$$\begin{aligned} {{\mathrm{argmin}}}&~ P(y) := h(y) + \big <w^{k} + \beta {\mathbf {A}}{\mathbf {x}}^+, By\big > + \frac{\beta }{2}\Vert By\Vert ^2. \end{aligned}$$

(63)

Because h(y) is lower bounded by $-\frac{L_h{\bar{M}}^2}{2}\Vert By\Vert ^2$, when $\beta > L_h{\bar{M}}$, $P(y)\rightarrow \infty $ as $\Vert By\Vert \rightarrow \infty $. This shows that y-subproblem is coercive with respect to By. Because P(y) is lower semi-continuous and ${{\mathrm{argmin}}}h(y) \ \text {s.t.} \ By = u$ has a unique solution for each u, the minimal point of P(y) must exist and the y-subproblem is well defined.

As for the $x_i$-subproblem, $i = 0,\ldots , p$, ignoring the constants yields

$$\begin{aligned} {{\mathrm{argmin}}}\ {\mathcal L}_\beta (x^{+}_{<i},x_i,x^{k}_{>i},y^k,w^k)&={{\mathrm{argmin}}}\ f(x^{+}_{<i},x_i,x^k_{>i}) \nonumber \\&\quad + \frac{\beta }{2}\Vert \frac{1}{\beta }w^k + A_{<i}x^+_{<i} + A_{>i}x^k_{>i} + A_ix_i + By^k\Vert ^2 \end{aligned}$$

(64)

$$\begin{aligned}&={{\mathrm{argmin}}}\ f(x^{+}_{<i},x_i,x^k_{>i}) + h(u) - h(u) \nonumber \\&\quad + \frac{\beta }{2}\Vert Bu-By^k-\frac{1}{\beta }w^k\Vert ^2. \end{aligned}$$

(65)

where $u = H(-A_{<i}x^+_{<i} - A_{>i}x^k_{>i} - A_ix_i)$. The first two terms are coercive bounded because $A_{<i}x^+_{<i} + A_{>i}x^k_{>i} + A_ix_i + Bu = 0$ and A1. The third and fourth terms are lower bounded because h is Lipschitz differentiable. Because the function is lower semi-continuous, all the subproblems are well defined. $\square $

Proof

(Proposition 1) Define the augmented Lagrangian function to be

$$\begin{aligned} {\mathcal L}_{\beta }(x,y,w) = x^2 - y^2 + w(x-y) + \frac{\beta }{2} \Vert x - y\Vert ^2. \end{aligned}$$

It is clear that when $\beta =0$, ${\mathcal L}_{\beta }$ is not lower bounded for any w. We are going to show that for any $\beta >2$, the duality gap is not zero.

$$\begin{aligned} \inf _{x\in [-1,1],y\in \mathbb {R}}\sup _{w\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) > \sup _{w\in \mathbb {R}}\inf _{x\in [-1,1],y\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w). \end{aligned}$$

On one hand, because $\sup _{w\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) = +\infty $ when $x\ne y$ and $\sup _{w\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) = 0$ when $x=y$, we have

$$\begin{aligned} \inf _{x\in [-1,1],y\in \mathbb {R}}\sup _{w\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) = 0. \end{aligned}$$

On the other hand, let $t=x-y$,

$$\begin{aligned}&\sup _{w\in \mathbb {R}}\inf _{x\in [-1,1],y\in \mathbb {R}} {\mathcal L}_{\beta }(x,y,w) = \sup _{w\in \mathbb {R}}\inf _{x\in [-1,1],t\in \mathbb {R}} t(2x-t)+ wt+\frac{\beta }{2} t^2 \nonumber \\&\quad = \sup _{w\in \mathbb {R}}\inf _{x\in [-1,1],t\in \mathbb {R}} (w+2x)t+\frac{\beta -2}{2} t^2 \end{aligned}$$

(66)

$$\begin{aligned}&\quad = \sup _{w\in \mathbb {R}}\inf _{x\in [-1,1]} -\frac{(w+2x)^2}{2(\beta -2)}= \sup _{w\in \mathbb {R}} -\frac{\max \{(w-2)^2,(w+2)^2\}}{2(\beta -2)}= -\frac{2}{\beta - 2}. \end{aligned}$$

(67)

This shows the duality gap is not zero (but it goes to 0 as $\beta $ tends to $\infty $).

Then let us show that ALM does not converge if $\beta ^k$ is bounded, i.e., there exists $\beta >0$ such that $\beta ^k\le \beta $ for any $k\in {\mathbb {N}}$. Without loss of generality, we assume that $\beta ^k$ equals to the constant $\beta $ for all $k\in {\mathbb {N}}$. This will not affect the proof. ALM consists of two steps

1)
$(x^{k+1},y^{k+1}) = \text {argmin}_{x,y} {\mathcal L}_{\beta }(x,y,w^k),$
2)
$w^{k+1} = w^k + \tau (x^{k+1} - y^{k+1}).$

Since $(x^{k+1} - y^{k+1})\in \partial \psi (w^k)$ where $\psi (w) = \inf _{x,y} {\mathcal L}_{\beta }(x,y,w)$, and we already know

$$\begin{aligned} \inf _{x,y} {\mathcal L}_{\beta }(x,y,w) = -\frac{\max ((w-2)^2,(w+2)^2)}{2(\beta -2)}, \end{aligned}$$

we have

$$\begin{aligned} w^{k+1} = \left\{ \begin{array}{cc} (1-\frac{\tau }{\beta -2}) w^k - \frac{2\tau }{\beta -2} &{} \text { if } w^{k} \ge 0\\ (1-\frac{\tau }{\beta -2}) w^k + \frac{2\tau }{\beta - 2} &{} \text { if } w^{k} \le 0 \end{array} \right. . \end{aligned}$$

Note that when $w^k = 0$, the optimization problem $\inf _{x,y} L(x,y,0)$ has two distinct minimal points which lead to two different values. This shows no matter how small $\tau $ is, $w^k$ will oscillate around 0 and never converge.

However, although the duality gap is not zero, ADMM still converges in this case. There are two ways to prove it. The first way is to check all the conditions in Theorem 1. Another way is to check the iterates directly. The ADMM iterates are

$$\begin{aligned} x^{k+1}&= \max \left( -1,\min (1,\frac{\beta }{\beta + 2}(y^k - \frac{w^k}{\beta }))\right) , \quad y^{k+1} = \frac{\beta }{\beta - 2}\big (x^{k+1}+\frac{w^k}{\beta }\big ),\nonumber \\ w^{k+1}&= w^k + \beta (x^{k+1} - y^{k+1}). \end{aligned}$$

(68)

The second equality shows that $w^{k} = -2y^k$, substituting it into the first and second equalities, we have

$$\begin{aligned} x^{k+1} = \max \{-1,\min \{1,y^k\}\},\quad y^{k+1} = \frac{1}{\beta - 2}\left( \beta x^{k+1} - 2y^k\right) . \end{aligned}$$

(69)

Here $|y^{k+1}| \le \frac{\beta }{\beta -2} + \frac{2}{\beta -2}|y^k|$. Thus after finite iterations, $|y^{k}| \le 2$ (assume $\beta >4$). If $|y^k| \le 1$, the ADMM sequence converges obviously. If $|y^k| > 1$, without loss of generality, we could assume $2>y^k>1$. Then $x^{k+1} = 1$. It means $0<y^{k+1}<1$, so the ADMM sequence converges. Thus, we know for any initial point $y^0$ and $w^0$, ADMM converges. $\square $

Proof

(Theorem 2) Similar to the proof of Theorem 1, we only need to verify P1–P4 in Proposition 2. Proof of P2: Similar to Lemmas 4 and 5, we have

$$\begin{aligned}&{\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)-{\mathcal L}_\beta ({\mathbf {x}}^{k+1},y^{k+1},w^{k+1}) \nonumber \\&\quad \ge -\frac{1}{\beta }\Vert w^{k} - w^{k+1}\Vert ^2 + \sum _{i=1}^p\frac{\beta - L_\phi {\bar{M}}}{2}\Vert A_ix_i^k-A_ix_i^{k+1}\Vert ^2 \nonumber \\&\quad + \frac{\beta - L_\phi {\bar{M}}}{2}\Vert By^k - By^{k+1}\Vert ^2. \end{aligned}$$

(70)

Since $B^Tw^k = - \partial _y \phi ({\mathbf {x}}^k,y^k)$ for any $k\in {\mathbb {N}}$, we have

$$\begin{aligned} \Vert w^k - w^{k+1}\Vert \le C_1L_\phi \left( \sum _{i=0}^p \Vert x_i^k - x_i^{k+1}\Vert + \Vert y^k - y^{k+1}\Vert \right) , \end{aligned}$$

where $C_1 = \sigma _{\min }(B)$, $\sigma _{\min }(B)$ is the smallest positive singular value of B, and $L_\phi $ is the Lipschitz constant for $\phi $. Therefore, we have

$$\begin{aligned}&{\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)-{\mathcal L}_\beta ({\mathbf {x}}^{k+1},y^{k+1},w^{k+1}) \nonumber \\&\quad \ge \left( \frac{\beta - L_\phi {\bar{M}}}{2} - \frac{CL_\phi {\bar{M}}}{\beta }\right) \sum _{i=0}^p\Vert A_ix_i^k-A_ix_i^{k+1}\Vert ^2\nonumber \\&\qquad + \left( \frac{\beta - L_\phi {\bar{M}}}{2}-\frac{C_1L_\phi {\bar{M}}}{\beta }\right) \Vert By^k - By^{k+1}\Vert ^2. \end{aligned}$$

(71)

When $\beta > \max \{1,L_\phi {\bar{M}} + 2C_1L_\phi {\bar{M}}\}$, P2 holds.

Proof of P1: First of all, we have already shown ${\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)\ge {\mathcal L}_\beta ({\mathbf {x}}^{k+1},y^{k+1},w^{k+1})$, which means ${\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)$ decreases monotonically. There exists $y'$ such that ${\mathbf {A}}{\mathbf {x}}^k + By' = 0$ and $y' = H(By')$. In order to show ${\mathcal L}_\beta ({\mathbf {x}}^k,y^k,w^k)$ is lower bounded, we apply A1–A3 to get

$$\begin{aligned}&{\mathcal L}_\beta ({\mathbf {x}}^{k},y^k,w^k)=\phi ({\mathbf {x}}^{k},y^k)+\big<w^{k}, \sum _{i=0}^p A_ix^{k}_i + By^k\big>+ \frac{\beta }{2}\Vert \sum _{i=0}^p A_ix^{k}_i + By^k\Vert ^2\nonumber \\&\quad = \phi ({\mathbf {x}}^{k},y^k)+\big <d_y^k,y' - y^k\big>+ \frac{\beta }{2}\Vert By^k - By'\Vert ^2 \ge \phi ({\mathbf {x}}^{k},y')\nonumber \\&\qquad +\frac{\beta }{4}\Vert \sum _{i=0}^p A_ix^{k}_i + By^k\Vert ^2 > -\infty , \end{aligned}$$

(72)

for some $d_y^k \in \partial _y \phi ({\mathbf {x}}^{k},y^k)$. This shows that $\mathcal{L}_{\beta }({\mathbf {x}}^{k},y^k,w^k)$ is lower bounded. If we view (72) from the opposite direction, it can be observed that

$$\begin{aligned} \phi ({\mathbf {x}}^k,y')+\frac{\beta }{4}\Vert \sum _{i=1}^p A_ix^{k}_i + By^k\Vert ^2 \end{aligned}$$

is upper bounded by ${\mathcal L}_\beta ({\mathbf {x}}^0,y^0,w^0)$. Then A1 ensures that $\{{\mathbf {x}}^k,y^k\}$ is bounded. Therefore, $w^k$ is bounded too.

Proof of P3, P4: This part is trivial as $\phi $ is Lipschitz differentiable. Hence we omit it.

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Yin, W. & Zeng, J. Global Convergence of ADMM in Nonconvex Nonsmooth Optimization. J Sci Comput 78, 29–63 (2019). https://doi.org/10.1007/s10915-018-0757-z

Download citation

Received: 23 November 2016
Revised: 06 December 2017
Accepted: 30 May 2018
Published: 07 June 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s10915-018-0757-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Abstract

Access this article

Similar content being viewed by others

Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

On the linear convergence of the alternating direction method of multipliers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proof of Proposition 1

Proof

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Abstract

Access this article

Similar content being viewed by others

Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization

Global Convergence of Unmodified 3-Block ADMM for a Class of Convex Minimization Problems

On the linear convergence of the alternating direction method of multipliers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proof of Proposition 1

Proof

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation