Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications

Hajinezhad, Davood; Shi, Qingjiang

doi:10.1007/s10898-017-0594-x

Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications

Published: 29 December 2017

Volume 70, pages 261–288, (2018)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

1783 Accesses
24 Citations
Explore all metrics

Abstract

In this paper, we study a class of nonconvex nonsmooth optimization problems with bilinear constraints, which have wide applications in machine learning and signal processing. We propose an algorithm based on the alternating direction method of multipliers, and rigorously analyze its convergence properties (to the set of stationary solutions). To test the performance of the proposed method, we specialize it to the nonnegative matrix factorization problem and certain sparse principal component analysis problem. Extensive experiments on real and synthetic data sets have demonstrated the effectiveness and broad applicability of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Article 11 April 2024

Notes

MULT is coded by the authors of this paper.
Code is available at https://sites.google.com/a/umn.edu/huang663/publications.
Code is available at http://smallk.github.io/
Code is available at http://www.math.ucla.edu/%7Ewotaoyin/papers/bcu/matlab.html.

References

Ames, B., Hong, M.: Alternating directions method of multipliers for l1-penalized zero variance discriminant analysis and principal component analysis (2014). arXiv:1401.5492v2
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (2011)
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. 101(12), 4164–4169 (2004)
Article Google Scholar
d’Aspremont, A., Bach, F., Ghaoui, L.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
MathSciNet MATH Google Scholar
d’Aspremont, A., Ghaoui, L.E., Jordan, M., Lanckriet, G.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49, 434–V448 (2007)
Article MathSciNet MATH Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization (1989). Ph.D Thesis, Operations Research Center, MIT
Eckstein, J., Yao, W.: Augmented lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Reports 32 (2012)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Giannakis, G.B., Ling, Q., Mateos, G., Schizas, I.D., Zhu, H.: Decentralized learning for wireless communications and networking. arXiv preprint arXiv:1503.08855 (2015)
Gillis, N.: The why and how of nonnegative matrix factorization (2015). Book Chapter available at arXiv:1401.5226v2
Glowinski, R., Marroco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualite, d’une classe de problemes de dirichlet non lineares. Revue Franqaise d’Automatique, Informatique et Recherche Opirationelle 9, 41–76 (1975)
Article MATH Google Scholar
Hajinezhad, D., Chang, T.H., Wang, X., Shi, Q., Hong, M.: Nonnegative matrix factorization using admm: algorithm and convergence analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016)
Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: 2015 IEEE International Conference on GlobalSIp 2015 (2015)
Hajinezhad, D., Hong, M., Zhao, T., Wang, Z.: Nestt: a nonconvex primal-dual splitting method for distributed and stochastic optimization. Adv. Neural Inf. Process. Syst. 29, 3207–3215 (2016)
Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Appl. 4, 303–320 (1969)
Article MathSciNet MATH Google Scholar
Hong, M., Chang, T.H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2013). Preprint, available online arXiv:1401.7079
Hong, M., Hajinezhad, D., Zhao, M.M.: Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: Precup, D., Teh, Y.W. (eds) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 1529–1538. PMLR, International Convention Centre, Sydney, Australia (2017)
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems pp. 1410–1390 (2014). arXiv:1410.1390v1
Huang, K., Sidiropoulos, N., Liavas, A.P.: A flexible and efficient algorithmic framework for constrained matrix and tensor factorization. arXiv preprint arXiv:1506.04209 (2015)
Jeffers, J.: Two case studies in the application of principal component analysis. Appl. Stat. 16, 225–236 (1967)
Article Google Scholar
Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. arXiv preprint arXiv:1605.02408 (2016)
Jolliffe, I.: Principal Component Analysis. Springer, New York (2002)
MATH Google Scholar
Jolliffe, I., Trendalov, N., Uddin, M.: A modifed principal component technique based on the lasso. J. Comput. Graph. Stat. 12, 531–V547 (2003)
Article Google Scholar
Journee, M., Nesterov, Y., Richtarik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
MathSciNet MATH Google Scholar
Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)
Article Google Scholar
Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)
Article MathSciNet MATH Google Scholar
Laboratories, A.: Cambridge orl database of faces. http://www.uk.research.att.com/facedatabase.html
Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press (2001)
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
Article MathSciNet MATH Google Scholar
Lin, C.H.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
Article MathSciNet MATH Google Scholar
Mackey, L.: Deflation methods for sparse pca. Adv. Neural Inf. Process. 21, 1017–1024 (2008)
Google Scholar
Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (2006)
Book Google Scholar
Mordukhovich, B.S., Nam, N.M., Yen, N.D.: Frchet subdifferential calculus and optimality conditions in nondifferentiable programming (2005). Mathematics Research Reports. Paper 29
Pauca, V.P., Shahnaz, F., Berry, M.W., Plemmons, R.J.: Text Mining using Non-Negative Matrix Factorizations, chap. 45, pp. 452–456
Rahmani, M., Atia, G.: High dimensional low rank plus sparse matrix decomposition. arXiv preprint arXiv:1502.00182 (2015)
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Article MathSciNet MATH Google Scholar
Richtarik, P., Takac, M., Ahipasaoglu, S.D.: Alternating maximization: unifying framework for 8 sparse pca formulations and efficient parallel codes (2012). arXiv:1212.4137
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)
Book MATH Google Scholar
Sani, A., Vosoughi, A.: Distributed vector estimation for power-and bandwidth-constrained wireless sensor networks. IEEE Trans. Signal Process. 64(15), 3879–3894 (2016)
Article MathSciNet Google Scholar
Schizas, I., Ribeiro, A., Giannakis, G.: Consensus in ad hoc wsns with noisy links—part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2008)
Article MathSciNet Google Scholar
Shen, H., Huang, J.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99, 1015–1034 (2008)
Article MathSciNet MATH Google Scholar
Song, D., Meyer, D.A., Min, M.R.: Fast nonnegative matrix factorization with rank-one admm. NIPS 2014 Workshop on Optimization for Machine Learning (OPT2014) (2014)
Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: The Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 103(9), 475–494 (2001)
Article MathSciNet MATH Google Scholar
Turkmen, A.C.: A review of nonnegative matrix factorization methods for clustering (2015). Preprint, available at arXiv:1507.03194v2
Wang, F., Xu, Z., Xu, H.K.: Convergence of bregman alternating direction method with multipliers for nonconvex composite problems. arXiv preprint arXiv:1410.8625 (2014)
Wang, Y., Yin, W., Zeng, J.: Global convergence of admm in nonconvex nonsmooth optimization. arXiv preprint arXiv:1511.06324 (2015)
Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and ptychographic phase retrieval. Inverse Prob. 28(11), 1–18 (2012)
Article MathSciNet MATH Google Scholar
Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. J. Front. Math. China pp. 365–384 (2011)
Zdunek, R.: Alternating direction method for approximating smooth feature vectors in nonnegative matrix factorization. In: Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on, pp. 1–6 (2014)
Zhang, R., Kwok, J.T.: Asynchronous distributed admm for consensus optimization. In: Proceedings of the 31st International Conference on Machine Learning (2014)
Zhang, Y.: An alternating direction algorithm for nonnegative matrix factorization (2010). Preprint
Zhao, Q., Meng, D., Xu, Z., Gao, C.: A block coordinate descent approach for sparse principal component analysis. J. Neurocomput. 153, 180–190 (2015)
Article Google Scholar
Zlobec, S.: On the liu-floudas convexification of smooth programs. J. Glob. Optim. 32(3), 401–407 (2005)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank Dr. Mingyi Hong for sharing his pearls of wisdom with us during this research, and we would also thank anonymous reviewers for their careful and insightful reviews.

Author information

Authors and Affiliations

Department of Industrial and Manufacturing System Engineering, Iowa State University, Ames, IA, USA
Davood Hajinezhad
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Qingjiang Shi

Authors

Davood Hajinezhad
View author publications
You can also search for this author in PubMed Google Scholar
Qingjiang Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davood Hajinezhad.

Additional information

The conference versions of this work appear in [14, 15].

Appendix

1.1 Proof of Lemma 1

Since $\left( X, Z\right) ^{r+1}$ is optimal solution of (22b), it should satisfy the optimality condition which is given by:

$$\begin{aligned} \nabla f(Z^{r+1}) + {\varLambda }^r + \rho (Z^{r+1} - X^{r+1}Y^{r+1}) =0. \end{aligned}$$

(38)

If we combine Eq. (38) with dual update variable (22c), we have:

$$\begin{aligned} {\varLambda }^{r+1} = -\nabla f(Z^{r+1}). \end{aligned}$$

(39)

Applying the Eq. (23), eventually we have

$$\begin{aligned} \Vert {\varLambda }^{r+1}-{\varLambda }^r\Vert _F^2&= \Vert \nabla f(Z^{r+1}) - \nabla f(Z^{r})\Vert _F^2\le L^2\Vert Z^{r+1}-Z^r\Vert _F^2. \end{aligned}$$

(40)

The lemma is proved. Q.E.D.

1.2 Proof of Lemma 2

Part 1 First let us prove that $r_1(X^{r+1})\le u_1(X^{r+1},X^r)$ which are given in (18) and (19), and similarly we can show that $r_2(Y^{r+1})\le u_2(Y^{r+1},Y^r)$. When $r_1$ is convex we have $u_1(X,X^r)=r_1(X)$, so if we set $X=X^{r+1}$ we get $u_1(X^{r+1},X^r)=r_1(X^{r+1})\ge r_1(X^{r+1})$. In the second case when $r_1$ is concave, we have

$$\begin{aligned} u_1(X,X^r)=r_1(X^r) + h'_1(l_1(X^r))\left[ l_1(X) - l_1(X^r)\right] . \end{aligned}$$

For simplicity, let us do the change of variable $Z:=l_1(X)$, therefore we have $r_1(X)=h_1(l_1(X))=h_1(Z).$ Based on the critical property of a concave function (i.e. linear approximation is global upper-estimation for the function) we have for every $Z,\hat{Z}\in \text{ dom }\,(h_1)$

$$\begin{aligned} h_1(Z)\le h_1(\hat{Z}) + h'_1(\hat{Z})(Z-\hat{Z}). \end{aligned}$$

If we plug back in $Z:=l_1(X)$, and $\hat{Z}:=l_1(\hat{X})$ in the above equation, we have

$$\begin{aligned} h_1(l_1(X))\le h_1(l_1(\hat{X}))+ h'_1(l_1(\hat{X}))\big [l_1(X)-l_1(\hat{X})\big ]. \end{aligned}$$

Now if we set $X=X^{r+1}$, and $\hat{X}=X^r$, we reach

$$\begin{aligned} h_1(l_1(X^{r+1}))\le h_1(l_1(X^r))+ h'_1(l_1(X^r))\big [l_1(X^{r+1})-l_1(X^r)\big ] \end{aligned}$$

which is equivalent to $r_1(X^{r+1})\le u_1(X^{r+1},X^r).$

Next, for simplicity let us define $W^r:=(Y^{r}, (X, Z)^{r}; {\varLambda }^{r})$. Then, the successive difference of augmented Lagrangian can be written as the following by adding and subtracting the term $L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]$

$$\begin{aligned} L_\rho [W^{r+1}]-L_\rho [W^{r}]=&L_\rho [W^{r+1}]-L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}] \nonumber \\&+L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [W^{r}]. \end{aligned}$$

(41)

First we bound $L_\rho [W^{r+1}]-L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]$. Using Lemma 1 we have

$$\begin{aligned}&L_\rho [W^{r+1}]-L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]\nonumber \\&\quad =\langle {\varLambda }^{r+1} - {\varLambda }^{r}, Z^{r+1}-X^{r+1}Y^{r+1}\rangle = \frac{1}{\rho }\Vert {\varLambda }^{r+1} - {\varLambda }^{r}\Vert _F^2 \nonumber \\&\quad {\mathop {\le }\limits ^{(40)}}\frac{L^2}{\rho }\Vert Z^{r+1}-Z^{r}\Vert _F^2. \end{aligned}$$

(42)

Next let us bound $L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [W^{r}]$.

$$\begin{aligned}&L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [W^{r}]\nonumber \\&\quad =L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]\nonumber \\&\quad +L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]-L_\rho [W^{r}] \end{aligned}$$

(43)

Suppose that $\xi _X\in \partial _X H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]$ is a subgradient of function $H[(X,Z); X^r, Y^{r+1}, {\varLambda }^r]$ at the point $X^{r+1}$. Then we have

$$\begin{aligned}&L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}] \nonumber \\&\quad =H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]-H[(X,Z)^{r}; X^r, Y^{r+1}, {\varLambda }^r]-\frac{\beta }{2}\Vert X^{r+1}-X^r\Vert _F^2\nonumber \\&\qquad +\left[ r_1(X^{r+1})-u_1(X^{r+1},X^r)\right] \nonumber \\&\quad {\mathop {\le }\limits ^\mathrm{(i)}}H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]-H[(X,Z)^{r}; X^r, Y^{r+1}, {\varLambda }^r]-\frac{\beta }{2}\Vert X^{r+1}-X^r\Vert _F^2\nonumber \\&\quad {\mathop {\le }\limits ^\mathrm{(ii)}}\left\langle \xi _X, X^{r+1}-X^r\right\rangle +\langle \nabla _Z H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r], Z^{r+1}-Z^r \rangle \nonumber \\&\qquad -\frac{\gamma _x}{2}\Vert X^{r+1}-X^r\Vert _F^2-\frac{\gamma _z}{2}\Vert Z^{r+1}-Z^r\Vert _F^2-\frac{\beta }{2}\Vert X^{r+1}-X^r\Vert _F^2\nonumber \\&\quad {\mathop {=}\limits ^\mathrm{(iii)}}-\bigg (\frac{\gamma _x}{2}+\frac{\beta }{2}\bigg )\Vert X^{r+1}-X^r\Vert _F^2-\frac{\gamma _z}{2}\Vert Z^{r+1}-Z^r\Vert _F^2, \end{aligned}$$

(44)

where $\mathrm{(i)}$ is true because from Eqs. (18) and (19) we conclude that $r_1(X^{r+1})\le u_1(X^{r+1},X^{r})$, $\mathrm (ii)$ comes from the strong convexity of $H[(X,Z); X^r, Y^{r+1}, {\varLambda }^r]$ with respect to X and Z with modulus $\gamma _x$ and $\gamma _z$ respectively, and we have $\mathrm{(iii)}$ due to the optimality condition for the problem (22b), which given by

$$\begin{aligned}&\nabla _Z H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]=0, \,\text {and}\quad 0\in \partial _X H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r], \end{aligned}$$

thus, the first term in the second inequality disappears because we implicitly set $\xi _X=0$.

Next suppose $\xi _Y\in \partial _Y G[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]$. Similarly we bound $L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]-L_\rho [W^{r}]$ as follows

$$\begin{aligned} L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]-L_\rho [W^{r}]&= G[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]-G[Y^r; (X,Z)^r, Y^r,{\varLambda }^r]\nonumber \\&\quad -\frac{\beta }{2}\Vert Y^{r+1}-Y^r\Vert _F^2 +\left[ r_2(Y^{r+1})-u_2(Y^{r+1},Y^r)\right] \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(i)}}}G[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]-G[Y^r; (X,Z)^r, Y^r,{\varLambda }^r]\nonumber \\&\quad -\frac{\beta }{2}\Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}}\left\langle \xi _Y, Y^{r+1}-Y^r\right\rangle -\frac{\gamma _y}{2}\Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&\quad -\frac{\beta }{2}\Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&{\mathop {=}\limits ^{\mathrm{(iii)}}}-\left( \frac{\gamma _y}{2}+\frac{\beta }{2}\right) \Vert Y^{r+1}-Y^r\Vert _F^2, \end{aligned}$$

(45)

where ${\mathrm{(i)}}$ is true because from Eqs. (18) and (19) we conclude that $r_2(Y^{r+1})\le u_2(Y^{r+1},Y^{r})$, ${\mathrm{(ii)}}$ comes from the strong convexity of $G[Y;(X,Z)^r,Y^r, {\varLambda }^r]$ with respect to Y with modulus $\gamma _y$, and we have $\mathrm{(iii)}$ due to the optimality condition for the problem (22a), which is $0\in \partial _YG[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]$, and we set $\xi _Y=0$. Combining the Eqs. (42), (44) and (45), we have

$$\begin{aligned}&L_\rho [W^{r+1}]-L_\rho [W^{r}]\nonumber \\&\quad \le -\left( \frac{\gamma _{z}}{2}-\frac{L^2}{\rho }\right) \Vert Z^{r+1}-Z^r\Vert ^2_F-\left( \frac{\gamma _y}{2}+\frac{\beta }{2}\right) \Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&\quad -\left( \frac{\gamma _x}{2}+\frac{\beta }{2}\right) \Vert X^{r+1}-X^r\Vert ^2_F. \end{aligned}$$

(46)

To complete the proof we only need to set $C_z=\frac{\gamma _z}{2}-\frac{L^2}{\rho }$, $C_y=\frac{\gamma _y+\beta }{2}$, and $C_x=\frac{\gamma _x+\beta }{2}$. Furthermore, since the subproblems (22a) and (22b) are strongly convex with modulus $\gamma _x\ge 0$, $\gamma _y\ge 0$, we have $C_y\ge 0$, and $C_x\ge 0$. Consequently, when $\rho \ge \frac{2L^2}{\gamma _{z}}$ we also have $C_z\ge 0$. Thus, the augmented Lagrangian function is always decreasing.

Part 2 Now we show that the augmented Lagrangian function is lower bounded

$$\begin{aligned} L_\rho [W^{r+1}]&= f(Z^{r+1}) +r_1(X^{r+1})+r_2(Y^{r+1})+ \langle {\varLambda }^{r+1}, Z^{r+1}-X^{r+1}Y^{r+1} \rangle \nonumber \\&\quad +\frac{\rho }{2}\Vert Z^{r+1}-X^{r+1}Y^{r+1}\Vert ^2_F\nonumber \\&{\mathop {=}\limits ^\mathrm{(i)}} f(Z^{r+1})+r_1(X^{r+1})+r_2(Y^{r+1})+\langle \nabla f(Z^{r+1}), X^{r+1}Y^{r+1}-Z^{r+1} \rangle \nonumber \\&\quad +\frac{\rho }{2}\Vert Z^{r+1}-X^{r+1}Y^{r+1}\Vert ^2_F \nonumber \\&{\mathop {\ge }\limits ^\mathrm{(ii)}}f(Z^{r+1})+r_1(X^{r+1})+r_2(Y^{r+1})+\langle \nabla f(Z^{r+1}), X^{r+1}Y^{r+1}-Z^{r+1} \rangle \nonumber \\&\quad +\frac{L}{2}\Vert Z^{r+1}-X^{r+1}Y^{r+1}\Vert ^2_F \nonumber \\&{\mathop {\ge }\limits ^\mathrm{(iii)}}f(X^{r+1}Y^{r+1}) + r_1(X^{r+1})+r_2(Y^{r+1}) \nonumber \\&= g(X^{r+1}, Y^{r+1}, X^{r+1}Y^{r+1}), \end{aligned}$$

(47)

where in $\mathrm (i)$ we use Eq. (39), $\mathrm{(ii)}$ is true because we have picked $\rho \ge L$, and $\mathrm{(iii)}$ comes from the fact that for Lipchitz continuous function f we have

$$\begin{aligned} f(x)\le f(y)+\langle \nabla f(y), x-y \rangle +\frac{L}{2}\Vert x-y\Vert ^2\quad \forall x,y \in \text {Dom(f)}. \end{aligned}$$

Due to the lower boundedness of g(X, Y, Z) (Assumption A) we have $g(X^{r+1}, Y^{r+1}, X^{r+1}Y^{r+1})\ge \underline{g}$. Together with Eq. (47), we can set $\underline{L}=\underline{g}$.

The proof is complete. Q.E.D.

1.3 Proof of Theorem 1

Part 1 Clearly, Eq. (29) together with the fact that augmented Lagrangian is lower bounded imply

$$\begin{aligned} Z^{r+1}-Z^r\rightarrow 0;~ Y^{r+1}-Y^r\rightarrow 0;~ X^{r+1}-X^r\rightarrow 0. \end{aligned}$$

(48)

Further, applying Lemma (1) we have

$$\begin{aligned} {\varLambda }^{r+1}-{\varLambda }^r\rightarrow 0. \end{aligned}$$

Utilizing the update equation for dual variable (22c), eventually we obtain

$$\begin{aligned} X^{r+1}Y^{r+1}-Z^{r+1}\rightarrow 0 . \end{aligned}$$

(49)

The part (1) is proved. From this part condition (26d) can be derived.

Part 2 Since $Y^{r+1}$ minimizes problem (22a), we have

$$\begin{aligned} 0\in h'_2(l_2(Y^r))\partial l_2(Y^{r+1})-\rho (X^r)^\top \left( Z^r-X^{r}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r\right) +\beta \left( Y^{r+1}-Y^r\right) . \end{aligned}$$

Thus, there exists $\xi \in \partial l_2(Y^{r+1})$ such that for every Y

$$\begin{aligned} \langle Y-Y^{r+1} , {\varPhi }_2^r\xi - \rho (X^r)^\top \left( Z^r-X^{r}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r\right) +\beta (Y^{r+1}-Y^r) \rangle \ge 0, \end{aligned}$$

(50)

where we have set ${\varPhi }_2^r = h'_2(l_2(Y^r))$ for notational simplicity. Because $l_2(Y)$ is a convex function we have the following inequality for every $\xi \in \partial l_2(Y^{r+1})$

$$\begin{aligned} l_2(Y) - l_2(Y^{r+1})\ge \langle \xi , Y- Y^{r+1}\rangle ;\quad \forall ~Y. \end{aligned}$$

(51)

Further since we assumed that $h_2$ is non-decreasing, we have $ h'_2(l_2(Y^r))\ge 0$. Combining this fact with (51), yields the following

$$\begin{aligned} {\varPhi }_2^rl_2(Y) - {\varPhi }_2^rl_2(Y^{r+1})\ge \langle {\varPhi }_2^r\xi , Y- Y^{r+1}\rangle ;\quad \forall ~Y. \end{aligned}$$

(52)

If we plug Eqs. (52) into (50) we obtain

$$\begin{aligned}&{\varPhi }_2^rl_2(Y) - {\varPhi }_2^rl_2(Y^{r+1}) \nonumber \\&+ \bigg \langle Y^{r+1}-Y , \rho (X^r)^\top (Z^r-X^{r}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r)+\beta (Y^{r+1}-Y^r)\bigg \rangle \ge 0. \end{aligned}$$

(53)

Next taking the limit over (53) and utilizing the facts that $\Vert Y^{r+1}-Y^r\Vert \rightarrow 0$, and $\Vert X^{r+1}Y^{r+1}-Z^{r+1}\Vert _F\rightarrow 0$, we obtain the following

$$\begin{aligned} {\varPhi }_2^*l_2(Y) - {\varPhi }_2^*l_2(Y^{*})+ \langle Y^*-Y , (X^*)^\top {\varLambda }^* \rangle \ge 0;\quad \forall ~Y. \end{aligned}$$

(54)

where ${\varPhi }_2^* = h'_2(l_2(Y^*))$. From Eq. (54) we can conclude that

$$\begin{aligned}&{\varPhi }_2^*l_2(Y) + \langle Y^*-Y , (X^*)^\top {\varLambda }^* \rangle \ge {\varPhi }_2^*l_2(Y^{*})+\langle Y^*-Y^{*} , (X^*)^\top {\varLambda }^* \rangle ; \quad \forall ~Y, \end{aligned}$$

(55)

which further implies

$$\begin{aligned} Y^*\in \mathop {\mathrm{argmin}}_{Y}\left( {\varPhi }_2^*l_2(Y)+\langle Y^*-Y, (X^*)^\top {\varLambda }^* \rangle \right) . \end{aligned}$$

This is equivalent to

$$\begin{aligned} (X^*)^\top {\varLambda }^*\in h'_2\left( l_2(Y^*)\right) \partial l_2(Y^*). \end{aligned}$$

Applying the chain rule for Clark sub-differential given in Eq. (27), we have

$$\begin{aligned} (X^*)^\top {\varLambda }^*\in \partial ^c (h_2\circ l_2)(Y^*) = \partial ^c r_2(Y^*). \end{aligned}$$

From this equation we can simply conclude that

$$\begin{aligned} \text {dist}(\partial ^c r_2(Y^*),(X^*)^\top {\varLambda }^*)=0 \end{aligned}$$

This proves the Eq. (26c).

Now let us consider (X, Z) step (22b). Similarly let us define ${\varPhi }_1 = h'_1(l_1(X^{r}))$. Since $(X,Z)^{r+1}$ minimizes (22b), we have

$$\begin{aligned}&\nabla f(Z^{r+1})+{\varLambda }^{r+1}+\rho \left( Z^{r+1}-X^{r+1}Y^{r+1}\right) = 0; \end{aligned}$$

(56)

$$\begin{aligned}&\quad 0\in {\varPhi }_1\partial l_1(X^{r+1})-\rho \left( Z^{r+1}-X^{r+1}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r\right) (Y^{(r+1)\top })+\beta (X^{r+1}-X^r). \end{aligned}$$

(57)

Let us take limit over the Eqs. (56) and (57). Then invoking (48, and (49) and following the same process for proving (26b) it follows that

$$\begin{aligned}&\nabla f(Z^*)+{\varLambda }^* = 0; \end{aligned}$$

(58)

$$\begin{aligned}&\quad \text {dist}\left( \partial ^c r_1(X^*),{\varLambda }^*(Y^*)^\top \right) =0 \end{aligned}$$

(59)

which verifies (26a) and (26c).

The theorem is proved. Q.E.D.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hajinezhad, D., Shi, Q. Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications. J Glob Optim 70, 261–288 (2018). https://doi.org/10.1007/s10898-017-0594-x

Download citation

Received: 05 March 2017
Accepted: 04 December 2017
Published: 29 December 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10898-017-0594-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications

Abstract

Access this article

Similar content being viewed by others

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

The Frank-Wolfe Algorithm: A Short Introduction

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Lemma 1

1.2 Proof of Lemma 2

1.3 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications

Abstract

Access this article

Similar content being viewed by others

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

The Frank-Wolfe Algorithm: A Short Introduction

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Lemma 1

1.2 Proof of Lemma 2

1.3 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation