Abstract
The stable principal component pursuit (SPCP) is a non-smooth convex optimization problem, the solution of which enables one to reliably recover the low rank and sparse components of a data matrix which is corrupted by a dense noise matrix, even when only a fraction of data entries are observable. In this paper, we propose a new algorithm for solving SPCP. The proposed algorithm is a modification of the alternating direction method of multipliers (ADMM) where we use an increasing sequence of penalty parameters instead of a fixed penalty. The algorithm is based on partial variable splitting and works directly with the non-smooth objective function. We show that both primal and dual iterate sequences converge under mild conditions on the sequence of penalty parameters. To the best of our knowledge, this is the first convergence result for a variable penalty ADMM when penalties are not bounded, the objective function is non-smooth and its sub-differential is not uniformly bounded. Using partial variable splitting and adopting an increasing sequence of penalty multipliers, together, significantly reduce the number of iterations required to achieve feasibility in practice. Our preliminary computational tests show that the proposed algorithm works very well in practice, and outperforms ASALM, a state of the art ADMM algorithm for the SPCP problem with a constant penalty parameter.
Similar content being viewed by others
Notes
In an earlier preprint, we named it as NSA algorithm.
In an earlier preprint, we named it as Non-Smooth Augmented Lagrangian (NSA) algorithm.
The modified version is available from http://svt.stanford.edu/code.html
References
Aybat, N.S., Iyengar, G.: A unified approach for minimizing composite norms. Math. Progr. Ser. A 144, 181–226 (2014)
Aybat, N.S., Goldfarb, D., Ma, S.: Efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58, 1–29 (2014)
Aybat, N.S., Zarmehri, S., Kumara, S.: An ADMM algorithm for clustering partially observed networks. In: Proceedings of the 2015 SIAM International Conference on Data Mining, to appear (2015). Preprint available at http://arxiv.org/abs/1410.3898
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. (2011)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Foundations and Trends in Machine Learning, vol. 3, chap. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, pp. 1–122 (2011)
Boyer, C., Merzbach, U.: A History of Mathematics, 2nd edn, pp. 286–287. Wiley, New York (1991)
Candès, E.J., Li, X., Ma, Y.J.W.: Robust principle component analysis? J. ACM 58, 1–37 (2011)
Chandrasekaran, V., Sanghavi, S., Parrilo, P., Willsky, A.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2011)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Eckstein, J.: Augmented lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. Rutcor Research Report RRR 32–2012, Rutgers Center for Operations Research (2012)
Eckstein, J., Bertsekas, D.P.: On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Fukushima, M.: Application of the alternating direction method of multipliers to separable convex programming problems. Comput. Optim. Appl. 1, 93–111 (1992). doi:10.1007/BF00247655
Glowinski, R.: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. Studies in Mathematics and its Applications. Elsevier Science (2000)
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. Ser. A 141(1–2), 349–382 (2013)
He, B., Yang, H.: Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 23, 151–161 (1998)
He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106(2), 337–356 (2000)
He, B.S., Liao, L.Z., Han, D.R., Yang, H.: A new inexact alternating directions method for monontone variational inequalities. Math. Program. Ser. A 92, 103–118 (2002)
Kontogiorgis, S., Meyer, R.R.: A variable-penalty alternating direction method for convex optimization. Math. Program. 83, 29–53 (1998)
Larsen, R.: Lanczos bidiagonalization with partial reorthogonalization. Technical report DAIMI PB-357, Department of Computer Science, Aarhus University (1998)
Li, L., Huang, W., Gu, I., Tian, Q.: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13, 1459–1472 (2004)
Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. Tech. rep., UIUC Technical Report UILU-ENG-09-2214 (2009)
Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv:1009.5055v2 (2011)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Nocedal, J., Wright, S.J.: Numer. Optim. Springer-Verlag, New York (1999)
Rockafellar, R.: Convex Analysis. Princeton University Press (1997)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. (2008)
Wright, J., Peng, Y., Ma, Y., Ganesh, A., Rao, S.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Proceedings of Neural Information Processing Systems (NIPS) (2009)
Zhou, Z., Li, X., Wright, J., Candès, E., Ma, Y.: Stable principle component pursuit. Proceedings of International Symposium on Information Theory (2010)
Acknowledgments
We would like to thank to Min Tao for providing the code ASALM. The work of N. S. Aybat is supported by NSF Grant CMMI-1400217. The work of G. Iyengar is supported by NIH R21 AA021909-01, NSF CMMI-1235023, NSF DMS-1016571 Grants.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
1.1 Proof of Lemma 1
Suppose \(\delta >0\). Let \((Z^*,S^*)\) be an optimal solution to problem \((P_{ns})\), \(\theta ^*\) denote the optimal Lagrangian multiplier for the constraint \((Z,S)\in \chi \) written as \(\frac{1}{2}\Vert \pi _{\varOmega }\left( Z+S-D\right) \Vert ^2_F\le \frac{\delta ^2}{2}\) and \(\pi ^*_{\varOmega }\) denotes the adjoint operator of \(\pi _{\varOmega }\). Note that \(\pi ^*_{\varOmega }=\pi _{\varOmega }\). Then the KKT conditions for this problem are given by
where (38) and (39) follow from the fact that \(\pi _{\varOmega } \pi _{\varOmega }=\pi _{\varOmega }\).
and
where \(q(\tilde{Z})=\tilde{Z}-\rho ^{-1}~Q\). From (44) it follows that
From the second equation in (45), we get
The Eq. (46) and \(\pi _{{\varOmega }^c}\left( G\right) =\mathbf{0}\) are precisely the first-order optimality conditions for the “shrinkage” problem
The expression for \(S^*\) in (10) is the optimal solution to this “shrinkage” problem, and \(Z^*\) given in (11) follows from the first equation in (43) and the first row of (45). Hence, given optimal Lagrangian dual \(\theta ^*\), \(S^*\) and \(Z^*\) computed from Eqs. (10) and (11), respectively, satisfy KKT conditions (38) and (39).
Next, we show how to compute the optimal dual \(\theta ^*\). We consider two cases.
-
(i)
Suppose \(\Vert \pi _{\varOmega }\left( D-q(\tilde{Z})\right) \Vert _F\le \delta \). In this case, let \(\theta ^*=0\). Setting \(\theta ^*=0\) in (10) and (11), we find \(S^*=\mathbf{0}\) and \(Z^*=q(\tilde{Z})\). By construction, \(S^*\), \(Z^*\) and \(\theta ^*\) satisfy conditions (38) and (39). It is easy to check that this choice of \(\theta ^*=0\) trivially satisfies the rest of the conditions as well. Hence, \(\theta ^*=0\) is an optimal lagrangian dual.
-
(ii)
Next, suppose \(\Vert \pi _{\varOmega }\left( D-q(\tilde{Z})\right) \Vert _F>\delta \). From (11), we have
$$\begin{aligned} \pi _{\varOmega }\left( Z^*+S^*-D\right) = \frac{\rho }{\rho +\theta ^*}~\pi _{\varOmega }\left( S^*+q(\tilde{Z})-D\right) . \end{aligned}$$(47)Therefore,
$$\begin{aligned}&\Vert \pi _{\varOmega }\left( Z^*+S^*-D\right) \Vert _F \nonumber \\&\quad =\frac{\rho }{\rho +\theta ^*}~\left\| \pi _{\varOmega }\left( S^*+q(\tilde{Z})-D\right) \right\| _F, \nonumber \\&\quad =\frac{\rho }{\rho +\theta ^*} \left\| \pi _{\varOmega }\left( \max \left\{ |D-q(\tilde{Z})| -\xi \frac{(\rho +\theta ^*)}{\rho \theta ^*} E,\ \mathbf{0}\right\} -|D-q(\tilde{Z})|\right) \right\| _F,\nonumber \\&\quad = \frac{\rho }{\rho +\theta ^*}~\left\| \pi _{\varOmega }\left( \min \left\{ \xi \frac{(\rho +\theta ^*)}{\rho \theta ^*}~E,\ |D-q(\tilde{Z})|\right\} \right) \right\| _F,\nonumber \\&\quad =\left\| \min \left\{ \frac{\xi }{\theta ^*}~E,\ \frac{\rho }{\rho +\theta ^*}~\left| \pi _{\varOmega }\left( D-q(\tilde{Z})\right) \right| \right\} \right\| _F, \end{aligned}$$(48)where the second equation is obtained after substituting (10) for \(S^*\) and then componentwise dividing the resulting expression inside the norm by \(\hbox {sgn}\left( D-q(\tilde{Z})\right) \). Define \(\phi :\mathbb {R}_+\rightarrow \mathbb {R}\),
$$\begin{aligned} \phi (\theta ):= \left\| \min \left\{ \frac{\xi }{\theta }~E,\ \frac{\rho }{\rho +\theta }~\left| \pi _{\varOmega }\left( D-q(\tilde{Z})\right) \right| \right\} \right\| _F. \end{aligned}$$(49)It is easy to show that \(\phi \) is a strictly decreasing function of \(\theta \). Since \(\lim _{\theta \rightarrow \infty }\phi (\theta )=0\) and \(\phi (0)=\Vert \pi _{\varOmega }\left( D-q(\tilde{Z})\right) \Vert _F>\delta \), there exists a unique \(\theta ^*>0\) such that \(\phi (\theta ^*)=\delta \). Moreover, since \(\theta ^*>0\) and \(\phi (\theta ^*)=\delta \), (48) implies that \(Z^*\), \(S^*\) and \(\theta ^*\) satisfy the rest of KKT conditions (40), (41) and (42) as well. Thus, the unique \(\theta ^*>0\) that satisfies \(\phi (\theta ^*)=\delta \) is the optimal Lagrangian dual. We now show that \(\theta ^*\) can be computed in \(\mathcal {O}(|{\varOmega }|\log (|{\varOmega }|))\) time. Let \(A:=|\pi _{\varOmega }\left( D-q(\tilde{Z})\right) |\) and \(0\le a_{(1)}\le a_{(2)}\le \cdots \le a_{(|{\varOmega }|)}\) be the \(|{\varOmega }|\) elements of the matrix \(A\) corresponding to the indices \((i,j)\in {\varOmega }\) sorted in increasing order, which can be done in \(\mathcal {O}(|{\varOmega }|\log (|{\varOmega }|))\) time. Defining \(a_{(0)}:=0\) and \(a_{(|{\varOmega }|+1)}:=\infty \), we then have for all \(j\in \{0,1,\ldots ,|{\varOmega }|\}\) that
$$\begin{aligned} \frac{\rho }{\rho +\theta }~a_{(j)} \le \frac{\xi }{\theta } \le \frac{\rho }{\rho +\theta }~a_{(j+1)} \Leftrightarrow \frac{1}{\xi }~a_{(j)}-\frac{1}{\rho } \le \frac{1}{\theta } \le \frac{1}{\xi }~a_{(j+1)}-\frac{1}{\rho }. \end{aligned}$$(50)Let \(\bar{k}:=\max \left\{ j: a_{(j)}\le \frac{\xi }{\rho },\ 0\le j\le |{\varOmega }| \right\} \), and for all \(\bar{k}< j\le |{\varOmega }|\) define \(\theta _j:=\frac{1}{\frac{1}{\xi }~a_{(j)}-\frac{1}{\rho }}\). Then for all \(\bar{k}< j\le |{\varOmega }|\), we have
$$\begin{aligned} \phi (\theta _j)=\sqrt{\left( \frac{\rho }{\rho +\theta _j}\right) ^2~\sum _{i=0}^j a^2_{(i)}+(|{\varOmega }|-j)~\left( \frac{\xi }{\theta _j}\right) ^2}. \end{aligned}$$(51)Also define \(\theta _{\bar{k}}:=\infty \) and \(\theta _{|{\varOmega }|+1}:=0\) so that \(\phi (\theta _{\bar{k}}):=0\) and \(\phi (\theta _{|{\varOmega }|+1})=\phi (0)=\Vert A\Vert _F>\delta \). Note that \(\{\theta _j\}_{\{\bar{k}< j\le |{\varOmega }|\}}\) contains all the points at which \(\phi (\theta )\) may not be differentiable for \(\theta \ge 0\). Define \(j^*:=\max \{j:\ \phi (\theta _j)\le \delta ,\ \bar{k}\le j\le |{\varOmega }|\}\). Then \(\theta ^*\) is the unique solution of the system
$$\begin{aligned} \sqrt{\left( \frac{\rho }{\rho +\theta }\right) ^2~\sum _{i=0}^{j^*} a^2_{(i)}+(|{\varOmega }|-j^*)~\left( \frac{\xi }{\theta }\right) ^2}=\delta \,\hbox {and}\, \theta >0, \end{aligned}$$(52)since \(\phi (\theta )\) is continuous and strictly decreasing in \(\theta \) for \(\theta \ge 0\). Solving the equation in (52) requires finding the roots of a fourth-order polynomial (also known as a quartic function). Lodovico Ferrari showed in 1540 that the roots of quartic functions can be solved in closed form. Thus, it follows that \(\theta ^*>0\) can be computed in \(\mathcal {O}(1)\) operations. Note that if \(\bar{k}=|{\varOmega }|\), then \(\theta ^*\) is the solution of the equation
$$\begin{aligned} \sqrt{\left( \frac{\rho }{\rho +\theta ^*}\right) ^2~\sum _{i=1}^{|{\varOmega }|} a^2_{(i)}}=\delta , \end{aligned}$$(53)i.e. \(\theta ^*= \rho \left( \frac{\Vert A\Vert _F}{\delta }-1\right) = \rho \left( \frac{\Vert \pi _{\varOmega }\left( D-q(\tilde{Z})\right) \Vert _F}{\delta }-1\right) \).
Hence, we have proved that problem \((P_{ns})\) can be solved efficiently when \(\delta > 0\).
Now, suppose \(\delta =0\). Since \(\pi _{\varOmega }\left( Z^*+S^*-D\right) =0\), problem \((P_{ns})\) can be written as
Then (13) and \(Z^*=\pi _{\varOmega }\left( D-S^*\right) +\pi _{{\varOmega }^c}\left( q(\tilde{Z})\right) \) trivially follow from first-order optimality conditions for the above problem.
1.2 Proof of Lemma 2
Let \(W^*:=-Q+\rho (\tilde{Z}-Z^*)\). Then (38) in the proof of Lemma 1 implies that \(W^*=\theta ^*~\pi _{\varOmega }\left( Z^*+S^*-D\right) \). From the first-order optimality conditions of \((P_{ns})\) in (9), we have that \((W^*,-W)\in \partial \mathbf{1}_\chi (Z^*,S^*)\) for some \(W\in \partial \xi \Vert S^*\Vert _1\). From (38) and (39), it follows that \(-W^*\in \partial \xi \Vert S^*\Vert _1\). The definition of \(\chi \), chain rule on subdifferential (see Theorem 23.9 in [26]), and \(-W^*\in \partial \xi \Vert S^*\Vert _1\) together imply that \((W^*,W^*)\in \partial \mathbf{1}_\chi (Z^*,S^*)\).
1.3 Proof of Lemma 3
Since \(L_{k+1}\) is the optimal solution to the subproblem in Step 4 of ADMIP corresponding to the \(k\)-th iteration, it follows that
Let \(\theta _k\ge 0\) denote the optimal Lagrange multiplier for the quadratic constraint in Step 5 sub-problem in the \(k\)-th iteration. Since \((Z_{k+1},S_{k+1})\) is the optimal solution, the first-order optimality conditions imply that
From (55), it follows that \(-\hat{Y}_{k+1}\in \partial \Vert L_{k+1}\Vert _*\). From (56) and (57), it follows that \(-Y_{k+1}\in \xi ~\partial \Vert S_{k+1}\Vert _1\). Since \(\partial \Vert L\Vert _*\) and \(\partial \Vert S\Vert _1\) are uniformly bounded sets for all \(L, S\in \mathbb {R}^{m\times n}\), it follows that \(\{\hat{Y}_k\}_{k\in \mathbb {Z}_+}\) and \(\{Y_k\}_{k\in \mathbb {Z}_+}\) are bounded sequences. Moreover, (57) implies that \(\pi _{\varOmega }\left( Y_k\right) =Y_k\) for all \(k\ge 1\).
1.4 Proof of Lemma 4
For all \(k \ge 0\), since \(Y_{k+1}=Y_k+\rho _k(L_{k+1}-Z_{k+1})\) and and \(\hat{Y}_{k+1}:=Y_k+\rho _k(L_{k+1}-Z_k)\), we have that \(Y_{k+1}-\hat{Y}_{k+1}=\rho _k(Z_k-Z_{k+1})\). Using these relations, we obtain the following equality
Moreover, we also have
where the second equality follows from rewriting the last term in (59) using (58), and the last equality follows from the relation \(L_{k+1}-Z_{k+1} = \rho _k^{-1}(Y_{k+1}-Y_k)\).
Since \(Y^*\) and \(\theta ^*\) are optimal Lagrangian dual variables, we have
From first-order optimality conditions, we get
Hence, \(-Y^*\in \partial \Vert L^*\Vert _*\) and \(-Y^*\in \xi ~\partial \Vert S^*\Vert _1\). Moreover, from Lemma 3, we also have that \(-Y_k\in \partial \xi ~\Vert S_k\Vert _1\) for all \(k\ge 1\). Since \(\xi ~\Vert .\Vert _1\) is convex, it follows that
Since \(\rho _{k+1}\ge \rho _k\) for all \(k\ge 1\), first adding (61) to (60), then adding and subtracting (62), we get
Lemma 2 applied to the Step 5 sub-problem corresponding to the \(k\)-th iteration gives \((Y_{k+1},Y_{k+1})\in \partial \mathbf{1}_{\chi }(Z_{k+1},S_{k+1})\). Using an argument similar to that used in the proof of Lemma 2, one can also show that \((Y^*,Y^*)\in \partial \mathbf{1}_{\chi }(L^*,S^*)\). Moreover, since \(-Y^*\in \partial \xi ~\Vert S^*\Vert _1\), \(-Y^*\in \partial \Vert L^*\Vert _*\), and \(-Y_{k}\in \partial \xi ~\Vert S_k\Vert _1\), \(-\hat{Y}_{k}\in \partial \Vert L_k\Vert _*\) for all \(k\ge 1\), we have that for all \(k \ge 0\),
This set of inequalities and (63) together imply that \(\{\Vert Z_{k}-L^*\Vert _F^2+\rho _{k}^{-2}\Vert Y_{k}-Y^*\Vert _F^2\}_{k\in \mathbb {Z}_+}\) is a non-increasing sequence. Using this fact, rewriting (63) and summing over \(k\in \mathbb {Z}_+\), we get
This inequality is sufficient to prove the rest of the lemma.
Rights and permissions
About this article
Cite this article
Aybat, N.S., Iyengar, G. An alternating direction method with increasing penalty for stable principal component pursuit. Comput Optim Appl 61, 635–668 (2015). https://doi.org/10.1007/s10589-015-9736-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-015-9736-6