Abstract
We propose a first-order augmented Lagrangian algorithm (FALC) to solve the composite norm minimization problem
where \(\sigma (X)\) denotes the vector of singular values of \(X \in \mathbb{R }^{m\times n}\), the matrix norm \(\Vert \sigma (X)\Vert _{\alpha }\) denotes either the Frobenius, the nuclear, or the \(\ell _2\)-operator norm of \(X\), the vector norm \(\Vert .\Vert _{\beta }\) denotes either the \(\ell _1\)-norm, \(\ell _2\)-norm or the \(\ell _{\infty }\)-norm; \(\mathcal{Q }\) is a closed convex set and \(\mathcal{A }(.)\), \(\mathcal{C }(.)\), \(\mathcal{F }(.)\) are linear operators from \(\mathbb{R }^{m\times n}\) to vector spaces of appropriate dimensions. Basis pursuit, matrix completion, robust principal component pursuit (PCP), and stable PCP problems are all special cases of the composite norm minimization problem. Thus, FALC is able to solve all these problems in a unified manner. We show that any limit point of FALC iterate sequence is an optimal solution of the composite norm minimization problem. We also show that for all \(\epsilon >0\), the FALC iterates are \(\epsilon \)-feasible and \(\epsilon \)-optimal after \(\mathcal{O }(\log (\epsilon ^{-1}))\) iterations, which require \(\mathcal{O }(\epsilon ^{-1})\) constrained shrinkage operations and Euclidean projection onto the set \(\mathcal{Q }\). Surprisingly, on the problem sets we tested, FALC required only \(\mathcal{O }(\log (\epsilon ^{-1}))\) constrained shrinkage, instead of the \(\mathcal{O }(\epsilon ^{-1})\) worst case bound, to compute an \(\epsilon \)-feasible and \(\epsilon \)-optimal solution. To best of our knowledge, FALC is the first algorithm with a known complexity bound that solves the stable PCP problem.
Similar content being viewed by others
References
Aybat, N.S., Chakraborty, A.: Fast reconstruction of CT images from parsimonious angular measurements via compressed sensing. Technical report, Siemens Corporate Research (2009)
Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)
Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Becker, S., Bobin, J., Candès, E.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4, 1–39 (2011)
Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)
Candès, E., Romberg, J.: Quantitative robust uncertainty principles and optimally sparse decompositions. Found. Comput. Math. 6, 227–254 (2006)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006)
Candès, E., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52, 5406–5425 (2006)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? (2009). Submitted for publication
Cands, E.J., Recht, B.: Exact matrix completion via convex optimization. Found Comput Math 9, 717–772 (2008)
d’Aspremont, A., Bach, F.R., Ghaoui, L.E.: Optimal solutions for sparse principle component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49, 434–448 (2007)
Daubechies, I., Fornasier, M., Loris, I.: Accelerated projected gradient method for linear inverse problems with sparsity constraints. J. Fourier Anal. Appl. 14, 764–792 (2008)
Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
El Ghaoui, L., Gahinet, P.: Rank minimization under lmi constraints: a framework for output feedback problems. In ; Proceedings of the European control conference (1993)
Fazel, M., Hindi, H., Boyd, S.: Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In: Proceedings of American control conference, Denver, Colorado (2003)
Fazel, M., Hindi, H., Boyd, S.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, pp. 2156–2162 (2003)
Fazel, M., Hindi, H., Boyd, S.: Rank minimization and applications in system theory. In: American control conference, pp. 3273–3278 (2004)
Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications in system identification and realization (2012). Submitted for publication
Figueiredo, M.A., Nowak, R., Wright, S.J.: Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1, 586–597 (2007)
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions (2010). ArXiv:0912.4571v2
Hale, E.T., Yin, W., Zhang, Y.: A fixed-point continuation for \(\ell \) -regularized minimization with applications to compressed sensing. Rice University, Technical report (2007)
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell \) -minimization: methodology and convergence. SIAM J. Optim. 19, 1107–1130 (2008)
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principle component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
Koh, K., Kim, S.J., Boyd, S.: Solver for \(\ell \) -regularized least squares problems. Stanford University, Technical report (2007)
Larsen, R.: Lanczos bidiagonalization with partial reorthogonalization. Technical report DAIMI PB-357, Department of Computer Science, Aarhus University (1998)
Lewis, A.S.: The convex analysis of unitarily invariant matrix norms. J. Convex Anal. 2, 173–183 (1995)
Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv:1009.5055v2 (2011)
Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. Technical report UIUC Technical Report UILU-ENG-09-2214 (2009)
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)
Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM. J. Matrix Anal. Appl. 31, 1235–1256 (2009)
Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. Ser. A 128, 321–353 (2011)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Toh, K., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems (2010). (Preprint)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. (2008) (submitted to)
Van den Berg, E., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation. SIAM J. Sci. Comput. (2009) (to appear)
Yang, J., Zhang, Y.: Alternating direction algorithms for l1-problems in compressive sensing. Technical Report TR09-37, CAAM, Rice University (2009)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\) minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)
Zhou, Z., Li, X., Wright, J., Candès, E., Ma, Y.: Stable principle component pursuit. In: Proceedings of International Symposium on Information Theory (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research partially supported by ONR N000140310514, DOE DE-FG02-08ER25856, DOE DE-AR0000235 and NSF DMS 10-16571 grants.
Appendices
Appendix A: Proofs of technical results
1.1 Lemma 5 and proof
Lemma 5
Let \(\mathcal{Q }\subset \mathbb{R }^q\) be nonempty closed convex set such that \(\{X\in \mathbb{R }^{m\times n}: \mathcal{A }(X)-b\in \mathcal{Q }\}\ne \emptyset \), where \(\mathcal{A }\) is surjective; and let \((X^{(k)}_{*},s^{(k)}_{*},y^{(k)}_{*})\) is an optimal solution to (15). Then, for all \(k\ge 1\),
Proof
From the first order optimality conditions for (15), we have \(y^{(k)}_{*}=\varPi _\mathcal{Q }(\mathcal{A }(X^{(k)}_{*})-b-\lambda ^{(k)}\theta _1^{(k)})\). Since Euclidean projection is nonexpansive, we have
The result now follows from the triangular inequality. \(\square \)
This result implies several simple bounds on \(\Vert y^{(k)}_{*}\Vert _2\). Since the initial iterate \(X^{(0)}\) is feasible, i.e. \(\mathcal{A }(X^{(0)}) - b \in \mathcal{Q }\), it follows that
Suppose \(0 \in \mathcal{Q }\). Then \(\Vert y^{(k)}_{*}\Vert _2\le \eta _2^{(k)}:= \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2.\) When \(\mathcal{Q }\) is bounded with \(\mathcal{Q }\subseteq \{y: \Vert y\Vert _2 \le \eta _2\}\). Then, one can set \(\eta _2^{(k)}:=\eta _2\) for all \(k\ge 1\).
1.2 Lemma 6 and proof
Lemma 6
Fix \(\alpha \), \(\beta \in \{1,2,\infty \}\). Let
where
Suppose \((\bar{X},\bar{s},\bar{y})\) is \(\epsilon \)-optimal for the problem \(\min _{X,s,y}\{P(X,s,y):~y\in \mathcal{Q }\}\), i.e.
Then we have
where \(M = \left({\small \begin{array}{lll} -I&\quad 0&\quad C \\ 0&\quad -I&\quad A \\ \end{array}}\right),\) \(\frac{1}{\alpha ^*}+\frac{1}{\alpha }=1\) (resp. \(\frac{1}{\beta ^*}+\frac{1}{\beta }=1\)) is the Hölder conjugate of \(\alpha \) (resp. \(\beta \)) and the functions \(I(\cdot )\) and \(J(\cdot )\) are defined in (21).
In order to prove for Lemma 6, we need the following result.
Theorem 5
Let \(f:\mathbb{R }^{m\times n}\times \mathbb{R }^p \times \mathbb{R }^q \rightarrow \mathbb{R }\) denote a convex function with a Lipschitz continuous gradient \(\nabla f\) with a Lipschitz constant \(L\) with respect to the norm \(\Vert (X,s,y)\Vert =\sqrt{\Vert X\Vert _F^2+\Vert s\Vert _2^2+\Vert y\Vert _2^2}\).
Let \((X_*,s_*,y_*) \in \mathop {\mathrm{argmin}}_{X,s,y}\{\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta )+f(X,s,y): y\in \mathcal{Q }\}\). Suppose \((\bar{X},\bar{s},\bar{y}) \in \mathbb{R }^{m\times n} \times \mathbb{R }^p\times \mathbb{R }^q\) such that \(\bar{y}\in \mathcal{Q }\) satisfies
for some \(\epsilon >0\). Then
where \(\frac{1}{\alpha ^*}+\frac{1}{\alpha }=1\) (resp. \(\frac{1}{\beta ^*}+\frac{1}{\beta }=1\)) is the Hölder conjugate of \(\alpha \) (resp. \(\beta \)) and the functions \(I(\cdot )\) and \(J(\cdot )\) are defined in (22).
Proof
Since \(\nabla f\) is Lipschitz continuous with constant \(L\), the triangular inequality for \(\Vert \sigma (.)\Vert _\alpha \) and \(\Vert .\Vert _\beta \) implies that for any \(X\in \mathbb{R }^{m\times n}\), \(s\in \mathbb{R }^p\) and \(y\in \mathbb{R }^q\)
where \(\left\langle X,Y \right\rangle =\mathop {\mathbf{Tr}}(X^T Y)\in \mathbb{R }\) denotes the usual Euclidean inner product of \(X\in \mathbb{R }^{m\times n}\) and \(Y\in \mathbb{R }^{m\times n}\). Since \(X\), \(s\) and \(y\) are arbitrary, it follows that
The first minimization problem on the right hand side of (81) can be simplified as follows:
where \(X^*(W)=\bar{X}-\frac{\nabla _X f(\bar{X},\bar{s},\bar{y})+W}{L}\) is the minimizer of the inner minimization problem in (82).
The second minimization problem on the right hand side of (81) can be simplified as follows:
\(s^*(u)=\bar{s}-\frac{\nabla _s f(\bar{X},\bar{s},\bar{y})+u}{L}\) is the minimizer of the inner minimization problem in (84).
Since \(\bar{y}\in \mathcal{Q }\), the following is true for the third minimization problem on the right hand side of (81).
Thus, (81), (83), (85) and (86) together imply that
Since \(\Big (\lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+f(\bar{X},\bar{s},\bar{y})\Big ) -\Big (\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta )+f(X_*,s_*,y_*)\Big )\le \epsilon \), we have that
From (21), it follows that \(\Vert W\Vert _F\le I(\alpha ^*)\Vert \sigma (W)\Vert _{\alpha ^*}\). Thus, (87) implies that
Suppose \(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F> I(\alpha ^*)\lambda \mu _1\). Then the optimal solution of the optimization problem in (88) is
Then (87) implies that \((\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F-I(\alpha ^*)\lambda \mu _1)^2\le 2L\epsilon \), i.e. \(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F \le \sqrt{2L\epsilon }+I(\alpha ^*)\lambda \mu _1\). This is trivially true when \(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F\le I(\alpha ^*)\lambda \mu _1\). Therefore, we can conclude that always
A similar analysis establishes that \(\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})\Vert _2\le \sqrt{2L\epsilon }+J(\beta ^*)\lambda \mu _2\). \(\square \)
Now we are ready to prove Lemma 6.
Proof
Let \(f(X,s,y)=\frac{1}{2} \Vert \mathcal{A }(X)-y-b-\lambda \theta _1\Vert _2^2+\frac{1}{2} \Vert \mathcal{C }(X)-s-d-\lambda \theta _2\Vert _2^2\) and let \(\Vert (X,s,y)\Vert =\sqrt{\Vert X\Vert _F^2+\Vert s\Vert _2^2+\Vert y\Vert _2^2}\), then for any \(X_1, X_2 \in \mathbb{R }^{m\times n}\), \(s_1, s_2 \in \mathbb{R }^p\) and \(y_1, y_2 \in \mathbb{R }^q\), we have
Hence,
where \(\sigma _{\max }(M)\) is the maximum singular-value of \(M\). Thus, \(f:\mathbb{R }^{m\times n}\times \mathbb{R }^p\times \mathbb{R }^q \rightarrow \mathbb{R }\) is a convex function and \(\nabla f\) is Lipschitz continuous with respect to \(\Vert .\Vert \) with Lipschitz constant \(L=\sigma _{\max }^2(M)\).
Since \((\bar{X},\bar{s},\bar{y})\) is an \(\epsilon \)-optimal solution to the problem \(\min \{P(X,s,y):X\in \mathbb{R }^{m\times n}, s\in \mathbb{R }^p, y\in \mathcal{Q }\subset \mathbb{R }^q\}\), Theorem 5 guarantees that
\(\square \)
1.3 Lemma 7 and proof
Lemma 7
Let \(\mathcal{Q }\subset \mathbb{R }^q\) be a nonempty, closed, and convex set. Then for all \(\tilde{y}\in \mathbb{R }^q\) and \(\lambda >0\), we have \(\varPi _\mathcal{Q }(\lambda \tilde{y})=\lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y})\), or equivalently, \(\varPi _\mathcal{Q }(\tilde{y})=\lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y}/\lambda )\), where \(\mathcal{Q }/\lambda = \{x: \lambda x \in \mathcal{Q }\}\).
Proof
Fix \(\tilde{y}\in \mathbb{R }^q\) and \(\lambda >0\). Then
\(\square \)
1.4 Lemma 8 and proof
Lemma 8
Let \((X_{*},s_{*},y_{*})\) be an optimal solution to (13) and suppose that \(\Vert \varPi _{\mathcal{Q }}\left(y^{(k)}_p\right) -y^{(k)}\Vert _2\le \xi ^{(k)}\) for some \(k\ge 1\), where \(y^{(k)}_p:=y^{(k)}-\frac{1}{L}\nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})\). Then we have
Proof
From the definition of \(\varPi _\mathcal{Q }(.)\), we have
Since \(y_{*}\in \mathcal{Q }\), \(y^{(k)}-y^{(k)}_p=\frac{1}{L}\nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})\) and \(\Vert \varPi _{\mathcal{Q }}\left(y^{(k)}_p\right)-y^{(k)}\Vert _2\le \xi ^{(k)}\), (92) follows from (93). \(\square \)
Appendix B: Auxiliary results for simple optimization problems
Lemma 9
Let \((\mathcal{E },\Vert .\Vert )\) be a normed vector space, \(f:\mathcal{E }\rightarrow \mathbb{R }\) be a strictly convex function and \(\chi \subset \mathcal{E }\) be a closed, convex set with a non-empty interior. Let \(\bar{x}=\mathop {\mathrm{argmin}}_{x\in \chi }f(x)\) and \(x^*=\mathop {\mathrm{argmin}}_{x\in \mathcal{E }}f(x)\). If \(x^*\not \in \chi \), then \(\bar{x}\in \mathop {\mathbf{bd}}\chi \), where \(\mathop {\mathbf{bd}}\chi \) denotes the boundary of \(\chi \).
Proof
We will establish the result by contradiction. Assume \(\bar{x}\) is in the interior of \(\chi \), i.e. \(\bar{x}\in \mathop {\mathbf{int}}(\chi )\). Then \(\exists \;\epsilon >0\) such that \(B(\bar{x},\epsilon )=\{x\in \mathcal{E }\;:\;\Vert x-\bar{x}\Vert <\epsilon \}\subset \chi \). Since \(f\) is strictly convex and \(x^*\ne \bar{x}\), \(f(x^*)<f(\bar{x})\). Choose \(0<\lambda <\frac{\epsilon }{\Vert \bar{x}-x^*\Vert }<1\) so that \(\lambda x^* +(1-\lambda ) \bar{x} \in B(\bar{x},\epsilon ) \subset \chi \). Since \(f\) is strictly convex,
However, \(\lambda x^* + (1-\lambda ) \bar{x} \in B(\bar{x},\epsilon )\subset \chi \) and \(f(\lambda x^* + (1-\lambda ) \bar{x})<f(\bar{x})\) contradicts the fact that \(f(\bar{x})<f(x)\) for all \(x\in \chi \). Therefore, \(\bar{x} \not \in \mathop {\mathbf{int}}(\chi )\). Since \(\bar{x}\in \chi \), it follows that \(\bar{x}\in \mathop {\mathbf{bd}}\chi \). \(\square \)
Next, we collect together complexity results for optimization problems of the form
that need to be solved in each Algorithm APG update step, displayed in Fig. 1.
Lemma 10
Let \(\bar{X} = \mathop {\mathrm{argmin}}_{X \in \mathbb{R }^{m\times n}}\big \{\lambda \Vert \sigma (X)\Vert _{\alpha } + \frac{1}{2}\Vert X-\tilde{X}\Vert _F^2: \Vert \sigma (X)\Vert _{\alpha } \le \eta \big \}\) of the constrained matrix shrinkage problem. Then
where \(U\mathop {\mathbf{diag}}(\sigma )V^T\) denotes the SVD of \(\tilde{X}\) such that \(\sigma \in \mathbb{R }_+^r\) and \(r=\mathop {\mathbf{rank}}(\tilde{X})\); and \(\bar{s}\) denotes the optimal solution of the constrained vector shrinkage problem
Since the worst case complexity of computing the SVD of \(\tilde{X}\) is \(\mathcal{O }(\min \{n^2m,m^2n\})\) the complexity of the computing \(\bar{X}\) is \(\mathcal{O }(\min \{n^2m,m^2n\} + T_v(r,\alpha ))\), where \(T_v(r,\alpha )\) denotes the complexity of computing the solution of an \(r\)-dimensional constrained vector shrinkage problem with norm \(\Vert .\Vert _{\alpha }\). The function
Proof
The standard results in non-linear convex optimization over matrices implies that \(\bar{X}\) is of the form \(\bar{X} = U \mathop {\mathbf{diag}}(\bar{s}) V^T\) (see Corollary 2.5 in [28]).
Now, consider the vector constrained shrinkage problem
-
(i)
\(\beta = 1\): First considered the unconstrained case, i.e. \(\eta =\infty \). The unconstrained solution \(s^*\) has a closed form \(s^*=\text{ sign}(\tilde{s})\,\odot \,\max \{|\tilde{s}|-\lambda \mathbf{1}, \mathbf{0}\}\) and can be computed with \(\mathcal{O }(p)\) complexity, where \(\odot \) denotes componentwise multiplication and \(\mathbf{1}\) is a vector of ones. When \(\eta <\infty \), the constrained optimal solution, \(\bar{s}\), can be computed with \(\mathcal{O }(p\log (p))\) complexity. See Lemma A.4 in [1].
-
(ii)
\(\beta = 2\): First considered the unconstrained case, i.e. \(\eta =\infty \). Since \(\ell _2\)-norm is self dual, \(\lambda \Vert s\Vert _2 = \max \{u^Ts: \Vert u\Vert _2 \le 1\}\). Thus,
$$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \lambda \Vert s\Vert _2+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} \min _{s\in \mathbb{R }^p}\ \max _{u:\ \Vert u\Vert _2\le \lambda } \left\{ u^T s + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _2\le \lambda }\ \min _{s \in \mathbb{R }^p}\left\{ u^Ts +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _2\le \lambda } \left\{ u^T (\tilde{s}-u)+\frac{1}{2}\Vert u\Vert _2^2\right\} , \\&\quad =\frac{1}{2}\Vert \tilde{s}\Vert _2^2-\min _{u:\ \Vert u\Vert _2\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2, \nonumber \end{aligned}$$(96)where (96) follows from the fact that \(s^*(u):=\mathop {\mathrm{argmin}}_{s\in \mathbb{R }^p} \{u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\}=\tilde{s}-u\). Define
$$\begin{aligned} u^*:=\mathop {\mathrm{argmin}}_{u:\ \Vert u\Vert _2\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2=\tilde{s}~\min \left\{ \frac{\lambda }{\Vert \tilde{s}\Vert _2},\ \mathbf{1}\right\} . \end{aligned}$$Then the unconstrained optimal solution \(s^*=s^*(u^*)=\tilde{s}\max \left\{ 1-\frac{\lambda }{\Vert \tilde{s}\Vert _2},\ 0\right\} \) and the complexity of computing \(\bar{s}\) is \(\mathcal{O }(p)\). Next, consider the constrained optimization problem, i.e. \(\eta <\infty \). The constrained optimum \(\bar{s}=s^*\), whenever \(s^*\) is feasible, i.e. \(\Vert s^*\Vert _2 \le \eta \). Since \(f(s):=\lambda \Vert s\Vert _2\!+\!\frac{1}{2}\Vert s\!-\!\tilde{s}\Vert _2^2\) is strongly convex, Lemma 9 implies that \(\Vert \bar{s}\Vert _{2} \!=\! \eta \) whenever \(\Vert s^*\Vert _2 \!>\! \eta \). Thus,
$$\begin{aligned} \min \Big \{\lambda \Vert s\Vert _2+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\ \Vert s\Vert _2\le \eta \Big \} = \lambda \eta + \min \Big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2^2 = \eta ^2\Big \}. \end{aligned}$$The unique KKT point for the optimization problem \(\min \big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \frac{1}{2}\Vert s\Vert _2^2 = \frac{\eta ^2}{2}\big \}\), is given by \(\bar{s} = \eta \frac{\tilde{s}}{\Vert \tilde{s}\Vert }\) and KKT multiplier for the constraint \(\frac{1}{2}\Vert s\Vert _2^2 = \frac{\eta ^2}{2}\) is \(\vartheta = \frac{\Vert \tilde{s}\Vert _2}{\eta } -1\). It is easy to check that \(\vartheta > 0\) whenever \(\Vert s^*\Vert _2 > \eta \). Thus, \(\bar{s}\) is optimal for the convex optimization problem \(\min \Big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2^2 \le \eta ^2\Big \}\); consequently, optimal for equality constrained optimization problem \(\min \big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2 = \eta \big \}\). Hence, the complexity of computing \(\bar{s}\) is \(\mathcal{O }(p)\).
-
(iii)
\(\beta = \infty \): First consider the unconstrained problem. Since \(\ell _1\)-norm is the dual norm of the \(\ell _{\infty }\)-norm, we have that
$$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \lambda \Vert s\Vert _\infty +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} \min _{s\in \mathbb{R }^p}\ \max _{u:\ \Vert u\Vert _1\le \lambda } \left\{ u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _1\le \lambda }\ \min _{s \in \mathbb{R }^p}\left\{ u^Ts+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\ {}&\quad =\max _{u:\ \Vert u\Vert _1\le \lambda } \left\{ u^T (\tilde{s}-u) + \frac{1}{2}\Vert u\Vert _2^2\right\} , \\&\quad =\frac{1}{2}\Vert \tilde{s}\Vert _2^2-\min _{u:\ \Vert u\Vert _1\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2, \nonumber \end{aligned}$$(97)where (97) follows from the fact that \(s^*(u):=\mathop {\mathrm{argmin}}_{s\in \mathbb{R }^p}\{u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\}=\tilde{s}-u\). The result in (i) implies that complexity of computing \(u^*=\min _{u:\ \Vert u\Vert _1\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2\) is \(\mathcal{O }(p\log (p))\). Thus, the unconstrained optimal solution \(s^* = s^*(u^*) = \tilde{s} - u^*\) can be computed in \(\mathcal{O }(p\log (p))\) operations. Next, consider the constrained optimization problem. The constrained optimum, \(\bar{s}=s^*\) whenever \(s^*\) is feasible, i.e. \(\Vert s^*\Vert _{\infty } \le \eta \). Since \(f(s) = \lambda \Vert s\Vert _{\infty } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\) is strictly convex, Lemma 9 implies that \(\Vert \bar{s}\Vert _{\infty } = \eta \), whenever \(\Vert s^*\Vert _{\infty } > \eta \). Therefore,
$$\begin{aligned} \min \left\{ \lambda \Vert s\Vert _\infty +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty \le \eta \right\} = \lambda \eta + \min \left\{ \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty =\eta \right\} . \end{aligned}$$Then, it is easy to check \(\text{ sign}(\bar{s}_i) = \text{ sign}(\tilde{s}_i)\) for all \(i = 1, \ldots , p\). Moreover, \(\Vert s^*\Vert _\infty >\eta \) implies that \(\Vert \tilde{s}\Vert _\infty >\eta \). These two facts imply that
$$\begin{aligned} \min \left\{ \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty =\eta \right\} = \min \left\{ \frac{1}{2}\Vert s-\left|\tilde{s}\right|\Vert _2^2: 0 \le s_i \le \eta \right\} . \end{aligned}$$For \(1\!\le \! i\!\le \! p\), we have \(\min \{\left|\tilde{s}_i\right|,\eta \}\!=\!\mathop {\mathrm{argmin}}_{s_i\in \mathbb{R }}\big \{\frac{1}{2}(s_i \!-\! \left|\tilde{s}_i\right|)^2: 0 \!\le \! s_i \!\le \! \eta \big \}\). Thus, it follows that \(\bar{s}\!=\! \text{ sign}(\tilde{s}) \odot \min \{|\tilde{s}|, \eta \mathbf{1}\}\). Hence the complexity of computing \(\bar{s}\) is \(\mathcal{O }(p\log (p))\).
\(\square \)
Rights and permissions
About this article
Cite this article
Aybat, N.S., Iyengar, G. A unified approach for minimizing composite norms. Math. Program. 144, 181–226 (2014). https://doi.org/10.1007/s10107-012-0622-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-012-0622-z
Keywords
- Norm minimization
- Convex optimization
- Conic constraints
- Augmented Lagrangian method
- First order method
- Iteration complexity
- \(\ell _1\)-Minimization
- Nuclear norm
- Basis pursuit
- Principal component pursuit
- Sparse optimization