A unified approach for minimizing composite norms

Aybat, N. S.; Iyengar, G.

doi:10.1007/s10107-012-0622-z

A unified approach for minimizing composite norms

Full Length Paper
Series A
Published: 08 January 2013

Volume 144, pages 181–226, (2014)
Cite this article

Mathematical Programming Submit manuscript

N. S. Aybat¹ &
G. Iyengar²

612 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a first-order augmented Lagrangian algorithm (FALC) to solve the composite norm minimization problem

$$\begin{aligned} \begin{array}{ll} \min \limits _{X\in \mathbb{R }^{m\times n}}&\mu _1\Vert \sigma (\mathcal{F }(X)-G)\Vert _\alpha +\mu _2\Vert \mathcal{C }(X)-d\Vert _\beta ,\\ \text{ subject} \text{ to}&\mathcal{A }(X)-b\in \mathcal{Q }, \end{array} \end{aligned}$$

where $\sigma (X)$ denotes the vector of singular values of $X \in \mathbb{R }^{m\times n}$, the matrix norm $\Vert \sigma (X)\Vert _{\alpha }$ denotes either the Frobenius, the nuclear, or the $\ell _2$-operator norm of $X$, the vector norm $\Vert .\Vert _{\beta }$ denotes either the $\ell _1$-norm, $\ell _2$-norm or the $\ell _{\infty }$-norm; $\mathcal{Q }$ is a closed convex set and $\mathcal{A }(.)$, $\mathcal{C }(.)$, $\mathcal{F }(.)$ are linear operators from $\mathbb{R }^{m\times n}$ to vector spaces of appropriate dimensions. Basis pursuit, matrix completion, robust principal component pursuit (PCP), and stable PCP problems are all special cases of the composite norm minimization problem. Thus, FALC is able to solve all these problems in a unified manner. We show that any limit point of FALC iterate sequence is an optimal solution of the composite norm minimization problem. We also show that for all $\epsilon >0$, the FALC iterates are $\epsilon $-feasible and $\epsilon $-optimal after $\mathcal{O }(\log (\epsilon ^{-1}))$ iterations, which require $\mathcal{O }(\epsilon ^{-1})$ constrained shrinkage operations and Euclidean projection onto the set $\mathcal{Q }$. Surprisingly, on the problem sets we tested, FALC required only $\mathcal{O }(\log (\epsilon ^{-1}))$ constrained shrinkage, instead of the $\mathcal{O }(\epsilon ^{-1})$ worst case bound, to compute an $\epsilon $-feasible and $\epsilon $-optimal solution. To best of our knowledge, FALC is the first algorithm with a known complexity bound that solves the stable PCP problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on SVM and their application in image classification

Article 11 January 2018

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

References

Aybat, N.S., Chakraborty, A.: Fast reconstruction of CT images from parsimonious angular measurements via compressed sensing. Technical report, Siemens Corporate Research (2009)
Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)
Article MATH MathSciNet Google Scholar
Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)
Article MATH MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MATH MathSciNet Google Scholar
Becker, S., Bobin, J., Candès, E.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4, 1–39 (2011)
Article MATH MathSciNet Google Scholar
Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)
Article Google Scholar
Candès, E., Romberg, J.: Quantitative robust uncertainty principles and optimally sparse decompositions. Found. Comput. Math. 6, 227–254 (2006)
Article MATH MathSciNet Google Scholar
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006)
Article MATH Google Scholar
Candès, E., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52, 5406–5425 (2006)
Article Google Scholar
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? (2009). Submitted for publication
Cands, E.J., Recht, B.: Exact matrix completion via convex optimization. Found Comput Math 9, 717–772 (2008)
Article Google Scholar
d’Aspremont, A., Bach, F.R., Ghaoui, L.E.: Optimal solutions for sparse principle component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
MATH MathSciNet Google Scholar
d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49, 434–448 (2007)
Article MATH MathSciNet Google Scholar
Daubechies, I., Fornasier, M., Loris, I.: Accelerated projected gradient method for linear inverse problems with sparsity constraints. J. Fourier Anal. Appl. 14, 764–792 (2008)
Article MATH MathSciNet Google Scholar
Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)
Article MathSciNet Google Scholar
El Ghaoui, L., Gahinet, P.: Rank minimization under lmi constraints: a framework for output feedback problems. In ; Proceedings of the European control conference (1993)
Fazel, M., Hindi, H., Boyd, S.: Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In: Proceedings of American control conference, Denver, Colorado (2003)
Fazel, M., Hindi, H., Boyd, S.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, pp. 2156–2162 (2003)
Fazel, M., Hindi, H., Boyd, S.: Rank minimization and applications in system theory. In: American control conference, pp. 3273–3278 (2004)
Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications in system identification and realization (2012). Submitted for publication
Figueiredo, M.A., Nowak, R., Wright, S.J.: Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1, 586–597 (2007)
Article Google Scholar
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions (2010). ArXiv:0912.4571v2
Hale, E.T., Yin, W., Zhang, Y.: A fixed-point continuation for $\ell $ -regularized minimization with applications to compressed sensing. Rice University, Technical report (2007)
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for $\ell $ -minimization: methodology and convergence. SIAM J. Optim. 19, 1107–1130 (2008)
Article MATH MathSciNet Google Scholar
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principle component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
MATH MathSciNet Google Scholar
Koh, K., Kim, S.J., Boyd, S.: Solver for $\ell $ -regularized least squares problems. Stanford University, Technical report (2007)
Larsen, R.: Lanczos bidiagonalization with partial reorthogonalization. Technical report DAIMI PB-357, Department of Computer Science, Aarhus University (1998)
Lewis, A.S.: The convex analysis of unitarily invariant matrix norms. J. Convex Anal. 2, 173–183 (1995)
MATH MathSciNet Google Scholar
Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv:1009.5055v2 (2011)
Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. Technical report UIUC Technical Report UILU-ENG-09-2214 (2009)
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)
Article MATH MathSciNet Google Scholar
Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM. J. Matrix Anal. Appl. 31, 1235–1256 (2009)
Article MathSciNet Google Scholar
Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. Ser. A 128, 321–353 (2011)
Article MATH MathSciNet Google Scholar
http://www.netflixprize.com/
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Article MATH MathSciNet Google Scholar
Toh, K., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems (2010). (Preprint)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. (2008) (submitted to)
Van den Berg, E., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Article MATH MathSciNet Google Scholar
Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation. SIAM J. Sci. Comput. (2009) (to appear)
Yang, J., Zhang, Y.: Alternating direction algorithms for l1-problems in compressive sensing. Technical Report TR09-37, CAAM, Rice University (2009)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for $\ell _1$ minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)
Article MATH MathSciNet Google Scholar
Zhou, Z., Li, X., Wright, J., Candès, E., Ma, Y.: Stable principle component pursuit. In: Proceedings of International Symposium on Information Theory (2010)

Download references

Author information

Authors and Affiliations

IE Department, The Pennsylvania State University, University Park, PA, 16802, USA
N. S. Aybat
IEOR Department, Columbia University, New York, NY, 10027, USA
G. Iyengar

Authors

N. S. Aybat
View author publications
You can also search for this author in PubMed Google Scholar
G. Iyengar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. S. Aybat.

Additional information

Research partially supported by ONR N000140310514, DOE DE-FG02-08ER25856, DOE DE-AR0000235 and NSF DMS 10-16571 grants.

Appendices

Appendix A: Proofs of technical results

1.1 Lemma 5 and proof

Lemma 5

Let $\mathcal{Q }\subset \mathbb{R }^q$ be nonempty closed convex set such that $\{X\in \mathbb{R }^{m\times n}: \mathcal{A }(X)-b\in \mathcal{Q }\}\ne \emptyset $, where $\mathcal{A }$ is surjective; and let $(X^{(k)}_{*},s^{(k)}_{*},y^{(k)}_{*})$ is an optimal solution to (15). Then, for all $k\ge 1$,

$$\begin{aligned} \Vert y^{(k)}_{*}\Vert _2\le \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2+ 2~\min _{\tilde{y} \in \mathcal{Q }}\{\Vert \tilde{y}\Vert _2\}. \end{aligned}$$

(78)

Proof

From the first order optimality conditions for (15), we have $y^{(k)}_{*}=\varPi _\mathcal{Q }(\mathcal{A }(X^{(k)}_{*})-b-\lambda ^{(k)}\theta _1^{(k)})$. Since Euclidean projection is nonexpansive, we have

$$\begin{aligned} \Vert y^{(k)}_{*}-\tilde{y}\Vert _2\le \Vert \mathcal{A }(X^{(k)}_{*})-b -\lambda ^{(k)}\theta _1^{(k)}-\tilde{y}\Vert _2 \quad \forall \tilde{y}\in \mathcal{Q }. \end{aligned}$$

(79)

The result now follows from the triangular inequality. $\square $

This result implies several simple bounds on $\Vert y^{(k)}_{*}\Vert _2$. Since the initial iterate $X^{(0)}$ is feasible, i.e. $\mathcal{A }(X^{(0)}) - b \in \mathcal{Q }$, it follows that

$$\begin{aligned} \Vert y^{(k)}_{*}\Vert _2\le \eta _2^{(k)}:= \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2+ 2\Vert \mathcal{A }(X^{(0)}) - b\Vert _2.\quad \end{aligned}$$

(80)

Suppose $0 \in \mathcal{Q }$. Then $\Vert y^{(k)}_{*}\Vert _2\le \eta _2^{(k)}:= \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2.$ When $\mathcal{Q }$ is bounded with $\mathcal{Q }\subseteq \{y: \Vert y\Vert _2 \le \eta _2\}$. Then, one can set $\eta _2^{(k)}:=\eta _2$ for all $k\ge 1$.

1.2 Lemma 6 and proof

Lemma 6

Fix $\alpha $, $\beta \in \{1,2,\infty \}$. Let

$$\begin{aligned} P(X,s,y)=\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta ) + f(X,s,y) \end{aligned}$$

where

$$\begin{aligned} f(X,s,y) = \frac{1}{2} \Vert \mathcal{A }(X)-y-b-\lambda \theta _1\Vert _2^2+\frac{1}{2} \Vert \mathcal{C }(X)-s-d-\lambda \theta _2\Vert _2^2. \end{aligned}$$

Suppose $(\bar{X},\bar{s},\bar{y})$ is $\epsilon $-optimal for the problem $\min _{X,s,y}\{P(X,s,y):~y\in \mathcal{Q }\}$, i.e.

$$\begin{aligned} 0\le P(\bar{X},\bar{s},\bar{y})- \min _{X \in \mathbb{R }^{m\times n},~s\in \mathbb{R }^p,~y\in \mathcal{Q }\subset \mathbb{R }^q}P(X,s,y) \le \epsilon . \end{aligned}$$

Then we have

$$\begin{aligned}&\Vert \mathcal{C }(\bar{X})-\bar{s}-d-\lambda \theta _2\Vert _2 \le J(\beta ^*)\mu _2\lambda +\sigma _{max}(M)\sqrt{2\epsilon },\\&\Vert \mathcal{A }^*\left(\mathcal A (\bar{X})\!-\!\bar{y}\!-\!b\!-\!\lambda \theta _1\right)\!+\mathcal C ^*\left(\mathcal C (\bar{X})\!-\!\bar{s}\!-\!d\!-\!\lambda \theta _2\right)\Vert _F \!\le \! I(\alpha ^*)\mu _1\lambda \! +\! \sigma _{max}(M)\sqrt{2\epsilon }, \end{aligned}$$

where $M = \left({\small \begin{array}{lll} -I&\quad 0&\quad C \\ 0&\quad -I&\quad A \\ \end{array}}\right),$ $\frac{1}{\alpha ^*}+\frac{1}{\alpha }=1$ (resp. $\frac{1}{\beta ^*}+\frac{1}{\beta }=1$) is the Hölder conjugate of $\alpha $ (resp. $\beta $) and the functions $I(\cdot )$ and $J(\cdot )$ are defined in (21).

In order to prove for Lemma 6, we need the following result.

Theorem 5

Let $f:\mathbb{R }^{m\times n}\times \mathbb{R }^p \times \mathbb{R }^q \rightarrow \mathbb{R }$ denote a convex function with a Lipschitz continuous gradient $\nabla f$ with a Lipschitz constant $L$ with respect to the norm $\Vert (X,s,y)\Vert =\sqrt{\Vert X\Vert _F^2+\Vert s\Vert _2^2+\Vert y\Vert _2^2}$.

Let $(X_*,s_*,y_*) \in \mathop {\mathrm{argmin}}_{X,s,y}\{\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta )+f(X,s,y): y\in \mathcal{Q }\}$. Suppose $(\bar{X},\bar{s},\bar{y}) \in \mathbb{R }^{m\times n} \times \mathbb{R }^p\times \mathbb{R }^q$ such that $\bar{y}\in \mathcal{Q }$ satisfies

$$\begin{aligned}&\lambda \big (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta \big )+f(\bar{X},\bar{s},\bar{y}) \le \lambda \big (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta \big )\\&\quad +\,f(X_*,s_*,y_*)+ \epsilon \end{aligned}$$

for some $\epsilon >0$. Then

$$\begin{aligned}&\Vert \nabla _X f(\bar{X}, \bar{s}, \bar{y})\Vert _F \le \big (\sqrt{2L\epsilon }+I(\alpha ^*) \lambda \mu _1\big ), \\&\Vert \nabla _s f(\bar{X}, \bar{s}, \bar{y})\Vert _2 \le \big (\sqrt{2L\epsilon } + J(\beta ^*)\lambda \mu _2\big ), \end{aligned}$$

where $\frac{1}{\alpha ^*}+\frac{1}{\alpha }=1$ (resp. $\frac{1}{\beta ^*}+\frac{1}{\beta }=1$) is the Hölder conjugate of $\alpha $ (resp. $\beta $) and the functions $I(\cdot )$ and $J(\cdot )$ are defined in (22).

Proof

Since $\nabla f$ is Lipschitz continuous with constant $L$, the triangular inequality for $\Vert \sigma (.)\Vert _\alpha $ and $\Vert .\Vert _\beta $ implies that for any $X\in \mathbb{R }^{m\times n}$, $s\in \mathbb{R }^p$ and $y\in \mathbb{R }^q$

$$\begin{aligned}&\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta )+f(X,s,y)\nonumber \\&\le \lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+ f(\bar{X},\bar{s},\bar{y})+\lambda (\mu _1\Vert \sigma (X-\bar{X})\Vert _\alpha +\mu _2\Vert s-\bar{s}\Vert _\beta )\nonumber \\&\quad + \left\langle \nabla _X f(\bar{X},\bar{s},\bar{y}),(X-\bar{X}) \right\rangle +\nabla _s f(\bar{X}, \bar{s},\bar{y})^T (s-\bar{s})+\nabla _y f(\bar{X}, \bar{s},\bar{y})^T (y-\bar{y})\nonumber \\&\quad +\frac{L}{2}\Vert X-\bar{X}\Vert _F^2 +\frac{L}{2}\Vert s-\bar{s}\Vert _2^2+\frac{L}{2}\Vert y-\bar{y}\Vert _2^2, \nonumber \end{aligned}$$

where $\left\langle X,Y \right\rangle =\mathop {\mathbf{Tr}}(X^T Y)\in \mathbb{R }$ denotes the usual Euclidean inner product of $X\in \mathbb{R }^{m\times n}$ and $Y\in \mathbb{R }^{m\times n}$. Since $X$, $s$ and $y$ are arbitrary, it follows that

$$\begin{aligned}&\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta )+f(X_*,s_*,y_*) \nonumber \\&\quad \le \lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+f(\bar{X},\bar{s},\bar{y}) \nonumber \\&\qquad +\min _{X\in \mathbb{R }^{m\times n}}\left\{ \left\langle \nabla _X f(\bar{X},\bar{s},\bar{y}),X-\bar{X} \right\rangle +\frac{L}{2}\Vert X-\bar{X}\Vert _F^2+\lambda \mu _1\Vert \sigma (X-\bar{X})\Vert _\alpha \right\} \nonumber \\&\qquad +\min _{s\in \mathbb{R }^p}\left\{ \nabla _s f(\bar{X},\bar{s},\bar{y})^T(s-\bar{s})+\frac{L}{2}\Vert s-\bar{s}\Vert _2^2+\lambda \mu _2\Vert s-\bar{s}\Vert _\beta \right\} \nonumber \\&\qquad +\min _{y\in \mathcal{Q }\subset \mathbb{R }^q}\left\{ \nabla _y f(\bar{X},\bar{s},\bar{y})^T(y-\bar{y})+\frac{L}{2}\Vert y-\bar{y}\Vert _2^2\right\} . \end{aligned}$$

(81)

The first minimization problem on the right hand side of (81) can be simplified as follows:

$$\begin{aligned}&\min _{X\in \mathbb{R }^{m\times n}}\left\{ \left\langle \nabla _X f(\bar{X},\bar{s},\bar{y}),X-\bar{X} \right\rangle +\frac{L}{2}\Vert X-\bar{X}\Vert _F^2 + \lambda \mu _1\Vert \sigma (X-\bar{X})\Vert _\alpha \right\} \nonumber \\&\quad =\max _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\min _{X\in \mathbb{R }^{m\times n}}\left\{ \frac{L}{2}\Vert X-\bar{X}\Vert _F^2+\left\langle \nabla _X f(\bar{X},\bar{s},\bar{y})+ W,~X-\bar{X} \right\rangle \right\} ,\nonumber \\ \end{aligned}$$

(82)

$$\begin{aligned}&\quad =\max _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\left\{ \frac{L}{2}\Vert X^*(W)-\bar{X}\Vert _F^2+\left\langle \nabla _X f(\bar{X},\bar{s},\bar{y})+W,~X^*(W)-\bar{X} \right\rangle \right\} , \nonumber \\&\quad =-\min _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\frac{\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})+W\Vert _F^2}{2L}, \end{aligned}$$

(83)

where $X^*(W)=\bar{X}-\frac{\nabla _X f(\bar{X},\bar{s},\bar{y})+W}{L}$ is the minimizer of the inner minimization problem in (82).

The second minimization problem on the right hand side of (81) can be simplified as follows:

$$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \nabla _s f(\bar{X},\bar{s},\bar{y})^T(s-\bar{s})+\frac{L}{2}\Vert s-\bar{s}\Vert _2^2+\lambda \mu _2\Vert s-\bar{s}\Vert _\beta \right\} \nonumber \\&\quad =\max _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\min _{s\in \mathbb{R }^p}\left\{ \frac{L}{2}\Vert s-\bar{s}\Vert _2^2 +(\nabla _s f(\bar{X},\bar{s},\bar{y})+ u)^T(s-\bar{s})\right\} , \end{aligned}$$

(84)

$$\begin{aligned}&\quad =\max _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\left\{ \frac{L}{2}\Vert s^*(u)-\bar{s}\Vert _2^2 +(\nabla _sf(\bar{X},\bar{s},\bar{y})+u)^T(s^*(u)-\bar{s})\right\} , \nonumber \\&\quad =-\min _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\frac{\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})+u\Vert _2^2}{2L}, \end{aligned}$$

(85)

$s^*(u)=\bar{s}-\frac{\nabla _s f(\bar{X},\bar{s},\bar{y})+u}{L}$ is the minimizer of the inner minimization problem in (84).

Since $\bar{y}\in \mathcal{Q }$, the following is true for the third minimization problem on the right hand side of (81).

$$\begin{aligned} \min _{y\in \mathcal{Q }\subset \mathbb{R }^q}\left\{ \nabla _y f(\bar{X},\bar{s},\bar{y})^T(y-\bar{y})+\frac{L}{2}\Vert y-\bar{y}\Vert _2^2\right\} \le 0. \end{aligned}$$

(86)

Thus, (81), (83), (85) and (86) together imply that

$$\begin{aligned}&\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha \!+\!\mu _2\Vert s_*\Vert _\beta )\!+\!f(X_*,s_*,y_*) \le \lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha \!+\!\mu _2\Vert \bar{s}\Vert _\beta )\!\!+\!\!f(\bar{X},\bar{s},\bar{y}) \\&\quad -\min _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\frac{\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})+W\Vert _F^2}{2L}\\&\quad -\min _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\frac{\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})+u\Vert _2^2}{2L}. \end{aligned}$$

Since $\Big (\lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+f(\bar{X},\bar{s},\bar{y})\Big ) -\Big (\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta )+f(X_*,s_*,y_*)\Big )\le \epsilon $, we have that

$$\begin{aligned} \min _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\!+\!W\Vert _F^2 +\! \min _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})+u\Vert _2^2\le 2L \epsilon .\nonumber \\ \end{aligned}$$

(87)

From (21), it follows that $\Vert W\Vert _F\le I(\alpha ^*)\Vert \sigma (W)\Vert _{\alpha ^*}$. Thus, (87) implies that

$$\begin{aligned} \min _{W:\Vert W\Vert _F\le I(\alpha ^*)\lambda \mu _1}\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})+W\Vert _F^2\le 2L \epsilon . \end{aligned}$$

(88)

Suppose $\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F> I(\alpha ^*)\lambda \mu _1$. Then the optimal solution of the optimization problem in (88) is

$$\begin{aligned} W^*=-I(\alpha ^*)\lambda \mu _1 \cdot \frac{\nabla _X f(\bar{X},\bar{s},\bar{y})}{\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F}. \end{aligned}$$

Then (87) implies that $(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F-I(\alpha ^*)\lambda \mu _1)^2\le 2L\epsilon $, i.e. $\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F \le \sqrt{2L\epsilon }+I(\alpha ^*)\lambda \mu _1$. This is trivially true when $\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F\le I(\alpha ^*)\lambda \mu _1$. Therefore, we can conclude that always

$$\begin{aligned} \Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F\le \sqrt{2L\epsilon }+I(\alpha ^*)\lambda \mu _1. \end{aligned}$$

A similar analysis establishes that $\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})\Vert _2\le \sqrt{2L\epsilon }+J(\beta ^*)\lambda \mu _2$. $\square $

Now we are ready to prove Lemma 6.

Proof

Let $f(X,s,y)=\frac{1}{2} \Vert \mathcal{A }(X)-y-b-\lambda \theta _1\Vert _2^2+\frac{1}{2} \Vert \mathcal{C }(X)-s-d-\lambda \theta _2\Vert _2^2$ and let $\Vert (X,s,y)\Vert =\sqrt{\Vert X\Vert _F^2+\Vert s\Vert _2^2+\Vert y\Vert _2^2}$, then for any $X_1, X_2 \in \mathbb{R }^{m\times n}$, $s_1, s_2 \in \mathbb{R }^p$ and $y_1, y_2 \in \mathbb{R }^q$, we have

$$\begin{aligned}&\Vert \nabla f(X_1, s_1, y_1)-\nabla f(X_2, s_2, y_2)\Vert ^2\\&\quad =\left\Vert\left( \begin{array}{l} \nabla _X f(X_1,s_1,y_1)-\nabla _X f(X_2,s_2,y_2)\\ \nabla _s f(X_1,s_1,y_1)-\nabla _s f(X_2,s_2, y_2)\\ \nabla _y f(X_1,s_1,y_1)-\nabla _y f(X_2,s_2, y_2) \end{array} \right)\right\Vert^2, \\&\quad = \Vert \nabla _X f(X_1,s_1,y_1)-\nabla _X f(X_2,s_2,y_2)\Vert _F^2+\Vert \nabla _s f(X_1,s_1,y_1)\\&\quad \quad -\,\nabla _s f(X_2,s_2,y_2)\Vert _2^2\\&\quad \quad +\Vert \nabla _y f(X_1,s_1,y_1)-\nabla _y f(X_2,s_2,y_2)\Vert _2^2,\\&\quad =\Vert \mathcal{A }^*(\mathcal{A }(X_1-X_2)-y_1+y_2)+\mathcal{C }^*(\mathcal{C }(X_1-X_2)-s_1+s_2)\Vert _F^2\\&\quad \quad +\Vert \mathcal{C }(X_1-X_2)-s_1+s_2\Vert _2^2+\Vert \mathcal{A }(X_1-X_2)-y_1+y_2\Vert _2^2,\\&\quad =\Vert A^T(A\ {\mathop {\mathbf{vec}}}(X_1-X_2)-y_1+y_2)+C^T(C\ {\mathop {\mathbf{vec}}}(X_1-X_2)-s_1+s_2)\Vert _2^2\\&\quad \quad +\Vert C\ {\mathop {\mathbf{vec}}}(X_1-X_2)-s_1+s_2\Vert _2^2+\Vert A\ {\mathop {\mathbf{vec}}}(X_1-X_2)-y_1+y_2\Vert _2^2,\\&\quad =\left\Vert M^TM \left( \begin{array}{l} s_1-s_2 \\ y_1-y_2 \\ {\mathop {\mathbf{vec}}}(X_1-X_2)\\ \end{array} \right)\right\Vert^2_2. \end{aligned}$$

Hence,

$$\begin{aligned}&\Vert \nabla f(X_1, s_1, y_1)-\nabla f(X_2, s_2, y_2)\Vert \le \ \sigma _{\max }^2(M)~\left\Vert \left( \begin{array}{l} s_1-s_2 \\ y_1-y_2 \\ {\mathop {\mathbf{vec}}}(X_1-X_2)\\ \end{array} \right)\right\Vert_2, \\&\quad =\ \sigma _{\max }^2(M)~\sqrt{\Vert X_1-X_2\Vert _F^2+\Vert s_1-s_2\Vert _2^2+\Vert y_1-y_2\Vert _2^2}, \nonumber \\&\quad =\ \sigma _{\max }^2(M)~\Vert (X_1,s_1,y_1)-(X_2,s_2,y_2)\Vert , \end{aligned}$$

where $\sigma _{\max }(M)$ is the maximum singular-value of $M$. Thus, $f:\mathbb{R }^{m\times n}\times \mathbb{R }^p\times \mathbb{R }^q \rightarrow \mathbb{R }$ is a convex function and $\nabla f$ is Lipschitz continuous with respect to $\Vert .\Vert $ with Lipschitz constant $L=\sigma _{\max }^2(M)$.

Since $(\bar{X},\bar{s},\bar{y})$ is an $\epsilon $-optimal solution to the problem $\min \{P(X,s,y):X\in \mathbb{R }^{m\times n}, s\in \mathbb{R }^p, y\in \mathcal{Q }\subset \mathbb{R }^q\}$, Theorem 5 guarantees that

$$\begin{aligned} \Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F&= \Vert \mathcal{A }^*(\mathcal A (\bar{X})-\bar{y}-b-\lambda \theta _1) + \mathcal C ^*(\mathcal C (\bar{X})-\bar{s}-d-\lambda \theta _2)\Vert _F \nonumber \\&\le \sqrt{2\epsilon }~\sigma _{\max }(M)+I(\alpha ^*)\lambda \mu _1, \end{aligned}$$

(89)

$$\begin{aligned} \Vert \nabla _s f(\bar{X},\bar{s},\bar{y})\Vert _2&= \Vert \mathcal{C }(\bar{X})-\bar{s}-d-\lambda \theta _2\Vert _2 \le \sqrt{2\epsilon }~\sigma _{\max }(M)+J(\beta ^*)\lambda \mu _2.\nonumber \\ \end{aligned}$$

(90)

$\square $

1.3 Lemma 7 and proof

Lemma 7

Let $\mathcal{Q }\subset \mathbb{R }^q$ be a nonempty, closed, and convex set. Then for all $\tilde{y}\in \mathbb{R }^q$ and $\lambda >0$, we have $\varPi _\mathcal{Q }(\lambda \tilde{y})=\lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y})$, or equivalently, $\varPi _\mathcal{Q }(\tilde{y})=\lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y}/\lambda )$, where $\mathcal{Q }/\lambda = \{x: \lambda x \in \mathcal{Q }\}$.

Proof

Fix $\tilde{y}\in \mathbb{R }^q$ and $\lambda >0$. Then

$$\begin{aligned} \varPi _\mathcal{Q }(\lambda \tilde{y})=\mathop {\mathrm{argmin}}_{x\in \mathcal{Q }}\Vert x-\lambda \tilde{y}\Vert _2 =\lambda \mathop {\mathrm{argmin}}_{y\in \mathcal{Q }/\lambda }\Vert y-\tilde{y}\Vert _2 = \lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y}). \end{aligned}$$

(91)

$\square $

1.4 Lemma 8 and proof

Lemma 8

Let $(X_{*},s_{*},y_{*})$ be an optimal solution to (13) and suppose that $\Vert \varPi _{\mathcal{Q }}\left(y^{(k)}_p\right) -y^{(k)}\Vert _2\le \xi ^{(k)}$ for some $k\ge 1$, where $y^{(k)}_p:=y^{(k)}-\frac{1}{L}\nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})$. Then we have

$$\begin{aligned}&-\left\langle \nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)}),~y_{*}-y^{(k)} \right\rangle \le L\xi ^{(k)}\Vert y_{*}-y^{(k)}\Vert _2\nonumber \\&\quad +\xi ^{(k)}\Vert \nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})\Vert _2. \end{aligned}$$

(92)

Proof

From the definition of $\varPi _\mathcal{Q }(.)$, we have

$$\begin{aligned}&\left\langle \varPi _\mathcal{Q }(y^{(k)}_p)-y^{(k)}_p,~y-\varPi _\mathcal{Q }(y^{(k)}_p) \right\rangle \ge 0, \quad \forall ~y\in \mathcal{Q },\nonumber \\&\quad \Rightarrow \left\langle \varPi _\mathcal{Q }(y^{(k)}_p)-y^{(k)},~y-y^{(k)} \right\rangle +\left\langle \varPi _\mathcal{Q }(y^{(k)}_p)-y^{(k)},~y^{(k)}-\varPi _\mathcal{Q }(y^{(k)}_p) \right\rangle \nonumber \\&\qquad \quad +\,\left\langle y^{(k)}-y^{(k)}_p,~y-y^{(k)} \right\rangle +\left\langle y^{(k)}-y^{(k)}_p,~y^{(k)}-\varPi _\mathcal{Q }(y^{(k)}_p) \right\rangle \ge 0, \quad \forall ~y\in \mathcal{Q }.\nonumber \\ \end{aligned}$$

(93)

Since $y_{*}\in \mathcal{Q }$, $y^{(k)}-y^{(k)}_p=\frac{1}{L}\nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})$ and $\Vert \varPi _{\mathcal{Q }}\left(y^{(k)}_p\right)-y^{(k)}\Vert _2\le \xi ^{(k)}$, (92) follows from (93). $\square $

Appendix B: Auxiliary results for simple optimization problems

Lemma 9

Let $(\mathcal{E },\Vert .\Vert )$ be a normed vector space, $f:\mathcal{E }\rightarrow \mathbb{R }$ be a strictly convex function and $\chi \subset \mathcal{E }$ be a closed, convex set with a non-empty interior. Let $\bar{x}=\mathop {\mathrm{argmin}}_{x\in \chi }f(x)$ and $x^*=\mathop {\mathrm{argmin}}_{x\in \mathcal{E }}f(x)$. If $x^*\not \in \chi $, then $\bar{x}\in \mathop {\mathbf{bd}}\chi $, where $\mathop {\mathbf{bd}}\chi $ denotes the boundary of $\chi $.

Proof

We will establish the result by contradiction. Assume $\bar{x}$ is in the interior of $\chi $, i.e. $\bar{x}\in \mathop {\mathbf{int}}(\chi )$. Then $\exists \;\epsilon >0$ such that $B(\bar{x},\epsilon )=\{x\in \mathcal{E }\;:\;\Vert x-\bar{x}\Vert <\epsilon \}\subset \chi $. Since $f$ is strictly convex and $x^*\ne \bar{x}$, $f(x^*)<f(\bar{x})$. Choose $0<\lambda <\frac{\epsilon }{\Vert \bar{x}-x^*\Vert }<1$ so that $\lambda x^* +(1-\lambda ) \bar{x} \in B(\bar{x},\epsilon ) \subset \chi $. Since $f$ is strictly convex,

$$\begin{aligned} f(\lambda x^* + (1-\lambda ) \bar{x})<\lambda f(x^*) + (1-\lambda ) f(\bar{x}) < f(\bar{x}). \end{aligned}$$

(94)

However, $\lambda x^* + (1-\lambda ) \bar{x} \in B(\bar{x},\epsilon )\subset \chi $ and $f(\lambda x^* + (1-\lambda ) \bar{x})<f(\bar{x})$ contradicts the fact that $f(\bar{x})<f(x)$ for all $x\in \chi $. Therefore, $\bar{x} \not \in \mathop {\mathbf{int}}(\chi )$. Since $\bar{x}\in \chi $, it follows that $\bar{x}\in \mathop {\mathbf{bd}}\chi $. $\square $

Next, we collect together complexity results for optimization problems of the form

$$\begin{aligned}&\min _{X\in \mathbb{R }^{m\times n}}&\left\{ \lambda \Vert \sigma (X)\Vert _{\alpha }+ \frac{1}{2}\Vert X-\tilde{X}\Vert _F^2: \Vert \sigma (X)\Vert _{\alpha } \le \eta \right\} \end{aligned}$$

$$\begin{aligned}&\min _{s\in \mathbb{R }^p}&\left\{ \lambda \Vert s\Vert _{\beta } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _{\beta } \le \eta \right\} \end{aligned}$$

that need to be solved in each Algorithm APG update step, displayed in Fig. 1.

Lemma 10

Let $\bar{X} = \mathop {\mathrm{argmin}}_{X \in \mathbb{R }^{m\times n}}\big \{\lambda \Vert \sigma (X)\Vert _{\alpha } + \frac{1}{2}\Vert X-\tilde{X}\Vert _F^2: \Vert \sigma (X)\Vert _{\alpha } \le \eta \big \}$ of the constrained matrix shrinkage problem. Then

$$\begin{aligned} \bar{X} = U \mathop {\mathbf{diag}}(\bar{s}) V^T, \end{aligned}$$

where $U\mathop {\mathbf{diag}}(\sigma )V^T$ denotes the SVD of $\tilde{X}$ such that $\sigma \in \mathbb{R }_+^r$ and $r=\mathop {\mathbf{rank}}(\tilde{X})$; and $\bar{s}$ denotes the optimal solution of the constrained vector shrinkage problem

$$\begin{aligned} \min _{s \in \mathbb{R }^r}\Big \{\lambda \Vert s\Vert _{\alpha } + \frac{1}{2}\Vert s-\sigma \Vert _2^2:~\Vert s\Vert _{\alpha } \le \eta \Big \}. \end{aligned}$$

Since the worst case complexity of computing the SVD of $\tilde{X}$ is $\mathcal{O }(\min \{n^2m,m^2n\})$ the complexity of the computing $\bar{X}$ is $\mathcal{O }(\min \{n^2m,m^2n\} + T_v(r,\alpha ))$, where $T_v(r,\alpha )$ denotes the complexity of computing the solution of an $r$-dimensional constrained vector shrinkage problem with norm $\Vert .\Vert _{\alpha }$. The function

$$\begin{aligned} T_v(p,\alpha ) = \left\{ \begin{array}{ll} \mathcal{O }(p\log (p))&\alpha = 1, \infty ,\\ \mathcal{O }(p),&\alpha = 2,\\ \end{array}\right. \end{aligned}$$

(95)

Proof

The standard results in non-linear convex optimization over matrices implies that $\bar{X}$ is of the form $\bar{X} = U \mathop {\mathbf{diag}}(\bar{s}) V^T$ (see Corollary 2.5 in [28]).

Now, consider the vector constrained shrinkage problem

$$\begin{aligned} \min _{s\in \mathbb{R }^p}\Big \{\lambda \Vert s\Vert _{\beta } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:~\Vert s\Vert _{\beta } \le \eta \Big \}. \end{aligned}$$

(i)
$\beta = 1$: First considered the unconstrained case, i.e. $\eta =\infty $. The unconstrained solution $s^*$ has a closed form $s^*=\text{ sign}(\tilde{s})\,\odot \,\max \{|\tilde{s}|-\lambda \mathbf{1}, \mathbf{0}\}$ and can be computed with $\mathcal{O }(p)$ complexity, where $\odot $ denotes componentwise multiplication and $\mathbf{1}$ is a vector of ones. When $\eta <\infty $, the constrained optimal solution, $\bar{s}$, can be computed with $\mathcal{O }(p\log (p))$ complexity. See Lemma A.4 in [1].
(ii)
$\beta = 2$: First considered the unconstrained case, i.e. $\eta =\infty $. Since $\ell _2$-norm is self dual, $\lambda \Vert s\Vert _2 = \max \{u^Ts: \Vert u\Vert _2 \le 1\}$. Thus,
$$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \lambda \Vert s\Vert _2+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} \min _{s\in \mathbb{R }^p}\ \max _{u:\ \Vert u\Vert _2\le \lambda } \left\{ u^T s + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _2\le \lambda }\ \min _{s \in \mathbb{R }^p}\left\{ u^Ts +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _2\le \lambda } \left\{ u^T (\tilde{s}-u)+\frac{1}{2}\Vert u\Vert _2^2\right\} , \\&\quad =\frac{1}{2}\Vert \tilde{s}\Vert _2^2-\min _{u:\ \Vert u\Vert _2\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2, \nonumber \end{aligned}$$
(96)
where (96) follows from the fact that $s^*(u):=\mathop {\mathrm{argmin}}_{s\in \mathbb{R }^p} \{u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\}=\tilde{s}-u$. Define
$$\begin{aligned} u^*:=\mathop {\mathrm{argmin}}_{u:\ \Vert u\Vert _2\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2=\tilde{s}~\min \left\{ \frac{\lambda }{\Vert \tilde{s}\Vert _2},\ \mathbf{1}\right\} . \end{aligned}$$
Then the unconstrained optimal solution $s^*=s^*(u^*)=\tilde{s}\max \left\{ 1-\frac{\lambda }{\Vert \tilde{s}\Vert _2},\ 0\right\} $ and the complexity of computing $\bar{s}$ is $\mathcal{O }(p)$. Next, consider the constrained optimization problem, i.e. $\eta <\infty $. The constrained optimum $\bar{s}=s^*$, whenever $s^*$ is feasible, i.e. $\Vert s^*\Vert _2 \le \eta $. Since $f(s):=\lambda \Vert s\Vert _2\!+\!\frac{1}{2}\Vert s\!-\!\tilde{s}\Vert _2^2$ is strongly convex, Lemma 9 implies that $\Vert \bar{s}\Vert _{2} \!=\! \eta $ whenever $\Vert s^*\Vert _2 \!>\! \eta $. Thus,
$$\begin{aligned} \min \Big \{\lambda \Vert s\Vert _2+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\ \Vert s\Vert _2\le \eta \Big \} = \lambda \eta + \min \Big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2^2 = \eta ^2\Big \}. \end{aligned}$$
The unique KKT point for the optimization problem $\min \big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \frac{1}{2}\Vert s\Vert _2^2 = \frac{\eta ^2}{2}\big \}$, is given by $\bar{s} = \eta \frac{\tilde{s}}{\Vert \tilde{s}\Vert }$ and KKT multiplier for the constraint $\frac{1}{2}\Vert s\Vert _2^2 = \frac{\eta ^2}{2}$ is $\vartheta = \frac{\Vert \tilde{s}\Vert _2}{\eta } -1$. It is easy to check that $\vartheta > 0$ whenever $\Vert s^*\Vert _2 > \eta $. Thus, $\bar{s}$ is optimal for the convex optimization problem $\min \Big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2^2 \le \eta ^2\Big \}$; consequently, optimal for equality constrained optimization problem $\min \big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2 = \eta \big \}$. Hence, the complexity of computing $\bar{s}$ is $\mathcal{O }(p)$.
(iii)
$\beta = \infty $: First consider the unconstrained problem. Since $\ell _1$-norm is the dual norm of the $\ell _{\infty }$-norm, we have that
$$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \lambda \Vert s\Vert _\infty +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} \min _{s\in \mathbb{R }^p}\ \max _{u:\ \Vert u\Vert _1\le \lambda } \left\{ u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _1\le \lambda }\ \min _{s \in \mathbb{R }^p}\left\{ u^Ts+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\ {}&\quad =\max _{u:\ \Vert u\Vert _1\le \lambda } \left\{ u^T (\tilde{s}-u) + \frac{1}{2}\Vert u\Vert _2^2\right\} , \\&\quad =\frac{1}{2}\Vert \tilde{s}\Vert _2^2-\min _{u:\ \Vert u\Vert _1\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2, \nonumber \end{aligned}$$
(97)
where (97) follows from the fact that $s^*(u):=\mathop {\mathrm{argmin}}_{s\in \mathbb{R }^p}\{u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\}=\tilde{s}-u$. The result in (i) implies that complexity of computing $u^*=\min _{u:\ \Vert u\Vert _1\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2$ is $\mathcal{O }(p\log (p))$. Thus, the unconstrained optimal solution $s^* = s^*(u^*) = \tilde{s} - u^*$ can be computed in $\mathcal{O }(p\log (p))$ operations. Next, consider the constrained optimization problem. The constrained optimum, $\bar{s}=s^*$ whenever $s^*$ is feasible, i.e. $\Vert s^*\Vert _{\infty } \le \eta $. Since $f(s) = \lambda \Vert s\Vert _{\infty } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2$ is strictly convex, Lemma 9 implies that $\Vert \bar{s}\Vert _{\infty } = \eta $, whenever $\Vert s^*\Vert _{\infty } > \eta $. Therefore,
$$\begin{aligned} \min \left\{ \lambda \Vert s\Vert _\infty +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty \le \eta \right\} = \lambda \eta + \min \left\{ \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty =\eta \right\} . \end{aligned}$$
Then, it is easy to check $\text{ sign}(\bar{s}_i) = \text{ sign}(\tilde{s}_i)$ for all $i = 1, \ldots , p$. Moreover, $\Vert s^*\Vert _\infty >\eta $ implies that $\Vert \tilde{s}\Vert _\infty >\eta $. These two facts imply that
$$\begin{aligned} \min \left\{ \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty =\eta \right\} = \min \left\{ \frac{1}{2}\Vert s-\left|\tilde{s}\right|\Vert _2^2: 0 \le s_i \le \eta \right\} . \end{aligned}$$
For $1\!\le \! i\!\le \! p$, we have $\min \{\left|\tilde{s}_i\right|,\eta \}\!=\!\mathop {\mathrm{argmin}}_{s_i\in \mathbb{R }}\big \{\frac{1}{2}(s_i \!-\! \left|\tilde{s}_i\right|)^2: 0 \!\le \! s_i \!\le \! \eta \big \}$. Thus, it follows that $\bar{s}\!=\! \text{ sign}(\tilde{s}) \odot \min \{|\tilde{s}|, \eta \mathbf{1}\}$. Hence the complexity of computing $\bar{s}$ is $\mathcal{O }(p\log (p))$.

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aybat, N.S., Iyengar, G. A unified approach for minimizing composite norms. Math. Program. 144, 181–226 (2014). https://doi.org/10.1007/s10107-012-0622-z

Download citation

Received: 19 January 2011
Accepted: 25 November 2012
Published: 08 January 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10107-012-0622-z

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified approach for minimizing composite norms

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

The Frank-Wolfe Algorithm: A Short Introduction

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proofs of technical results

1.1 Lemma 5 and proof

Lemma 5

Proof

1.2 Lemma 6 and proof

Lemma 6

Theorem 5

Proof

Proof

1.3 Lemma 7 and proof

Lemma 7

Proof

1.4 Lemma 8 and proof

Lemma 8

Proof

Appendix B: Auxiliary results for simple optimization problems

Lemma 9

Proof

Lemma 10

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

A unified approach for minimizing composite norms

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

The Frank-Wolfe Algorithm: A Short Introduction

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proofs of technical results

1.1 Lemma 5 and proof

Lemma 5

Proof

1.2 Lemma 6 and proof

Lemma 6

Theorem 5

Proof

Proof

1.3 Lemma 7 and proof

Lemma 7

Proof

1.4 Lemma 8 and proof

Lemma 8

Proof

Appendix B: Auxiliary results for simple optimization problems

Lemma 9

Proof

Lemma 10

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation