Skip to main content
Log in

A unified approach for minimizing composite norms

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose a first-order augmented Lagrangian algorithm (FALC) to solve the composite norm minimization problem

$$\begin{aligned} \begin{array}{ll} \min \limits _{X\in \mathbb{R }^{m\times n}}&\mu _1\Vert \sigma (\mathcal{F }(X)-G)\Vert _\alpha +\mu _2\Vert \mathcal{C }(X)-d\Vert _\beta ,\\ \text{ subject} \text{ to}&\mathcal{A }(X)-b\in \mathcal{Q }, \end{array} \end{aligned}$$

where \(\sigma (X)\) denotes the vector of singular values of \(X \in \mathbb{R }^{m\times n}\), the matrix norm \(\Vert \sigma (X)\Vert _{\alpha }\) denotes either the Frobenius, the nuclear, or the \(\ell _2\)-operator norm of \(X\), the vector norm \(\Vert .\Vert _{\beta }\) denotes either the \(\ell _1\)-norm, \(\ell _2\)-norm or the \(\ell _{\infty }\)-norm; \(\mathcal{Q }\) is a closed convex set and \(\mathcal{A }(.)\), \(\mathcal{C }(.)\), \(\mathcal{F }(.)\) are linear operators from \(\mathbb{R }^{m\times n}\) to vector spaces of appropriate dimensions. Basis pursuit, matrix completion, robust principal component pursuit (PCP), and stable PCP problems are all special cases of the composite norm minimization problem. Thus, FALC is able to solve all these problems in a unified manner. We show that any limit point of FALC iterate sequence is an optimal solution of the composite norm minimization problem. We also show that for all \(\epsilon >0\), the FALC iterates are \(\epsilon \)-feasible and \(\epsilon \)-optimal after \(\mathcal{O }(\log (\epsilon ^{-1}))\) iterations, which require \(\mathcal{O }(\epsilon ^{-1})\) constrained shrinkage operations and Euclidean projection onto the set \(\mathcal{Q }\). Surprisingly, on the problem sets we tested, FALC required only \(\mathcal{O }(\log (\epsilon ^{-1}))\) constrained shrinkage, instead of the \(\mathcal{O }(\epsilon ^{-1})\) worst case bound, to compute an \(\epsilon \)-feasible and \(\epsilon \)-optimal solution. To best of our knowledge, FALC is the first algorithm with a known complexity bound that solves the stable PCP problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Aybat, N.S., Chakraborty, A.: Fast reconstruction of CT images from parsimonious angular measurements via compressed sensing. Technical report, Siemens Corporate Research (2009)

  2. Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  3. Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  5. Becker, S., Bobin, J., Candès, E.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4, 1–39 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  6. Cai, J., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)

    Article  Google Scholar 

  7. Candès, E., Romberg, J.: Quantitative robust uncertainty principles and optimally sparse decompositions. Found. Comput. Math. 6, 227–254 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  8. Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006)

    Article  MATH  Google Scholar 

  9. Candès, E., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52, 5406–5425 (2006)

    Article  Google Scholar 

  10. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? (2009). Submitted for publication

  11. Cands, E.J., Recht, B.: Exact matrix completion via convex optimization. Found Comput Math 9, 717–772 (2008)

    Article  Google Scholar 

  12. d’Aspremont, A., Bach, F.R., Ghaoui, L.E.: Optimal solutions for sparse principle component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)

    MATH  MathSciNet  Google Scholar 

  13. d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49, 434–448 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  14. Daubechies, I., Fornasier, M., Loris, I.: Accelerated projected gradient method for linear inverse problems with sparsity constraints. J. Fourier Anal. Appl. 14, 764–792 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  15. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  16. El Ghaoui, L., Gahinet, P.: Rank minimization under lmi constraints: a framework for output feedback problems. In ; Proceedings of the European control conference (1993)

  17. Fazel, M., Hindi, H., Boyd, S.: Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In: Proceedings of American control conference, Denver, Colorado (2003)

  18. Fazel, M., Hindi, H., Boyd, S.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, pp. 2156–2162 (2003)

  19. Fazel, M., Hindi, H., Boyd, S.: Rank minimization and applications in system theory. In: American control conference, pp. 3273–3278 (2004)

  20. Fazel, M., Pong, T.K., Sun, D., Tseng, P.: Hankel matrix rank minimization with applications in system identification and realization (2012). Submitted for publication

  21. Figueiredo, M.A., Nowak, R., Wright, S.J.: Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1, 586–597 (2007)

    Article  Google Scholar 

  22. Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions (2010). ArXiv:0912.4571v2

  23. Hale, E.T., Yin, W., Zhang, Y.: A fixed-point continuation for \(\ell \) -regularized minimization with applications to compressed sensing. Rice University, Technical report (2007)

  24. Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell \) -minimization: methodology and convergence. SIAM J. Optim. 19, 1107–1130 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  25. Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principle component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)

    MATH  MathSciNet  Google Scholar 

  26. Koh, K., Kim, S.J., Boyd, S.: Solver for \(\ell \) -regularized least squares problems. Stanford University, Technical report (2007)

  27. Larsen, R.: Lanczos bidiagonalization with partial reorthogonalization. Technical report DAIMI PB-357, Department of Computer Science, Aarhus University (1998)

  28. Lewis, A.S.: The convex analysis of unitarily invariant matrix norms. J. Convex Anal. 2, 173–183 (1995)

    MATH  MathSciNet  Google Scholar 

  29. Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv:1009.5055v2 (2011)

  30. Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. Technical report UIUC Technical Report UILU-ENG-09-2214 (2009)

  31. Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  32. Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM. J. Matrix Anal. Appl. 31, 1235–1256 (2009)

    Article  MathSciNet  Google Scholar 

  33. Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. Ser. A 128, 321–353 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  34. http://www.netflixprize.com/

  35. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  36. Toh, K., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems (2010). (Preprint)

  37. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. (2008) (submitted to)

  38. Van den Berg, E., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  39. Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation. SIAM J. Sci. Comput. (2009) (to appear)

  40. Yang, J., Zhang, Y.: Alternating direction algorithms for l1-problems in compressive sensing. Technical Report TR09-37, CAAM, Rice University (2009)

  41. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\) minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  42. Zhou, Z., Li, X., Wright, J., Candès, E., Ma, Y.: Stable principle component pursuit. In: Proceedings of International Symposium on Information Theory (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. S. Aybat.

Additional information

Research partially supported by ONR N000140310514, DOE DE-FG02-08ER25856, DOE DE-AR0000235 and NSF DMS 10-16571 grants.

Appendices

Appendix A: Proofs of technical results

1.1 Lemma 5 and proof

Lemma 5

Let \(\mathcal{Q }\subset \mathbb{R }^q\) be nonempty closed convex set such that \(\{X\in \mathbb{R }^{m\times n}: \mathcal{A }(X)-b\in \mathcal{Q }\}\ne \emptyset \), where \(\mathcal{A }\) is surjective; and let \((X^{(k)}_{*},s^{(k)}_{*},y^{(k)}_{*})\) is an optimal solution to (15). Then, for all \(k\ge 1\),

$$\begin{aligned} \Vert y^{(k)}_{*}\Vert _2\le \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2+ 2~\min _{\tilde{y} \in \mathcal{Q }}\{\Vert \tilde{y}\Vert _2\}. \end{aligned}$$
(78)

Proof

From the first order optimality conditions for (15), we have \(y^{(k)}_{*}=\varPi _\mathcal{Q }(\mathcal{A }(X^{(k)}_{*})-b-\lambda ^{(k)}\theta _1^{(k)})\). Since Euclidean projection is nonexpansive, we have

$$\begin{aligned} \Vert y^{(k)}_{*}-\tilde{y}\Vert _2\le \Vert \mathcal{A }(X^{(k)}_{*})-b -\lambda ^{(k)}\theta _1^{(k)}-\tilde{y}\Vert _2 \quad \forall \tilde{y}\in \mathcal{Q }. \end{aligned}$$
(79)

The result now follows from the triangular inequality. \(\square \)

This result implies several simple bounds on \(\Vert y^{(k)}_{*}\Vert _2\). Since the initial iterate \(X^{(0)}\) is feasible, i.e. \(\mathcal{A }(X^{(0)}) - b \in \mathcal{Q }\), it follows that

$$\begin{aligned} \Vert y^{(k)}_{*}\Vert _2\le \eta _2^{(k)}:= \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2+ 2\Vert \mathcal{A }(X^{(0)}) - b\Vert _2.\quad \end{aligned}$$
(80)

Suppose \(0 \in \mathcal{Q }\). Then \(\Vert y^{(k)}_{*}\Vert _2\le \eta _2^{(k)}:= \sigma _{\max }(A)\Vert X^{(k)}_{*}\Vert _F +\Vert b+\lambda ^{(k)}\theta _1^{(k)}\Vert _2.\) When \(\mathcal{Q }\) is bounded with \(\mathcal{Q }\subseteq \{y: \Vert y\Vert _2 \le \eta _2\}\). Then, one can set \(\eta _2^{(k)}:=\eta _2\) for all \(k\ge 1\).

1.2 Lemma 6 and proof

Lemma 6

Fix \(\alpha \), \(\beta \in \{1,2,\infty \}\). Let

$$\begin{aligned} P(X,s,y)=\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta ) + f(X,s,y) \end{aligned}$$

where

$$\begin{aligned} f(X,s,y) = \frac{1}{2} \Vert \mathcal{A }(X)-y-b-\lambda \theta _1\Vert _2^2+\frac{1}{2} \Vert \mathcal{C }(X)-s-d-\lambda \theta _2\Vert _2^2. \end{aligned}$$

Suppose \((\bar{X},\bar{s},\bar{y})\) is \(\epsilon \)-optimal for the problem \(\min _{X,s,y}\{P(X,s,y):~y\in \mathcal{Q }\}\), i.e.

$$\begin{aligned} 0\le P(\bar{X},\bar{s},\bar{y})- \min _{X \in \mathbb{R }^{m\times n},~s\in \mathbb{R }^p,~y\in \mathcal{Q }\subset \mathbb{R }^q}P(X,s,y) \le \epsilon . \end{aligned}$$

Then we have

$$\begin{aligned}&\Vert \mathcal{C }(\bar{X})-\bar{s}-d-\lambda \theta _2\Vert _2 \le J(\beta ^*)\mu _2\lambda +\sigma _{max}(M)\sqrt{2\epsilon },\\&\Vert \mathcal{A }^*\left(\mathcal A (\bar{X})\!-\!\bar{y}\!-\!b\!-\!\lambda \theta _1\right)\!+\mathcal C ^*\left(\mathcal C (\bar{X})\!-\!\bar{s}\!-\!d\!-\!\lambda \theta _2\right)\Vert _F \!\le \! I(\alpha ^*)\mu _1\lambda \! +\! \sigma _{max}(M)\sqrt{2\epsilon }, \end{aligned}$$

where \(M = \left({\small \begin{array}{lll} -I&\quad 0&\quad C \\ 0&\quad -I&\quad A \\ \end{array}}\right),\) \(\frac{1}{\alpha ^*}+\frac{1}{\alpha }=1\) (resp. \(\frac{1}{\beta ^*}+\frac{1}{\beta }=1\)) is the Hölder conjugate of \(\alpha \) (resp. \(\beta \)) and the functions \(I(\cdot )\) and \(J(\cdot )\) are defined in (21).

In order to prove for Lemma 6, we need the following result.

Theorem 5

Let \(f:\mathbb{R }^{m\times n}\times \mathbb{R }^p \times \mathbb{R }^q \rightarrow \mathbb{R }\) denote a convex function with a Lipschitz continuous gradient \(\nabla f\) with a Lipschitz constant \(L\) with respect to the norm \(\Vert (X,s,y)\Vert =\sqrt{\Vert X\Vert _F^2+\Vert s\Vert _2^2+\Vert y\Vert _2^2}\).

Let \((X_*,s_*,y_*) \in \mathop {\mathrm{argmin}}_{X,s,y}\{\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta )+f(X,s,y): y\in \mathcal{Q }\}\). Suppose \((\bar{X},\bar{s},\bar{y}) \in \mathbb{R }^{m\times n} \times \mathbb{R }^p\times \mathbb{R }^q\) such that \(\bar{y}\in \mathcal{Q }\) satisfies

$$\begin{aligned}&\lambda \big (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta \big )+f(\bar{X},\bar{s},\bar{y}) \le \lambda \big (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta \big )\\&\quad +\,f(X_*,s_*,y_*)+ \epsilon \end{aligned}$$

for some \(\epsilon >0\). Then

$$\begin{aligned}&\Vert \nabla _X f(\bar{X}, \bar{s}, \bar{y})\Vert _F \le \big (\sqrt{2L\epsilon }+I(\alpha ^*) \lambda \mu _1\big ), \\&\Vert \nabla _s f(\bar{X}, \bar{s}, \bar{y})\Vert _2 \le \big (\sqrt{2L\epsilon } + J(\beta ^*)\lambda \mu _2\big ), \end{aligned}$$

where \(\frac{1}{\alpha ^*}+\frac{1}{\alpha }=1\) (resp. \(\frac{1}{\beta ^*}+\frac{1}{\beta }=1\)) is the Hölder conjugate of \(\alpha \) (resp. \(\beta \)) and the functions \(I(\cdot )\) and \(J(\cdot )\) are defined in (22).

Proof

Since \(\nabla f\) is Lipschitz continuous with constant \(L\), the triangular inequality for \(\Vert \sigma (.)\Vert _\alpha \) and \(\Vert .\Vert _\beta \) implies that for any \(X\in \mathbb{R }^{m\times n}\), \(s\in \mathbb{R }^p\) and \(y\in \mathbb{R }^q\)

$$\begin{aligned}&\lambda (\mu _1\Vert \sigma (X)\Vert _\alpha +\mu _2\Vert s\Vert _\beta )+f(X,s,y)\nonumber \\&\le \lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+ f(\bar{X},\bar{s},\bar{y})+\lambda (\mu _1\Vert \sigma (X-\bar{X})\Vert _\alpha +\mu _2\Vert s-\bar{s}\Vert _\beta )\nonumber \\&\quad + \left\langle \nabla _X f(\bar{X},\bar{s},\bar{y}),(X-\bar{X}) \right\rangle +\nabla _s f(\bar{X}, \bar{s},\bar{y})^T (s-\bar{s})+\nabla _y f(\bar{X}, \bar{s},\bar{y})^T (y-\bar{y})\nonumber \\&\quad +\frac{L}{2}\Vert X-\bar{X}\Vert _F^2 +\frac{L}{2}\Vert s-\bar{s}\Vert _2^2+\frac{L}{2}\Vert y-\bar{y}\Vert _2^2, \nonumber \end{aligned}$$

where \(\left\langle X,Y \right\rangle =\mathop {\mathbf{Tr}}(X^T Y)\in \mathbb{R }\) denotes the usual Euclidean inner product of \(X\in \mathbb{R }^{m\times n}\) and \(Y\in \mathbb{R }^{m\times n}\). Since \(X\), \(s\) and \(y\) are arbitrary, it follows that

$$\begin{aligned}&\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta )+f(X_*,s_*,y_*) \nonumber \\&\quad \le \lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+f(\bar{X},\bar{s},\bar{y}) \nonumber \\&\qquad +\min _{X\in \mathbb{R }^{m\times n}}\left\{ \left\langle \nabla _X f(\bar{X},\bar{s},\bar{y}),X-\bar{X} \right\rangle +\frac{L}{2}\Vert X-\bar{X}\Vert _F^2+\lambda \mu _1\Vert \sigma (X-\bar{X})\Vert _\alpha \right\} \nonumber \\&\qquad +\min _{s\in \mathbb{R }^p}\left\{ \nabla _s f(\bar{X},\bar{s},\bar{y})^T(s-\bar{s})+\frac{L}{2}\Vert s-\bar{s}\Vert _2^2+\lambda \mu _2\Vert s-\bar{s}\Vert _\beta \right\} \nonumber \\&\qquad +\min _{y\in \mathcal{Q }\subset \mathbb{R }^q}\left\{ \nabla _y f(\bar{X},\bar{s},\bar{y})^T(y-\bar{y})+\frac{L}{2}\Vert y-\bar{y}\Vert _2^2\right\} . \end{aligned}$$
(81)

The first minimization problem on the right hand side of (81) can be simplified as follows:

$$\begin{aligned}&\min _{X\in \mathbb{R }^{m\times n}}\left\{ \left\langle \nabla _X f(\bar{X},\bar{s},\bar{y}),X-\bar{X} \right\rangle +\frac{L}{2}\Vert X-\bar{X}\Vert _F^2 + \lambda \mu _1\Vert \sigma (X-\bar{X})\Vert _\alpha \right\} \nonumber \\&\quad =\max _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\min _{X\in \mathbb{R }^{m\times n}}\left\{ \frac{L}{2}\Vert X-\bar{X}\Vert _F^2+\left\langle \nabla _X f(\bar{X},\bar{s},\bar{y})+ W,~X-\bar{X} \right\rangle \right\} ,\nonumber \\ \end{aligned}$$
(82)
$$\begin{aligned}&\quad =\max _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\left\{ \frac{L}{2}\Vert X^*(W)-\bar{X}\Vert _F^2+\left\langle \nabla _X f(\bar{X},\bar{s},\bar{y})+W,~X^*(W)-\bar{X} \right\rangle \right\} , \nonumber \\&\quad =-\min _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\frac{\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})+W\Vert _F^2}{2L}, \end{aligned}$$
(83)

where \(X^*(W)=\bar{X}-\frac{\nabla _X f(\bar{X},\bar{s},\bar{y})+W}{L}\) is the minimizer of the inner minimization problem in (82).

The second minimization problem on the right hand side of (81) can be simplified as follows:

$$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \nabla _s f(\bar{X},\bar{s},\bar{y})^T(s-\bar{s})+\frac{L}{2}\Vert s-\bar{s}\Vert _2^2+\lambda \mu _2\Vert s-\bar{s}\Vert _\beta \right\} \nonumber \\&\quad =\max _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\min _{s\in \mathbb{R }^p}\left\{ \frac{L}{2}\Vert s-\bar{s}\Vert _2^2 +(\nabla _s f(\bar{X},\bar{s},\bar{y})+ u)^T(s-\bar{s})\right\} , \end{aligned}$$
(84)
$$\begin{aligned}&\quad =\max _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\left\{ \frac{L}{2}\Vert s^*(u)-\bar{s}\Vert _2^2 +(\nabla _sf(\bar{X},\bar{s},\bar{y})+u)^T(s^*(u)-\bar{s})\right\} , \nonumber \\&\quad =-\min _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\frac{\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})+u\Vert _2^2}{2L}, \end{aligned}$$
(85)

\(s^*(u)=\bar{s}-\frac{\nabla _s f(\bar{X},\bar{s},\bar{y})+u}{L}\) is the minimizer of the inner minimization problem in (84).

Since \(\bar{y}\in \mathcal{Q }\), the following is true for the third minimization problem on the right hand side of (81).

$$\begin{aligned} \min _{y\in \mathcal{Q }\subset \mathbb{R }^q}\left\{ \nabla _y f(\bar{X},\bar{s},\bar{y})^T(y-\bar{y})+\frac{L}{2}\Vert y-\bar{y}\Vert _2^2\right\} \le 0. \end{aligned}$$
(86)

Thus, (81), (83), (85) and (86) together imply that

$$\begin{aligned}&\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha \!+\!\mu _2\Vert s_*\Vert _\beta )\!+\!f(X_*,s_*,y_*) \le \lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha \!+\!\mu _2\Vert \bar{s}\Vert _\beta )\!\!+\!\!f(\bar{X},\bar{s},\bar{y}) \\&\quad -\min _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\frac{\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})+W\Vert _F^2}{2L}\\&\quad -\min _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\frac{\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})+u\Vert _2^2}{2L}. \end{aligned}$$

Since \(\Big (\lambda (\mu _1\Vert \sigma (\bar{X})\Vert _\alpha +\mu _2\Vert \bar{s}\Vert _\beta )+f(\bar{X},\bar{s},\bar{y})\Big ) -\Big (\lambda (\mu _1\Vert \sigma (X_*)\Vert _\alpha +\mu _2\Vert s_*\Vert _\beta )+f(X_*,s_*,y_*)\Big )\le \epsilon \), we have that

$$\begin{aligned} \min _{W:\Vert \sigma (W)\Vert _{\alpha ^*}\le \lambda \mu _1}\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\!+\!W\Vert _F^2 +\! \min _{u:\Vert u\Vert _{\beta ^*}\le \lambda \mu _2}\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})+u\Vert _2^2\le 2L \epsilon .\nonumber \\ \end{aligned}$$
(87)

From (21), it follows that \(\Vert W\Vert _F\le I(\alpha ^*)\Vert \sigma (W)\Vert _{\alpha ^*}\). Thus, (87) implies that

$$\begin{aligned} \min _{W:\Vert W\Vert _F\le I(\alpha ^*)\lambda \mu _1}\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})+W\Vert _F^2\le 2L \epsilon . \end{aligned}$$
(88)

Suppose \(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F> I(\alpha ^*)\lambda \mu _1\). Then the optimal solution of the optimization problem in (88) is

$$\begin{aligned} W^*=-I(\alpha ^*)\lambda \mu _1 \cdot \frac{\nabla _X f(\bar{X},\bar{s},\bar{y})}{\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F}. \end{aligned}$$

Then (87) implies that \((\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F-I(\alpha ^*)\lambda \mu _1)^2\le 2L\epsilon \), i.e. \(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F \le \sqrt{2L\epsilon }+I(\alpha ^*)\lambda \mu _1\). This is trivially true when \(\Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F\le I(\alpha ^*)\lambda \mu _1\). Therefore, we can conclude that always

$$\begin{aligned} \Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F\le \sqrt{2L\epsilon }+I(\alpha ^*)\lambda \mu _1. \end{aligned}$$

A similar analysis establishes that \(\Vert \nabla _s f(\bar{X},\bar{s},\bar{y})\Vert _2\le \sqrt{2L\epsilon }+J(\beta ^*)\lambda \mu _2\). \(\square \)

Now we are ready to prove Lemma 6.

Proof

Let \(f(X,s,y)=\frac{1}{2} \Vert \mathcal{A }(X)-y-b-\lambda \theta _1\Vert _2^2+\frac{1}{2} \Vert \mathcal{C }(X)-s-d-\lambda \theta _2\Vert _2^2\) and let \(\Vert (X,s,y)\Vert =\sqrt{\Vert X\Vert _F^2+\Vert s\Vert _2^2+\Vert y\Vert _2^2}\), then for any \(X_1, X_2 \in \mathbb{R }^{m\times n}\), \(s_1, s_2 \in \mathbb{R }^p\) and \(y_1, y_2 \in \mathbb{R }^q\), we have

$$\begin{aligned}&\Vert \nabla f(X_1, s_1, y_1)-\nabla f(X_2, s_2, y_2)\Vert ^2\\&\quad =\left\Vert\left( \begin{array}{l} \nabla _X f(X_1,s_1,y_1)-\nabla _X f(X_2,s_2,y_2)\\ \nabla _s f(X_1,s_1,y_1)-\nabla _s f(X_2,s_2, y_2)\\ \nabla _y f(X_1,s_1,y_1)-\nabla _y f(X_2,s_2, y_2) \end{array} \right)\right\Vert^2, \\&\quad = \Vert \nabla _X f(X_1,s_1,y_1)-\nabla _X f(X_2,s_2,y_2)\Vert _F^2+\Vert \nabla _s f(X_1,s_1,y_1)\\&\quad \quad -\,\nabla _s f(X_2,s_2,y_2)\Vert _2^2\\&\quad \quad +\Vert \nabla _y f(X_1,s_1,y_1)-\nabla _y f(X_2,s_2,y_2)\Vert _2^2,\\&\quad =\Vert \mathcal{A }^*(\mathcal{A }(X_1-X_2)-y_1+y_2)+\mathcal{C }^*(\mathcal{C }(X_1-X_2)-s_1+s_2)\Vert _F^2\\&\quad \quad +\Vert \mathcal{C }(X_1-X_2)-s_1+s_2\Vert _2^2+\Vert \mathcal{A }(X_1-X_2)-y_1+y_2\Vert _2^2,\\&\quad =\Vert A^T(A\ {\mathop {\mathbf{vec}}}(X_1-X_2)-y_1+y_2)+C^T(C\ {\mathop {\mathbf{vec}}}(X_1-X_2)-s_1+s_2)\Vert _2^2\\&\quad \quad +\Vert C\ {\mathop {\mathbf{vec}}}(X_1-X_2)-s_1+s_2\Vert _2^2+\Vert A\ {\mathop {\mathbf{vec}}}(X_1-X_2)-y_1+y_2\Vert _2^2,\\&\quad =\left\Vert M^TM \left( \begin{array}{l} s_1-s_2 \\ y_1-y_2 \\ {\mathop {\mathbf{vec}}}(X_1-X_2)\\ \end{array} \right)\right\Vert^2_2. \end{aligned}$$

Hence,

$$\begin{aligned}&\Vert \nabla f(X_1, s_1, y_1)-\nabla f(X_2, s_2, y_2)\Vert \le \ \sigma _{\max }^2(M)~\left\Vert \left( \begin{array}{l} s_1-s_2 \\ y_1-y_2 \\ {\mathop {\mathbf{vec}}}(X_1-X_2)\\ \end{array} \right)\right\Vert_2, \\&\quad =\ \sigma _{\max }^2(M)~\sqrt{\Vert X_1-X_2\Vert _F^2+\Vert s_1-s_2\Vert _2^2+\Vert y_1-y_2\Vert _2^2}, \nonumber \\&\quad =\ \sigma _{\max }^2(M)~\Vert (X_1,s_1,y_1)-(X_2,s_2,y_2)\Vert , \end{aligned}$$

where \(\sigma _{\max }(M)\) is the maximum singular-value of \(M\). Thus, \(f:\mathbb{R }^{m\times n}\times \mathbb{R }^p\times \mathbb{R }^q \rightarrow \mathbb{R }\) is a convex function and \(\nabla f\) is Lipschitz continuous with respect to \(\Vert .\Vert \) with Lipschitz constant \(L=\sigma _{\max }^2(M)\).

Since \((\bar{X},\bar{s},\bar{y})\) is an \(\epsilon \)-optimal solution to the problem \(\min \{P(X,s,y):X\in \mathbb{R }^{m\times n}, s\in \mathbb{R }^p, y\in \mathcal{Q }\subset \mathbb{R }^q\}\), Theorem 5 guarantees that

$$\begin{aligned} \Vert \nabla _X f(\bar{X},\bar{s},\bar{y})\Vert _F&= \Vert \mathcal{A }^*(\mathcal A (\bar{X})-\bar{y}-b-\lambda \theta _1) + \mathcal C ^*(\mathcal C (\bar{X})-\bar{s}-d-\lambda \theta _2)\Vert _F \nonumber \\&\le \sqrt{2\epsilon }~\sigma _{\max }(M)+I(\alpha ^*)\lambda \mu _1, \end{aligned}$$
(89)
$$\begin{aligned} \Vert \nabla _s f(\bar{X},\bar{s},\bar{y})\Vert _2&= \Vert \mathcal{C }(\bar{X})-\bar{s}-d-\lambda \theta _2\Vert _2 \le \sqrt{2\epsilon }~\sigma _{\max }(M)+J(\beta ^*)\lambda \mu _2.\nonumber \\ \end{aligned}$$
(90)

\(\square \)

1.3 Lemma 7 and proof

Lemma 7

Let \(\mathcal{Q }\subset \mathbb{R }^q\) be a nonempty, closed, and convex set. Then for all \(\tilde{y}\in \mathbb{R }^q\) and \(\lambda >0\), we have \(\varPi _\mathcal{Q }(\lambda \tilde{y})=\lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y})\), or equivalently, \(\varPi _\mathcal{Q }(\tilde{y})=\lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y}/\lambda )\), where \(\mathcal{Q }/\lambda = \{x: \lambda x \in \mathcal{Q }\}\).

Proof

Fix \(\tilde{y}\in \mathbb{R }^q\) and \(\lambda >0\). Then

$$\begin{aligned} \varPi _\mathcal{Q }(\lambda \tilde{y})=\mathop {\mathrm{argmin}}_{x\in \mathcal{Q }}\Vert x-\lambda \tilde{y}\Vert _2 =\lambda \mathop {\mathrm{argmin}}_{y\in \mathcal{Q }/\lambda }\Vert y-\tilde{y}\Vert _2 = \lambda ~\varPi _{\mathcal{Q }/\lambda }(\tilde{y}). \end{aligned}$$
(91)

\(\square \)

1.4 Lemma 8 and proof

Lemma 8

Let \((X_{*},s_{*},y_{*})\) be an optimal solution to (13) and suppose that \(\Vert \varPi _{\mathcal{Q }}\left(y^{(k)}_p\right) -y^{(k)}\Vert _2\le \xi ^{(k)}\) for some \(k\ge 1\), where \(y^{(k)}_p:=y^{(k)}-\frac{1}{L}\nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})\). Then we have

$$\begin{aligned}&-\left\langle \nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)}),~y_{*}-y^{(k)} \right\rangle \le L\xi ^{(k)}\Vert y_{*}-y^{(k)}\Vert _2\nonumber \\&\quad +\xi ^{(k)}\Vert \nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})\Vert _2. \end{aligned}$$
(92)

Proof

From the definition of \(\varPi _\mathcal{Q }(.)\), we have

$$\begin{aligned}&\left\langle \varPi _\mathcal{Q }(y^{(k)}_p)-y^{(k)}_p,~y-\varPi _\mathcal{Q }(y^{(k)}_p) \right\rangle \ge 0, \quad \forall ~y\in \mathcal{Q },\nonumber \\&\quad \Rightarrow \left\langle \varPi _\mathcal{Q }(y^{(k)}_p)-y^{(k)},~y-y^{(k)} \right\rangle +\left\langle \varPi _\mathcal{Q }(y^{(k)}_p)-y^{(k)},~y^{(k)}-\varPi _\mathcal{Q }(y^{(k)}_p) \right\rangle \nonumber \\&\qquad \quad +\,\left\langle y^{(k)}-y^{(k)}_p,~y-y^{(k)} \right\rangle +\left\langle y^{(k)}-y^{(k)}_p,~y^{(k)}-\varPi _\mathcal{Q }(y^{(k)}_p) \right\rangle \ge 0, \quad \forall ~y\in \mathcal{Q }.\nonumber \\ \end{aligned}$$
(93)

Since \(y_{*}\in \mathcal{Q }\), \(y^{(k)}-y^{(k)}_p=\frac{1}{L}\nabla _y f^{(k)}(X^{(k)},s^{(k)},y^{(k)})\) and \(\Vert \varPi _{\mathcal{Q }}\left(y^{(k)}_p\right)-y^{(k)}\Vert _2\le \xi ^{(k)}\), (92) follows from (93). \(\square \)

Appendix B: Auxiliary results for simple optimization problems

Lemma 9

Let \((\mathcal{E },\Vert .\Vert )\) be a normed vector space, \(f:\mathcal{E }\rightarrow \mathbb{R }\) be a strictly convex function and \(\chi \subset \mathcal{E }\) be a closed, convex set with a non-empty interior. Let \(\bar{x}=\mathop {\mathrm{argmin}}_{x\in \chi }f(x)\) and \(x^*=\mathop {\mathrm{argmin}}_{x\in \mathcal{E }}f(x)\). If \(x^*\not \in \chi \), then \(\bar{x}\in \mathop {\mathbf{bd}}\chi \), where \(\mathop {\mathbf{bd}}\chi \) denotes the boundary of \(\chi \).

Proof

We will establish the result by contradiction. Assume \(\bar{x}\) is in the interior of \(\chi \), i.e. \(\bar{x}\in \mathop {\mathbf{int}}(\chi )\). Then \(\exists \;\epsilon >0\) such that \(B(\bar{x},\epsilon )=\{x\in \mathcal{E }\;:\;\Vert x-\bar{x}\Vert <\epsilon \}\subset \chi \). Since \(f\) is strictly convex and \(x^*\ne \bar{x}\), \(f(x^*)<f(\bar{x})\). Choose \(0<\lambda <\frac{\epsilon }{\Vert \bar{x}-x^*\Vert }<1\) so that \(\lambda x^* +(1-\lambda ) \bar{x} \in B(\bar{x},\epsilon ) \subset \chi \). Since \(f\) is strictly convex,

$$\begin{aligned} f(\lambda x^* + (1-\lambda ) \bar{x})<\lambda f(x^*) + (1-\lambda ) f(\bar{x}) < f(\bar{x}). \end{aligned}$$
(94)

However, \(\lambda x^* + (1-\lambda ) \bar{x} \in B(\bar{x},\epsilon )\subset \chi \) and \(f(\lambda x^* + (1-\lambda ) \bar{x})<f(\bar{x})\) contradicts the fact that \(f(\bar{x})<f(x)\) for all \(x\in \chi \). Therefore, \(\bar{x} \not \in \mathop {\mathbf{int}}(\chi )\). Since \(\bar{x}\in \chi \), it follows that \(\bar{x}\in \mathop {\mathbf{bd}}\chi \). \(\square \)

Next, we collect together complexity results for optimization problems of the form

$$\begin{aligned}&\min _{X\in \mathbb{R }^{m\times n}}&\left\{ \lambda \Vert \sigma (X)\Vert _{\alpha }+ \frac{1}{2}\Vert X-\tilde{X}\Vert _F^2: \Vert \sigma (X)\Vert _{\alpha } \le \eta \right\} \end{aligned}$$
$$\begin{aligned}&\min _{s\in \mathbb{R }^p}&\left\{ \lambda \Vert s\Vert _{\beta } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _{\beta } \le \eta \right\} \end{aligned}$$

that need to be solved in each Algorithm APG update step, displayed in Fig. 1.

Lemma 10

Let \(\bar{X} = \mathop {\mathrm{argmin}}_{X \in \mathbb{R }^{m\times n}}\big \{\lambda \Vert \sigma (X)\Vert _{\alpha } + \frac{1}{2}\Vert X-\tilde{X}\Vert _F^2: \Vert \sigma (X)\Vert _{\alpha } \le \eta \big \}\) of the constrained matrix shrinkage problem. Then

$$\begin{aligned} \bar{X} = U \mathop {\mathbf{diag}}(\bar{s}) V^T, \end{aligned}$$

where \(U\mathop {\mathbf{diag}}(\sigma )V^T\) denotes the SVD of \(\tilde{X}\) such that \(\sigma \in \mathbb{R }_+^r\) and \(r=\mathop {\mathbf{rank}}(\tilde{X})\); and \(\bar{s}\) denotes the optimal solution of the constrained vector shrinkage problem

$$\begin{aligned} \min _{s \in \mathbb{R }^r}\Big \{\lambda \Vert s\Vert _{\alpha } + \frac{1}{2}\Vert s-\sigma \Vert _2^2:~\Vert s\Vert _{\alpha } \le \eta \Big \}. \end{aligned}$$

Since the worst case complexity of computing the SVD of \(\tilde{X}\) is \(\mathcal{O }(\min \{n^2m,m^2n\})\) the complexity of the computing \(\bar{X}\) is \(\mathcal{O }(\min \{n^2m,m^2n\} + T_v(r,\alpha ))\), where \(T_v(r,\alpha )\) denotes the complexity of computing the solution of an \(r\)-dimensional constrained vector shrinkage problem with norm \(\Vert .\Vert _{\alpha }\). The function

$$\begin{aligned} T_v(p,\alpha ) = \left\{ \begin{array}{ll} \mathcal{O }(p\log (p))&\alpha = 1, \infty ,\\ \mathcal{O }(p),&\alpha = 2,\\ \end{array}\right. \end{aligned}$$
(95)

Proof

The standard results in non-linear convex optimization over matrices implies that \(\bar{X}\) is of the form \(\bar{X} = U \mathop {\mathbf{diag}}(\bar{s}) V^T\) (see Corollary 2.5 in [28]).

Now, consider the vector constrained shrinkage problem

$$\begin{aligned} \min _{s\in \mathbb{R }^p}\Big \{\lambda \Vert s\Vert _{\beta } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:~\Vert s\Vert _{\beta } \le \eta \Big \}. \end{aligned}$$
  1. (i)

    \(\beta = 1\): First considered the unconstrained case, i.e. \(\eta =\infty \). The unconstrained solution \(s^*\) has a closed form \(s^*=\text{ sign}(\tilde{s})\,\odot \,\max \{|\tilde{s}|-\lambda \mathbf{1}, \mathbf{0}\}\) and can be computed with \(\mathcal{O }(p)\) complexity, where \(\odot \) denotes componentwise multiplication and \(\mathbf{1}\) is a vector of ones. When \(\eta <\infty \), the constrained optimal solution, \(\bar{s}\), can be computed with \(\mathcal{O }(p\log (p))\) complexity. See Lemma A.4 in [1].

  2. (ii)

    \(\beta = 2\): First considered the unconstrained case, i.e. \(\eta =\infty \). Since \(\ell _2\)-norm is self dual, \(\lambda \Vert s\Vert _2 = \max \{u^Ts: \Vert u\Vert _2 \le 1\}\). Thus,

    $$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \lambda \Vert s\Vert _2+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} \min _{s\in \mathbb{R }^p}\ \max _{u:\ \Vert u\Vert _2\le \lambda } \left\{ u^T s + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _2\le \lambda }\ \min _{s \in \mathbb{R }^p}\left\{ u^Ts +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _2\le \lambda } \left\{ u^T (\tilde{s}-u)+\frac{1}{2}\Vert u\Vert _2^2\right\} , \\&\quad =\frac{1}{2}\Vert \tilde{s}\Vert _2^2-\min _{u:\ \Vert u\Vert _2\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2, \nonumber \end{aligned}$$
    (96)

    where (96) follows from the fact that \(s^*(u):=\mathop {\mathrm{argmin}}_{s\in \mathbb{R }^p} \{u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\}=\tilde{s}-u\). Define

    $$\begin{aligned} u^*:=\mathop {\mathrm{argmin}}_{u:\ \Vert u\Vert _2\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2=\tilde{s}~\min \left\{ \frac{\lambda }{\Vert \tilde{s}\Vert _2},\ \mathbf{1}\right\} . \end{aligned}$$

    Then the unconstrained optimal solution \(s^*=s^*(u^*)=\tilde{s}\max \left\{ 1-\frac{\lambda }{\Vert \tilde{s}\Vert _2},\ 0\right\} \) and the complexity of computing \(\bar{s}\) is \(\mathcal{O }(p)\). Next, consider the constrained optimization problem, i.e. \(\eta <\infty \). The constrained optimum \(\bar{s}=s^*\), whenever \(s^*\) is feasible, i.e. \(\Vert s^*\Vert _2 \le \eta \). Since \(f(s):=\lambda \Vert s\Vert _2\!+\!\frac{1}{2}\Vert s\!-\!\tilde{s}\Vert _2^2\) is strongly convex, Lemma 9 implies that \(\Vert \bar{s}\Vert _{2} \!=\! \eta \) whenever \(\Vert s^*\Vert _2 \!>\! \eta \). Thus,

    $$\begin{aligned} \min \Big \{\lambda \Vert s\Vert _2+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\ \Vert s\Vert _2\le \eta \Big \} = \lambda \eta + \min \Big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2^2 = \eta ^2\Big \}. \end{aligned}$$

    The unique KKT point for the optimization problem \(\min \big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \frac{1}{2}\Vert s\Vert _2^2 = \frac{\eta ^2}{2}\big \}\), is given by \(\bar{s} = \eta \frac{\tilde{s}}{\Vert \tilde{s}\Vert }\) and KKT multiplier for the constraint \(\frac{1}{2}\Vert s\Vert _2^2 = \frac{\eta ^2}{2}\) is \(\vartheta = \frac{\Vert \tilde{s}\Vert _2}{\eta } -1\). It is easy to check that \(\vartheta > 0\) whenever \(\Vert s^*\Vert _2 > \eta \). Thus, \(\bar{s}\) is optimal for the convex optimization problem \(\min \Big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2^2 \le \eta ^2\Big \}\); consequently, optimal for equality constrained optimization problem \(\min \big \{ \frac{1}{2}\Vert s - \tilde{s}\Vert _2^2 : \Vert s\Vert _2 = \eta \big \}\). Hence, the complexity of computing \(\bar{s}\) is \(\mathcal{O }(p)\).

  3. (iii)

    \(\beta = \infty \): First consider the unconstrained problem. Since \(\ell _1\)-norm is the dual norm of the \(\ell _{\infty }\)-norm, we have that

    $$\begin{aligned}&\min _{s\in \mathbb{R }^p}\left\{ \lambda \Vert s\Vert _\infty +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} \min _{s\in \mathbb{R }^p}\ \max _{u:\ \Vert u\Vert _1\le \lambda } \left\{ u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\&\quad =\max _{u:\ \Vert u\Vert _1\le \lambda }\ \min _{s \in \mathbb{R }^p}\left\{ u^Ts+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\right\} , \nonumber \\ {}&\quad =\max _{u:\ \Vert u\Vert _1\le \lambda } \left\{ u^T (\tilde{s}-u) + \frac{1}{2}\Vert u\Vert _2^2\right\} , \\&\quad =\frac{1}{2}\Vert \tilde{s}\Vert _2^2-\min _{u:\ \Vert u\Vert _1\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2, \nonumber \end{aligned}$$
    (97)

    where (97) follows from the fact that \(s^*(u):=\mathop {\mathrm{argmin}}_{s\in \mathbb{R }^p}\{u^T s+\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\}=\tilde{s}-u\). The result in (i) implies that complexity of computing \(u^*=\min _{u:\ \Vert u\Vert _1\le \lambda } \frac{1}{2}\Vert u-\tilde{s}\Vert _2^2\) is \(\mathcal{O }(p\log (p))\). Thus, the unconstrained optimal solution \(s^* = s^*(u^*) = \tilde{s} - u^*\) can be computed in \(\mathcal{O }(p\log (p))\) operations. Next, consider the constrained optimization problem. The constrained optimum, \(\bar{s}=s^*\) whenever \(s^*\) is feasible, i.e. \(\Vert s^*\Vert _{\infty } \le \eta \). Since \(f(s) = \lambda \Vert s\Vert _{\infty } + \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2\) is strictly convex, Lemma 9 implies that \(\Vert \bar{s}\Vert _{\infty } = \eta \), whenever \(\Vert s^*\Vert _{\infty } > \eta \). Therefore,

    $$\begin{aligned} \min \left\{ \lambda \Vert s\Vert _\infty +\frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty \le \eta \right\} = \lambda \eta + \min \left\{ \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty =\eta \right\} . \end{aligned}$$

    Then, it is easy to check \(\text{ sign}(\bar{s}_i) = \text{ sign}(\tilde{s}_i)\) for all \(i = 1, \ldots , p\). Moreover, \(\Vert s^*\Vert _\infty >\eta \) implies that \(\Vert \tilde{s}\Vert _\infty >\eta \). These two facts imply that

    $$\begin{aligned} \min \left\{ \frac{1}{2}\Vert s-\tilde{s}\Vert _2^2:\Vert s\Vert _\infty =\eta \right\} = \min \left\{ \frac{1}{2}\Vert s-\left|\tilde{s}\right|\Vert _2^2: 0 \le s_i \le \eta \right\} . \end{aligned}$$

    For \(1\!\le \! i\!\le \! p\), we have \(\min \{\left|\tilde{s}_i\right|,\eta \}\!=\!\mathop {\mathrm{argmin}}_{s_i\in \mathbb{R }}\big \{\frac{1}{2}(s_i \!-\! \left|\tilde{s}_i\right|)^2: 0 \!\le \! s_i \!\le \! \eta \big \}\). Thus, it follows that \(\bar{s}\!=\! \text{ sign}(\tilde{s}) \odot \min \{|\tilde{s}|, \eta \mathbf{1}\}\). Hence the complexity of computing \(\bar{s}\) is \(\mathcal{O }(p\log (p))\).

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aybat, N.S., Iyengar, G. A unified approach for minimizing composite norms. Math. Program. 144, 181–226 (2014). https://doi.org/10.1007/s10107-012-0622-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-012-0622-z

Keywords

Mathematics Subject Classification (2000)

Navigation