Skip to main content
Log in

Decomposable norm minimization with proximal-gradient homotopy algorithm

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We study the convergence rate of the proximal-gradient homotopy algorithm applied to norm-regularized linear least squares problems, for a general class of norms. The homotopy algorithm reduces the regularization parameter in a series of steps, and uses a proximal-gradient algorithm to solve the problem at each step. Proximal-gradient algorithm has a linear rate of convergence given that the objective function is strongly convex, and the gradient of the smooth component of the objective function is Lipschitz continuous. In many applications, the objective function in this type of problem is not strongly convex, especially when the problem is high-dimensional and regularizers are chosen that induce sparsity or low-dimensionality. We show that if the linear sampling matrix satisfies certain assumptions and the regularizing norm is decomposable, proximal-gradient homotopy algorithm converges with a linear rate even though the objective function is not strongly convex. Our result generalizes results on the linear convergence of homotopy algorithm for \(\ell _1\)-regularized least squares problems. Numerical experiments are presented that support the theoretical convergence rate analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. For general psd B, the example are \(A_i = B^{-\frac{1}{2}} A_i' \) with \(A_{i}' \sim \mathcal {N}(0,I_n)\) or \(A_{i,j}'\) Rademacher for all j.

References

  1. Agarwal, A., Negahban, S., Wainwright, M.J.: Fast global convergence rates of gradient methods for high-dimensional statistical recovery. In: NIPS, vol. 23, pp. 37–45 (2010)

  2. Bunea, F., Tsybakov, A., Wegkamp, M., et al.: Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1, 169–194 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Candes, E., Plan, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inf. Theory 57(4), 2342–2359 (2011)

    Article  MathSciNet  Google Scholar 

  5. Candès, E., Recht, B.: Simple bounds for recovering low-complexity models. Math. Program. 141(1–2), 577–589 (2013)

  6. Candes, E., Tao, T.: Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Candes, E., Tao, T.: he dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann. Stat. 35(6), 2313–2351 (2007)

    Article  MATH  Google Scholar 

  8. Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gordon, Y.: Some inequalities for gaussian processes and applications. Israel J. Math. 50(4), 265–289 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57(3), 1548–1566 (2011)

    Article  MathSciNet  Google Scholar 

  13. Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell _1\)-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hou, K., Zhou, Z., So, A.M., Luo, Z.q.: On the linear convergence of the proximal gradient method for trace norm regularization. In: Advances in Neural Information Processing Systems, pp. 710–718 (2013)

  15. Jain, P., Meka, R., Dhillon, I.S.: Guaranteed rank minimization via singular value projection. NIPS 23, 937–945 (2010)

    Google Scholar 

  16. Jin, R., Yang, T., Zhu, S.: A new analysis of compressive sensing by stochastic proximal gradient descent. CoRR abs/1304.4680 (2013)

  17. Ledoux, M., Talagrand, M.: Probability in Banach Spaces: Isoperimetry and Processes, vol. 23. Springer, New York (2013)

    MATH  Google Scholar 

  18. Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31(3), 1235–1256 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lounici, K., Pontil, M., Van De Geer, S., Tsybakov, A.B., et al.: Oracle inequalities and optimal inference under group sparsity. Ann. Stat. 39(4), 2164–2204 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)

  21. Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30(2), 408–425 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  22. Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  23. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010)

    MathSciNet  MATH  Google Scholar 

  24. Mendelson, S., Pajor, A., Tomczak-Jaegermann, N.: Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17(4), 1248–1282 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien (french). CR Acad. Sci. Paris 255, 2897–2899 (1962)

    MATH  Google Scholar 

  26. Needell, D., Tropp, J.A.: Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harm. Anal. 26(3), 301–321 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of \(m\)-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  28. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  29. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nesterov, Y., Nemirovski, A.: On first-order algorithms for \(\ell _1\)/nuclear norm minimization. Acta Numer. 22, 509–575 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  31. Nesterov, Y., Nesterov, I.E.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Amsterdam (2004)

    MATH  Google Scholar 

  32. Nguyen, N., Needell, D., Woolf, T.: Linear convergence of stochastic iterative greedy algorithms with sparse constraints. arXiv preprint arXiv:1407.0088 (2014)

  33. Raskutti, G., Wainwright, M.J., Yu, B.: Minimax rates of estimation for high-dimensional linear regression over-balls. IEEE Trans. Inf. Theory 57(10), 6976–6994 (2011)

    Article  MathSciNet  Google Scholar 

  34. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  35. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  36. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  37. Rohde, A., Tsybakov, A.B., et al.: Estimation of high-dimensional low-rank matrices. Ann. Stat. 39(2), 887–930 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  38. Shalev-Shwartz, S., Gonen, A., Shamir, O.: Large-scale convex minimization with a low-rank constraint. arXiv preprint arXiv:1106.1622 (2011)

  39. Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. Optim. 20(6), 2807–2832 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  40. Talagrand, M.: The Generic Chaining, vol. 154. Springer, Berlin (2005)

    MATH  Google Scholar 

  41. Toh, K.C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim. 6(615–640), 15 (2010)

    MathSciNet  MATH  Google Scholar 

  42. Van De Geer, S.A., Bühlmann, P., et al.: On the conditions used to prove oracle results for the lasso. Electron. J. Stat. 3, 1360–1392 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  43. Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation. SIAM J. Sci. Comput. 32(4), 1832–1857 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  44. Wright, S.J., Nowak, R.D., Figueiredo, M.A.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  45. Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23(2), 1062–1091 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  46. Zhang, H., Jiang, J., Luo, Z.Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. Journal of the Operations Research Society of China 1(2), 163–186 (2013)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors are greatly indebted to Dr. Lin Xiao from Microsoft Research, Redmond, for his many valuable comments and suggestions. We thank Amin Jalali for his comments and helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Eghbali.

Additional information

This material is based upon work supported by the National Science Foundation under Grant No. ECCS-0847077, and in part by the Office of Naval Research under Grant No. N00014-12-1-1002.

Appendices

Appendix 1

In this section we give a lower bound on the number of measurements m that suffice for the existence of \(r>1\) in Assumption 1 with high probability when A is sampled from a certain class of distributions. To simplify the notation we assume that \(B = I\); therefore, \(\langle {x} , {y} \rangle = x^T y\). Given a random variable z the sub-Gaussian norm of z is defined as:

$$\begin{aligned} {\left\| z\right\| }_{\psi _2} = \inf \{\beta > 0 \,| \, \mathbb {E} \psi _2\left( \frac{|z|}{\beta }\right) \le 1\}, \end{aligned}$$

where \(\psi _2(x) = e^{x^2}-1\). For an n dimensional random vector \(w \sim P\) the sub-Gaussian norm is defined as

$$\begin{aligned} {\left\| w\right\| }_{\psi _2} = \sup _{u \in S^{n-1}} {\left\| \langle {w} , {u} \rangle \right\| }_{\psi _2}. \end{aligned}$$

P is called isotropic if \(E \big [\langle {w} , {u} \rangle ^2\big ] = 1\) for all \(u \in S^{n-1}\). Two important examples of sub-Gaussian random variables are Gaussian and bounded random variables. Suppose \(A: \mathbb {R}^{n} \mapsto \mathbb {R}^{m}\) is given by:

$$\begin{aligned} (Ax)_i = \frac{1}{\sqrt{m}}\langle {A_i} , {x} \rangle \quad \forall i \in \{1,2,\ldots ,m\}, \end{aligned}$$
(35)

where \(A_i\), \(1\le i \le m\) are iid samples from an isotropic sub-Gaussian distribution P on \(\mathbb {R}^n\). Two important examples are standard Gaussian vector \(A_{i}\sim \mathcal {N}(0,I_n)\) and random vector of independent Rademacher variables.Footnote 1 We want to bound the following probabilities for \(\theta \in (0,1)\):

$$\begin{aligned} P(\rho _{-}(A,k)<&1- \theta ) \end{aligned}$$
(36)
$$\begin{aligned} P(\rho _{+}(A,k)>&1+ \theta ). \end{aligned}$$
(37)

When \(A_{i} \sim \mathcal {N}(0,I_n)\) for all i, one can use the generalization of Slepian’s lemma by Gordon [11] alongside concentration inequalities for Lipschitz function of Gaussian random variable to derive (see, for example, [17, chapter 15]):

$$\begin{aligned}&P(\sqrt{\rho _{-}(A,k)} < \sqrt{\frac{m}{m+1}} - \theta ) \le e^{-\frac{m {\theta }^2}{8}},\\&P(\sqrt{\rho _{+}(A,k)} > 1+ \theta ) \le e^{-\frac{m {\theta }^2}{8}}, \end{aligned}$$

whenever,

$$\begin{aligned} {\theta } \ge \frac{2 G(k)}{\sqrt{m}}. \end{aligned}$$

Here, G is defined as:

$$\begin{aligned} G(k) := \mathbb {E} \sup _{u \in \sqrt{k} \mathcal {B}_{{\left\| \cdot \right\| }} \cap S^{n-1}}|\langle {u} , {g} \rangle |, \end{aligned}$$

where \(g \sim \mathcal {N}(0,I_n)\). For sub-Gaussian case, we use a result by Mendelson et al.[24, Theorem 2.3]. Using Talgrand’s generic chaining theorem [40, Theorem 2.1.1], the authors have given a result, which similar to the Gaussian case depends on G(k). Their result in our notation states:

Proposition 3

Suppose A is given by (35). If P is an isotropic distribution and \({\left\| A_1\right\| }_{\psi _2} \le \alpha \), then there exist constants \(c_1\) and \(c_2\) such that

$$\begin{aligned} \rho _{-}(A/\sqrt{m},k)\ge & {} 1 - \theta , \end{aligned}$$
(38)
$$\begin{aligned} \rho _{+}(A/\sqrt{m},k)\le & {} 1 + \theta , \end{aligned}$$
(39)

with probability exceeding \(1 - \exp {(-c_2 \theta ^2 m / \alpha ^4)}\) whenever

$$\begin{aligned} \theta \ge \frac{c_1 \alpha ^2 G(k)}{\sqrt{m}}. \end{aligned}$$

Suppose \(\lambda _\mathrm{tgt} = 4 {\left\| A^* z\right\| }^*\), which sets \(\gamma = \frac{5+4\delta }{3-4\delta }\). We can state the following proposition based on Proposition 3 :

Proposition 4

Let \(r>1\), \({\tilde{k}} = 36 r c k_{0} (1+\gamma )\gamma _\mathrm{inc}\) and \({\bar{k}} = c k_0(1+\gamma )^2\). If \(m \ge \frac{{c_1} \alpha ^4 }{(r-1)^2} (G(2{\tilde{k}})^2 + r^2 G({\bar{k}})^2)\), then r satisfies Assumption 1 with probability exceeding \(1 - \exp (c_2 (r-1)^2 m / r^2\alpha ^2)\).

The proof is a simple adaptation of proof of Theorem 1.4 in [24] which we omit here. To compare this with the number of measurements sufficient for successful recovery within a given accuracy, by combining (59) in the proof Lemma 1 and Proposition 3 we get:

Proposition 5

Let \(r>1\), \({\bar{k}} = c k_0(1+\gamma )^2\) and \(x^* \in {{\mathrm{\arg \min }}}\phi _{\lambda }(x)\). If \(m \ge \frac{{c_1} \alpha ^4 r^2}{(r-1)^2} G({\bar{k}})^2\), then \({\left\| x^* - x_0\right\| }_2 \le {c_2 r\lambda \sqrt{ck_{0}} }\) with probability exceeding \(1 - \exp (c_2 (r-1)^2 m / r^2\alpha ^2)\).

Note that this bound on m in case of \(l_1\), \(l_{1,2}\) and nuclear norms orderwise matches the lower bounds given by minimax rates in [19, 33] and [37].

Appendix 2

1.1 Proof of Theorem 1

Sufficiency. First consider the case where \(k = 1\) and \(x = \gamma _1 a_1\) with \(\gamma _1 > 0\). Note that \(a_1 \in \partial {\left\| x\right\| }=\partial {\left\| a_1\right\| }\) because \({\left\| a_1\right\| }^*= 1\) for all \(a_1 \in \mathcal {G}_{{\left\| \cdot \right\| }}\) and \(\langle {a_1} , {x} \rangle = \gamma _1 = {\left\| x\right\| }\). Define:

$$\begin{aligned} C = \{\xi - a_1 | \xi \in \partial {\left\| a_1\right\| }\}. \end{aligned}$$

Note that C is a convex set that contains the origin. Moreover, C is orthogonal to \(a_1\). We claim that (9) is satisfied with \(T_{a_1}^\bot = {{\mathrm{span}}}{C}\). To establish the claim, we first prove that C is symmetric and is contained in the dual norm ball. Let \(v\in C\) and \(\xi = a_1 + v \in \partial {{\left\| a_1\right\| }}\). By (4), \(\langle {a_1} , {\xi } \rangle = {\left\| \xi \right\| }^* = 1\). Therefore,

$$\begin{aligned} a_1 \in {{\mathrm{\arg \max }}}_{a \in \mathcal {G}_{{\left\| \cdot \right\| }}} \langle {a} , {\xi } \rangle \end{aligned}$$

and we can apply the hypothesis of the theorem (in particular statement I) to obtain an orthonormal representation for \(\xi \):

$$\begin{aligned} \xi = a_1+\sum _{i=1}^{l}{\eta _i b_i}. \end{aligned}$$

Now by statement II in the hypothesis we get:

$$\begin{aligned} {\left\| v\right\| }^*=\max _{i}{\eta _i} \le {\left\| \xi \right\| }^* \le 1. \end{aligned}$$

Let \(\xi ' = a_1-\sum _{i=1}^{l}{\eta _i b_i}\). By the hypothesis, \({\left\| \xi '\right\| }^* = \max \{1,\max _{i}{\eta _i}\} = 1\). Also, \(\langle {\xi '} , {a_1} \rangle =1\) hence \(\xi ' \in \partial {\left\| a_1\right\| }\) and \(-v \in C\).

Let \(v \in {{\mathrm{span}}}{C}\) with \({\left\| v\right\| }^*\le 1\). Since C is a symmetric convex set, there exists \(\lambda \in (0,1]\) such that \(\lambda v \in C\) (i.e., C is absorbing in \({{\mathrm{span}}}{C}\)). Define \(z = a_1+\lambda v\) which is in \(\partial {{\left\| a_1\right\| }}\). Since \(\langle {a_1} , {z} \rangle = {\left\| z\right\| }^* = 1\), we can write z as

$$\begin{aligned} z = a_1 + \sum _{i=1}^{k'}{\nu _i c_i}, \end{aligned}$$

where \(\{c_i|i=1,\ldots , k'\} \subset \mathcal {G}_{{\left\| \cdot \right\| }}\) and \(\{\nu _i \ge 0| i=1, \ldots , k'\}\) satisfy the hypothesis of the theorem. In particular, since \(v = 1/\lambda \sum _{i=1}^{k'}{\nu _i c_i}\), we have \(\max _i{\nu _i/\lambda } \le 1\). Hence \({\left\| a_1 + v\right\| }^* = \max \{1,\nu _1/\lambda , \ldots \nu _{t'}/\lambda \} = 1\) and \(a_1+v \in \partial {{\left\| a_1\right\| }}\). Therefore,

$$\begin{aligned} \partial {{\left\| a_1\right\| }} = \{a_1+v|v \in {{\mathrm{span}}}{C}, {\left\| v\right\| }^*\le 1\} . \end{aligned}$$

Now suppose that \(x = \sum _{i=1}^{k}{\gamma _i a_i}\) with \(k > 1\). Note that \({\sum _{i=1}^{k}{a_i}} \in \partial {\left\| x\right\| }\) since \({\left\| \sum _{i=1}^{k}{a_i}\right\| }^* = 1\) and \(\langle {\sum _{i=1}^{k}{a_i}} , {x} \rangle = \sum _{i=1}^{k}\gamma _i = {\left\| x\right\| }\). Let \(\xi \in \partial {{\left\| x\right\| }}\) and define \(v = \xi - \sum _{i=1}^{k}{a_i}\). We can write:

$$\begin{aligned}&{\left\| x\right\| } = \sum _{i=1}^{k}{\gamma _i} = \langle {\xi } , {x} \rangle = \sum _{i=1}^{k}{\gamma _i \langle {\xi } , {a_i} \rangle }\nonumber \\&\quad \Rightarrow \forall i\in \{1,2,\ldots ,k\}: \, \langle {\xi } , {a_i} \rangle = 1 \Rightarrow \forall i \in \{1,2,\ldots ,k\}:\, \xi \in \partial {{\left\| a_i\right\| }}. \end{aligned}$$
(40)

Also, since \(\sum _{i=1}^{k}{a_i} \in \partial {\left\| a_i\right\| }\), (40) results in:

$$\begin{aligned} \forall i\in \{1,2,\ldots ,k\}: \quad v \in {T_{a_{i}}^{\bot }}. \end{aligned}$$
(41)

Since \(\xi = \sum _{i=1}^{k}{a_i} + v \in \partial {\left\| a_1\right\| }\), we have \({\left\| \sum _{i=2}^{k}{a_i} + v\right\| }^* = 1\) hence \(\sum _{i=2}^{k}{a_i} + v \in \partial {\left\| a_2\right\| }\). By induction, we conclude that \(a_k + v \in \partial {\left\| a_k\right\| }\). This implies \({\left\| v\right\| }^*\le 1\).

Let \(v' \in \cap _{i \in \{1,2,\ldots ,k\}}{T_{a_{i}}^{\bot }}\) with \({\left\| v'\right\| }^*\le 1\) and define \(\xi ' = \sum _{i=1}^{k}{a_i}+ v'\). We will prove that \({\left\| \xi '\right\| }^* \le 1\) and hence \(\xi ' \in \partial {{\left\| x\right\| }}\) . To prove this we use induction. Define

$$\begin{aligned} z_l = \sum _{i=k-l+1}^{k}{a_i}+v' \qquad \forall l \in \{1,2,\ldots ,k\}. \end{aligned}$$

Note that \({\left\| z_1\right\| }^* \le 1\) since \(z_1 = a_k+v' \in \partial {\left\| a_k\right\| }\). Suppose \({\left\| z_{l'}\right\| }^* \le 1\) for some \(l' < k \). We prove that \({\left\| z_{l'+1}\right\| }^* \le 1\). We have \(\sum _{i=k-l'+1}^{k}{a_i} \in T_{a_{k-l'}}^{\bot }\) because \(\sum _{i=k-l'}^{k}{a_i} = a_{k-l'} + \sum _{i=k-l'+1}^{k}{a_i} \in \partial {\left\| a_{k-l'}\right\| }\). Combining this with the fact that \(v' \in T_{a_{k-l'}}^{\bot }\), we get \(z_{l'} \in T_{a_{k-l'}}^{\bot }\). Therefore, \(z_{l'+1}= a_{k-l'}+z_{l'} \in \partial {\left\| a_{k-l'}\right\| }\) hence \({\left\| z_{l'+1}\right\| }^*\le 1\). Thus \({\left\| \xi '\right\| }^* = {\left\| z_k\right\| }^*\le 1\). We conclude that:

$$\begin{aligned} \partial&{\left\| x\right\| } = \left\{ \sum _{i=1}^{k}{a_i}+v| v \in \bigcap _{i=1}^{k}{T_{a_i}^{\bot }}, {\left\| v\right\| }^*\le 1\right\} . \end{aligned}$$
(42)

Necessity. For any \(a\in \mathcal {G}_{{\left\| \cdot \right\| }}\), we have:

$$\begin{aligned}&\langle {a} , {a} \rangle = 1, \nonumber \\ \forall b \in \mathcal {G}_{{\left\| \cdot \right\| }}: \quad&\langle {b} , {a} \rangle \le {\left\| b\right\| }_2{\left\| a\right\| }_2= 1. \end{aligned}$$

That implies \({\left\| a\right\| }^* = 1\) and \(a \in \partial {\left\| a\right\| }\). Since \(a\in T_a\), we conclude that:

$$\begin{aligned} \partial {\left\| a\right\| } = \big \{a+v|v \in {T_{a}^{\bot }}, {\left\| v\right\| }^*\le 1\big \}. \end{aligned}$$
(43)

Take \(\gamma _1 = \langle {a_1} , {x} \rangle ={\left\| x\right\| }^*\) and let \({\varDelta }_1 = x - {\gamma _1} a_1\). If \({\varDelta }_1 = 0\), then take \(k =1\) and \(x = \gamma _1 a_1\). Suppose \({\varDelta }_1 \ne 0\). Since \({\left\| \frac{1}{\gamma _1}x\right\| }^* = 1\) and \(\langle {a_1} , {{\frac{1}{\gamma _1}x}} \rangle = {\left\| a_1\right\| } = 1\), we can conclude that \(\frac{1}{\gamma _1}x \in \partial {\left\| a_1\right\| }\). Furthermore, we have

$$\begin{aligned} P_{{{T}_{a_1}}^{\bot }}(x)&= x - \gamma _1 P_{{T}_{a_1}}(\frac{1}{\gamma _1}x) = x - {\gamma _1} a_1= {\varDelta }_1\\&\Rightarrow {\varDelta }_1 \in T_{a_1}^{\bot }.\nonumber \end{aligned}$$
(44)

Now we introduce a lemma that will be used in the rest of the proof.

Lemma 4

Suppose \(a \in \mathcal {G}_{{\left\| \cdot \right\| }}\) and \(y \in T^{\perp }_{a} - \{0\}\). If \(z \in \mathcal {B}_{{\left\| \cdot \right\| }}\) is such that \({\left\| y\right\| }^* = \langle {y} , {z} \rangle \), then \(z \in T^{\perp }_{a}\).

Proof

Without loss of generality assume that \({\left\| y\right\| }^* = 1\). It suffices to show that if \(b \in \mathcal {G}_{{\left\| \cdot \right\| }}\) and \( \langle {y} , {b} \rangle = 1\), then \(b \in T^{\perp }_{a}\). Consider such \(b \in \mathcal {G}_{{\left\| \cdot \right\| }}\). By (43), \({\left\| a+y\right\| }^* = 1\). That results in:

$$\begin{aligned} 1 \ge \langle {a+y} , {b} \rangle = \langle {a} , {b} \rangle +1 \Rightarrow 0 \ge \langle {a} , {b} \rangle . \end{aligned}$$

By considering \(-y\) and \(-b\) we get that \(\langle {a} , {b} \rangle = 0\). Since \(\langle {a+y} , {b} \rangle = {\left\| b\right\| } = 1\), we can conclude that \(a+y \in \partial {\left\| b\right\| }\). Since \( \langle {y} , {b} \rangle = 1\) and \({\left\| y\right\| }^* = 1\), \(y \in \partial {\left\| b\right\| }\). Combining these two conclusions, we get:

$$\begin{aligned}&y \in \partial {\left\| b\right\| }, a+y \in \partial {\left\| b\right\| } \Rightarrow a \in T^{\perp }_{b} \mathop {\Longrightarrow }\limits ^{(43)} {\left\| a+b\right\| }^*\\&\quad \le 1 \Rightarrow a+b \in \partial {\left\| a\right\| } \Rightarrow b \in T^{\perp }_{a} \end{aligned}$$

\(\square \)

Suppose that there exist \(l \in \{1,2,\ldots ,k\}\), an orthogonal set \(\{a_i \in \mathcal {G}_{{\left\| \cdot \right\| }}|\, i=1,2,\ldots , l\}\), and a set of coefficients \(\{\gamma _i \ge 0 |\, i = 1,2,\ldots ,l\}\) such that \(x = \sum _{i=1}^{l}{\gamma _i a_i} + {\varDelta }_l\), \({\varDelta }_l \in \cap _{i=1}^{l}{T_{a_i}^{\bot }} \), and:

$$\begin{aligned} \partial&{\left\| \sum _{i=1}^{l}{a_i}\right\| } = \left\{ \sum _{i=1}^{l}{a_i}+v|v \in \bigcap _{i=1}^{l}{T_{a_i}^{\bot }}, {\left\| v\right\| }^*\le 1\right\} . \end{aligned}$$
(45)

By Lemma 4, there exists \(a_{l+1} \in \mathcal {G}_{{\left\| \cdot \right\| }}\) such that \(a_{l+1} \in \cap _{i=1}^{l}{T_{a_i}^{\bot }}\) and \(\langle {a_{l+1}} , {{\varDelta }_l} \rangle = {\left\| {\varDelta }_l\right\| }^*\). Take \(\gamma _{l+1} = \langle {a_{l+1}} , {{\varDelta }_l} \rangle = {\left\| {\varDelta }_l\right\| }^*\) and let \({\varDelta }_{l+1} = {\varDelta }_l - \gamma _{l+1} a_{l+1}\). We have \({\varDelta }_{l+1} \in \bigcap _{i=1}^{l}{T_{a_i}^{\bot }}\) because \(\{{\varDelta }_l, a_{l+1}\} \subset \bigcap _{i=1}^{l}{T_{a_i}^{\bot }}\). Since \({\left\| \frac{1}{\gamma _{l+1}}{\varDelta }_{l}\right\| }^* = 1\) and \(\langle {a_{l+1}} , {{\frac{1}{\gamma _{l+1}}{\varDelta }_{l}}} \rangle = {\left\| a_{l+1}\right\| } = 1\), we can conclude that \(\frac{1}{\gamma _{l+1}}{\varDelta }_l \in \partial {\left\| a_{l+1}\right\| }\). Using the same reasoning as in (44), we have \({\varDelta }_{l+1} \in T_{a_{l+1}}^{\bot }\) hence \( {\varDelta }_{l+1} \in \bigcap _{i=1}^{l+1}{T_{a_i}^{\bot }}\).

By decomposability assumption there exists \(e\in \mathbb {R}^n\) and a subspace T such that:

$$\begin{aligned} \partial {\left\| \sum _{i=1}^{l+1}{a_i}\right\| } = \big \{e+v|v \in {T^{\bot }}, {\left\| v\right\| }^*\le 1\big \}. \end{aligned}$$
(46)

We claim that

$$\begin{aligned} e&= \sum _{i=1}^{l+1}{a_i} \end{aligned}$$
(47)
$$\begin{aligned} {T^{\bot }}&= \bigcap _{i=1}^{l+1}{T_{a_i}^{\bot }}. \end{aligned}$$
(48)

To prove the first claim, it is enough to show that \(\sum _{i=1}^{l+1}{a_i} \in \partial {\left\| \sum _{i=1}^{l+1}{a_i}\right\| }\). Note that \( {\left\| \sum _{i=1}^{l+1}{a_i} \right\| }^*\le 1\) since \(\sum _{i=1}^{l+1}{a_i}= \sum _{i=1}^{l}{a_i}+a_{l+1} \in \partial {\left\| \sum _{i=1}^{l}{a_i}\right\| }\) which is given by (45). Now we can write:

$$\begin{aligned} l+1 = \left\langle {\sum _{i=1}^{l+1}{a_i}}{\sum _{i=1}^{l+1}{a_i}}\right\rangle \le {\left\| \sum _{i=1}^{l+1}{a_i}\right\| } {\left\| \sum _{i=1}^{l+1}{a_i}\right\| }^* \le {\left\| \sum _{i=1}^{l+1}{a_i}\right\| }. \end{aligned}$$

On the other hand, by triangle inequality,

$$\begin{aligned} {\left\| \sum _{i=1}^{l+1}{a_i}\right\| } \le {\left\| \sum _{i=1}^{l}{a_i}\right\| }+{\left\| a_{l+1}\right\| }=l+1, \end{aligned}$$

thus

$$\begin{aligned} {\left\| \sum _{i=1}^{l+1}{a_i}\right\| } = \left\langle {\sum _{i=1}^{l+1}{a_i}}{\sum _{i=1}^{l+1}{a_i}}\right\rangle . \end{aligned}$$

Therefore, \(\sum _{i=1}^{l+1}{a_i} \in \partial {{\left\| \sum _{i=1}^{l+1}{a_i}\right\| }}\). Since \(\sum _{i=1}^{l+1}{a_i}\in T_{\sum _{i=1}^{l+1}{a_i}} = T\), we conclude that:

$$\begin{aligned} \partial {\left\| \sum _{i=1}^{l+1}{a_i}\right\| } = \left\{ \sum _{i=1}^{l+1}{a_i}+v|v \in {T^{\bot }}, {\left\| v\right\| }^*\le 1\right\} . \end{aligned}$$

To prove (48), we first show that \(\bigcap _{i=1}^{l+1}{T_{a_i}^{\bot }} \in T^{\bot }\). Let \(\xi = e+v\) with \(v \in \bigcap _{i=1}^{l+1}{T_{a_i}^{\bot }}\). Note that \({\left\| a_{l+1}+v\right\| }^*\le 1\) since \(a_{l+1}+v \in \partial {{\left\| a_{l+1}\right\| }}\). Furthermore, \(a_{l+1}+v \in \bigcap _{i=1}^{l}{T_{a_i}^{\bot }}\), which in turn implies \({\sum _{i=1}^{l+1}{a_i}+v} \in \partial {{\left\| \sum _{i=1}^{l}{a_i}\right\| }}\) hence \({\left\| \sum _{i=1}^{l+1}{a_i}+v\right\| }^* \le 1\). Additionally, we have:

$$\begin{aligned} \left\langle {\sum _{i=1}^{l+1}{a_i}+v}{\sum _{i=1}^{l+1}{a_i}}\right\rangle = {\left\| \sum _{i=1}^{l+1}{a_i}\right\| } = l+1. \end{aligned}$$

Hence \(\xi \in \partial {\left\| \sum _{i=1}^{l+1}{a_i}\right\| }\) and \(v \in T^{\bot }\).

Now, let \(\xi ' = \sum _{i=1}^{l+1}{a_i}+v' \in {\left\| \sum _{i=1}^{l+1}{a_i}\right\| }\). Note that:

$$\begin{aligned} \left\langle {\xi '}{\sum _{i=1}^{l+1}{a_i}}\right\rangle= & {} \left\langle {\xi '}{\sum _{i=1}^{l}{a_i}}\right\rangle +\left\langle {\xi '}{{a_{l+1}}}\right\rangle = l+1 \Rightarrow \left\langle {\xi '}{\sum _{i=1}^{l}{a_i}}\right\rangle = l, \left\langle {\xi '}{{a_{l+1}}}\right\rangle \nonumber \\= & {} 1 \Rightarrow \xi ' \in \partial {\left\| \sum _{i=1}^{l}{a_i}\right\| } \cap \partial {\left\| {a_{l+1}}\right\| }\\&\quad \Rightarrow v' \in \bigcap _{i=1}^{l}{T_{a_i}^{\bot }} , \sum _{i=1}^{l}{a_i} + v' \in \bigcap _{i=1}^{l}{T_{a_i}^{\bot }}; \end{aligned}$$

moreover, \(\sum _{i=1}^{l}{a_i} \in T_{a_{l+1}}^{\bot }\) since \(\sum _{i=1}^{l+1}{a_i} \in \partial {\left\| a_{l+1}\right\| }\). This implies \(v \in \bigcap _{i=1}^{l+1}{T_{a_i}^{\bot }}\) which completes the proof of (48).

Because \(a_i \notin T_{a_i}^{\bot }\) for all \(i \in \{1,2,\ldots ,l+1\}\), \(dim(\cap _{i = 1}^{l+1}T_{a_i}^{\bot } )\le n-l-1 \). Hence there exists \(k \le n\), an orthogonal set \(\{a_i \in \mathcal {G}_{{\left\| \cdot \right\| }},\, i = 1, 2, \ldots , k\}\), and a set of coefficients \(\{\gamma _i \ge 0 \text{, }\, i\in \{1,2,\ldots ,k\}\}\) such that \(x = \sum _{i=1}^{k}{\gamma _i a_i}\) and:

$$\begin{aligned} \partial {\left\| \sum _{i=1}^{k}{a_i}\right\| } = \left\{ \sum _{i=1}^{k}{a_i}+v\;|\; v \in \bigcap _{i=1}^{k}{T_{a_i}^{\bot }}, {\left\| v\right\| }^*\le 1 \right\} . \end{aligned}$$
(49)

That proves \({\left\| x\right\| } = \langle {\sum _{i=1}^{k}{a_i}} , {x} \rangle = \sum _{i=1}^{k}{\gamma _i}.\)

To prove statement II, we first prove that \(a_i \in T_{a_j}^{\bot }\) for all \(i,j \in \{1,2,\ldots ,k\}\). By (49), \({\left\| \sum _{i=1}^{k}{a_i}\right\| }^*\le 1\). We can write:

$$\begin{aligned} \left\langle {\sum _{i=1}^{k}{a_i}}{a_j}\right\rangle = 1 \Rightarrow \sum _{i=1}^{k}{a_i} \in \partial {\left\| a_j\right\| } \Rightarrow \sum _{i=1}^{k}{a_i} - a_j \in T_{a_j}^{\bot }, \; {\left\| \sum _{i=1}^{k}{a_i} - a_j\right\| }^*\le 1, \end{aligned}$$

Now the claim follows from Lemma 4.

Let \(l=| \{\eta _i|\eta _i \ne 0\}|\). If \(l = 0\), the statement is trivially true. Suppose the statement is true when \(l=l'-1\) for some \(l' \in \{1,\ldots ,n\}\) and consider the case where \(l=l'\). Suppose that \(|\eta _j| = \max _{i}{|\eta _i|}\). By proper normalization we can assume that \(\eta _j = 1\). Let \(y = \sum _{i\ne j}{\eta _ia_{i}}\). We can deduce the following properties for y:

$$\begin{aligned}&\forall i\ne j: \, a_i \in T_{a_j}^{\bot } \Rightarrow y \in T_{a_j}^{\bot },\\&{\left\| y\right\| }^* = \max _{i\ne j}{|\eta _i|} \le 1. \end{aligned}$$

By the decomposability assumption \(\sum _{i=1}^{k}{\eta _ia_{i}}= a_j + y \in \partial {\left\| a_j\right\| }\) hence \({\left\| \sum _{i =1}^{k}{\eta _ia_{i}}\right\| }^*\le 1\). Hence \({\left\| \sum _{i=1}^{k}{\eta _ia_{i}}\right\| }^* = 1\). \(\square \)

Remark 1

Let \(x = \sum _{i=1}^{K(x)}\gamma _i a_i\). Since \({T^{\bot }_{x}} = \bigcap _{i=1}^{K(x)}{T_{a_i}^{\bot }}\), a more general version of lemma 4 holds:

Lemma 5

Suppose \(x \in \mathbb {R}^n\) and \(y \in T^{\perp }_{x} - \{0\}\). If \(z \in \mathcal {B}_{{\left\| \cdot \right\| }}\) is such that \({\left\| y\right\| }^* = \langle {y} , {z} \rangle \), then \(z \in T^{\perp }_{x}\).

We state and prove a dual version of Lemma 5, which will be used in the proof of Lemma 1 and Lemma 2.

Lemma 6

Let \(x \in \mathbb {R}^n\). If \(y \in T^{\perp }_{x}\), then there exists \(z \in T_{x}^{\bot } \cap \mathcal {B}_{{\left\| \cdot \right\| }^*}\) such that \({\left\| y\right\| } = \langle {y} , {z} \rangle \).

Proof

If \(y = 0\), then the lemma is trivially true. If \(y \ne 0\), then:

$$\begin{aligned}&\frac{y}{{\left\| y\right\| }} \in T_{x}^{\bot } \cap \{x\;|\; {\left\| x\right\| } = 1\} \Rightarrow \exists z \in T_{x}^{\bot } \text { such that } \frac{y}{{\left\| y\right\| }} \in \mathop {{{\mathrm{\arg \max }}}}\limits _{a \in T_{x}^{\bot } \cap \mathcal {B}_{{\left\| \cdot \right\| }}}{\langle {a} , {z} \rangle }. \end{aligned}$$

Therefore, by Lemma 5, we get

$$\begin{aligned} {\left\| z\right\| }^* = \max _{a \in T_{x}^{\bot } \cap \mathcal {G}_{{\left\| \cdot \right\| }}}{\langle {a} , {z} \rangle } \le \langle {\frac{y}{{\left\| y\right\| }}} , {z} \rangle \le {\left\| z\right\| }^* \Rightarrow \left\langle {\frac{y}{{\left\| y\right\| }}}{z}\right\rangle = {\left\| z\right\| }^* \Rightarrow \left\langle {y}{\frac{z}{{\left\| z\right\| }^*}}\right\rangle ={\left\| y\right\| } \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 2

First, we introduce a lemma.

Lemma 7

Let \(\{a_1,\ldots , a_k\}\) be an orthogonal subset of \(\mathcal {G}_{{\left\| \cdot \right\| }}\) that satisfies II in Theorem 1. Let \(y = \sum _{i=1}^{k}\beta _i a_i\), with \(\beta _i \in \mathbb {R}\) for all i, then

$$\begin{aligned} K(y) = |\{i\;|\; \beta _i \ne 0\}|. \end{aligned}$$

Proof

Let \(k' = |\{i\;|\; \beta _i \ne 0\}|\). Without loss of generality assume that \(\beta _i \ne 0\) for \(i \le k'\) and \(\beta _i = 0\) for \(i > k'\). Let \(\eta _i = \mathrm{sgn}(\beta _i)\) and \(a'_i = \mathrm{sgn}(\beta _i) a_i\) for all \(i \le k'\). Since \(a_1, \ldots , a_k\) satisfy Condition II in the orthogonal representation theorem, so do \(a'_1,\ldots , a'_{k'}\).

Now we show that y and \(a'_1,\ldots , a'_{k'}\) satisfy Condition I. By (11), \({\left\| \sum _{i=1}^{k'} a'_i\right\| }^* \le 1\). Therefore,

$$\begin{aligned}&{\left\| y\right\| } \ge \left\langle {\sum _{i=1}^{k'} a'_i}{y}\right\rangle = \sum _{i=1}^{k'} |\beta _i|, \quad {\left\| y\right\| } = {\left\| \sum _{i=1}^{k'} \beta _i a'_i\right\| } \le \sum _{i=1}^{k'} |\beta _i| \quad \Rightarrow \quad {\left\| y\right\| } = \sum _{i=1}^{k'} |\beta _i|. \end{aligned}$$

Therefore, by the orthogonal representation theorem, \(e_y = \sum _{i=1}^{k'} a'_i\). Thus \(K(y) = {\left\| e_y\right\| }_2^2 = k'\). \(\square \)

For any \(x \in \mathbb {R}^n - \{0\}\) define

$$\begin{aligned} l(x) = \min \{l \; | \; x = \sum _{i=1}^{l} \alpha _i b_i, \; b_1, \ldots , b_l \subseteq \mathcal {G}_{{\left\| \cdot \right\| }}, \quad \alpha _i \in \mathbb {R} \}. \end{aligned}$$

Define \(l(0) = 0\). Now the proof is a simple consequence of the following lemma:

Lemma 8

For all \(x \in \mathbb {R}^n\), \(l(x) = K(x)\).

Proof

\(K(x) \ge l(x)\) by the definition of l(x). We prove that \(K(x) = l(x)\) by induction on K(x). When \(K(x) \in \{0,1\}\), the statement is trivially true. Suppose the statement is true when \(K(x) \in \{0,1,2,\ldots ,k-1\}\). Consider the case where \(K(x) = k\). By way of contradiction, suppose \(l(x) < K(x)\). Let

$$\begin{aligned} x = \sum _{i=1}^{k} \gamma _i a_i, \end{aligned}$$
(50)

where \(\gamma _1, \ldots , \gamma _k\) and \(a_1,\ldots , a_k\) are given by the orthogonal representation theorem. If \(l(x) = 1\), then:

$$\begin{aligned} \sum _{i=1}^{k} \gamma _i a_i = \alpha _1 b_1, \end{aligned}$$

for some \(\alpha _1 \ne 0\) and \(b_1 \in \mathcal {G}_{{\left\| \cdot \right\| }}\). Since \(|\alpha _1|= {\left\| \alpha _1 b_1\right\| } = {\left\| x\right\| } = \sum _{i=1}^{k} \gamma _i \), either \(b_1\) or \(-b_1\) can be written as convex combination of \(a_1, \ldots , a_k\) which contradicts the fact that \(b_1\in \mathcal {G}_{{\left\| \cdot \right\| }}\).

If \(l(x) = l > 1\), we can write x as:

$$\begin{aligned} x = \sum _{i=1}^{l} \alpha _i b_i, \end{aligned}$$
(51)

with \(\{b_1, \ldots , b_l\} \subseteq \mathcal {G}_{{\left\| \cdot \right\| }}\). By turning \(b_i\) to \(- b_i\) without loss of generality we assume that \(\alpha _i > 0\) for all i. Let \(u = 2 \alpha _1 b_1 \) and \(v = 2 \sum _{i=2}^{l} \alpha _i b_i\) and note that \(x = (u+v)/2\). Let \(C = \mathrm{Cone}{\{a_1, a_2,\ldots , a_k\}}\). Let \(\mathrm{int} C\) and \(\mathrm{bd} C\) denote the interior and the boundary of C, respectively. Note that \(u \notin \mathrm{int C}\) because by Lemma 7, if \(u \in \mathrm{int C}\), then \(K(u) = k\); however, \(l(u) = 1\). Now we consider two cases for v.

  1. Case 1.

    If \(v \in \mathrm{int C}\), then we can write v as a conic combination of \(a_1, a_2, \ldots , a_k\) with positive coefficients:

    $$\begin{aligned} v = 2 \sum _{i=2}^{l} \alpha _i b_i = \sum _{i=1}^{k} c_i a_i, \end{aligned}$$

    where \(c_i > 0\) for all i.

  2. Case 2.

    If \(v \notin \mathrm{int} C\). let \(L = \{\theta u + (1-\theta ) v \; | \; \theta \in [0,1]\}\). Since L intersects the interior of C at x and \(\{u, v\} \notin \mathrm{int} C\), there exists \(u',v'\) such that \(L \cap \mathrm{bd} C = \{u',v'\}\). Suppose \(v'\) is on the line segment between v and x (see Fig. 5). Let \(L'= \{\theta u + (1-\theta ) v' \; | \; \theta \in [0,1]\}\) and note that \(x \in L'\). Since \(v' \in \mathrm{bd} C\), it can be written as conic combination of at most \(k-1\) of \(a_1, \ldots , a_k\). Without loss of generality assume that \(v' = \sum _{i=2}^{k}{\beta _i a_i}\). For some \(\theta \in (0,1)\):

    $$\begin{aligned} x = \theta u + (1-\theta ) v' = \alpha _1' b_1 + \sum _{i=2}^{k}{\beta _i' a_i}, \end{aligned}$$

    where \(\alpha _1' = 2 \theta \alpha _1\) and \(\beta '_i = (1 -\theta ) \beta _i\). Using the representation in (50), we get:

    $$\begin{aligned} \alpha _1' b_1&= \gamma _1 a_1 + \sum _{i=2}^{k} (\gamma _i - \beta _i') a_i. \end{aligned}$$

    We have \(l(\alpha _1' b_1) = 1\), and by Lemma 7, \(K(\alpha _1' b_1) = 1 + |\{i | \gamma _i \ne \beta _i', i = 2, \ldots , k \}|\). Therefore, \(\gamma _i = \beta _i'\) for all \(i = 2,\ldots ,k\) and \(b_1 = a_1\). Combining the previous fact with (50) and (51), we get:

    $$\begin{aligned} x - \alpha _1 a_1 = (\gamma _1 - \alpha _1) a_1 + \sum _{i=2}^{k} \gamma _i a_i = \sum _{i=2}^{l} \alpha _i b_i. \end{aligned}$$
    (52)

    If \(\gamma _{1} = \alpha _1\), by the induction hypothesis \(k = l\) , which is a contradiction. Now, suppose \(\gamma _1 - \alpha _1 \ne 0\). In both cases we produced a point \(y = v\) such that \(K(y) = k\) and \(l(y) \le l-1\). We can continues this procedure until we get a y such that \(K(y) = k\) and \(l(y) = 1\), which gives us the contradiction. \(\square \)

Fig. 5
figure 5

Relative position of \(u'\) and \(v'\) on the line segment between u and v

1.3 Proof of Proposition 2

In iteration \(t+1\) when the backtrack procedure stops, the following inequality holds true:

$$\begin{aligned} \phi _{\lambda }(x^{(t+1)})\le m_{M_{t+1}}(x^{(t)},x^{(t+1)}) =&\min _{x}f(x^{(t)}) + \langle \nabla f(x^{(t)}) , x-x^{(t)} \rangle \nonumber \\&+ \frac{M_{t+1}}{2} {\left\| x-x^{(t)}\right\| }_2^2 + \lambda {\left\| x\right\| }\nonumber \\ \le&\min _{x} \phi _{\lambda }(x) + \frac{M_{t+1}}{2} {\left\| x-x^{(t)}\right\| }_2^2. \end{aligned}$$
(53)

On the other hand, by (19), we have

$$\begin{aligned} \phi _{\lambda }(x^{(t+1)})\le m_{L_f}(x^{(t)},x^{(t+1)}), \end{aligned}$$

which ensures \(M_{t+1} \le \gamma _\mathrm{inc} L_f\) since \(m_{L}(x^{(t)},x^{(t+1)})\) is non-decreasing in L. By (17), we have:

$$\begin{aligned} \phi _{\lambda }(x^{(t)}) \ge \phi _{\lambda }(x^{*}) + \frac{\mu _f}{2} {\left\| x^{(t)}-x^{*}\right\| }_2^2. \end{aligned}$$
(54)

If we confine x to \(\{\alpha x^{*}+(1-\alpha )x^{(t)}\,|\, 0 \le \alpha \le 1\}\), inequality (53) combined with (54) results in

$$\begin{aligned} \phi _{\lambda }(x^{(t+1)})&\le \min _{\alpha \in [0 ,1]}\left\{ {\phi _{\lambda }(\alpha x^{*}+(1-\alpha )x^{(t)})+\frac{\alpha ^2 M_{t+1}}{2}{{\left\| x^{(t)}-x^{*}\right\| }_2^2}}\right\} \\&\le \min _{\alpha \in [0 ,1]}\left\{ {\alpha \phi _{\lambda }(x^{*})+(1-\alpha )\phi _{\lambda }(x^{(t)})+\frac{\alpha ^2 M_{t+1}}{2}{{\left\| x^{(t)}-x^{*}\right\| }_2^2}}\right\} \\&\le \min _{\alpha \in [0 ,1]}\left\{ {\alpha \phi _{\lambda }(x^{*})+(1-\alpha )\phi _{\lambda }(x^{(t)})+\frac{\alpha ^2 \gamma _\mathrm{inc} L_f}{ \mu _f}(\phi _{\lambda }(x^{(t)})-\phi _{\lambda }(x^{*}))}\right\} . \end{aligned}$$

The RHS of the above inequality is minimized for \(\alpha ^* = \min \{1,\frac{\mu _f}{2 \gamma _\mathrm{inc} L_f}\}\). Therefore, we get

$$\begin{aligned} \phi _{\lambda }(x^{(t+1)}) - \phi _{\lambda }(x^{*})\le & {} \left( 1-\alpha ^* + \frac{{\alpha ^*}^2 \gamma _\mathrm{inc} L_f}{\mu _f}\right) (\phi _{\lambda }(x^{(t)})-\phi _{\lambda }(x^{*}))\\\le & {} \left( 1-\frac{\mu _f}{4 \gamma _\mathrm{inc} L_f}\right) (\phi _{\lambda }(x^{(t)})-\phi _{\lambda }(x^{*})). \end{aligned}$$

To prove (23), we note that the backtrack stopping criteria ensures

$$\begin{aligned} \phi _{\lambda }(x^{(t+1)})\,\le \,&f(x^{(t)}) + \big \langle \nabla f(x^{(t)}) , x^{(t+1)}-x^{(t)} \big \rangle \nonumber \\&+\,\frac{M_{t+1}}{2} {\left\| x^{(t+1)}-x^{(t)}\right\| }_2^2 + \lambda {\left\| x^{(t+1)}\right\| }\nonumber \\ \le&f(x^{(t)}) - \big \langle M_{t+1} (x^{(t+1)}-x^{(t)}) \nonumber \\&+\,\xi , x^{(t+1)}-x^{(t)} \big \rangle + \frac{M_{t+1}}{2} {\left\| x^{(t+1)}-x^{(t)}\right\| }_2^2 + \lambda {\left\| x^{(t+1)}\right\| }\nonumber \\ \le&{f(x^{(t)}) - \frac{M_{t+1}}{2} {\left\| x^{(t+1)}-x^{(t)}\right\| }_2^2 +\big \langle \xi , x^{(t)} - x^{(t+1)} \big \rangle +\lambda {\left\| x^{(t+1)}\right\| }}\nonumber \\ \le \,&\phi _{\lambda }(x^{(t)}) - \frac{M_{t+1}}{2} {\left\| x^{(t+1)}-x^{(t)}\right\| }_2^2. \end{aligned}$$
(55)

The hypothesis (18) ensures \(M_{t+1} \ge \mu _f\). Combining (16) and (55) and using the lower and the upper bounds on \(M_{t+1}\), we get the desired result

$$\begin{aligned} \omega _{\lambda }(x^{(t+1)})&\le {\left\| M_{t+1} (x^{(t)} - x^{(t+1)}) +\nabla f(x^{(t+1)})- \nabla f(x^{(t)})\right\| }^*\\&\le \theta ({M_{t+1}} + {L'_f}) {\left\| x^{(t+1)} - x^{(t)}\right\| }_2 \\&\le \theta \left( 1 + \frac{L'_f}{M_{t+1}}\right) \sqrt{2 M_{t+1}(\phi _{\lambda }(x^{(t)})-\phi _{\lambda }(x^{(t+1)}))} \\&\le \theta \left( 1+\frac{L'_f}{\mu _f}\right) \sqrt{ 2 \gamma _\mathrm{inc}L_f (\phi _{\lambda }(x^{(t)})-\phi _{\lambda }(x^*)) }. \end{aligned}$$

1.4 Proof of Lemma 1

By the hypothesis there exists \(\xi \in \partial {\left\| x\right\| }\) such that \({\left\| A^*(A x-b)+\lambda \xi \right\| }^* \le \delta \lambda \). Therefore, we can write

$$\begin{aligned} \delta \lambda {\left\| x-x_{0}\right\| }&\ge {\left\| x-x_{0}\right\| } {\left\| A^*(A x-b)+\lambda \xi \right\| }^* \ge \langle (x-x_{0}) , A^*(A x-b)+\lambda \xi \rangle \nonumber \\&= \langle (x-x_{0}) , A^*(A(x-x_{0}))-A^* z+\lambda \xi \rangle \nonumber \\&= {\left\| A(x-x_{0})\right\| }_2^2 - \langle x-x_{0} , A^* z \rangle +\lambda \langle x-x_{0} , \xi \rangle \nonumber \\&\ge {\left\| A(x-x_{0})\right\| }_2^2 - {\left\| x-x_{0}\right\| } {\left\| A^* z\right\| }^* +\lambda ({\left\| x\right\| } - {\left\| x_{0}\right\| }). \end{aligned}$$
(56)

Now we lower-bound \({\left\| x\right\| }\):

$$\begin{aligned} {\left\| x\right\| } = {\left\| x-x_0+x_0\right\| } \ge {\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})+x_{0}\right\| } - {\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }. \end{aligned}$$

By Lemma 6, there exists \(s \in T_{x_{0}}^{\bot }\) such that \(\langle {s} , {P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})} \rangle = {\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| }\) and \({\left\| s\right\| }^* = 1\). Note that \(e_{x_0}+s \in \partial {{\left\| x_{0}\right\| }}\) hence \({\left\| e_{x_0}+s\right\| }^* \le 1\). Therefore, we get:

$$\begin{aligned} {\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})+x_{0}\right\| }\ge & {} \langle {e_{x_0}+s} , {P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})+x_{0}} \rangle \ge {\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| }+{\left\| x_{0}\right\| },\nonumber \\ {\left\| x\right\| } - {\left\| x_{0}\right\| }\ge & {} {\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| } - {\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| } . \end{aligned}$$
(57)

Combining (57) and (56), we get

$$\begin{aligned} \delta \lambda {\left\| x-x_{0}\right\| }&\ge \lambda ({\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| } \\ -&{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| })- {\left\| x-x_{0}\right\| } {\left\| A^* z\right\| }^* +{\left\| A(x-x_{0})\right\| }_2^2. \end{aligned}$$

By applying triangle inequality to \({\left\| x-x_0\right\| }\), we obtain

$$\begin{aligned}&(\lambda (1+ \delta ) +{\left\| A^* z\right\| }^*){\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }\ge (\lambda (1-\delta ) \nonumber \\&\quad -{\left\| A^* z\right\| }^*){\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| }+{\left\| A(x-x_{0})\right\| }_2^2. \end{aligned}$$
(58)

That yields

$$\begin{aligned} \frac{{\left\| x-x_0\right\| }}{{\left\| x-x_0\right\| }_2}&\le \frac{{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }+{\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| }}{{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }_2}\\&\le (1+\gamma ) \frac{{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }}{{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }_2}\le (1+\gamma )\sqrt{ck_{0}}. \end{aligned}$$

Using the definition of the lower restricted isometry constant, we derive

$$\begin{aligned}&\rho _{-}(A,{ c(1+\gamma )^2 k_{0}}){\left\| x-x_{0}\right\| }_2^2 \le {\left\| A(x-x_{0})\right\| }_2^2 \\&\quad \le \,{((1+\delta )\lambda + {\left\| A^* z\right\| }^*)}{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| } \\&\quad \le \,{\sqrt{ck_{0}}((1+\delta )\lambda + {\left\| A^* z\right\| }^*)}{\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }_2\\&\quad \le \,{\sqrt{ck_{0}}((1+\delta )\lambda + {\left\| A^* z\right\| }^*)}{\left\| {x-x_{0}}\right\| }_2, \end{aligned}$$

which yields the following bounds

$$\begin{aligned} {\left\| x-x_{0}\right\| }_2 \le&\frac{{\sqrt{ck_{0}}((1+\delta )\lambda + {\left\| A^* z\right\| }^*)}}{\rho _{-}(A,{ c(1+\gamma )^2 k_{0}})}, \end{aligned}$$
(59)
$$\begin{aligned} {\left\| x-x_{0}\right\| } \le&\frac{{ck_{0}(1+\gamma )((1+\delta )\lambda + {\left\| A^* z\right\| }^*)}}{\rho _{-}(A,{ c(1+\gamma )^2 k_{0}})}. \end{aligned}$$
(60)

By convexity of \(\phi _{\lambda }\),

$$\begin{aligned} \phi _{\lambda }(x)-\phi _{\lambda }(x_0)&\le \langle {\lambda \xi +A^*(A x-b)} , {x-x_0} \rangle \nonumber \\&\le \,\frac{{ck_{0}\delta \lambda (1+\gamma )((1+\delta )\lambda + {\left\| A^* z\right\| }^*)}}{\rho _{-}(A,{ c(1+\gamma )^2 k_{0}})}. \end{aligned}$$

1.5 Proof of Lemma 2

Let \({\varDelta }=\frac{3 ck_{0} \lambda (1+\gamma )}{2 \rho _{-}(A,{ c(1+\gamma )^2 k_{0}})}\). We can write

$$\begin{aligned} \phi _{\lambda }(x)&\le \phi _{\lambda }(x_{0})+ \delta \lambda {\varDelta }\nonumber \\ \Rightarrow \quad \frac{1}{2} {\left\| A x-b\right\| }_2^2 -\frac{1}{2} {\left\| Ax_{0}-b\right\| }_2^2&\le \lambda ({\left\| x_{0}\right\| } - {\left\| x\right\| }) + \delta \lambda {\varDelta }\nonumber \\&\le \lambda {\left\| x_{0}-x\right\| }+ \delta \lambda {\varDelta }\end{aligned}$$
(61)

If \({\left\| x-x_{0}\right\| } \le {\varDelta }\), half of the conclusion is immediate. To get the second half, we can expand the left hand side of (61) to get:

$$\begin{aligned} \frac{1}{2} {\left\| A(x-x_{0})\right\| }_2^2&\le \lambda {\left\| x-x_0\right\| } +\langle x-x_{0} , A^* z\rangle + \delta \lambda {\varDelta }\\&\le (\lambda +{\left\| A^*z\right\| }^*) {\left\| x-x_0\right\| }+ \delta \lambda {\varDelta }\\&\le \left( \frac{5}{4}+\delta \right) \lambda {\varDelta }\le \lambda \frac{3 {\varDelta }}{2}. \end{aligned}$$

Suppose \({\left\| x-x_{0}\right\| } > {\varDelta }\), then from (61) we get:

$$\begin{aligned} \lambda ({\left\| x\right\| }-{\left\| x_{0}\right\| })&\le \frac{1}{2} {\left\| A x_{0}-b\right\| }_2^2 - \frac{1}{2} {\left\| A x-b\right\| }_2^2 + \delta \lambda {\left\| x-x_{0}\right\| }\\&\le -\frac{1}{2} {\left\| A(x-x_{0})\right\| }_2^2 + \langle x-x_{0} , A^* z\rangle + \delta \lambda {\left\| x-x_{0}\right\| }\\&\le -\frac{1}{2} {\left\| A(x-x_{0})\right\| }_2^2+{\left\| A^* z\right\| }^*{\left\| x-x_{0}\right\| } + \delta \lambda {\left\| x-x_{0}\right\| }. \end{aligned}$$

By using (57) and triangle inequality we get:

$$\begin{aligned}&(\lambda (1+ \delta ) +{\left\| A^* z\right\| }^*){\left\| P_{{T}_{x_{0}}}(x-x_{0})\right\| }\ge (\lambda (1-\delta ') \\&\quad -{\left\| A^* z\right\| }^* ){\left\| P_{{{T}_{x_{0}}}^{\bot }}(x-x_{0})\right\| }+\frac{1}{2}{\left\| A(x-x_{0})\right\| }_2^2 . \end{aligned}$$

Using the same reasoning as in the proof of Lemma 1, we get the desired results.

1.6 Proof of Lemma 3

By first order optimality condition there exists \(\xi \in \partial {\left\| x^{+}\right\| }\) such that:

$$\begin{aligned} \lambda \xi&= L(x-x^{+}) - \nabla f(x) \\&= L(x-x^{+}) - A^*(A x-b) \\&= L(x-x^{+}) - A^*(A (x-x_0)) + A^* z \end{aligned}$$

Note that \(\xi = e_{x^{+}} + v\) for some \(v \in T_{x^{+}}^{\bot }\). By Lemma 6, there exists \(v' \in T_{x^{+}}^{\bot } \cap \mathcal {B}_{{\left\| \cdot \right\| }^*}\) such that \(\langle {v'} , {v} \rangle = {\left\| v\right\| }\). Since \(e_{x^+} + v' \in \partial {\left\| x^{+}\right\| }\), \({\left\| e_{x^+} + v'\right\| }^* \le 1\). Therefore, we can write:

$$\begin{aligned} {\left\| \xi \right\| } =&{\left\| e_{x^+} + v\right\| } \ge \langle {e_{x^+} + v'} , {e_{x^+} + v} \rangle = {\left\| e_{x^+}\right\| } + {\left\| v\right\| }\\ \Rightarrow K(x^{+}) =&{\left\| e_{x^+}\right\| } \le {\left\| \xi \right\| }. \end{aligned}$$

Let \(\xi = \sum _{i=1}^{l}\gamma _i a_i\), where \(a_1,\ldots ,a_{l}\) and \(\gamma _1, \ldots , \gamma _{l}\) are given by the orthogonal representation theorem. Since \(\gamma _i \le 1\) for all i, \(l \ge {\left\| \xi \right\| }\). If \({\left\| \xi \right\| } > {\tilde{k}}\), we can define \(u = \sum _{i=1}^{{\tilde{k}}}{a_i}\), then

$$\begin{aligned} {\tilde{k}} \lambda \le \langle {u} , {\lambda \xi } \rangle= & {} \langle {u} , {L(x^{+}-x)} \rangle - \langle {A u} , {A(x-x_0)} \rangle + \langle {u} , {A^*z} \rangle \nonumber \\\le & {} L {\left\| x^{+}-x\right\| } + \sqrt{\rho _{+}(A,{\tilde{k}}) {\tilde{k}}} {\left\| A(x-x_0)\right\| }_2+ {\tilde{k}} {\left\| A^* z\right\| }^*\nonumber \\\Rightarrow & {} \frac{3{\tilde{k}}\lambda }{4} \le L {\left\| x^{+}-x\right\| } + \sqrt{\rho _{+}(A,{\tilde{k}}) {\tilde{k}}} {\left\| A(x-x_0)\right\| }_2. \end{aligned}$$
(62)

Since \(\phi _{\lambda }(x^{+}) \le \phi _{\lambda }{(x)}\), by Lemma 2, we have:

$$\begin{aligned} {\left\| x^{+}-x\right\| } \le {\left\| x^{+}-x_0\right\| }+{\left\| x-x_0\right\| } \le \frac{9 ck_{0} \lambda (1+\gamma )}{ \rho _{-}(A,{ c(1+\gamma )^2 k_{0}})},\\ {\left\| A(x-x_0)\right\| }_2^2 \le \frac{9 ck_{0} \lambda ^2 (1+\gamma )}{ \rho _{-}(A,{ c(1+\gamma )^2 k_{0}})}. \end{aligned}$$

Define

$$\begin{aligned} \alpha&= \gamma _\mathrm{inc} \rho _{+}\big (A,2{\tilde{k}}\big )\frac{9 ck_{0} (1+\gamma )}{ \rho _{-}(A,{ c(1+\gamma )^2 k_{0}})},\\ \beta ^2&= \rho _{+}(A,{\tilde{k}})\frac{9 ck_{0} (1+\gamma )}{ \rho _{-}(A,{ c(1+\gamma )^2 k_{0}})}. \end{aligned}$$

We can rewrite (62) as:

$$\begin{aligned} \frac{3{\tilde{k}}}{4} -\alpha -\beta \sqrt{{\tilde{k}}}< 0 \Rightarrow \sqrt{{\tilde{k}}} < \frac{2}{3} \left( \beta + \sqrt{\beta ^2 + 3 \alpha }\right) \le 2 \sqrt{\alpha }. \end{aligned}$$

But this contradicts Assumption 1, so \({\left\| \xi \right\| } \le {\tilde{k}}\) hence \(K(x^{+}) \le {\tilde{k}}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eghbali, R., Fazel, M. Decomposable norm minimization with proximal-gradient homotopy algorithm. Comput Optim Appl 66, 345–381 (2017). https://doi.org/10.1007/s10589-016-9871-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-016-9871-8

Keywords

Navigation