Skip to main content

Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing

  • Chapter
Compressed Sensing and its Applications

Abstract

The typical scenario that arises in most “big data” problems is one where the ambient dimension of the signal is very large (e.g., high resolution images, gene expression data from a DNA microarray, social network data, etc.), yet is such that its desired properties lie in some low dimensional structure (sparsity, low-rankness, clusters, etc.). In the modern viewpoint, the goal is to come up with efficient algorithms to reveal these structures and for which, under suitable conditions, one can give theoretical guarantees. We specifically consider the problem of recovering such a structured signal (sparse, low-rank, block-sparse, etc.) from noisy compressed measurements. A general algorithm for such problems, commonly referred to as generalized LASSO, attempts to solve this problem by minimizing a least-squares cost with an added “structure-inducing” regularization term ( 1 norm, nuclear norm, mixed 2/ 1 norm, etc.). While the LASSO algorithm has been around for 20 years and has enjoyed great success in practice, there has been relatively little analysis of its performance. In this chapter, we will provide a full performance analysis and compute, in closed form, the mean-square-error of the reconstructed signal. We will highlight some of the mathematical vignettes necessary for the analysis, make connections to noiseless compressed sensing and proximal denoising, and will emphasize the central role of the “statistical dimension” of a structured signal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Assuming that the entries of the noise vector z are i.i.d., it is well known that a sensible choice of τ in (4.8) must scale with the standard deviation σ of the noise components [6, 12, 42]. On the other hand, (4.7) eliminates the need to know or to pre-estimate σ [4].

  2. 2.

    C-LASSO in (4.17) stands for “Constrained LASSO.” The algorithm assumes a priori knowledge of f( x 0).

  3. 3.

    In the statistics literature the variant of the LASSO algorithm in (4.7) is mostly known as the “square-root LASSO” [4]. For the purposes of our presentation, we stick to the more compact term “ℓ2 -LASSO” [47].

  4. 4.

    The statements in this sections hold true with high probability in A, z and under mild assumptions. See Section 4.5 for the formal statement of the results.

  5. 5.

    The formula below is subject to some simplifications meant to highlight the essential structure. See Section 4.6 for the details.

  6. 6.

    The factor of 2 in (4.23) is conjectured in [60] that is not essential and, only, appears as an artifact of the proof technique therein. See, also, Section 4.6.2.1.

  7. 7.

    The tools used in [1] and [57] differ. Amelunxen et al [1] use tools from conic integral geometry, while Stojnic [55] relies on a comparison lemma for Gaussian processes (see Lemma 3).

  8. 8.

    In [14], \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) shows up indirectly via a closely related notion, that of the “Gaussian width” [34] of the restricted tangent cone \(\mathcal{T}_{f}(\mathbf{x}_{0}) \cap \mathcal{S}^{n-1}\). In the terminology used in [1, 40], \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) corresponds to the “statistical dimension” of \(\mathcal{T}_{f}(\mathbf{x}_{0}) = \left (\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))\right )^{\circ }\).

  9. 9.

    When referring to [47] keep in mind the following: a) in [47] the entries of A have variance 1 and not 1∕m as here, b) [47] uses slightly different notation for \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\) and \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) (\(\mathbf{D}_{f}(\mathbf{x}_{0},\lambda )\) and \(\mathbf{D}_{f}(\mathbf{x}_{0},\mathbf{R}^{+})\), respectively).

  10. 10.

    We follow this convention throughout: use the symbol “\(\ \tilde{}\)” over variables that are associated with the approximated problems. To distinguish, use the symbol “\(\ \hat{}\)” for the variables associated with the original problem.

  11. 11.

    Observe that the dependence of η and γ on λ, m and ∂f( x 0 ), is implicit in this definition.

  12. 12.

    Precisely: assuming m ≈ m − 1 and ignoring the t’s in the bound of Theorem 4.

  13. 13.

    It is conjectured in [60] and supported by simulations (e.g., Figure 4.7) that the factor of 2 in Theorem 5 is an artifact of the proof technique and not essential.

    Fig. 4.7
    figure 7

    Figure 4.7 illustrates the bound of Theorem 5, which is given in red for n = 340, m = 140, k = 10 and for A having \(\mathcal{N}(0, \frac{1} {m})\) entries. The upper bound of Theorem 2, which is asymptotic in m and only applies to i.i.d. Gaussian z, is given in black. In our simulations, we assume x 0 is a random unit norm vector over its support and consider both i.i.d. \(\mathcal{N}(0,\sigma ^{2})\), as well as, non-Gaussian noise vectors z. We have plotted the realizations of the normalized error for different values of λ and σ. As noted, the bound of Theorem 2 is occasionally violated since it requires very large m, as well as, i.i.d. Gaussian noise. On the other hand, the bound of Theorem 5 always holds.

  14. 14.

    For proofs of those claims, see Section 8 and in particular Lemma 8.1 in [47].

  15. 15.

    We say \(\mathbf{x}_{0} \in \mathbb{R}^{n}\) is block-sparse if it can be grouped into t known blocks of size \(b = n/t\) each so that only k of these t blocks are nonzero. To induce the structure, the standard approach is to use the 1, 2 norm which sums up the 2 norms of the blocks, [29, 48, 54, 58]. In particular, denoting the subvector corresponding to i’th block of a vector x by x i , the 1, 2 norm is defined as \(\|\mathbf{x}\|_{1,2} =\sum _{ i=1}^{t}\|\mathbf{x}_{i}\|_{2}\).

References

  1. Amelunxen, D., Lotz, M., McCoy, M.B., Tropp, J.A.: Living on the edge: a geometric theory of phase transitions in convex optimization. arXiv preprint. arXiv:1303.6672 (2013)

    Google Scholar 

  2. Bayati, M., Montanari, A.: The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57(2), 764–785 (2011)

    Article  MathSciNet  Google Scholar 

  3. Bayati, M., Montanari, A.: The LASSO risk for gaussian matrices. IEEE Trans. Inf. Theory 58(4), 1997–2017 (2012)

    Article  MathSciNet  Google Scholar 

  4. Belloni, A., Chernozhukov, V., Wang, L.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791–806 (2011)

    Article  MathSciNet  Google Scholar 

  5. Bertsekas, D., Nedic, A., Ozdaglar, A.: Convex Analysis and Optimization. Athena Scientific (2003)

    Google Scholar 

  6. Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)

    Article  MathSciNet  Google Scholar 

  7. Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples, vol. 3. Springer, New York (2010)

    Google Scholar 

  8. Cai, J.-F., Xu, W.: Guarantees of total variation minimization for signal recovery. arXiv preprint. arXiv:1301.6791 (2013)

    Google Scholar 

  9. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)

    Article  MathSciNet  Google Scholar 

  10. Candes, E., Tao, T.: Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theory 52(12),5406–5425 (2006)

    Article  MathSciNet  Google Scholar 

  11. Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)

    Article  MathSciNet  Google Scholar 

  12. Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)

    Article  MathSciNet  Google Scholar 

  13. Chandrasekaran, V., Jordan, M.I.: Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. 110(13), E1181–E1190 (2013)

    Article  MathSciNet  Google Scholar 

  14. Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)

    Article  MathSciNet  Google Scholar 

  15. Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)

    Article  MathSciNet  Google Scholar 

  16. Donoho, D.L.: High-dimensional data analysis: the curses and blessings of dimensionality. Aide-memoire of a lecture at “AMS Conference on Math Challenges of the 21st Century”. Citeseer (2000)

    Google Scholar 

  17. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  18. Donoho, D.L.: High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geom. 35(4), 617–652 (2006)

    Article  MathSciNet  Google Scholar 

  19. Donoho, D.L., Tanner, J.: Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA 102(27), 9452–9457 (2005)

    Article  MathSciNet  Google Scholar 

  20. Donoho, D.L., Tanner, J.: Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci. USA 102(27), 9446–9451 (2005)

    Article  MathSciNet  Google Scholar 

  21. Donoho, D.L., Tanner, J.: Thresholds for the recovery of sparse solutions via l1 minimization. In: The 40th Annual Conference onInformation Sciences and Systems, 2006, pp. 202–206. IEEE, New York (2006)

    Google Scholar 

  22. Donoho, D., Tanner, J.: Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci. 367(1906), 4273–4293 (2009)

    Article  MathSciNet  Google Scholar 

  23. Donoho, D.L., Tanner, J.: Precise undersampling theorems. Proc. IEEE 98(6), 913–924 (2010)

    Article  Google Scholar 

  24. Donoho, D.L., Elad, M., Temlyakov, V.N.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inf. Theory 52(1), 6–18 (2006)

    Article  MathSciNet  Google Scholar 

  25. Donoho, D.L., Maleki, A., Montanari, A.: Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. 106(45), 18914–18919 (2009)

    Article  Google Scholar 

  26. Donoho, D.L., Maleki, A., Montanari, A.: The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inf. Theory 57(10), 6920–6941 (2011)

    Article  MathSciNet  Google Scholar 

  27. Donoho, D., Johnstone, I., Montanari, A.: Accurate prediction of phase transitions in compressed sensing via a connection to minimax denoising. IEEE Trans. Inf. Theory 59(6), 3396–3433 (2013)

    Article  MathSciNet  Google Scholar 

  28. Donoho, D.L., Gavish, M., Montanari, A.: The phase transition of matrix recovery from Gaussian measurements matches the minimax mse of matrix denoising. Proc. Natl. Acad. Sci. 110(21), 8405–8410 (2013)

    Article  MathSciNet  Google Scholar 

  29. Eldar, Y.C., Kuppinger, P., Bolcskei, H.: Block-sparse signals: uncertainty relations and efficient recovery. IEEE Trans. Signal Process. 58(6), 3042–3054 (2010)

    Article  MathSciNet  Google Scholar 

  30. Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis (2002)

    Google Scholar 

  31. Foygel, R., Mackey, L.: Corrupted sensing: novel guarantees for separating structured signals. arXiv preprint. arXiv:1305.2524 (2013)

    Google Scholar 

  32. Gandy, S., Recht, B., Yamada, I.: Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Prob. 27(2), 025010 (2011)

    Article  MathSciNet  Google Scholar 

  33. Gordon, Y.: Some inequalities for Gaussian processes and applications. Isr. J. Math. 50(4), 265–289 (1985)

    Article  Google Scholar 

  34. Gordon, Y.: On Milman’s Inequality and Random Subspaces Which Escape Through a Mesh in \(\mathbb{R}^{n}\). Springer, New York (1988)

    Google Scholar 

  35. Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis, vol. 2. Springer, Berlin (2007)

    Google Scholar 

  36. Kressner, D., Steinlechner, M., Vandereycken, B.: Low-rank tensor completion by Riemannian optimization. BIT Numer. Math. 54(2):447–468 (2014)

    Article  MathSciNet  Google Scholar 

  37. Ledoux, M., Talagrand, M.: Probability in Banach Saces: Isoperimetry and Processes, vol. 23. Springer, Berlin (1991)

    Book  Google Scholar 

  38. Maleki, M.A.: Approximate Message Passing Algorithms for Compressed Sensing. Stanford University, Stanford (2010)

    Google Scholar 

  39. Maleki, A., Anitori, L., Yang, Z., Baraniuk, R.G.: Asymptotic analysis of complex lasso via complex approximate message passing (camp). IEEE Trans. Inf. Theory 59(7):4290–4308 (2013)

    Article  MathSciNet  Google Scholar 

  40. McCoy, M.B., Tropp, J.A.: From Steiner formulas for cones to concentration of intrinsic volumes. Discrete Comput. Geom. 51(4), 926–963 (2014)

    Article  MathSciNet  Google Scholar 

  41. Merriman, M.: On the history of the method of least squares. Analyst 4, 33–36 (1877)

    Article  Google Scholar 

  42. Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012)

    Article  MathSciNet  Google Scholar 

  43. Oymak, S., Hassibi, B.: Sharp MSE bounds for proximal denoising. arXiv preprint. arXiv:1305.2714 (2013)

    Google Scholar 

  44. Oymak, S., Mohan, K., Fazel, M., Hassibi, B.: A simplified approach to recovery conditions for low rank matrices. In: IEEE International Symposium on Information Theory Proceedings (ISIT), 2011, pp. 2318–2322. IEEE, New York (2011)

    Google Scholar 

  45. Oymak, S., Jalali, A., Fazel, M., Eldar, Y.C., Hassibi, B.: Simultaneously structured models with application to sparse and low-rank matrices. arXiv preprint. arXiv:1212.3753 (2012)

    Google Scholar 

  46. Oymak, S., Thrampoulidis, C., Hassibi, B.: Simple bounds for noisy linear inverse problems with exact side information. arXiv preprint. arXiv:1312.0641 (2013)

    Google Scholar 

  47. Oymak, S., Thrampoulidis, C., Hassibi, B.: The squared-error of generalized LASSO: a precise analysis. arXiv preprint. arXiv:1311.0830 (2013)

    Google Scholar 

  48. Rao, N., Recht, B., Nowak, R.: Tight measurement bounds for exact recovery of structured sparse signals. arXiv preprint. arXiv:1106.4355 (2011)

    Google Scholar 

  49. Raskutti, G., Wainwright, M.J., Yu, B.: Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 99, 2241–2259 (2010)

    MathSciNet  Google Scholar 

  50. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MathSciNet  Google Scholar 

  51. Richard, E., Savalle, P.-A., Vayatis, N.: Estimation of simultaneously sparse and low rank matrices. arXiv preprint. arXiv:1206.6474 (2012)

    Google Scholar 

  52. Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1997)

    Google Scholar 

  53. Stigler, S.M.: Gauss and the invention of least squares. Ann. Stat. 9, 465–474 (1981)

    Article  MathSciNet  Google Scholar 

  54. Stojnic, M.: Block-length dependent thresholds in block-sparse compressed sensing. arXiv preprint. arXiv:0907.3679 (2009)

    Google Scholar 

  55. Stojnic, M.: Various thresholds for 1-optimization in compressed sensing. arXiv preprint. arXiv:0907.3666 (2009)

    Google Scholar 

  56. Stojnic, M.: A framework to characterize performance of LASSO algorithms. arXiv preprint. arXiv:1303.7291 (2013)

    Google Scholar 

  57. Stojnic, M.: A rigorous geometry-probability equivalence in characterization of 1-optimization. arXiv preprint. arXiv:1303.7287 (2013)

    Google Scholar 

  58. Stojnic, M., Parvaresh, F., Hassibi, B.: On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 57(8), 3075–3085 (2009)

    Article  MathSciNet  Google Scholar 

  59. Taylor, J., et al.: The geometry of least squares in the 21st century. Bernoulli 19(4), 1449–1464 (2013)

    Google Scholar 

  60. Thrampoulidis, C., Oymak, S., Hassibi, B.: Simple error bounds for regularized noisy linear inverse problems. In: 2014 IEEE International Symposium on Information Theory (ISIT), pp. 3007–3011. IEEE (2014)

    Google Scholar 

  61. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodological) 58, 267–288 (1996)

    MathSciNet  Google Scholar 

  62. Vandenberghe, L.: Subgradients. http://www.seas.ucla.edu/~vandenbe/236C/lectures/subgradients.pdf (2013)

  63. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. arXiv preprint. arXiv:1011.3027 (2010)

    Google Scholar 

  64. Wainwright, M.J.: Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (Lasso). IEEE Trans. Inf. Theory 55(5), 2183–2202 (2009)

    Article  MathSciNet  Google Scholar 

  65. Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive principal component pursuit. Inf. Infer. 2(1), 32–68 (2013)

    Article  MathSciNet  Google Scholar 

  66. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the anonymous reviewers for their attention and their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Thrampoulidis .

Editor information

Editors and Affiliations

Appendix

Appendix

The upper bounds on the NSE of the generalized LASSO presented in Sections 4.54.7 are in terms of the summary parameters \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\) and \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\). While the bounds are simple, concise, and nicely resemble the corresponding ones in the case of OLS, it may appear to the reader that the formulae are rather abstract, because of the presence of \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) and \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\).

However, as discussed here, for a large number of widely used convex regularizers f(⋅ ), one can calculate (tight) upper bounds or even explicit formulae for these quantities. For example, for the estimation of a k-sparse signal x 0 with \(f(\cdot ) =\| \cdot \|_{1}\), it has been shown that \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))) \lesssim 2k(\log \frac{n} {k} + 1)\). Substituting this into Theorems 1 and 4 results in the “closed-form” upper bounds given in (4.20) and (4.22), i.e. ones expressed only in terms of m, n, and k. Analogous results have been derived [14, 31, 44, 54] for other well-known signal models as well, including low rankness and block-sparsityFootnote 15. The first column of Table 4.1 summarizes some of the results for \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) found in the literature [14, 31]. The second column provides closed form results on \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\) when λ is sufficiently large [47]. Note that, by setting λ to its lower bound in the second row, one approximately obtains the corresponding result in the first row. This should not be surprising due to (4.29). Also, this value of λ is a good proxy for the optimal regularizer λ best of the 2-LASSO as was discussed in Sections 4.5.3.4 and 4.6.2.1.

Table 4.1 Closed form upper bounds for \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) and \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\).

We refer the reader to [1, 14, 31, 47] for the details and state-of-the-art bounds on \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) and \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\). Identifying the subdifferential ∂ f(x 0) and calculating D(λ ∂ f(x 0)) for all λ ≥ 0 are the critical steps. Once those are available, computing \(\min _{\lambda \geq 0}\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\) provides upper approximation formulae for \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\). This idea was first introduced by Stojnic [55] and was subsequently refined and generalized in [14]. Most recently [1, 31] proved (4.29), thus showing that the resulting approximation on \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) is in fact highly accurate. Section 4 of [1] is an excellent reference for further details and the notation used there is closer to ours.

We should emphasize that examples of regularizers are not limited to the ones discussed here and presented in Table 4.1. There are increasingly more signal classes that exhibit low-dimensionality and to which the theorems of Sections 4.54.7 would apply. Some of these are as follows.

  • Non-negativity constraint: x 0 has non-negative entries, [20].

  • Low-rank plus sparse matrices: x 0 can be represented as sum of a low-rank and a sparse matrix, [65].

  • Signals with sparse gradient: Rather than x 0 itself, its gradient \(\mathbf{d} _{\mathbf{x}_{0}}(i) = \mathbf{x}_{0}(i) -\mathbf{x}_{0}(i - 1)\) is sparse, [8].

  • Low-rank tensors: x 0 is a tensor and its unfoldings are low-rank matrices, [32, 36].

  • Simultaneously sparse and low-rank matrices: For instance, x 0 = s s T for a sparse vector s, [45, 51].

Establishing new and tighter analytic bounds for D(λ ∂ f(x 0)) and \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) for more regularizers f is certainly an interesting direction for future research. In the case where such analytic bounds do not already exist in literature or are hard to derive, one can numerically estimate D(λ ∂ f(x 0)) and \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) once there is an available characterization of the subdifferential ∂ f(x 0). Using the concentration property of \(\mathop{\mathrm{dist}}\nolimits ^{2}(\mathbf{h},\lambda \partial f(\mathbf{x}_{0}))\) around \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\), when \(\mathbf{h} \sim \mathcal{N}(0,\mathbf{I}_{n})\), we can compute \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\), as follows:

  1. 1.

    draw a vector \(\mathbf{h} \sim \mathcal{N}(0,\mathbf{I}_{n})\),

  2. 2.

    return the solution of the convex program \(\min _{\mathbf{s}\in \partial f(\mathbf{x}_{0})}\|\mathbf{h} -\lambda \mathbf{s}\|^{2}\).

Computing \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) can be built on the same recipe by recognizing \(\mathop{\mathrm{dist}}\nolimits ^{2}(\mathbf{h},\text{cone}(\partial f(\mathbf{x}_{0})))\) as \(\min _{\lambda \geq 0,\mathbf{s}\in \partial f(\mathbf{x}_{0})}\|\mathbf{h} -\lambda \mathbf{s}\|^{2}\).

To sum up, any bound on D(λ ∂ f(x 0)) and \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) translates, through Theorems 16, into corresponding upper bounds on the NSE of the generalized LASSO. For purposes of illustration and completeness, we review next the details of computing \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\) and \(\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))\) for the celebrated case where x 0 is sparse and the 1-norm is used as the regularizer.

4.1.1 Sparse signals

Suppose x 0 is a k-sparse signal and \(f(\cdot ) =\| \cdot \|_{1}\). Denote by S the support set of x 0, and by S c its complement. The subdifferential at x 0 is [52],

$$\displaystyle{\partial f(\mathbf{x}_{0}) =\{ \mathbf{s} \in \mathbb{R}^{n}\ \vert \ \|\mathbf{s}\|_{ \infty }\leq 1\text{ and }\mathbf{s}_{i} = \text{sign}((\mathbf{x}_{0})_{i}),\forall i \in S\}.}$$

Let \(\mathbf{h} \in \mathbb{R}^{n}\) have i.i.d \(\mathcal{N}(0,1)\) entries and define

$$\displaystyle{\text{shrink}(\chi,\lambda ) = \left \{\begin{array}{@{}l@{\quad }l@{}} \chi -\lambda \quad &,\chi> \lambda, \\ 0 \quad &,-\lambda \leq \chi \leq \lambda, \\ \chi +\lambda \quad &,\chi <-\lambda. \end{array} \right.}$$

Then, D(λ ∂ f(x 0)) is equal to ([1, 14])

$$\displaystyle\begin{array}{rcl} \mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))& =& \mathbb{E}[\mathop{\mathrm{dist}}\nolimits ^{2}(\mathbf{h},\lambda \partial f(\mathbf{x}_{ 0}))] \\ & =& \sum _{i\in S}\mathbb{E}[(\mathbf{h}_{i} -\lambda \text{sign}((\mathbf{x}_{0})_{i}))^{2}] +\sum _{ i\in S^{c}}\mathbb{E}[\text{shrink}^{2}(\mathbf{h}_{ i},\lambda )] = \\ & =& k(1 + \lambda ^{2}) + (n - k)\sqrt{\frac{2} {\pi }} \left [(1 + \lambda ^{2})\int _{\lambda }^{\infty }e^{-t^{2}/2 }\mathrm{d}t -\lambda \exp (-\lambda ^{2}/2)\right ].{}\end{array}$$
(4.50)

Note that D(λ ∂ f(x 0)) depends only on n, λ and k = | S | , and not explicitly on S itself (which is not known). Substituting the expression in (4.50) in place of the D(λ ∂ f(x 0)) in Theorems 2 and 5 yields explicit expressions for the corresponding upper bounds in terms of n, m, k, and λ.

We can obtain an even simpler upper bound on D(λ ∂ f(x 0)) which does not involve error functions as we show below. Denote \(Q(t) = \frac{1} {\sqrt{2\pi }}\int _{t}^{\infty }e^{-\tau ^{2}/2 }\mathrm{d}\tau\) the complementary c.d.f. of a standard normal random variable. Then,

$$\displaystyle\begin{array}{rcl} \frac{1} {2}\mathbb{E}[\text{shrink}^{2}(\mathbf{h}_{ i},\lambda )]& =& \int _{\lambda }^{\infty }(t -\lambda )^{2}\mathrm{d}(-Q(t)) \\ & =& -\left [(t -\lambda )^{2}Q(t)\right ]_{\lambda }^{\infty } + 2\int _{\lambda }^{\infty }(t -\lambda )Q(t)\mathrm{d}t \\ & \leq & \int _{\lambda }^{\infty }(t -\lambda )e^{-t^{2}/2 }\mathrm{d}t {}\end{array}$$
(4.51)
$$\displaystyle\begin{array}{rcl} & \leq & e^{-\lambda ^{2}/2 } - \frac{\lambda ^{2}} {\lambda ^{2} + 1}e^{-\lambda ^{2}/2 } \\ & =& \frac{1} {\lambda ^{2} + 1}e^{-\lambda ^{2}/2 }. {}\end{array}$$
(4.52)

(4.51) and (4.52) follow from standard upper and lower tail bounds on normal random variables, namely \(\frac{1} {\sqrt{2\pi }} \frac{t} {t^{2}+1}e^{-t^{2}/2 } \leq Q(t) \leq \frac{1} {2}e^{-t^{2}/2 }\). From this, we find that

$$\displaystyle\begin{array}{rcl} \mathbf{D}(\lambda \partial f(\mathbf{x}_{0})) \leq k(1 + \lambda ^{2}) + (n - k) \frac{2} {\lambda ^{2} + 1}e^{-\lambda ^{2}/2 }.& & {}\\ \end{array}$$

Letting \(\lambda \geq \sqrt{2\log (\frac{n} {k})}\) in the above expression recovers the corresponding entry in Table 4.1:

$$\displaystyle\begin{array}{rcl} \mathbf{D}(\lambda \partial f(\mathbf{x}_{0})) \leq (\lambda ^{2} + 3)k,\text{ when }\lambda \geq \sqrt{2\log (\frac{n} {k})}.& &{}\end{array}$$
(4.53)

Substituting (4.53) in Theorems 2 and 5 recovers the bounds in (4.21) and (4.23), respectively.

Setting \(\lambda = \sqrt{2\log (\frac{n} {k})}\) in (4.53) provides an approximation to \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))\). In particular, \(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))) \leq 2k(\log (\frac{n} {k}) + 3/2)\). [14] obtains an even tighter bound (\(\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))) \leq 2k(\log (\frac{n} {k}) + 3/4)\) starting again from (4.50), but using different tail bounds for Gaussians. We refer the reader to Proposition 3.10 in [14] for the exact details.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Thrampoulidis, C., Oymak, S., Hassibi, B. (2015). Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing. In: Boche, H., Calderbank, R., Kutyniok, G., Vybíral, J. (eds) Compressed Sensing and its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-16042-9_4

Download citation

Publish with us

Policies and ethics