Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing

Thrampoulidis, Christos; Oymak, Samet; Hassibi, Babak

doi:10.1007/978-3-319-16042-9_4

Christos Thrampoulidis⁶,
Samet Oymak⁶ &
Babak Hassibi⁶

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

3031 Accesses
4 Citations

Abstract

The typical scenario that arises in most “big data” problems is one where the ambient dimension of the signal is very large (e.g., high resolution images, gene expression data from a DNA microarray, social network data, etc.), yet is such that its desired properties lie in some low dimensional structure (sparsity, low-rankness, clusters, etc.). In the modern viewpoint, the goal is to come up with efficient algorithms to reveal these structures and for which, under suitable conditions, one can give theoretical guarantees. We specifically consider the problem of recovering such a structured signal (sparse, low-rank, block-sparse, etc.) from noisy compressed measurements. A general algorithm for such problems, commonly referred to as generalized LASSO, attempts to solve this problem by minimizing a least-squares cost with an added “structure-inducing” regularization term (ℓ ₁ norm, nuclear norm, mixed ℓ ₂/ℓ ₁ norm, etc.). While the LASSO algorithm has been around for 20 years and has enjoyed great success in practice, there has been relatively little analysis of its performance. In this chapter, we will provide a full performance analysis and compute, in closed form, the mean-square-error of the reconstructed signal. We will highlight some of the mathematical vignettes necessary for the analysis, make connections to noiseless compressed sensing and proximal denoising, and will emphasize the central role of the “statistical dimension” of a structured signal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Assuming that the entries of the noise vector z are i.i.d., it is well known that a sensible choice of τ in (4.8) must scale with the standard deviation σ of the noise components [6, 12, 42]. On the other hand, (4.7) eliminates the need to know or to pre-estimate σ [4].
2.
C-LASSO in (4.17) stands for “Constrained LASSO.” The algorithm assumes a priori knowledge of f( x ₀).
3.
In the statistics literature the variant of the LASSO algorithm in (4.7) is mostly known as the “square-root LASSO” [4]. For the purposes of our presentation, we stick to the more compact term “ℓ₂ -LASSO” [47].
4.
The statements in this sections hold true with high probability in A, z and under mild assumptions. See Section 4.5 for the formal statement of the results.
5.
The formula below is subject to some simplifications meant to highlight the essential structure. See Section 4.6 for the details.
6.
The factor of 2 in (4.23) is conjectured in [60] that is not essential and, only, appears as an artifact of the proof technique therein. See, also, Section 4.6.2.1.
7.
The tools used in [1] and [57] differ. Amelunxen et al [1] use tools from conic integral geometry, while Stojnic [55] relies on a comparison lemma for Gaussian processes (see Lemma 3).
8.
In [14], $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ shows up indirectly via a closely related notion, that of the “Gaussian width” [34] of the restricted tangent cone $\mathcal{T}_{f}(\mathbf{x}_{0}) \cap \mathcal{S}^{n-1}$. In the terminology used in [1, 40], $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ corresponds to the “statistical dimension” of $\mathcal{T}_{f}(\mathbf{x}_{0}) = \left (\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))\right )^{\circ }$.
9.
When referring to [47] keep in mind the following: a) in [47] the entries of A have variance 1 and not 1∕m as here, b) [47] uses slightly different notation for $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$ and $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ ($\mathbf{D}_{f}(\mathbf{x}_{0},\lambda )$ and $\mathbf{D}_{f}(\mathbf{x}_{0},\mathbf{R}^{+})$, respectively).
10.
We follow this convention throughout: use the symbol “$\ \tilde{}$” over variables that are associated with the approximated problems. To distinguish, use the symbol “$\ \hat{}$” for the variables associated with the original problem.
11.
Observe that the dependence of η and γ on λ, m and ∂f( x ₀ ), is implicit in this definition.
12.
Precisely: assuming m ≈ m − 1 and ignoring the t’s in the bound of Theorem 4.
13.
It is conjectured in [60] and supported by simulations (e.g., Figure 4.7) that the factor of 2 in Theorem 5 is an artifact of the proof technique and not essential.
Fig. 4.7
Figure 4.7 illustrates the bound of Theorem 5, which is given in red for n = 340, m = 140, k = 10 and for A having $\mathcal{N}(0, \frac{1} {m})$ entries. The upper bound of Theorem 2, which is asymptotic in m and only applies to i.i.d. Gaussian z, is given in black. In our simulations, we assume x ₀ is a random unit norm vector over its support and consider both i.i.d. $\mathcal{N}(0,\sigma ^{2})$, as well as, non-Gaussian noise vectors z. We have plotted the realizations of the normalized error for different values of λ and σ. As noted, the bound of Theorem 2 is occasionally violated since it requires very large m, as well as, i.i.d. Gaussian noise. On the other hand, the bound of Theorem 5 always holds.
Full size image
14.
For proofs of those claims, see Section 8 and in particular Lemma 8.1 in [47].
15.
We say $\mathbf{x}_{0} \in \mathbb{R}^{n}$ is block-sparse if it can be grouped into t known blocks of size $b = n/t$ each so that only k of these t blocks are nonzero. To induce the structure, the standard approach is to use the ℓ _1, 2 norm which sums up the ℓ ₂ norms of the blocks, [29, 48, 54, 58]. In particular, denoting the subvector corresponding to i’th block of a vector x by x _i, the ℓ _1, 2 norm is defined as $\|\mathbf{x}\|_{1,2} =\sum _{ i=1}^{t}\|\mathbf{x}_{i}\|_{2}$.

References

Amelunxen, D., Lotz, M., McCoy, M.B., Tropp, J.A.: Living on the edge: a geometric theory of phase transitions in convex optimization. arXiv preprint. arXiv:1303.6672 (2013)
Google Scholar
Bayati, M., Montanari, A.: The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57(2), 764–785 (2011)
Article MathSciNet Google Scholar
Bayati, M., Montanari, A.: The LASSO risk for gaussian matrices. IEEE Trans. Inf. Theory 58(4), 1997–2017 (2012)
Article MathSciNet Google Scholar
Belloni, A., Chernozhukov, V., Wang, L.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791–806 (2011)
Article MathSciNet Google Scholar
Bertsekas, D., Nedic, A., Ozdaglar, A.: Convex Analysis and Optimization. Athena Scientific (2003)
Google Scholar
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)
Article MathSciNet Google Scholar
Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples, vol. 3. Springer, New York (2010)
Google Scholar
Cai, J.-F., Xu, W.: Guarantees of total variation minimization for signal recovery. arXiv preprint. arXiv:1301.6791 (2013)
Google Scholar
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)
Article MathSciNet Google Scholar
Candes, E., Tao, T.: Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theory 52(12),5406–5425 (2006)
Article MathSciNet Google Scholar
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)
Article MathSciNet Google Scholar
Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
Article MathSciNet Google Scholar
Chandrasekaran, V., Jordan, M.I.: Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. 110(13), E1181–E1190 (2013)
Article MathSciNet Google Scholar
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)
Article MathSciNet Google Scholar
Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)
Article MathSciNet Google Scholar
Donoho, D.L.: High-dimensional data analysis: the curses and blessings of dimensionality. Aide-memoire of a lecture at “AMS Conference on Math Challenges of the 21st Century”. Citeseer (2000)
Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet Google Scholar
Donoho, D.L.: High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geom. 35(4), 617–652 (2006)
Article MathSciNet Google Scholar
Donoho, D.L., Tanner, J.: Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA 102(27), 9452–9457 (2005)
Article MathSciNet Google Scholar
Donoho, D.L., Tanner, J.: Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci. USA 102(27), 9446–9451 (2005)
Article MathSciNet Google Scholar
Donoho, D.L., Tanner, J.: Thresholds for the recovery of sparse solutions via l1 minimization. In: The 40th Annual Conference onInformation Sciences and Systems, 2006, pp. 202–206. IEEE, New York (2006)
Google Scholar
Donoho, D., Tanner, J.: Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci. 367(1906), 4273–4293 (2009)
Article MathSciNet Google Scholar
Donoho, D.L., Tanner, J.: Precise undersampling theorems. Proc. IEEE 98(6), 913–924 (2010)
Article Google Scholar
Donoho, D.L., Elad, M., Temlyakov, V.N.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inf. Theory 52(1), 6–18 (2006)
Article MathSciNet Google Scholar
Donoho, D.L., Maleki, A., Montanari, A.: Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. 106(45), 18914–18919 (2009)
Article Google Scholar
Donoho, D.L., Maleki, A., Montanari, A.: The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inf. Theory 57(10), 6920–6941 (2011)
Article MathSciNet Google Scholar
Donoho, D., Johnstone, I., Montanari, A.: Accurate prediction of phase transitions in compressed sensing via a connection to minimax denoising. IEEE Trans. Inf. Theory 59(6), 3396–3433 (2013)
Article MathSciNet Google Scholar
Donoho, D.L., Gavish, M., Montanari, A.: The phase transition of matrix recovery from Gaussian measurements matches the minimax mse of matrix denoising. Proc. Natl. Acad. Sci. 110(21), 8405–8410 (2013)
Article MathSciNet Google Scholar
Eldar, Y.C., Kuppinger, P., Bolcskei, H.: Block-sparse signals: uncertainty relations and efficient recovery. IEEE Trans. Signal Process. 58(6), 3042–3054 (2010)
Article MathSciNet Google Scholar
Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis (2002)
Google Scholar
Foygel, R., Mackey, L.: Corrupted sensing: novel guarantees for separating structured signals. arXiv preprint. arXiv:1305.2524 (2013)
Google Scholar
Gandy, S., Recht, B., Yamada, I.: Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Prob. 27(2), 025010 (2011)
Article MathSciNet Google Scholar
Gordon, Y.: Some inequalities for Gaussian processes and applications. Isr. J. Math. 50(4), 265–289 (1985)
Article Google Scholar
Gordon, Y.: On Milman’s Inequality and Random Subspaces Which Escape Through a Mesh in $\mathbb{R}^{n}$. Springer, New York (1988)
Google Scholar
Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis, vol. 2. Springer, Berlin (2007)
Google Scholar
Kressner, D., Steinlechner, M., Vandereycken, B.: Low-rank tensor completion by Riemannian optimization. BIT Numer. Math. 54(2):447–468 (2014)
Article MathSciNet Google Scholar
Ledoux, M., Talagrand, M.: Probability in Banach Saces: Isoperimetry and Processes, vol. 23. Springer, Berlin (1991)
Book Google Scholar
Maleki, M.A.: Approximate Message Passing Algorithms for Compressed Sensing. Stanford University, Stanford (2010)
Google Scholar
Maleki, A., Anitori, L., Yang, Z., Baraniuk, R.G.: Asymptotic analysis of complex lasso via complex approximate message passing (camp). IEEE Trans. Inf. Theory 59(7):4290–4308 (2013)
Article MathSciNet Google Scholar
McCoy, M.B., Tropp, J.A.: From Steiner formulas for cones to concentration of intrinsic volumes. Discrete Comput. Geom. 51(4), 926–963 (2014)
Article MathSciNet Google Scholar
Merriman, M.: On the history of the method of least squares. Analyst 4, 33–36 (1877)
Article Google Scholar
Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012)
Article MathSciNet Google Scholar
Oymak, S., Hassibi, B.: Sharp MSE bounds for proximal denoising. arXiv preprint. arXiv:1305.2714 (2013)
Google Scholar
Oymak, S., Mohan, K., Fazel, M., Hassibi, B.: A simplified approach to recovery conditions for low rank matrices. In: IEEE International Symposium on Information Theory Proceedings (ISIT), 2011, pp. 2318–2322. IEEE, New York (2011)
Google Scholar
Oymak, S., Jalali, A., Fazel, M., Eldar, Y.C., Hassibi, B.: Simultaneously structured models with application to sparse and low-rank matrices. arXiv preprint. arXiv:1212.3753 (2012)
Google Scholar
Oymak, S., Thrampoulidis, C., Hassibi, B.: Simple bounds for noisy linear inverse problems with exact side information. arXiv preprint. arXiv:1312.0641 (2013)
Google Scholar
Oymak, S., Thrampoulidis, C., Hassibi, B.: The squared-error of generalized LASSO: a precise analysis. arXiv preprint. arXiv:1311.0830 (2013)
Google Scholar
Rao, N., Recht, B., Nowak, R.: Tight measurement bounds for exact recovery of structured sparse signals. arXiv preprint. arXiv:1106.4355 (2011)
Google Scholar
Raskutti, G., Wainwright, M.J., Yu, B.: Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 99, 2241–2259 (2010)
MathSciNet Google Scholar
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MathSciNet Google Scholar
Richard, E., Savalle, P.-A., Vayatis, N.: Estimation of simultaneously sparse and low rank matrices. arXiv preprint. arXiv:1206.6474 (2012)
Google Scholar
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1997)
Google Scholar
Stigler, S.M.: Gauss and the invention of least squares. Ann. Stat. 9, 465–474 (1981)
Article MathSciNet Google Scholar
Stojnic, M.: Block-length dependent thresholds in block-sparse compressed sensing. arXiv preprint. arXiv:0907.3679 (2009)
Google Scholar
Stojnic, M.: Various thresholds for ℓ ₁-optimization in compressed sensing. arXiv preprint. arXiv:0907.3666 (2009)
Google Scholar
Stojnic, M.: A framework to characterize performance of LASSO algorithms. arXiv preprint. arXiv:1303.7291 (2013)
Google Scholar
Stojnic, M.: A rigorous geometry-probability equivalence in characterization of ℓ ₁-optimization. arXiv preprint. arXiv:1303.7287 (2013)
Google Scholar
Stojnic, M., Parvaresh, F., Hassibi, B.: On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 57(8), 3075–3085 (2009)
Article MathSciNet Google Scholar
Taylor, J., et al.: The geometry of least squares in the 21st century. Bernoulli 19(4), 1449–1464 (2013)
Google Scholar
Thrampoulidis, C., Oymak, S., Hassibi, B.: Simple error bounds for regularized noisy linear inverse problems. In: 2014 IEEE International Symposium on Information Theory (ISIT), pp. 3007–3011. IEEE (2014)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodological) 58, 267–288 (1996)
MathSciNet Google Scholar
Vandenberghe, L.: Subgradients. http://www.seas.ucla.edu/~vandenbe/236C/lectures/subgradients.pdf (2013)
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. arXiv preprint. arXiv:1011.3027 (2010)
Google Scholar
Wainwright, M.J.: Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (Lasso). IEEE Trans. Inf. Theory 55(5), 2183–2202 (2009)
Article MathSciNet Google Scholar
Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive principal component pursuit. Inf. Infer. 2(1), 32–68 (2013)
Article MathSciNet Google Scholar
Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the anonymous reviewers for their attention and their helpful comments.

Author information

Authors and Affiliations

California Institute of Technology, Pasadena, CA, USA
Christos Thrampoulidis, Samet Oymak & Babak Hassibi

Authors

Christos Thrampoulidis
View author publications
You can also search for this author in PubMed Google Scholar
Samet Oymak
View author publications
You can also search for this author in PubMed Google Scholar
Babak Hassibi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christos Thrampoulidis .

Editor information

Editors and Affiliations

Lehrstuhl für Theoretische Informationstechnik, Technische Universität München, München, Bayern, Germany
Holger Boche
Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, USA
Robert Calderbank
Institut für Mathematik, Technische Universität Berlin, Berlin, Berlin, Germany
Gitta Kutyniok
Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
Jan Vybíral

Appendix

The upper bounds on the NSE of the generalized LASSO presented in Sections 4.5–4.7 are in terms of the summary parameters $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$ and $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$. While the bounds are simple, concise, and nicely resemble the corresponding ones in the case of OLS, it may appear to the reader that the formulae are rather abstract, because of the presence of $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ and $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$.

However, as discussed here, for a large number of widely used convex regularizers f(⋅ ), one can calculate (tight) upper bounds or even explicit formulae for these quantities. For example, for the estimation of a k-sparse signal x ₀ with $f(\cdot ) =\| \cdot \|_{1}$, it has been shown that $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))) \lesssim 2k(\log \frac{n} {k} + 1)$. Substituting this into Theorems 1 and 4 results in the “closed-form” upper bounds given in (4.20) and (4.22), i.e. ones expressed only in terms of m, n, and k. Analogous results have been derived [14, 31, 44, 54] for other well-known signal models as well, including low rankness and block-sparsity^{Footnote 15}. The first column of Table 4.1 summarizes some of the results for $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ found in the literature [14, 31]. The second column provides closed form results on $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$ when λ is sufficiently large [47]. Note that, by setting λ to its lower bound in the second row, one approximately obtains the corresponding result in the first row. This should not be surprising due to (4.29). Also, this value of λ is a good proxy for the optimal regularizer λ _best of the ℓ ₂-LASSO as was discussed in Sections 4.5.3.4 and 4.6.2.1.

Table 4.1 Closed form upper bounds for $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ and $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$.

Full size table

We refer the reader to [1, 14, 31, 47] for the details and state-of-the-art bounds on $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ and $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$. Identifying the subdifferential ∂ f(x ₀) and calculating D(λ ∂ f(x ₀)) for all λ ≥ 0 are the critical steps. Once those are available, computing $\min _{\lambda \geq 0}\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$ provides upper approximation formulae for $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$. This idea was first introduced by Stojnic [55] and was subsequently refined and generalized in [14]. Most recently [1, 31] proved (4.29), thus showing that the resulting approximation on $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ is in fact highly accurate. Section 4 of [1] is an excellent reference for further details and the notation used there is closer to ours.

We should emphasize that examples of regularizers are not limited to the ones discussed here and presented in Table 4.1. There are increasingly more signal classes that exhibit low-dimensionality and to which the theorems of Sections 4.5–4.7 would apply. Some of these are as follows.

Non-negativity constraint: x ₀ has non-negative entries, [20].
Low-rank plus sparse matrices: x ₀ can be represented as sum of a low-rank and a sparse matrix, [65].
Signals with sparse gradient: Rather than x ₀ itself, its gradient $\mathbf{d} _{\mathbf{x}_{0}}(i) = \mathbf{x}_{0}(i) -\mathbf{x}_{0}(i - 1)$ is sparse, [8].
Low-rank tensors: x ₀ is a tensor and its unfoldings are low-rank matrices, [32, 36].
Simultaneously sparse and low-rank matrices: For instance, x ₀ = s s ^T for a sparse vector s, [45, 51].

Establishing new and tighter analytic bounds for D(λ ∂ f(x ₀)) and $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ for more regularizers f is certainly an interesting direction for future research. In the case where such analytic bounds do not already exist in literature or are hard to derive, one can numerically estimate D(λ ∂ f(x ₀)) and $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ once there is an available characterization of the subdifferential ∂ f(x ₀). Using the concentration property of $\mathop{\mathrm{dist}}\nolimits ^{2}(\mathbf{h},\lambda \partial f(\mathbf{x}_{0}))$ around $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$, when $\mathbf{h} \sim \mathcal{N}(0,\mathbf{I}_{n})$, we can compute $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$, as follows:

1.
draw a vector $\mathbf{h} \sim \mathcal{N}(0,\mathbf{I}_{n})$,
2.
return the solution of the convex program $\min _{\mathbf{s}\in \partial f(\mathbf{x}_{0})}\|\mathbf{h} -\lambda \mathbf{s}\|^{2}$.

Computing $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ can be built on the same recipe by recognizing $\mathop{\mathrm{dist}}\nolimits ^{2}(\mathbf{h},\text{cone}(\partial f(\mathbf{x}_{0})))$ as $\min _{\lambda \geq 0,\mathbf{s}\in \partial f(\mathbf{x}_{0})}\|\mathbf{h} -\lambda \mathbf{s}\|^{2}$.

To sum up, any bound on D(λ ∂ f(x ₀)) and $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ translates, through Theorems 1–6, into corresponding upper bounds on the NSE of the generalized LASSO. For purposes of illustration and completeness, we review next the details of computing $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$ and $\mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))$ for the celebrated case where x ₀ is sparse and the ℓ ₁-norm is used as the regularizer.

4.1.1 Sparse signals

Suppose x ₀ is a k-sparse signal and $f(\cdot ) =\| \cdot \|_{1}$. Denote by S the support set of x ₀, and by S ^c its complement. The subdifferential at x ₀ is [52],

$$\displaystyle{\partial f(\mathbf{x}_{0}) =\{ \mathbf{s} \in \mathbb{R}^{n}\ \vert \ \|\mathbf{s}\|_{ \infty }\leq 1\text{ and }\mathbf{s}_{i} = \text{sign}((\mathbf{x}_{0})_{i}),\forall i \in S\}.}$$

Let $\mathbf{h} \in \mathbb{R}^{n}$ have i.i.d $\mathcal{N}(0,1)$ entries and define

$$\displaystyle{\text{shrink}(\chi,\lambda ) = \left \{\begin{array}{@{}l@{\quad }l@{}} \chi -\lambda \quad &,\chi> \lambda, \\ 0 \quad &,-\lambda \leq \chi \leq \lambda, \\ \chi +\lambda \quad &,\chi <-\lambda. \end{array} \right.}$$

Then, D(λ ∂ f(x ₀)) is equal to ([1, 14])

$$\displaystyle\begin{array}{rcl} \mathbf{D}(\lambda \partial f(\mathbf{x}_{0}))& =& \mathbb{E}[\mathop{\mathrm{dist}}\nolimits ^{2}(\mathbf{h},\lambda \partial f(\mathbf{x}_{ 0}))] \\ & =& \sum _{i\in S}\mathbb{E}[(\mathbf{h}_{i} -\lambda \text{sign}((\mathbf{x}_{0})_{i}))^{2}] +\sum _{ i\in S^{c}}\mathbb{E}[\text{shrink}^{2}(\mathbf{h}_{ i},\lambda )] = \\ & =& k(1 + \lambda ^{2}) + (n - k)\sqrt{\frac{2} {\pi }} \left [(1 + \lambda ^{2})\int _{\lambda }^{\infty }e^{-t^{2}/2 }\mathrm{d}t -\lambda \exp (-\lambda ^{2}/2)\right ].{}\end{array}$$

(4.50)

Note that D(λ ∂ f(x ₀)) depends only on n, λ and k = | S | , and not explicitly on S itself (which is not known). Substituting the expression in (4.50) in place of the D(λ ∂ f(x ₀)) in Theorems 2 and 5 yields explicit expressions for the corresponding upper bounds in terms of n, m, k, and λ.

We can obtain an even simpler upper bound on D(λ ∂ f(x ₀)) which does not involve error functions as we show below. Denote $Q(t) = \frac{1} {\sqrt{2\pi }}\int _{t}^{\infty }e^{-\tau ^{2}/2 }\mathrm{d}\tau$ the complementary c.d.f. of a standard normal random variable. Then,

$$\displaystyle\begin{array}{rcl} \frac{1} {2}\mathbb{E}[\text{shrink}^{2}(\mathbf{h}_{ i},\lambda )]& =& \int _{\lambda }^{\infty }(t -\lambda )^{2}\mathrm{d}(-Q(t)) \\ & =& -\left [(t -\lambda )^{2}Q(t)\right ]_{\lambda }^{\infty } + 2\int _{\lambda }^{\infty }(t -\lambda )Q(t)\mathrm{d}t \\ & \leq & \int _{\lambda }^{\infty }(t -\lambda )e^{-t^{2}/2 }\mathrm{d}t {}\end{array}$$

(4.51)

$$\displaystyle\begin{array}{rcl} & \leq & e^{-\lambda ^{2}/2 } - \frac{\lambda ^{2}} {\lambda ^{2} + 1}e^{-\lambda ^{2}/2 } \\ & =& \frac{1} {\lambda ^{2} + 1}e^{-\lambda ^{2}/2 }. {}\end{array}$$

(4.52)

(4.51) and (4.52) follow from standard upper and lower tail bounds on normal random variables, namely $\frac{1} {\sqrt{2\pi }} \frac{t} {t^{2}+1}e^{-t^{2}/2 } \leq Q(t) \leq \frac{1} {2}e^{-t^{2}/2 }$. From this, we find that

$$\displaystyle\begin{array}{rcl} \mathbf{D}(\lambda \partial f(\mathbf{x}_{0})) \leq k(1 + \lambda ^{2}) + (n - k) \frac{2} {\lambda ^{2} + 1}e^{-\lambda ^{2}/2 }.& & {}\\ \end{array}$$

Letting $\lambda \geq \sqrt{2\log (\frac{n} {k})}$ in the above expression recovers the corresponding entry in Table 4.1:

$$\displaystyle\begin{array}{rcl} \mathbf{D}(\lambda \partial f(\mathbf{x}_{0})) \leq (\lambda ^{2} + 3)k,\text{ when }\lambda \geq \sqrt{2\log (\frac{n} {k})}.& &{}\end{array}$$

(4.53)

Substituting (4.53) in Theorems 2 and 5 recovers the bounds in (4.21) and (4.23), respectively.

Setting $\lambda = \sqrt{2\log (\frac{n} {k})}$ in (4.53) provides an approximation to $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0})))$. In particular, $\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))) \leq 2k(\log (\frac{n} {k}) + 3/2)$. [14] obtains an even tighter bound ($\mathbf{D}(\mathop{\mathrm{cone}}\nolimits (\partial f(\mathbf{x}_{0}))) \leq 2k(\log (\frac{n} {k}) + 3/4)$ starting again from (4.50), but using different tail bounds for Gaussians. We refer the reader to Proposition 3.10 in [14] for the exact details.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Thrampoulidis, C., Oymak, S., Hassibi, B. (2015). Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing. In: Boche, H., Calderbank, R., Kutyniok, G., Vybíral, J. (eds) Compressed Sensing and its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-16042-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-16042-9_4
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-16041-2
Online ISBN: 978-3-319-16042-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

4.1.1 Sparse signals

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation